When a worker fails unexpectedly (API error, timeout, kill), the worker_runs record stays in "running" forever. The tracking system shows phantom progress for dead workers.
Reproduction:
- Spawn a worker that fails (API error, network issue)
- Worker dies
- Check DB:
SELECT * FROM worker_runs WHERE status='running'; — record stuck
- Status never updates to "failed"
Expected: Wrap worker execution in error handling that marks DB record as "failed" on any exception.
Workaround:
UPDATE worker_runs SET status='failed', completed_at=datetime('now') WHERE status='running' AND started_at < datetime('now', '-10 minutes');