Move review and prompt content to its own database table

My review database is coming large (over 5000 reviews, 177 megabytes), so I am thinking about restructuring things so that the metadata tables do not have the prompt and review content inline, which will improve sqlite scan performance. 

# Schema Refactoring: Separate Large Content from Metadata Tables

## Problem

The `review_jobs` and `reviews` tables store large text content (prompts,
diffs, review output) inline alongside small metadata columns. As the database
grows (currently 177MB), this has several consequences:

- **`ListJobs` pulls `rv.output` for every row** just to compute verdicts at
  query time via `ParseVerdict()`. Review output can be 10-100KB+.
- **`ListJobs` pulls `j.prompt` for every row.** The TUI queue view only uses
  it to check `prompt != ""` for running jobs (to decide if the prompt view is
  available). Prompts can be 5-20KB+.
- **SQLite B-tree leaf pages** carry per-value overhead even when large values
  overflow to separate pages. Scanning the main table touches more pages than
  necessary for metadata-only queries.
- **PostgreSQL TOAST** detoasting has real CPU overhead, and wide rows hurt HOT
  updates and index-only scans.
- **VACUUM effectiveness** is reduced when large and small data share the same
  table structure.

## Proposed Schema Changes

### New tables

```sql
-- Large content from review_jobs
CREATE TABLE job_content (
  job_id INTEGER PRIMARY KEY REFERENCES review_jobs(id),
  prompt TEXT,
  diff_content TEXT,
  output_prefix TEXT
);

-- Large content from reviews
CREATE TABLE review_content (
  review_id INTEGER PRIMARY KEY REFERENCES reviews(id),
  prompt TEXT NOT NULL,
  output TEXT NOT NULL
);
```

### Columns to remove

- `review_jobs`: drop `prompt`, `diff_content`, `output_prefix`
- `reviews`: drop `prompt`, `output`

### New column on `reviews`

```sql
ALTER TABLE reviews ADD COLUMN verdict TEXT;  -- 'P' or 'F', computed at write time
```

Storing the verdict at review creation time (in `CompleteJob`) eliminates the
need to fetch `rv.output` in `ListJobs` entirely.

## Query Impact Analysis

### review_jobs content columns

| Function | File | Read/Write | Columns | Migration impact |
|----------|------|-----------|---------|-----------------|
| `EnqueueJob` | jobs.go:794 | Write | prompt, diff_content, output_prefix | Write to `job_content` instead |
| `SaveJobPrompt` | jobs.go:927 | Write | prompt | Update `job_content.prompt` |
| `ClaimJob` | jobs.go:880 | Read | prompt, diff_content | JOIN `job_content` (worker needs both) |
| `CompleteJob` | jobs.go:962 | Read | output_prefix | JOIN `job_content` (needs prefix) |
| `ListJobs` | jobs.go:1154 | Read | prompt | **Stop selecting it.** Add `has_prompt` bool column or use `EXISTS` subquery. TUI only checks `prompt != ""` for running jobs. |

### reviews content columns

| Function | File | Read/Write | Columns | Migration impact |
|----------|------|-----------|---------|-----------------|
| `CompleteJob` | jobs.go:990 | Write | prompt, output | Write to `review_content` instead. Also compute and store `reviews.verdict`. |
| `GetReviewByJobID` | reviews.go:20 | Read | prompt, output | JOIN `review_content` (detail view needs both) |
| `GetReviewByCommitSHA` | reviews.go:98 | Read | prompt, output | JOIN `review_content` |
| `GetAllReviewsForGitRef` | reviews.go:169 | Read | prompt, output | JOIN `review_content` (synthesis context) |
| `GetRecentReviewsForRepo` | reviews.go:199 | Read | prompt, output | JOIN `review_content` |
| `GetReviewByID` | reviews.go:280 | Read | prompt, output | JOIN `review_content` |
| `GetBatchReviews` | ci.go:227 | Read | output | JOIN `review_content` |
| `ListJobs` | jobs.go:1154 | Read | output (via rv JOIN) | **Stop selecting it.** Read `reviews.verdict` directly instead. |

### Key optimization: ListJobs after migration

The `ListJobs` query becomes a pure metadata scan:

```sql
SELECT j.id, j.repo_id, j.commit_id, j.git_ref, j.branch, j.agent,
       j.reasoning, j.status, j.enqueued_at, j.started_at, j.finished_at,
       j.worker_id, j.error, j.has_prompt, j.retry_count,
       COALESCE(j.agentic, 0), r.root_path, r.name, c.subject,
       rv.addressed, rv.verdict,
       j.source_machine_id, j.uuid, j.model, j.job_type, j.review_type
FROM review_jobs j
JOIN repos r ON r.id = j.repo_id
LEFT JOIN commits c ON c.id = j.commit_id
LEFT JOIN reviews rv ON rv.job_id = j.id
```

No large text columns touched. Verdict is pre-computed. `has_prompt` replaces
the full prompt text for the "is prompt viewable?" check.

## Migration Strategy

### SQLite

roborev statically links a modern SQLite (3.35+), so `ALTER TABLE DROP COLUMN`
is available.

1. Begin transaction
2. Create `job_content` and `review_content` tables
3. `INSERT INTO job_content SELECT id, prompt, diff_content, output_prefix FROM review_jobs`
4. `INSERT INTO review_content SELECT rv.id, rv.prompt, rv.output FROM reviews rv`
5. Add `has_prompt` column: `ALTER TABLE review_jobs ADD COLUMN has_prompt INTEGER NOT NULL DEFAULT 0`
6. `UPDATE review_jobs SET has_prompt = 1 WHERE prompt IS NOT NULL AND prompt != ''`
7. Add `verdict` column: `ALTER TABLE reviews ADD COLUMN verdict TEXT`
8. Compute and store verdicts (requires application-level logic since
   `ParseVerdict` is in Go -- iterate rows in Go and batch update)
9. `ALTER TABLE review_jobs DROP COLUMN prompt`
10. `ALTER TABLE review_jobs DROP COLUMN diff_content`
11. `ALTER TABLE review_jobs DROP COLUMN output_prefix`
12. `ALTER TABLE reviews DROP COLUMN prompt`
13. `ALTER TABLE reviews DROP COLUMN output`
14. Commit
15. `VACUUM` to reclaim space (outside transaction)

The VACUUM after dropping the large columns is the big win -- it rewrites the
entire database, compacting the main tables to just metadata.

### PostgreSQL

Postgres can `ALTER TABLE DROP COLUMN` natively, but dropped columns leave
dead space until VACUUM FULL or a table rewrite.

1. Begin transaction
2. Create `job_content` and `review_content` tables
3. INSERT INTO ... SELECT for both
4. Add `has_prompt` and `verdict` columns
5. Populate `has_prompt` and `verdict`
6. `ALTER TABLE review_jobs DROP COLUMN prompt, DROP COLUMN diff_content, DROP COLUMN output_prefix`
7. `ALTER TABLE reviews DROP COLUMN prompt, DROP COLUMN output`
8. Commit
9. `VACUUM FULL review_jobs; VACUUM FULL reviews;` (outside transaction, requires exclusive lock)

### Schema version

Bump the schema version for both backends. The migration should be
backward-compatible in the sense that old code will fail gracefully (missing
columns) rather than silently corrupt data.

## Sync Considerations

The sync system (`internal/storage/sync.go`) serializes jobs and reviews for
cross-machine sync. The sync format will need to include content from the new
tables. This should be handled by JOINing the content tables in the sync export
queries and writing to both tables on sync import.

### Schema version gating

`EnsureSchema` in `postgres.go` hard-fails when the database schema version is
newer than the client's `pgSchemaVersion`:

```go
return fmt.Errorf("database schema version %d is newer than supported version %d", ...)
```

This means **all sync clients must be upgraded before the Postgres migration
runs**, or they will stop syncing entirely. Options:

1. **Upgrade-first**: Require all machines to upgrade roborev before any one of
   them runs the Postgres migration. The new code can handle both old and new
   schemas (check for content tables, fall back to inline columns if missing).
2. **Two-phase migration**: First release adds the new tables and starts
   writing to both locations (old columns + new tables) without bumping the
   schema version or dropping columns. Second release bumps the version, drops
   the old columns, and stops dual-writing.

### Rolling upgrade strategy (recommended)

The existing schema version gate provides clean rolling upgrade behavior:

1. **First client upgrades**: Runs Postgres migration (creates content tables,
   drops old columns, bumps schema version). Also runs local SQLite migration.
2. **Other clients (old version)**: `EnsureSchema` fails with version mismatch.
   The sync worker closes the connection and retries periodically. The daemon
   continues working locally -- reviews still run, the TUI still works, only
   sync is paused.
3. **Other clients upgrade at their own pace**: Each runs its local SQLite
   migration on startup. On next sync attempt, `EnsureSchema` succeeds
   (version now matches), sync resumes.

This requires no two-phase migration or dual-write complexity. The schema
version gate already provides the right semantics.

**Pre-work**: Improve the `EnsureSchema` error path to log a user-friendly
message: `"Sync: upgrade roborev to resume syncing (database schema v%d,
client supports v%d)"` instead of the current developer-facing error format.
This could ship in any release before the migration.

## TUI Changes

- `ReviewJob.Prompt` field becomes empty in `ListJobs` results. Add
  `HasPrompt bool` field to the struct. Update TUI code that checks
  `job.Prompt != ""` to check `job.HasPrompt` instead.
- For running jobs where the TUI displays the prompt directly (before a review
  exists), fetch from `job_content` on demand (same as fetching a review).
- `ReviewJob.Verdict` is already a `*string` field populated from `ListJobs` --
  just read it from `reviews.verdict` instead of computing from `rv.output`.

## Risks and Rollback

- **Data integrity**: The migration copies data before dropping columns. If
  the migration fails partway, the transaction rolls back and nothing changes.
- **Rollback path**: Keep the old schema version code available for one release
  cycle. A downgrade migration could re-merge the content tables back (slow but
  safe).
- **Performance during migration**: VACUUM can be slow on large databases.
  The 177MB database should VACUUM in seconds, but larger deployments may take
  longer. Consider warning the user or running VACUUM in the background.

## Scope

This is a standalone change that touches:
- `internal/storage/db.go` (schema, migrations)
- `internal/storage/jobs.go` (EnqueueJob, SaveJobPrompt, ClaimJob, CompleteJob, ListJobs)
- `internal/storage/reviews.go` (all Get* functions)
- `internal/storage/ci.go` (GetBatchReviews)
- `internal/storage/sync.go` (export/import)
- `internal/storage/postgres.go` (Postgres equivalents of all the above)
- `internal/storage/models.go` (HasPrompt field)
- `cmd/roborev/tui.go` (HasPrompt checks, verdict from DB)
- `internal/daemon/server.go` (API responses, if prompt/output are returned)

Estimated at 15-25 files touched depending on test coverage.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move review and prompt content to its own database table #229

Schema Refactoring: Separate Large Content from Metadata Tables

Problem

Proposed Schema Changes

New tables

Columns to remove

New column on `reviews`

Query Impact Analysis

review_jobs content columns

reviews content columns

Key optimization: ListJobs after migration

Migration Strategy

SQLite

PostgreSQL

Schema version

Sync Considerations

Schema version gating

Rolling upgrade strategy (recommended)

TUI Changes

Risks and Rollback

Scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Function	File	Read/Write	Columns	Migration impact
`EnqueueJob`	jobs.go:794	Write	prompt, diff_content, output_prefix	Write to `job_content` instead
`SaveJobPrompt`	jobs.go:927	Write	prompt	Update `job_content.prompt`
`ClaimJob`	jobs.go:880	Read	prompt, diff_content	JOIN `job_content` (worker needs both)
`CompleteJob`	jobs.go:962	Read	output_prefix	JOIN `job_content` (needs prefix)
`ListJobs`	jobs.go:1154	Read	prompt	Stop selecting it. Add `has_prompt` bool column or use `EXISTS` subquery. TUI only checks `prompt != ""` for running jobs.

Function	File	Read/Write	Columns	Migration impact
`CompleteJob`	jobs.go:990	Write	prompt, output	Write to `review_content` instead. Also compute and store `reviews.verdict`.
`GetReviewByJobID`	reviews.go:20	Read	prompt, output	JOIN `review_content` (detail view needs both)
`GetReviewByCommitSHA`	reviews.go:98	Read	prompt, output	JOIN `review_content`
`GetAllReviewsForGitRef`	reviews.go:169	Read	prompt, output	JOIN `review_content` (synthesis context)
`GetRecentReviewsForRepo`	reviews.go:199	Read	prompt, output	JOIN `review_content`
`GetReviewByID`	reviews.go:280	Read	prompt, output	JOIN `review_content`
`GetBatchReviews`	ci.go:227	Read	output	JOIN `review_content`
`ListJobs`	jobs.go:1154	Read	output (via rv JOIN)	Stop selecting it. Read `reviews.verdict` directly instead.

Move review and prompt content to its own database table #229

Description

Schema Refactoring: Separate Large Content from Metadata Tables

Problem

Proposed Schema Changes

New tables

Columns to remove

New column on reviews

Query Impact Analysis

review_jobs content columns

reviews content columns

Key optimization: ListJobs after migration

Migration Strategy

SQLite

PostgreSQL

Schema version

Sync Considerations

Schema version gating

Rolling upgrade strategy (recommended)

TUI Changes

Risks and Rollback

Scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New column on `reviews`