Skip to content

Conversation

@deepfates
Copy link
Owner

What

  • Add JSONL-based checkpointing and workspace
    • FsStore: content-addressed objects under /objects and checkpoint manifests under /checkpoints
    • Checkpoint manifest fields:
      • id, createdAt, schemaVersion, parentId
      • sourceRefs: provenance entries (e.g., { kind: "twitter", uri })
      • inputs.itemsRef: JSONL of all normalized ContentItem
      • transforms: entries for filter and grouping with config, inputRef, outputRef, and stats
      • materialized: threadsRef and conversationsRef
      • decisionsRef: optional JSONL of decisions
  • Wire checkpoint creation into the CLI run
    • Defaults workspace to /.splice (overridable via --workspace)
    • Writes items JSONL, filtered items JSONL, threads JSON, and conversations JSON; saves a manifest chained to the previous checkpoint
    • Still produces standard outputs based on --format
  • Decisions support (foundation for interactive UI)
    • src/core/decisions.ts: pure helpers to fold a decisions JSONL stream to the latest-per-id statuses (unread/export/skip/custom), group by status, and filter selected items
    • CLI flags:
      • --decisions-import : import an existing decisions JSONL into the checkpoint
      • --set-status with --ids ... and/or --ids-file : programmatically mark items in this run (appends to decisions JSONL)
  • Entry forwarder updated so npx tsx splice.ts runs the modular CLI (ensures checkpoint code path is used during dev)
  • Why
    • Establishes persistent, diffable artifacts between pipeline stages without reprocessing
    • Enables resumability and fan-out to future outputs
    • Decisions JSONL is the building block for an interactive review UI (spreadsheet-like), with append-only logs and simple merges
  • How to test
    • Run:
      • npx tsx splice.ts --source --out ./out --format json --log-level info
      • Inspect workspace: ./out/.splice
        • Checkpoints: ls ./out/.splice/checkpoints
        • Objects: ls ./out/.splice/objects
        • Open manifest: jq . ./out/.splice/checkpoints/.json
      • Decision example:
        • npx tsx splice.ts --source --out ./out --format json --set-status export --ids 123 456
        • Manifest should include decisionsRef; decisions JSONL stored under objects/
  • Notes
    • No breaking changes to existing CLI outputs
    • --checkpoint flag is parsed but not yet used to read from prior checkpoints (planned)
  • Next
    • Add a minimal UI (HTML or TUI) to view items in a spreadsheet-like list with sorting/filtering and status toggles
    • Add endpoints/CLI commands to read a checkpoint and apply decisions to materialize selected sets
    • Incremental ingest cursors

…isions import/set-status; wire into CLI and forwarder
@deepfates deepfates requested a review from Copilot October 19, 2025 01:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces checkpointing and decisions capabilities to the splice tool. It refactors the monolithic splice.ts into a modular CLI with persistent workspace artifacts (JSONL-based), enabling resumable pipelines and laying the foundation for an interactive UI to review and curate items.

  • Adds JSONL-based checkpointing system with content-addressed storage (FsStore) under <workspace>/objects and manifests under <workspace>/checkpoints
  • Introduces decisions JSONL to track manual statuses (unread/export/skip) per item for interactive review workflows
  • Refactors entry point (splice.ts) to a thin forwarder that loads the modular CLI (src/cli/splice.ts), ensuring the checkpoint code path runs during dev

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/core/types.ts Adds CLI options for workspace, checkpoint, decisions import, and status management
src/core/store.ts Implements FsStore with content-addressed objects, checkpoint manifests, and helpers for JSONL persistence
src/core/decisions.ts Pure functions for folding decision streams, applying statuses to items, and generating decision records
src/cli/splice.ts Wires checkpoint creation into CLI run; stores artifacts, processes decisions flags, and chains manifests
splice.ts Refactored to a thin forwarder that loads TypeScript or compiled JavaScript CLI entry

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 76 to 79
function stableStringify(value: unknown): string {
// Deterministic JSON stringify (sort object keys)
const seen = new WeakSet<object>();
const stringify = (v: any): string => {
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stringify inner function uses any type for parameter v. Consider using unknown for better type safety.

Copilot uses AI. Check for mistakes.
Comment on lines 198 to 201
for await (const item of iterable as any) {
const line = JSON.stringify(item) + "\n";
await fh.write(line);
hashUpdate(h, line);
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type casting to any bypasses type safety. Consider using a proper type guard or conditional type handling for the iterable.

Suggested change
for await (const item of iterable as any) {
const line = JSON.stringify(item) + "\n";
await fh.write(line);
hashUpdate(h, line);
if (typeof (iterable as AsyncIterable<T>)[Symbol.asyncIterator] === "function") {
for await (const item of iterable as AsyncIterable<T>) {
const line = JSON.stringify(item) + "\n";
await fh.write(line);
hashUpdate(h, line);
}
} else if (typeof (iterable as Iterable<T>)[Symbol.iterator] === "function") {
for (const item of iterable as Iterable<T>) {
const line = JSON.stringify(item) + "\n";
await fh.write(line);
hashUpdate(h, line);
}
} else {
throw new TypeError("putJSONL: Provided value is not Iterable or AsyncIterable");

Copilot uses AI. Check for mistakes.

async saveCheckpoint(manifest: CheckpointManifest): Promise<string> {
await this.init();
const id = manifest.id || this.generateCheckpointId(manifest);
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty string '' is falsy and will trigger ID generation even if explicitly set. Use manifest.id ?? this.generateCheckpointId(manifest) to handle only nullish values.

Suggested change
const id = manifest.id || this.generateCheckpointId(manifest);
const id = manifest.id ?? this.generateCheckpointId(manifest);

Copilot uses AI. Check for mistakes.
const source = path.resolve(opts.source);
const outDir = path.resolve(opts.out);
const workspaceDir = path.resolve(
(opts as any).workspace || path.join(outDir, ".splice"),
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using (opts as any).workspace bypasses type safety. Define workspace and checkpoint in the CLIOptions type (they appear to be added in types.ts) and remove the cast.

Suggested change
(opts as any).workspace || path.join(outDir, ".splice"),
opts.workspace || path.join(outDir, ".splice"),

Copilot uses AI. Check for mistakes.
Comment on lines 254 to 256
const decisionsPath = (opts as any).decisionsImport as
| string
| undefined;
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple uses of (opts as any) throughout this section bypass type safety. The CLIOptions type should include decisionsImport, setStatus, ids, and idsFile fields to avoid casting.

Copilot uses AI. Check for mistakes.
Comment on lines 265 to 267
let ids: string[] = Array.isArray((opts as any).ids)
? ((opts as any).ids as string[])
: [];
Copy link

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant type casting. If opts.ids is typed as string[] in CLIOptions, this can be simplified to let ids: string[] = opts.ids || [];.

Suggested change
let ids: string[] = Array.isArray((opts as any).ids)
? ((opts as any).ids as string[])
: [];
let ids: string[] = (opts as any).ids || [];

Copilot uses AI. Check for mistakes.
@deepfates deepfates merged commit 654b3cf into main Oct 19, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant