Skip to content

introduce caching system #472

@luxass

Description

@luxass

The pipeline supports caching route outputs to avoid reprocessing unchanged files. This is especially useful when running the pipeline multiple times during development or when only a subset of files have changed.

Cache Store

A cache store handles persisting and retrieving cached route outputs. The pipeline ships with an in-memory implementation.

import { definePipeline, createMemoryCacheStore } from "@ucdjs/pipelines";

const cacheStore = createMemoryCacheStore();

const pipeline = definePipeline({
  versions: ["16.0.0"],
  source: mySource,
  cacheStore,
  routes: [myRoute],
});

await pipeline.run();           // First run: cache miss
await pipeline.run();           // Second run: cache hit
await pipeline.run({ cache: false }); // Force recompute

Cache Keys

Cache entries are keyed by a combination of factors to ensure correctness. When any of these change, the cache is invalidated.

interface CacheKey {
  routeId: string;                    // Which route produced this
  version: string;                    // Unicode version
  inputHash: string;                  // Hash of file content
  artifactHashes: Record<string, string>; // Hashes of consumed artifacts
}

The artifact hashes ensure that if an upstream artifact changes, all routes that consume it will recompute.

Per-Route Cache Control

Individual routes can opt out of caching if their output depends on external factors.

const volatileRoute = definePipelineRoute({
  id: "volatile",
  filter: byName("SomeFile.txt"),
  cache: false, // Never cache this route
  parser: async function* (ctx) { /* ... */ },
  resolver: async (ctx, rows) => { /* ... */ },
});

Custom Cache Stores

The CacheStore interface can be implemented for different storage backends like filesystem or Redis.

interface CacheStore {
  get(key: CacheKey): Promise<CacheEntry | null>;
  set(entry: CacheEntry): Promise<void>;
  delete(key: CacheKey): Promise<boolean>;
  clear(): Promise<void>;
  stats(): Promise<CacheStats>;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions