Skip to content

Comments

Dashboard aggregates with mandatory collect+reduce#29

Merged
NiveditJain merged 12 commits intomainfrom
feat/aggregated-evals
Feb 24, 2026
Merged

Dashboard aggregates with mandatory collect+reduce#29
NiveditJain merged 12 commits intomainfrom
feat/aggregated-evals

Conversation

@NiveditJain
Copy link
Member

@NiveditJain NiveditJain commented Feb 24, 2026

Summary

  • New app.dashboard.aggregate() API for cross-session aggregations on dashboard views, requiring explicit { collect, reduce } for full control over output tables
  • Incremental computation with per-session caching — only new/changed sessions are reprocessed
  • Per-eval recompute — each eval result row has an individual re-run button to recompute a single eval without reprocessing all others
  • Performance fixes (v0.5.5): LRU-bounded runtime cache, incremental dashboard index, server-side filtering & pagination
  • Concurrency limits via batchAll utility to prevent file descriptor exhaustion on large workspaces

Test plan

  • npm run build — no type errors
  • npm run build:evals — dist declarations compile
  • npx vitest run — all 403 tests pass
  • npm run dev -- --evals ./test_evals.js — aggregate table renders with user-defined columns
  • Per-eval re-run button triggers single eval refresh and merges result in-place

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Dashboard Aggregates: cross-session aggregate API with incremental computation, sortable summary tables, per-view aggregate counts, and UI sections with totals, loading/updating and retry states.
    • Per-eval recompute: per-row Rerun to recompute a single evaluation and merge updated results inline.
  • Bug Fixes

    • Dashboard debounce cleanup on unmount to avoid lingering timers.
  • Documentation

    • README and API docs updated with aggregate usage, examples, and new types.

NiveditJain and others added 6 commits February 15, 2026 20:32
- Improved dashboard load times by implementing parallel session file discovery and cache operations, significantly speeding up performance for projects with many sessions.
- Updated cache invalidation and file stat calls to run in parallel, enhancing efficiency.
- UI improvements include immediate loading spinners for eval and enrichment results panels when no initial result is available, and various text overflow fixes for better display of tool names.

Updated CHANGELOG.md to reflect these changes.
…dling

- Introduced a new `batchAll` utility to manage concurrency for async tasks, preventing file descriptor exhaustion during parallel operations.
- Updated various actions (compute-dashboard, run-session-action, run-subagent-enrichments, run-subagent-evals) to utilize `batchAll` for concurrency-limited execution of cache operations and file handling.
- Enhanced cache invalidation in `LocalCacheBackend` to use `batchAll`, improving performance during file deletions.
- Updated CHANGELOG.md to document these changes and improvements.
Update CHANGELOG.md to reference batchAll instead of Promise.allSettled/Promise.all,
matching the actual implementation. Add vitest suite for the batchAll concurrency
utility covering order preservation, bounded concurrency, empty inputs, and rejection handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…filtering (v0.5.5)

Resolves JavaScript heap out-of-memory crashes on large workspaces by:
- Adding LRU eviction (maxSize) to runtimeCache, capping getCachedSessionLog at 20 entries
- Introducing DashboardIndex in globalThis for incremental session tracking (only new/changed/deleted sessions processed)
- Bypassing runtime cache in computeDashboard to allow immediate GC of parsed JSONL data
- Moving filtering and pagination server-side with debounced re-fetches
- Using incremental accumulators for filter metadata instead of unbounded arrays

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep incremental DashboardIndex and hashEvalsModule import from feature branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-reduce produced cluttered tables with mixed types. Users now
provide both collect and reduce functions for full control over
aggregate output columns and rows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Dashboard Aggregates (v0.5.6): new app.dashboard.aggregate API with per-session collect and cross-session reduce phases, incremental per-session caching/indexing via computeAggregates server action, aggregate registry/types, client UI for sortable aggregate tables, and per-eval recompute support (single-eval rerun bypassing cache).

Changes

Cohort / File(s) Summary
Docs & Versioning
CHANGELOG.md, README.md, docs/api-reference.md, package.json
Bump to 0.5.6 and add full documentation for Dashboard Aggregates API, types, examples, incremental semantics and README changelog entries.
Aggregate Server Action
app/actions/compute-aggregates.ts
New server action computeAggregates(viewName?): session discovery, incremental per-session collect with global index & disk cache, batched concurrent processing, and reduce to produce AggregatePayload and totals.
Dashboard Integration & Views
app/actions/list-dashboard-views.ts, app/dashboard/[viewName]/page.tsx, app/dashboard/page.tsx
Expose aggregateCount and hasDefaultAggregates; conditionally render AggregateClient and surface aggregate counts alongside filters in view listings and pages.
Client Components
app/dashboard/aggregate-client.tsx, app/dashboard/aggregate-section.tsx, app/dashboard/aggregate-custom-table.tsx
Add AggregateClient (fetch & UI states), AggregateSection and AggregateCustomTable (sortable table, formatting, empty state).
Eval/Action Changes (per-eval rerun)
app/actions/run-evals.ts, app/actions/run-session-action.ts, app/actions/run-subagent-evals.ts, app/components/eval-results-panel.tsx
Add optional evalName option to runEvals/runSessionAction/runSubagentEvals to run a single eval (filter items, bypass per-item cache); add per-row rerun UI and merge-updated result behavior into panel.
Registry & Types
lib/evals/dashboard-registry.ts, lib/evals/dashboard-types.ts, lib/evals/index.ts, lib/evals/app.ts
New aggregate registry and APIs (registerAggregate, getAggregatesForView, hasAggregates, clearAggregates); export/add aggregate types (AggregateContext, AggregateDefinition, AggregatePayload, etc.) and builder APIs to register aggregates.
Indexing/key change
app/actions/compute-dashboard.ts
Rename global dashboard index key to __CLAUDEYE_DASHBOARD_INDEX__ and use UUID when session hashing fails to avoid empty-hash collisions.
Examples & Tests
test_evals.js
Add an example view using .aggregate(...) with collect/reduce implementations computing averages, rates, and per-model counts.
Minor client fixes
app/dashboard/dashboard-client.tsx, app/components/eval-results-panel.tsx
Add debounce cleanup on unmount; wire per-row rerun button with global loading state and local rerun UI/indicator.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant AggClient as AggregateClient
    participant Server as computeAggregates
    participant Registry as DashboardRegistry
    participant Storage as GlobalIndex

    User->>AggClient: open dashboard view
    AggClient->>Server: computeAggregates(viewName)
    Server->>Registry: getAggregatesForView(viewName)
    Registry-->>Server: aggregates[]
    Server->>Storage: getOrCreateIndex(viewName,...)
    Storage-->>Server: AggregateIndex

    Note over Server: Phase 1 — Session Discovery
    Server->>Server: discover sessions, hash contents
    Server->>Storage: diff hashes -> changedSessions
    Storage-->>Server: changedSessions

    Note over Server: Phase 2 — Per-Session Collect
    loop per changed session
        Server->>Server: parse session, build AggregateContext
        Server->>Server: aggregate.collect(context)
        Server->>Storage: cache per-session result (fire-and-forget)
    end

    Note over Server: Phase 3 — Reduce
    Server->>Server: aggregate.reduce(collectedSessions)
    Server-->>AggClient: AggregatePayload (aggregates, totals)
    AggClient->>AggClient: render AggregateCustomTable
    User->>AggClient: click column header
    AggClient->>AggClient: toggle sort & re-render
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through sessions, one by one,

Collected crumbs until the work was done,
Reduced them down to tables neat and bright,
Sorted columns twinkling in the night,
A rabbit cheers — aggregates take flight!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers the key changes and includes a test plan, but is missing the required 'Type of Change' checkbox section and the full 'Checklist' section from the template (lint/tsc/test/build checks). Add the missing 'Type of Change' and 'Checklist' sections from the template. Mark appropriate checkboxes for 'New feature' and the testing items that were completed.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Dashboard aggregates with mandatory collect+reduce' directly and clearly summarizes the main feature added in this PR: a new aggregate API that requires explicit collect and reduce handlers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/aggregated-evals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a comprehensive dashboard aggregation system for cross-session analytics, along with significant performance improvements to prevent memory issues in large workspaces. The changes span two versions (0.5.5 and 0.5.6) and include a new public API, incremental computation infrastructure, and server-side filtering/pagination.

Changes:

  • New aggregate API: Adds app.dashboard.aggregate() with mandatory { collect, reduce } pattern for defining cross-session summary tables with full control over output columns and rows
  • Performance optimizations: Implements LRU-bounded runtime cache, incremental dashboard index to avoid reprocessing unchanged sessions, and server-side filtering/pagination to reduce payload sizes
  • Memory safety: Eliminates OOM crashes on large workspaces by capping the runtime cache at 20 entries and bypassing it during dashboard filter computation

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
package.json Version bump to 0.5.6
lib/runtime-cache.ts Adds optional LRU eviction with maxSize parameter
lib/log-entries.ts Applies maxSize limit (20) to session log cache
lib/evals/dashboard-types.ts Defines aggregate types (AggregateContext, CollectedSession, AggregateTableRow, etc.) and serialization types for server-side filtering
lib/evals/dashboard-registry.ts Adds aggregate registry functions (registerAggregate, getAggregatesForView, etc.)
lib/evals/app.ts Exposes aggregate API on DashboardBuilder and DashboardViewBuilder
lib/evals/index.ts Exports new aggregate types for public API
app/actions/compute-dashboard.ts Implements incremental DashboardIndex with server-side filtering and pagination
app/actions/compute-aggregates.ts New server action that runs collect/reduce phases with incremental caching
app/dashboard/dashboard-client.tsx Refactored for server-side filtering with debounced re-fetches
app/dashboard/dashboard-sessions-table.tsx Updated for server-side pagination
app/dashboard/aggregate-client.tsx Client component that fetches and displays aggregates
app/dashboard/aggregate-section.tsx Wrapper component for individual aggregate tables
app/dashboard/aggregate-custom-table.tsx Sortable table component for aggregate results
app/dashboard/page.tsx Integrates AggregateClient for default view
app/dashboard/[viewName]/page.tsx Integrates AggregateClient for named views
app/actions/list-dashboard-views.ts Includes aggregate count in view metadata
docs/api-reference.md Documents aggregate API with examples and type signatures
README.md Adds aggregate feature documentation with examples
CHANGELOG.md Details 0.5.5 and 0.5.6 changes
__tests__/lib/runtime-cache.test.ts Comprehensive LRU eviction tests
test_evals.js Example aggregate usage for testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +417 to +421
'Pass Rate': collected.filter(s => s.values[`${name}_pass`]).length / collected.length,
'Avg Score': collected.reduce((sum, s) => {
const v = s.values[`${name}_score`];
return sum + (typeof v === 'number' ? v : 0);
}, 0) / collected.length,
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Division by zero risk when collected array is empty. If no sessions match the aggregate's condition or if all sessions fail during collect, the division by collected.length will result in Infinity. Consider adding a safeguard like const n = collected.length || 1; or checking if the array is empty before performing divisions.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the README comment — this is a documentation example, not runtime code. The reduce function is fully user-authored, so edge case handling is the user's responsibility.

Comment on lines +234 to +236
return Array.from(evalNames).map(name => ({
'Eval': name,
'Pass Rate': collected.filter(s => s.values[`${name}_pass`]).length / collected.length,
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Division by zero risk when collected array is empty. If no sessions match the aggregate's condition or if all sessions fail during collect, the division by collected.length will result in Infinity. Consider adding a safeguard like const n = collected.length || 1; or checking if the array is empty before performing divisions.

Suggested change
return Array.from(evalNames).map(name => ({
'Eval': name,
'Pass Rate': collected.filter(s => s.values[`${name}_pass`]).length / collected.length,
const totalSessions = collected.length || 1;
return Array.from(evalNames).map(name => ({
'Eval': name,
'Pass Rate': collected.filter(s => s.values[`${name}_pass`]).length / totalSessions,

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a documentation example, not runtime code. The reduce function is user-authored — edge case handling is the user's responsibility. The AggregateCustomTable component already renders "No data returned from reduce function." when rows are empty. Adding defensive guards to doc examples adds noise to what should be concise illustrative code.

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing cleanup for debounce timer on component unmount. The debounceRef.current timeout should be cleared when the component unmounts to prevent calling state setters on an unmounted component. Add a cleanup function to clear the timeout on unmount or when the component re-renders.

Suggested change
useEffect(() => {
return () => {
if (debounceRef.current) {
clearTimeout(debounceRef.current);
}
};
}, []);

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in e53914c — added useEffect cleanup that clears the debounce timer on unmount.

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming convention for global registry keys. The key uses __VIZAUDE_DASHBOARD_INDEX__ but the rest of the codebase uses __CLAUDEYE_ prefix for global keys. Consider renaming to __CLAUDEYE_DASHBOARD_INDEX__ for consistency with other registry keys like __CLAUDEYE_EVAL_REGISTRY__, __CLAUDEYE_DASHBOARD_FILTER_REGISTRY__, etc.

Suggested change
const INDEX_KEY = "__CLAUDEYE_DASHBOARD_INDEX__";

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in e53914c — renamed to __CLAUDEYE_DASHBOARD_INDEX__ for consistency.

contentHashes: Map<string, string>;
}

const INDEX_KEY = "__VIZAUDE_AGGREGATE_INDEX__";
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming convention for global registry keys. The key uses __VIZAUDE_AGGREGATE_INDEX__ but the rest of the codebase uses __CLAUDEYE_ prefix for global keys. Consider renaming to __CLAUDEYE_AGGREGATE_INDEX__ for consistency with other registry keys like __CLAUDEYE_EVAL_REGISTRY__, __CLAUDEYE_DASHBOARD_FILTER_REGISTRY__, etc.

Suggested change
const INDEX_KEY = "__VIZAUDE_AGGREGATE_INDEX__";
const INDEX_KEY = "__CLAUDEYE_AGGREGATE_INDEX__";

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in e53914c — renamed to __CLAUDEYE_AGGREGATE_INDEX__ for consistency.

Comment on lines 28 to 48
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expired cache entries are not removed before checking capacity. When a cache hit occurs but the entry has expired (line 31 check fails), the expired entry remains in the cache Map. If the cache is at maxSize, line 43 will evict the oldest entry instead of replacing the expired one, potentially keeping more entries than maxSize. Consider removing expired entries before inserting new ones, or clearing expired entries when checking capacity.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a non-issue. When an expired entry is found (line 31 check fails), execution falls through to fn(...args), then cache.set(key, ...) at line 48 overwrites the expired entry in-place because it's the same key — the Map size doesn't increase. The only theoretical concern would be a different key's expired entry taking up space, but with a cap of 20 entries and natural replacement on next access, adding a scan-all-entries path on every set would be unnecessary overhead.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
app/dashboard/page.tsx (1)

69-71: Gate default aggregates render to avoid unnecessary compute/flash.

Right now the default AggregateClient renders even when no aggregates are registered, which still triggers computeAggregates and a brief loading state. Consider plumbing a hasDefaultAggregates (or default aggregate count) from listDashboardViews and conditionally rendering these blocks.

♻️ Example adjustment (requires listDashboardViews to expose a default-aggregate flag)
-  const { views, hasDefaultFilters } = result;
+  const { views, hasDefaultFilters, hasDefaultAggregates } = result;
...
-          <div className="mt-8">
-            <AggregateClient viewName="default" />
-          </div>
+          {hasDefaultAggregates && (
+            <div className="mt-8">
+              <AggregateClient viewName="default" />
+            </div>
+          )}
...
-          <div className="mt-8">
-            <AggregateClient viewName="default" />
-          </div>
+          {hasDefaultAggregates && (
+            <div className="mt-8">
+              <AggregateClient viewName="default" />
+            </div>
+          )}

Also applies to: 89-91

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/dashboard/page.tsx` around lines 69 - 71, The default AggregateClient is
rendered unconditionally and triggers computeAggregates even when there are no
registered default aggregates; update the dashboard render logic to accept a
flag from listDashboardViews (e.g., hasDefaultAggregates or
defaultAggregateCount) and only render <AggregateClient viewName="default" />
(and the similar block at lines 89-91) when that flag indicates defaults exist;
modify the code paths that call listDashboardViews to return the new flag and
use it in the page component to gate rendering so computeAggregates is not
invoked unnecessarily.
test_evals.js (1)

46-63: Handle empty aggregate input explicitly.

Returning rows when collected is empty yields a table of zeros and bypasses the empty-state message. Consider returning [] to indicate no data.

♻️ Suggested guard for empty aggregates
     reduce: (collected) => {
+      if (collected.length === 0) return [];
       const models = new Map();
       let totalTurns = 0;
       let totalTools = 0;
       let completions = 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test_evals.js` around lines 46 - 63, The reduce function in the provided
test_evals.js aggregates metrics over the collected array but still returns a
table of zeroed metrics when collected is empty; change the logic in the reduce
reducer (the function named reduce that takes collected) to explicitly check if
collected.length === 0 and return [] immediately so the empty-state handling can
run instead of emitting rows of zeros; update references inside reduce
(totalTurns/totalTools/completions/models calculation and the final return) to
only run when collected is non-empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/actions/compute-aggregates.ts`:
- Around line 233-239: The code currently builds filterValues by only adding
results where !r.skipped && !r.error, which omits keys and violates the
FilterValue contract; change the loop that processes runAllFilters(...)
(filterSummary.results) to always set filterValues[r.name] = r.value for every
result (regardless of r.skipped or r.error) so each filter key is populated with
the returned FilterValue object; keep any existing logic that handles
skipped/error consumers unchanged but ensure filterValues is fully populated by
function runAllFilters and the variable filterValues.
- Around line 154-165: The loop updates index.contentHashes immediately which
can leave hashes advanced when a subsequent per-session collect fails, causing
stale aggregates; change the flow in the code that iterates over hashResults
(using result.value { key, contentHash, project, file }) to only set
index.contentHashes.set(key, contentHash) after the collect for that key
completes successfully (i.e., after the work that consumes toCompute for that
key finishes), or alternatively collect a provisional list of {key, task,
contentHash} and update index.contentHashes only once the collect step for each
entry succeeds; ensure index.collectedSessions remains consistent with
contentHashes by updating both together after successful collect rather than
before.

In `@app/dashboard/aggregate-custom-table.tsx`:
- Around line 61-71: The header cells rendered in the columns.map currently use
a clickable <th> with onClick(toggleSort) which isn't keyboard-accessible and
lacks aria-sort; make the header interactive and accessible by making the <th>
focusable (add tabIndex={0} or replace with a <button>/role="button"), add an
onKeyDown handler that triggers toggleSort(col) for Enter and Space, and expose
current sort state via aria-sort on the header (compute
"none"/"ascending"/"descending" from your sort state and apply it to the same
<th> rendered in columns.map); keep the ArrowUpDown icon and ensure toggleSort
and the sort state variables are referenced to determine aria-sort.

---

Nitpick comments:
In `@app/dashboard/page.tsx`:
- Around line 69-71: The default AggregateClient is rendered unconditionally and
triggers computeAggregates even when there are no registered default aggregates;
update the dashboard render logic to accept a flag from listDashboardViews
(e.g., hasDefaultAggregates or defaultAggregateCount) and only render
<AggregateClient viewName="default" /> (and the similar block at lines 89-91)
when that flag indicates defaults exist; modify the code paths that call
listDashboardViews to return the new flag and use it in the page component to
gate rendering so computeAggregates is not invoked unnecessarily.

In `@test_evals.js`:
- Around line 46-63: The reduce function in the provided test_evals.js
aggregates metrics over the collected array but still returns a table of zeroed
metrics when collected is empty; change the logic in the reduce reducer (the
function named reduce that takes collected) to explicitly check if
collected.length === 0 and return [] immediately so the empty-state handling can
run instead of emitting rows of zeros; update references inside reduce
(totalTurns/totalTools/completions/models calculation and the final return) to
only run when collected is non-empty.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9698abd and 532895a.

📒 Files selected for processing (16)
  • CHANGELOG.md
  • README.md
  • app/actions/compute-aggregates.ts
  • app/actions/list-dashboard-views.ts
  • app/dashboard/[viewName]/page.tsx
  • app/dashboard/aggregate-client.tsx
  • app/dashboard/aggregate-custom-table.tsx
  • app/dashboard/aggregate-section.tsx
  • app/dashboard/page.tsx
  • docs/api-reference.md
  • lib/evals/app.ts
  • lib/evals/dashboard-registry.ts
  • lib/evals/dashboard-types.ts
  • lib/evals/index.ts
  • package.json
  • test_evals.js

Comment on lines +233 to +239
const filterValues: Record<string, FilterValue> = {};
if (filters.length > 0) {
const filterSummary = await runAllFilters(rawLines, stats, project.name, sid, filters);
for (const r of filterSummary.results) {
if (!r.skipped && !r.error) {
filterValues[r.name] = r.value;
}
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Always populate filterValues to match the FilterValue contract.
runAllFilters already returns a value even on skipped/error results, but this logic drops those entries. That leaves missing keys in filterValues, which breaks consumers that expect a FilterValue for each filter.

✅ Suggested fix
-              for (const r of filterSummary.results) {
-                if (!r.skipped && !r.error) {
-                  filterValues[r.name] = r.value;
-                }
-              }
+              for (const r of filterSummary.results) {
+                filterValues[r.name] = r.value;
+              }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/actions/compute-aggregates.ts` around lines 233 - 239, The code currently
builds filterValues by only adding results where !r.skipped && !r.error, which
omits keys and violates the FilterValue contract; change the loop that processes
runAllFilters(...) (filterSummary.results) to always set filterValues[r.name] =
r.value for every result (regardless of r.skipped or r.error) so each filter key
is populated with the returned FilterValue object; keep any existing logic that
handles skipped/error consumers unchanged but ensure filterValues is fully
populated by function runAllFilters and the variable filterValues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional and matches the same pattern used in compute-dashboard.ts. A skipped filter was never computed — it has no meaningful value. An errored filter has no valid value. The AggregateContext.filterValues type is Record<string, FilterValue>, not Record<string, FilterValue | undefined> — including skipped/errored results would require providing a fabricated value. Collect functions already handle missing keys gracefully (the key simply isn't present). The guard is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the filterValues pattern in compute-dashboard.ts matches the pattern in compute-aggregates.ts
rg -n -A5 -B2 "filterValues\[r\.name\]" app/actions/compute-dashboard.ts

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Also look at the broader filter handling context in compute-dashboard.ts
rg -n -A10 "runAllFilters" app/actions/compute-dashboard.ts | head -30

Length of output: 1116


🏁 Script executed:

#!/bin/bash
# Check the FilterValue type definition to understand the contract
rg -n "type FilterValue|interface FilterValue" lib/evals/

Length of output: 84


@NiveditJain, you're absolutely right—thank you for the clarification! I've verified that compute-dashboard.ts uses the identical pattern at lines 336-339, and your explanation makes perfect sense. Record<string, FilterValue> doesn't require all keys to be present; it just constrains the type when they are. Including skipped or errored filters would indeed require fabricating values or changing the type contract. The guard is correct, and I apologize for the noise.

ᏊˊᄇˋᏊ learned something new


✏️ Learnings added
Learnt from: NiveditJain
Repo: exospherehost/claudeye PR: 29
File: app/actions/compute-aggregates.ts:233-239
Timestamp: 2026-02-24T08:23:08.197Z
Learning: In the claudeye project, `filterValues` in `compute-dashboard.ts` and `compute-aggregates.ts` intentionally omits skipped and errored filters using the guard `if (!result.skipped && !result.error)`. This matches the type contract `Record<string, FilterValue>` (where FilterValue = boolean | number | string) which allows missing keys. Skipped filters were never computed and errored filters have no valid value, so they should not be included. Consumers handle missing keys gracefully.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: NiveditJain
Repo: exospherehost/claudeye PR: 11
File: app/actions/run-subagent-enrichments.ts:53-54
Timestamp: 2026-02-15T13:56:52.379Z
Learning: In the claudeye project's 0.5.0 unified data model, server actions like `runSubagentEnrichments` in `app/actions/run-subagent-enrichments.ts` intentionally pass combined session + all subagent data (from `getCachedSessionLog`) directly to enrichers without server-side filtering by `agentId`. It is the responsibility of individual enricher implementations to filter by `_source` field if they need agent-specific data. This design provides flexibility for enrichers to analyze either scoped or combined data as needed.

NiveditJain and others added 2 commits February 24, 2026 00:20
…cleanup

- Defer contentHashes update until after successful collect to prevent
  stale aggregates when collect fails
- Rename __VIZAUDE_* globalThis keys to __CLAUDEYE_* for consistency
- Add useEffect cleanup for debounce timer on unmount

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each eval result row now has a refresh button that reruns just that
eval, bypassing cache and merging the result back in-place. Supports
both session and subagent scopes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/components/eval-results-panel.tsx (1)

144-215: ⚠️ Potential issue | 🟠 Major

Prevent stale per-row rerun results from overwriting a newer full rerun.

If a user clicks Re-run while a row rerun is in-flight, the row rerun can finish later and merge older data into the fresh summary. Add a lightweight sequencing guard so per-row reruns are ignored when a newer full run starts.

🔧 Suggested fix (sequence guard)
-  const abortRef = useRef<AbortController | null>(null);
+  const abortRef = useRef<AbortController | null>(null);
+  const fullRunSeqRef = useRef(0);

  const run = useCallback(async (forceRefresh = false) => {
+    const seq = ++fullRunSeqRef.current;
     // Abort any in-flight request
     abortRef.current?.abort();
     const controller = new AbortController();
     abortRef.current = controller;
@@
-      if (controller.signal.aborted) return;
+      if (controller.signal.aborted || seq !== fullRunSeqRef.current) return;
@@
-      if (controller.signal.aborted) return;
+      if (controller.signal.aborted || seq !== fullRunSeqRef.current) return;
@@
-      if (!controller.signal.aborted) {
+      if (!controller.signal.aborted && seq === fullRunSeqRef.current) {
         setLoading(false);
       }
     }
   }, [projectName, sessionId, agentId, subagentType, subagentDescription]);

  const runSingleEval = useCallback(async (evalName: string) => {
+    const seqAtStart = fullRunSeqRef.current;
     try {
@@
-      setSummary((prev) => {
+      if (seqAtStart !== fullRunSeqRef.current) return;
+      setSummary((prev) => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/components/eval-results-panel.tsx` around lines 144 - 215, Row-level
reruns (runSingleEval) can finish after a newer full run (run) and overwrite
fresh summary; introduce a lightweight sequence guard: add a useRef number
(e.g., runSeqRef) that you increment at the start of run() (abortRef handling
place), capture the current seq into runSingleEval before calling
runEvals/runSubagentEvals, and when the row response arrives compare captured
seq to runSeqRef.current and ignore/return if they differ so stale per-row
results are not merged; update references in run, runSingleEval, and the abort
logic to use runSeqRef for sequencing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/components/eval-results-panel.tsx`:
- Around line 112-119: The rerun button in eval-results-panel (button using
onClick={handleRerun} and the RefreshCw icon) lacks an accessible name; add an
aria-label (for example aria-label="Re-run evaluation" or similar descriptive
text) to the button element so screen readers receive a proper accessible name
while keeping the existing title, disabled state, and icon/animation logic
(preserve props: onClick={handleRerun}, disabled={rerunning || globalLoading},
and the RefreshCw usage).

---

Outside diff comments:
In `@app/components/eval-results-panel.tsx`:
- Around line 144-215: Row-level reruns (runSingleEval) can finish after a newer
full run (run) and overwrite fresh summary; introduce a lightweight sequence
guard: add a useRef number (e.g., runSeqRef) that you increment at the start of
run() (abortRef handling place), capture the current seq into runSingleEval
before calling runEvals/runSubagentEvals, and when the row response arrives
compare captured seq to runSeqRef.current and ignore/return if they differ so
stale per-row results are not merged; update references in run, runSingleEval,
and the abort logic to use runSeqRef for sequencing.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 532895a and 2e231b5.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • README.md
  • app/actions/compute-aggregates.ts
  • app/actions/compute-dashboard.ts
  • app/actions/run-evals.ts
  • app/actions/run-session-action.ts
  • app/actions/run-subagent-evals.ts
  • app/components/eval-results-panel.tsx
  • app/dashboard/dashboard-client.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@NiveditJain
Copy link
Member Author

Code review

Found 2 issues:

  1. runSingleEval does not participate in the abortRef abort mechanism, creating a race condition with the global "Re-run" button. The global run() function uses abortRef to abort in-flight requests and guards setSummary calls with controller.signal.aborted checks (lines 146-148). runSingleEval has no such mechanism. If a user clicks a per-eval re-run, then clicks the global "Re-run" before it completes, the global run can finish first with fresh results, then the stale single-eval result overwrites the correct fresh result for that eval via the setSummary merge at line 190.

const runSingleEval = useCallback(async (evalName: string) => {
try {
const result = agentId
? await runSubagentEvals(projectName, sessionId, agentId, subagentType, subagentDescription, false, evalName)
: await runEvals(projectName, sessionId, false, evalName);
if (!result.ok) {
setError(result.error);
return;
}
if (!result.hasEvals) return;
// Merge the single fresh result into existing summary
const freshResult = result.summary.results[0];
if (!freshResult) return;
setSummary((prev) => {
if (!prev) return prev;
const updatedResults = prev.results.map((r) =>
r.name === freshResult.name ? freshResult : r,
);
let passCount = 0, failCount = 0, errorCount = 0, skippedCount = 0;
for (const r of updatedResults) {
if (r.skipped) skippedCount++;
else if (r.error) errorCount++;
else if (r.pass) passCount++;
else failCount++;
}
return {
...prev,
results: updatedResults,
passCount,
failCount,
errorCount,
skippedCount,
};
});
setCached(false);
} catch (e) {
setError(e instanceof Error ? e.message : "Failed to rerun eval");
}
}, [projectName, sessionId, agentId, subagentType, subagentDescription]);

  1. Empty-string contentHash sentinel can permanently pin stale session state. When hashSessionFile throws, contentHash is set to "". If two consecutive calls fail to hash the same session, the stored hash "" matches the new hash "", and the session is permanently skipped -- even if the file later becomes readable. This was previously flagged on PR High-performance dashboard filters with OOM fix (v0.5.5) #18 against compute-dashboard.ts (same pattern at line 282) and was copied into the new compute-aggregates.ts. Consider using a unique sentinel (e.g., crypto.randomUUID()) so a hash failure never matches a stored hash.

let contentHash: string;
try {
contentHash = await hashSessionFile(project.name, sid);
} catch {
contentHash = "";
}
return { key, contentHash, project, file };
}),

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/components/eval-results-panel.tsx`:
- Around line 191-210: When merging freshResult in the setSummary updater (the
callback passed to setSummary), also recompute summary.totalDurationMs from the
updatedResults instead of leaving it unchanged: after creating updatedResults
(which replaces the matching result by freshResult) sum each result's duration
field (e.g., result.durationMs or 0 if missing) into totalDurationMs and return
it along with passCount, failCount, errorCount, and skippedCount so the duration
display reflects the rerun immediately.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e231b5 and 3781500.

📒 Files selected for processing (1)
  • app/components/eval-results-panel.tsx

…ration

- Add singleEvalAbortRef so run() aborts in-flight single-eval requests,
  preventing stale per-eval results from overwriting a fresh global rerun
- Replace contentHash = "" with crypto.randomUUID() on hash failure so
  consecutive failures never match and permanently skip sessions
- Recompute totalDurationMs in setSummary updater after single-eval rerun

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
app/actions/compute-aggregates.ts (1)

259-274: Pass the already-built context to agg.condition instead of the stripped condCtx.

The full AggregateContext (including evalResults, enrichResults, and filterValues) is computed at lines 244–253 before this loop. The inline condCtx object is a manual subset that unnecessarily prevents condition authors from writing predicates like ctx.filterValues["myFilter"] === true. Since TypeScript allows passing a structural supertype to a function expecting a subset (no excess-property check on variables), this is a free improvement.

♻️ Proposed fix
         if (agg.condition) {
-          const condCtx = { entries: rawLines, stats, projectName: project.name, sessionId: sid, source: "session" };
-          const condResult = await agg.condition(condCtx);
+          const condResult = await agg.condition(context);
           if (!condResult) continue;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/actions/compute-aggregates.ts` around lines 259 - 274, The loop that
evaluates aggregate conditions currently builds and passes a stripped condCtx to
agg.condition, which prevents condition functions from accessing the full
AggregateContext (including evalResults, enrichResults, filterValues); change
the call to pass the already-built context object (the AggregateContext computed
at lines just above the loop) into agg.condition and remove or stop using the
ad-hoc condCtx object; update any related variable usage in this block so
agg.condition(context) is awaited and behaves the same on truthy/falsy results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/actions/compute-aggregates.ts`:
- Around line 256-294: The code flattens per-aggregate collected maps into
mergedValues via Object.assign(mergedValues, values) (and similarly on
cache-restore), causing silent key collisions when different aggregates emit the
same key; fix by preserving per-aggregate scoping instead of flattening: either
namespace flat keys when building mergedValues (e.g., prefix keys with
`${agg.name}/`) before assigning, or change the CollectedSession.payload (and
the reduce call sites) to pass the scoped map allCollectedValues[agg.name] to
each aggregate's reduce so each aggregate only receives its own values; update
setCachedResult usage and the restore path to keep the same per-aggregate shape
(allCollectedValues) and adjust any consumers of CollectedSession.values to
expect namespaced keys or a per-aggregate map accordingly.

---

Nitpick comments:
In `@app/actions/compute-aggregates.ts`:
- Around line 259-274: The loop that evaluates aggregate conditions currently
builds and passes a stripped condCtx to agg.condition, which prevents condition
functions from accessing the full AggregateContext (including evalResults,
enrichResults, filterValues); change the call to pass the already-built context
object (the AggregateContext computed at lines just above the loop) into
agg.condition and remove or stop using the ad-hoc condCtx object; update any
related variable usage in this block so agg.condition(context) is awaited and
behaves the same on truthy/falsy results.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3781500 and 168b02c.

📒 Files selected for processing (3)
  • app/actions/compute-aggregates.ts
  • app/actions/compute-dashboard.ts
  • app/components/eval-results-panel.tsx

@NiveditJain
Copy link
Member Author

Code review

Found 3 issues:

  1. compute-aggregates.ts uses getRegisteredEvals() (returns all scopes) instead of getSessionScopedEvals() when building evalResults for aggregate context. Every other session-level action (run-evals.ts line 29) uses getSessionScopedEvals. This causes subagent-scoped evals to run against session-level data, producing incorrect results that are then cached to disk.

import { getRegisteredEvals, hasEvals } from "@/lib/evals/registry";
import { getRegisteredEnrichers, hasEnrichers } from "@/lib/evals/enrich-registry";

// Pre-fetch registry info to avoid repeated calls
const registeredEvals = hasEvals() ? getRegisteredEvals() : [];
const registeredEnrichers = hasEnrichers() ? getRegisteredEnrichers() : [];

Compare with run-evals.ts:

import { getSessionScopedEvals } from "@/lib/evals/registry";

getItems: getSessionScopedEvals,

  1. Same issue for enrichers: getRegisteredEnrichers() instead of getSessionScopedEnrichers(). The established pattern in run-enrichments.ts (line 27) uses getSessionScopedEnrichers.

// Pre-fetch registry info to avoid repeated calls
const registeredEvals = hasEvals() ? getRegisteredEvals() : [];
const registeredEnrichers = hasEnrichers() ? getRegisteredEnrichers() : [];

Compare with run-enrichments.ts:

import { getSessionScopedEnrichers } from "@/lib/evals/enrich-registry";

getItems: getSessionScopedEnrichers,

  1. AggregateClient is rendered unconditionally on the default dashboard page (lines 70, 90), causing an unnecessary computeAggregates server action round-trip and loading flash for users with no aggregates. The named view page correctly guards this with viewInfo.aggregateCount > 0.

<div className="mt-8">
<AggregateClient viewName="default" />
</div>

Compare with the guarded version in the named view page:

{viewInfo && viewInfo.aggregateCount > 0 && (
<div className="mt-8">
<AggregateClient viewName={decodedViewName} />
</div>
)}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

…teClient

- Use getSessionScopedEvals/getSessionScopedEnrichers instead of
  getRegisteredEvals/getRegisteredEnrichers in compute-aggregates.ts
  to match the established pattern in run-evals.ts and run-enrichments.ts
- Guard AggregateClient renders on default dashboard page behind
  hasDefaultAggregates check to avoid unnecessary server round-trips

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
app/actions/compute-aggregates.ts (1)

166-294: ⚠️ Potential issue | 🟠 Major

Potential key collisions when flattening aggregate values.

Both the cache-restore path and the live collect path flatten per-aggregate maps into one mergedValues object. If two aggregates emit the same key, later ones silently overwrite earlier values, and reduce() sees corrupted input. Consider namespacing keys (e.g., ${agg.name}/key) or passing per-aggregate values into each reduce() so aggregates only see their own data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/actions/compute-aggregates.ts` around lines 166 - 294, The flattening of
per-aggregate maps into mergedValues causes key collisions (seen in the cache
restore branch and live collect loop using mergedValues), so change to a
collision-safe representation: either keep allCollectedValues as a map keyed by
agg.name (i.e., do not Object.assign into a flat mergedValues) and update
downstream reduce() consumers to iterate per-aggregate entries, or namespace
flattened keys consistently (e.g., `${agg.name}/${key}`) both when writing in
agg.collect results, when merging in the live path (in the loop around
agg.collect and Object.assign), and when restoring from getCachedResult; ensure
setCachedResult and the cached-restore branch use the same format so reduce()
and other consumers read per-aggregate values without overwrite.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@app/actions/compute-aggregates.ts`:
- Around line 166-294: The flattening of per-aggregate maps into mergedValues
causes key collisions (seen in the cache restore branch and live collect loop
using mergedValues), so change to a collision-safe representation: either keep
allCollectedValues as a map keyed by agg.name (i.e., do not Object.assign into a
flat mergedValues) and update downstream reduce() consumers to iterate
per-aggregate entries, or namespace flattened keys consistently (e.g.,
`${agg.name}/${key}`) both when writing in agg.collect results, when merging in
the live path (in the loop around agg.collect and Object.assign), and when
restoring from getCachedResult; ensure setCachedResult and the cached-restore
branch use the same format so reduce() and other consumers read per-aggregate
values without overwrite.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 168b02c and 439f8e8.

📒 Files selected for processing (3)
  • app/actions/compute-aggregates.ts
  • app/actions/list-dashboard-views.ts
  • app/dashboard/page.tsx

@NiveditJain NiveditJain merged commit d3daf99 into main Feb 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant