Eval-design: Visium z-score mean metric is invariant and trivially passable without preprocessing

# Eval-design: Visium z-score mean metric is invariant and trivially passable without preprocessing

## Description
The eval `merfish_brain_log_zscore_gad2_mean` computes the mean Z-scored expression of `Gad2` after normalization and z-scoring. This metric is invariant by construction (mean ≈ 0), making the eval gameable.

An agent can return `0` without loading data or executing any steps and still pass.

The original intent was to verify that the agent can (1) execute a multi-step preprocessing pipeline and (2) compute a derived statistic. The current metric does not achieve that.

## Steps to Reproduce

Run vizgen/normalization/merfish_brain_log_zscore_gad2_mean.json

## Expected vs Actual

**Expected:**  
The output should depend on actually executing preprocessing and computing on the data.

**Actual:**  
The eval passes with a constant output, independent of data or computation.

## Environment
- Dataset: `vizgen_mouse_brain_aging_raw.h5ad`
- Eval: `merfish_brain_log_zscore_gad2_mean`
- Type: `numeric_tolerance`
- Steps: normalize → log1p → z-score

## Proposed Fix
Replace the mean z-score with a non-invariant summary of the z-scored values, e.g:

- Fraction of cells with `Gad2` z-score > 1  
- 95th percentile of `Gad2` z-scores  
- Variance / IQR of `Gad2` z-scores  

Example:
```
After z-scoring, compute the fraction of cells with Gad2 z-score > 1.
Return EXACTLY: {"pct_cells_gad2_z_gt_1": <float>}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval-design: Visium z-score mean metric is invariant and trivially passable without preprocessing #2

Eval-design: Visium z-score mean metric is invariant and trivially passable without preprocessing

Description

Steps to Reproduce

Expected vs Actual

Environment

Proposed Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval-design: Visium z-score mean metric is invariant and trivially passable without preprocessing #2

Description

Eval-design: Visium z-score mean metric is invariant and trivially passable without preprocessing

Description

Steps to Reproduce

Expected vs Actual

Environment

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions