Column scoping and structural metadata — defining which columns to include, key alignment, and normalization rules for report tools like rvl, shape, and compare.
No AI. No inference. Pure deterministic configuration and validation.
brew install cmdrvl/tap/profile- What:
profilecreates YAML profiles that define column scope, keys, and normalization for report tools. - Why: It replaces ad-hoc column lists and one-off CLI flags with versioned, deterministic, reusable config.
- How: Draft from real data, validate/lint, freeze to immutable+hashable, then pass
--profiletoshape/rvl/compare.
| Feature | What It Does |
|---|---|
| Column scoping | Declare which columns matter — report tools only analyze include_columns |
| Key declaration | Specify the join/alignment key — no more guessing which column is the identifier |
| Normalization rules | Float precision, string trimming, order invariance — consistent across tools |
| Versioned & frozen | Each profile has a version and SHA-256 content hash — immutable once frozen |
| Drafting workflow | profile draft init reads a CSV header and generates a starting profile |
| Validation | profile lint catches schema drift between profile and dataset |
| Tool-agnostic | One profile consumed by shape, rvl, compare, and lock |
# 1) Create a draft profile from a real dataset
profile draft init loan_tape.csv --out loan_tape.draft.yaml
# 2) Validate schema and lint against the dataset
profile validate loan_tape.draft.yaml
profile lint loan_tape.draft.yaml --against loan_tape.csv
# 3) Freeze to an immutable profile
profile freeze loan_tape.draft.yaml \
--family csv.loan_tape.core \
--version 0 \
--out profiles/csv.loan_tape.core.v0.yaml
# 4) Use the frozen profile in report tools
shape old.csv new.csv --profile profiles/csv.loan_tape.core.v0.yaml --json
rvl old.csv new.csv --profile profiles/csv.loan_tape.core.v0.yaml --json
compare old.csv new.csv --profile profiles/csv.loan_tape.core.v0.yaml --jsonprofile draft init loan_tape.csv --out loan_tape.draft.yaml
# writes: loan_tape.draft.yaml
profile lint loan_tape.draft.yaml --against loan_tape.csv
# exit 0 (or exit 1 with deterministic lint issues)
profile freeze loan_tape.draft.yaml \
--family csv.loan_tape.core \
--version 0 \
--out profiles/csv.loan_tape.core.v0.yaml
# writes: profiles/csv.loan_tape.core.v0.yaml
# frozen profile includes: profile_id, profile_family, profile_version, profile_sha256A draft is cheap to iterate. A frozen profile is immutable and hashable for reproducible downstream analysis.
profile is a metadata tool that configures how report tools operate.
┌── shape ──┐
vacuum → hash → lock │ │
├── rvl ────┤ ← --profile
│ │
└── compare ┘
profile ───────────────────┘
Profile doesn't sit in the stream pipeline (vacuum → hash → lock). Instead, it produces configuration files that report tools consume via --profile. Lock records which profiles were active in its profiles array.
| If you need... | Use |
|---|---|
| Enumerate files in a directory | vacuum |
| Compute content hashes | hash |
| Match files against templates | fingerprint |
| Pin artifacts into a lockfile | lock |
| Check structural comparability | shape |
| Explain numeric changes | rvl |
profile only answers: which columns matter, what's the key, and how should values be compared?
A profile is a YAML file with a defined schema:
profile_id: "csv.loan_tape.core.v0"
profile_version: 0
include_columns:
- loan_id
- current_balance
- note_rate
- maturity_date
- property_type
- occupancy
key: ["loan_id"]
equivalence:
order: "order-invariant"
float_decimals: 6
trim_strings: true| Field | Type | Description |
|---|---|---|
profile_id |
string | Unique identifier with version suffix |
profile_version |
integer | Monotonically increasing version number |
include_columns |
string[] | Columns to include in analysis (others ignored) |
key |
string[] | Column(s) used for row alignment/joining |
equivalence.order |
string | "order-invariant" or "order-sensitive" |
equivalence.float_decimals |
integer | Decimal places for float comparison |
equivalence.trim_strings |
boolean | Trim whitespace before string comparison |
Once frozen, a profile is immutable:
profile_id: "csv.loan_tape.core.v0"
profile_version: 0
profile_sha256: "sha256:a1b2c3d4e5f6..."
frozen: true
# ... rest of profileAny semantic change requires a new profile_version and a new profile_id.
Generate a draft profile from a CSV header:
profile draft init loan_tape.csv --out loan_profile.yamlAuto-populates include_columns from the header. You edit the draft to remove unwanted columns and set the key.
Rank candidate key columns by uniqueness, null rate, and deterministic order:
profile suggest-key loan_tape.csv
# loan_id: unique=100%, nulls=0%, type=string ← recommended
# property_id: unique=85%, nulls=0%, type=stringValidate a profile against a dataset:
profile lint loan_profile.yaml --against loan_tape.csvCatches: missing columns, non-unique keys, type mismatches, schema drift.
Surface structural statistics about a dataset:
profile stats loan_tape.csv
# rows: 1,247 | columns: 42 | nulls: 3.2% | key candidates: loan_id, property_idValidate and mark a profile immutable with SHA-256 content hash:
profile freeze loan_profile.yaml \
--family csv.loan_tape.core \
--version 0 \
--out profiles/csv.loan_tape.core.v0.yaml| Capability | profile | Manual column lists | Config files | SQL views |
|---|---|---|---|---|
| Versioned and frozen | Yes | No | No | No |
| Content hash (tamper-evident) | Yes | No | No | No |
| Validated against dataset | Yes (lint) |
No | No | At query time |
| Key declaration | Yes | Ad-hoc | Ad-hoc | Yes |
| Normalization rules | Yes | No | No | No |
| Cross-tool (shape/rvl/compare) | Yes | No | No | No |
| Draft from header | Yes (draft init) |
Manual | Manual | Manual |
brew install cmdrvl/tap/profilecurl -fsSL https://raw.githubusercontent.com/cmdrvl/profile/main/scripts/install.sh | bashcargo build --release
./target/release/profile --helpAll report tools accept --profile:
# shape — only check overlap on profile columns
shape old.csv new.csv --profile loan_profile.yaml --json
# rvl — only explain changes in profile columns
rvl old.csv new.csv --profile loan_profile.yaml --json
# compare — only diff profile columns
compare old.csv new.csv --profile loan_profile.yaml --jsonlock records which profiles were active:
{
"profiles": [
{
"profile_id": "csv.loan_tape.core.v0",
"profile_version": 0,
"profile_sha256": "sha256:a1b2c3d4..."
}
]
}| Flag | Behavior |
|---|---|
--describe |
Print operator.json and exit 0 before normal input validation |
--schema |
Print profile JSON Schema and exit 0 before normal input validation (deferred in v0.1) |
--version |
Print profile <semver> and exit 0 |
--no-witness |
Suppress witness ledger recording |
| Exit | Meaning | When |
|---|---|---|
0 |
SUCCESS |
Operation completed with no issues |
1 |
ISSUES_FOUND |
Lint/diff found issues or differences |
2 |
REFUSAL |
Invalid input, schema violation, parse/IO refusal, or CLI error |
E_INVALID_SCHEMA, E_MISSING_FIELD, E_BAD_VERSION, E_ALREADY_FROZEN, E_IO, E_CSV_PARSE, E_EMPTY, E_COLUMN_NOT_FOUND
With --json, refusals are emitted in the unified output envelope (outcome=REFUSAL, refusal detail in result). Without --json, refusals are human-readable errors on stderr with the refusal code.
- Witness append is enabled for:
freeze,validate,lint,stats,suggest-key - Witness append is skipped for:
draft new,draft init,list,show,diff,push,pull --no-witnessdisables witness writes without changing domain outcome or exit semantics- Ledger path:
~/.epistemic/witness.jsonl - Witness append failures warn on stderr and do not change primary command outcome/exit code
A column in include_columns doesn't exist in the dataset. Run profile lint to diagnose:
profile lint loan_profile.yaml --against new_tape.csv
# ERROR: Column 'occupancy' in profile not found in datasetThe column(s) declared in key have duplicate values. Use profile suggest-key to find better candidates:
profile suggest-key loan_tape.csvFrozen profiles are immutable. If you need to change columns, create a new version:
# Old: csv.loan_tape.core.v0 (frozen)
# New: csv.loan_tape.core.v1 (add new columns, re-freeze)
profile_id: "csv.loan_tape.core.v1"
profile_version: 1If rvl reports spurious changes, your float_decimals may be too high. Try reducing precision:
equivalence:
float_decimals: 2 # was 6, reduced to match business precisionEnsure the profile file is committed and the path is correct. Profiles are plain YAML — no environment dependencies.
| Limitation | Detail |
|---|---|
| CSV only | v0 profiles scope CSV/TSV columns; XLSX sheet/range scoping is deferred |
| Single key type | Composite keys supported, but only column-based — no expression keys |
| No auto-update | Profile doesn't auto-detect schema changes — use lint to catch drift |
| No profile registry | Profiles are local files — centralized registry is deferred |
| Network publish deferred | push/pull data-fabric wrappers are deferred in v0.1 |
| Pre-release | Implementation in progress — spec is complete in the epistemic spine plan |
Flags don't compose. With 15 columns, a key, and normalization rules, the command line becomes unmanageable. A profile captures all scoping decisions in a versioned, validated, shareable file.
Immutability. Once a profile is frozen and referenced by a lockfile, you can prove that the exact same column scoping was used. Any change requires a new version, creating an audit trail.
Yes — as long as the datasets have the same schema. Use profile lint --against to verify compatibility before use.
They're ignored. Report tools only analyze columns in include_columns. This is the whole point — focus on what matters.
Fingerprint identifies what kind of file something is (template recognition). Profile declares which columns to analyze in report tools. They solve different problems and can be used together.
Without a key, report tools like rvl can't align rows between two datasets. The key column(s) define how rows map from old to new. profile suggest-key helps identify the best candidate.
Yes. profile draft init generates a starting profile from a CSV header. You can also write YAML directly or generate it from any tool.
The profile specification is part of the epistemic spine plan. This README covers intended behavior; implementation is in progress.
cargo fmt --check
cargo clippy --all-targets -- -D warnings
cargo test