A structured constraint framework for Claude Code in research workflows.
Claude Code can be a powerful boost to research workflows if we use it correctly and carefully. Without structure, an unconstrained LLM produces the mean of its training data — ask Claude to "implement RVQ" and you get a plausible-looking average of every RVQ implementation it has seen, not the one that matches your paper, your architecture, your constraints. The output compiles, but it's noisy: wrong assumptions baked in, silent numerical bugs, design decisions made without asking.
The fix isn't better prompts — it's structured constraints. This is the core insight behind obra/superpowers: when we constrain an LLM with domain-specific rules, verification gates, and forced checkpoints, the output goes from "plausible average" to precisely what we need. Propel applies this to research workflows where the cost of undetected noise is highest — a silent broadcasting bug in a loss function doesn't crash, it produces subtly wrong training runs that waste compute.
Propel's constraints are necessary but not sufficient. The framework forces Claude to stop and ask structured questions at every phase transition, but the quality of the output depends entirely on what you bring to those checkpoints:
- Your research question — not "implement X" but "test whether X improves Y under condition Z." The more specific you are, the less Claude has to guess.
- Your hypothesis — what do you expect to happen and why? This is what the auditors verify against. Without a target, Claude cannot tell you when it missed.
- Your method — which paper, which equations, which specific algorithmic choices? Claude cannot infer "use stop-gradient on the codebook as in Section 3.2" from context alone.
- Your domain knowledge — the pitfalls that aren't in any paper, the configurations that look correct but silently fail, the things that only work in your specific setup.
It is critical to review what Claude finds during investigation thoroughly — the investigation README is the blueprint for everything that follows. If the blueprint is correct, the code will be correct. Ask Claude to compare its proposals with what's in the paper, question why it made certain decisions, and have it introspect on its reasoning when something feels off.
Each gate is designed to extract your specific insight before Claude acts on it. Gate 0 asks your research intent, Gate 1 validates its understanding against yours, Gate 2 confirms the plan matches your method. Skipping these means accepting the noisy mean. The more specific your constraints, the less noise in the output.
Propel gives you three places to embed the expertise that makes the difference:
| Where | What to Put There | Why It Matters |
|---|---|---|
| Project CLAUDE.md | Research context, conventions, known pitfalls, what "correct" means for your project | Read on every session — sets the baseline constraints |
| Custom agents | Domain-specific auditors that check what matters in your field (see customization) | Automated verification tuned to your failure modes |
| Gate 0 answers | Your actual research question, hypothesis, success criteria, scope boundaries | The single biggest lever — this is where the mean becomes specific |
A generic "implement the loss function" gets you the average loss function. "Implement equation 7 from [paper], using stop-gradient on the codebook as in section 3.2, with straight-through estimator for the backward pass" gets you what you actually need.
Not every session needs the full pipeline. Propel offers three modes that filter which skills and gates are active. Choose a mode at the start of each session (via /intro or /switch), or default to Engineer.
| Mode | Scope | Active Gates | When to Use |
|---|---|---|---|
| Researcher | Literature, investigation, deep research | Gate 0, Gate 1 | Understanding the problem space — reading papers, tracing code, exploring approaches |
| Engineer | Full pipeline (default) | All (0-4) | Building something — investigation through implementation with all auditors |
| Trainer | Training execution, runtime debugging | Gate 4 (runtime only) | Code is ready — launching training runs, fixing CUDA/OOM/path errors |
- Researcher Mode keeps you in the understanding phase. Implementation skills are paused — if you try to build something, Propel suggests
/switch engineer. - Engineer Mode is the default and matches the existing full Propel workflow. Nothing changes if you always use this mode.
- Trainer Mode scans for training commands, launches them in screen sessions, and fixes runtime bugs. It does NOT touch training logic (architecture, loss, data pipeline) — for those,
/switch engineer.
Switch anytime with /switch researcher, /switch engineer, or /switch trainer. Mode state persists in .propel/mode.json (gitignored) and survives /clear.
Propel enforces five human-in-the-loop gates, dispatches domain-specific auditors after every code change, and maintains living documentation across /clear boundaries.
The full pipeline has seven stages, five human-in-the-loop gates, and two questioner checkpoints (see diagram above):
Intake → Q0 → Investigation → Gate 1 → Q1 → Design → Implementation → Debug → Training → Retrospective
G0 ground G1 findings detail G2 G3 G4 Trainer All
- Gates 0-1 (Researcher + Engineer): Scoping and investigation checkpoints
- Questioner Q0 (Researcher + Engineer): Grounds work in concrete reference implementations, architectures, and examples before investigation
- Questioner Q1 (Researcher + Engineer): Nails down implementation details — interfaces, data formats, edge cases — before design
- Gates 2-3 (Engineer only): Design approval and implementation auditing
- Gate 4 (Engineer + Trainer): Debug diagnosis before applying fixes
- Training (Trainer Mode): Launch runs, monitor, fix runtime errors
- Retrospective (All modes): Capture learnings and failed attempts
The Questioners exist because Claude is great at morphing an existing implementation into what you need, but bad at creating from scratch when the problem is unconstrained. See Pitfalls for details.
At each gate, Claude stops and asks structured questions that reveal design assumptions — never "shall I proceed?" but "should we [A] or [B]? A means [trade-off], B means [trade-off]."
# Clone and install
git clone https://github.com/KevinBian107/propel.git
cd propel && pip install -e .
# Initialize in any project
cd /path/to/your/project
propel initpropel init copies all skills, agents, commands, and hooks into your project's .claude/ directory, configures the session-start hook in settings.local.json, and adds scratch/, sessions/, .propel/, .claude/, and propel/ to .gitignore.
Then start Claude and run /intro. If you have an existing codebase, this scans it to draft a project-specific .claude/CLAUDE.md and optionally builds a persistent project profile. If you're starting from an empty repo, it seeds a minimal CLAUDE.md that grows progressively as you work — Gate 0 answers fill in research context, first code written fills in conventions, investigations fill in domain pitfalls. No need to fill out 12 sections before writing your first line of code.
See docs/quickstart.md for a 5-minute setup guide.
| Category | Skill | Trigger |
|---|---|---|
| Meta | using-propel | Always active — routes to correct skill |
| Literature | deep-research | "survey", "literature review", "compare methods" |
| paper-extraction | "process these papers", "build paper database" | |
| Investigation | investigation | "start investigation", "trace X", "what touches X" |
| Design | research-design | "propose how to", "design the implementation" |
| writing-plans | "write the plan", "break into tasks" | |
| Implementation | subagent-driven-research | User says "go" after plan approval |
| Validation | research-validation | "validate this", "test the implementation" |
| verification-before-completion | Before claiming "done" | |
| Debugging | systematic-debugging | Bug reports, training failures |
| Learning | retrospective | "retrospective", "capture learnings", auto-suggests at ~20 turns |
| Cross-cutting | think-deeply | Confirmation-seeking statements, leading questions |
| context-hygiene | >15 turns, "getting long" | |
| using-git-worktrees | "create worktree", "experiment branch" | |
| Training | trainer-mode | "train", "launch training", "run training" (Trainer Mode) |
| Customization | project-customization | "customize Propel", "analyze my project", "detect conventions" |
| Agent | Purpose | Auto-dispatched? |
|---|---|---|
| paper-alignment-auditor | Cross-reference code against paper equations | Yes — after paper-derived components |
| jax-logic-auditor | Trace shapes through JAX transforms | Yes — after JAX code changes |
| silent-bug-detector | Scan for 11 silent failure categories | Yes — after model/loss/data changes |
| data-flow-tracer | End-to-end tensor annotation | No — explicit invocation |
| regression-guard | Verify existing configs unchanged | Yes — after any code change |
| env-researcher | Deep-dive simulation env docs (MuJoCo, robosuite, Isaac, etc.) | Yes — during investigation of env-dependent code |
| failure-mode-researcher | Internet search for training failures | No — explicit invocation |
| code-reviewer | General code quality with research awareness | No — invoked during review stage |
| Command | Purpose |
|---|---|
| /intro | [Propel] Introduction — lists all commands, skills, and agents |
| /read-paper | [Propel] Extract structured reference from a paper |
| /debug-training | [Propel] Diagnose training issues |
| /trace-shapes | [Propel] Quick shape annotation through a code path |
| /primer | [Propel] Load project context |
| /switch | [Propel] Switch between modes (researcher, engineer, trainer) |
| /new-session | [Propel] Create and track a session |
# Create a new session and launch Claude Code
propel session launch "RVQ depth-2 rotation experiment"
# List past sessions
propel session list
# Save chat history
propel session save <session-id> <session-dir>Sessions are stored in sessions/ with chat history, prompt templates, and symlinks to investigation artifacts. See docs/workflow.md for details.
- Quick Start — 5-minute setup
- Full Workflow — Walkthrough with all 5 gates and 2 questioners
- Customization — Adding project-specific agents/skills
- Pitfalls — Known failure modes when working with Claude
- Design Document — Full specification (in code-manual repo)
Propel combines ideas from multiple sources:
-
obra/superpowers — Plugin architecture, discipline enforcement, verification gates, micro-task planning. Propel's plugin structure, hook system, and "check skills before acting" pattern come directly from Superpowers.
-
code-manual — Research methodology, investigation skills, domain-specific agents, paper-alignment auditing, retrospective system. The investigation-first workflow, all auditor agents, and the literature skills originate from code-manual.
-
scott-yj-yang/new-prompt — Session management CLI. The
propel sessiontool is adapted from new-prompt with auto-detection of project root, investigation artifact linking, and session indexing. -
Talmo's sleap-io — Investigation skill template. The structured scratch/ investigation pattern with living READMEs originates from Talmo's sleap-io project.
-
Sionic AI's experiment registry — Retrospective skill and
/advise+/retrospectiveworkflow for capturing experiment learnings into a reusable registry. -
brunoasm's claude skills — Think-deeply anti-sycophancy skill and PDF extraction skill.
-
Weizhena's Deep-Research workflow — Structured literature review with human-in-the-loop checkpoints.
-
Context Engineering Template — Basic Claude Code usage patterns and context engineering principles.
MIT