Propel

A structured constraint framework for Claude Code in research workflows.

Why Constraints Matter

Claude Code can be a powerful boost to research workflows if we use it correctly and carefully. Without structure, an unconstrained LLM produces the mean of its training data — ask Claude to "implement RVQ" and you get a plausible-looking average of every RVQ implementation it has seen, not the one that matches your paper, your architecture, your constraints. The output compiles, but it's noisy: wrong assumptions baked in, silent numerical bugs, design decisions made without asking.

The fix isn't better prompts — it's structured constraints. This is the core insight behind obra/superpowers: when we constrain an LLM with domain-specific rules, verification gates, and forced checkpoints, the output goes from "plausible average" to precisely what we need. Propel applies this to research workflows where the cost of undetected noise is highest — a silent broadcasting bug in a loss function doesn't crash, it produces subtly wrong training runs that waste compute.

What Only You Can Provide

Propel's constraints are necessary but not sufficient. The framework forces Claude to stop and ask structured questions at every phase transition, but the quality of the output depends entirely on what you bring to those checkpoints:

Your research question — not "implement X" but "test whether X improves Y under condition Z." The more specific you are, the less Claude has to guess.
Your hypothesis — what do you expect to happen and why? This is what the auditors verify against. Without a target, Claude cannot tell you when it missed.
Your method — which paper, which equations, which specific algorithmic choices? Claude cannot infer "use stop-gradient on the codebook as in Section 3.2" from context alone.
Your domain knowledge — the pitfalls that aren't in any paper, the configurations that look correct but silently fail, the things that only work in your specific setup.

It is critical to review what Claude finds during investigation thoroughly — the investigation README is the blueprint for everything that follows. If the blueprint is correct, the code will be correct. Ask Claude to compare its proposals with what's in the paper, question why it made certain decisions, and have it introspect on its reasoning when something feels off.

Each gate is designed to extract your specific insight before Claude acts on it. Gate 0 asks your research intent, Gate 1 validates its understanding against yours, Gate 2 confirms the plan matches your method. Skipping these means accepting the noisy mean. The more specific your constraints, the less noise in the output.

Encoding Your Domain Knowledge

Propel gives you three places to embed the expertise that makes the difference:

Where	What to Put There	Why It Matters
Project CLAUDE.md	Research context, conventions, known pitfalls, what "correct" means for your project	Read on every session — sets the baseline constraints
Custom agents	Domain-specific auditors that check what matters in your field (see customization)	Automated verification tuned to your failure modes
Gate 0 answers	Your actual research question, hypothesis, success criteria, scope boundaries	The single biggest lever — this is where the mean becomes specific

A generic "implement the loss function" gets you the average loss function. "Implement equation 7 from [paper], using stop-gradient on the codebook as in section 3.2, with straight-through estimator for the backward pass" gets you what you actually need.

Three Modes

Not every session needs the full pipeline. Propel offers three modes that filter which skills and gates are active. Choose a mode at the start of each session (via /intro or /switch), or default to Engineer.

Mode	Scope	Active Gates	When to Use
Researcher	Literature, investigation, deep research	Gate 0, Gate 1	Understanding the problem space — reading papers, tracing code, exploring approaches
Engineer	Full pipeline (default)	All (0-4)	Building something — investigation through implementation with all auditors
Trainer	Training execution, runtime debugging	Gate 4 (runtime only)	Code is ready — launching training runs, fixing CUDA/OOM/path errors

Researcher Mode keeps you in the understanding phase. Implementation skills are paused — if you try to build something, Propel suggests /switch engineer.
Engineer Mode is the default and matches the existing full Propel workflow. Nothing changes if you always use this mode.
Trainer Mode scans for training commands, launches them in screen sessions, and fixes runtime bugs. It does NOT touch training logic (architecture, loss, data pipeline) — for those, /switch engineer.

Switch anytime with /switch researcher, /switch engineer, or /switch trainer. Mode state persists in .propel/mode.json (gitignored) and survives /clear.

How Propel Constrains

Propel enforces five human-in-the-loop gates, dispatches domain-specific auditors after every code change, and maintains living documentation across /clear boundaries.

The Pipeline

The full pipeline has seven stages, five human-in-the-loop gates, and two questioner checkpoints (see diagram above):

Intake → Q0 → Investigation → Gate 1 → Q1 → Design → Implementation → Debug → Training → Retrospective
 G0    ground      G1        findings  detail   G2          G3          G4      Trainer      All

Gates 0-1 (Researcher + Engineer): Scoping and investigation checkpoints
Questioner Q0 (Researcher + Engineer): Grounds work in concrete reference implementations, architectures, and examples before investigation
Questioner Q1 (Researcher + Engineer): Nails down implementation details — interfaces, data formats, edge cases — before design
Gates 2-3 (Engineer only): Design approval and implementation auditing
Gate 4 (Engineer + Trainer): Debug diagnosis before applying fixes
Training (Trainer Mode): Launch runs, monitor, fix runtime errors
Retrospective (All modes): Capture learnings and failed attempts

The Questioners exist because Claude is great at morphing an existing implementation into what you need, but bad at creating from scratch when the problem is unconstrained. See Pitfalls for details.

At each gate, Claude stops and asks structured questions that reveal design assumptions — never "shall I proceed?" but "should we [A] or [B]? A means [trade-off], B means [trade-off]."

Installation

# Clone and install
git clone https://github.com/KevinBian107/propel.git
cd propel && pip install -e .

# Initialize in any project
cd /path/to/your/project
propel init

propel init copies all skills, agents, commands, and hooks into your project's .claude/ directory, configures the session-start hook in settings.local.json, and adds scratch/, sessions/, .propel/, .claude/, and propel/ to .gitignore.

Then start Claude and run /intro. If you have an existing codebase, this scans it to draft a project-specific .claude/CLAUDE.md and optionally builds a persistent project profile. If you're starting from an empty repo, it seeds a minimal CLAUDE.md that grows progressively as you work — Gate 0 answers fill in research context, first code written fills in conventions, investigations fill in domain pitfalls. No need to fill out 12 sections before writing your first line of code.

Quick Start

See docs/quickstart.md for a 5-minute setup guide.

Skills

Category	Skill	Trigger
Meta	using-propel	Always active — routes to correct skill
Literature	deep-research	"survey", "literature review", "compare methods"
	paper-extraction	"process these papers", "build paper database"
Investigation	investigation	"start investigation", "trace X", "what touches X"
Design	research-design	"propose how to", "design the implementation"
	writing-plans	"write the plan", "break into tasks"
Implementation	subagent-driven-research	User says "go" after plan approval
Validation	research-validation	"validate this", "test the implementation"
	verification-before-completion	Before claiming "done"
Debugging	systematic-debugging	Bug reports, training failures
Learning	retrospective	"retrospective", "capture learnings", auto-suggests at ~20 turns
Cross-cutting	think-deeply	Confirmation-seeking statements, leading questions
	context-hygiene	>15 turns, "getting long"
	using-git-worktrees	"create worktree", "experiment branch"
Training	trainer-mode	"train", "launch training", "run training" (Trainer Mode)
Customization	project-customization	"customize Propel", "analyze my project", "detect conventions"

Agents (Auditors)

Agent	Purpose	Auto-dispatched?
paper-alignment-auditor	Cross-reference code against paper equations	Yes — after paper-derived components
jax-logic-auditor	Trace shapes through JAX transforms	Yes — after JAX code changes
silent-bug-detector	Scan for 11 silent failure categories	Yes — after model/loss/data changes
data-flow-tracer	End-to-end tensor annotation	No — explicit invocation
regression-guard	Verify existing configs unchanged	Yes — after any code change
env-researcher	Deep-dive simulation env docs (MuJoCo, robosuite, Isaac, etc.)	Yes — during investigation of env-dependent code
failure-mode-researcher	Internet search for training failures	No — explicit invocation
code-reviewer	General code quality with research awareness	No — invoked during review stage

Commands (Slash Commands)

Command	Purpose
/intro	[Propel] Introduction — lists all commands, skills, and agents
/read-paper	[Propel] Extract structured reference from a paper
/debug-training	[Propel] Diagnose training issues
/trace-shapes	[Propel] Quick shape annotation through a code path
/primer	[Propel] Load project context
/switch	[Propel] Switch between modes (researcher, engineer, trainer)
/new-session	[Propel] Create and track a session

Session Management

# Create a new session and launch Claude Code
propel session launch "RVQ depth-2 rotation experiment"

# List past sessions
propel session list

# Save chat history
propel session save <session-id> <session-dir>

Sessions are stored in sessions/ with chat history, prompt templates, and symlinks to investigation artifacts. See docs/workflow.md for details.

Documentation

Quick Start — 5-minute setup
Full Workflow — Walkthrough with all 5 gates and 2 questioners
Customization — Adding project-specific agents/skills
Pitfalls — Known failure modes when working with Claude
Design Document — Full specification (in code-manual repo)

Acknowledgments

Propel combines ideas from multiple sources:

obra/superpowers — Plugin architecture, discipline enforcement, verification gates, micro-task planning. Propel's plugin structure, hook system, and "check skills before acting" pattern come directly from Superpowers.
code-manual — Research methodology, investigation skills, domain-specific agents, paper-alignment auditing, retrospective system. The investigation-first workflow, all auditor agents, and the literature skills originate from code-manual.
scott-yj-yang/new-prompt — Session management CLI. The propel session tool is adapted from new-prompt with auto-detection of project root, investigation artifact linking, and session indexing.
Talmo's sleap-io — Investigation skill template. The structured scratch/ investigation pattern with living READMEs originates from Talmo's sleap-io project.
Sionic AI's experiment registry — Retrospective skill and /advise + /retrospective workflow for capturing experiment learnings into a reusable registry.
brunoasm's claude skills — Think-deeply anti-sycophancy skill and PDF extraction skill.
Weizhena's Deep-Research workflow — Structured literature review with human-in-the-loop checkpoints.
Context Engineering Template — Basic Claude Code usage patterns and context engineering principles.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.claude-plugin		.claude-plugin
agents		agents
assets		assets
commands		commands
docs		docs
hooks		hooks
skills		skills
src/propel_cli		src/propel_cli
templates		templates
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Propel

Why Constraints Matter

What Only You Can Provide

Encoding Your Domain Knowledge

Three Modes

How Propel Constrains

The Pipeline

Installation

Quick Start

Skills

Agents (Auditors)

Commands (Slash Commands)

Session Management

Documentation

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Propel

Why Constraints Matter

What Only You Can Provide

Encoding Your Domain Knowledge

Three Modes

How Propel Constrains

The Pipeline

Installation

Quick Start

Skills

Agents (Auditors)

Commands (Slash Commands)

Session Management

Documentation

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages