A modern, web-based Hi-C contact map viewer and genome assembly curation tool.
Try it now at shandley.github.io/openpretext — no installation required. Click Load Example to explore a real koala genome assembly.
OpenPretext is designed as a browser-based alternative to
PretextView, the Wellcome Sanger
Institute's desktop application used by genome assembly teams worldwide
(Darwin Tree of Life, Vertebrate Genomes Project, Earth BioGenome Project).
It reads native .pretext files directly in the browser with no installation
required.
- WebGL2-accelerated contact map display at 60fps
- Tile-based level-of-detail rendering with LRU cache and background decompression
- Six color maps (Red-White, Blue-White-Red, Viridis, Hot, Cool, Grayscale) with keyboard cycling
- Adjustable gamma correction (0.1 to 2.0) via slider or keyboard
- Contig grid overlay with anti-aliased boundary lines
- Contig labels along map edges
- Minimap overview with click-to-navigate viewport indicator
- Scaffold color bands showing chromosome assignments
- Waypoint position markers
- Annotation track overlays (line, heatmap, marker modes)
- Comparison mode overlay showing original vs curated assembly boundaries
- Cut, join, invert, and move contigs with full undo/redo
- Drag-and-drop contig reordering in sidebar and on map
- Selection system: click, shift-range, ctrl-toggle, select-all
- Contig exclusion (mark contigs to hide from exports without deleting)
- Scaffold painting mode for chromosome assignment (create, rename, delete, paint, unpaint)
- Waypoint markers for positions of interest with keyboard navigation
- Batch operations: select by name pattern or size range, batch cut/join/invert, sort by length
- Contig meta tags: classify contigs as haplotig, contaminant, unlocalised, or sex chromosome with colored sidebar badges
- Auto Sort (Union Find) — Scores all contig pairs using Hi-C inter-contig link analysis across 4 orientations, then chains contigs into chromosome groups using a greedy Union Find algorithm. Automatically applies inversions and reordering via the command palette.
- Auto Cut (Breakpoint Detection) — Analyzes diagonal Hi-C signal density to detect misassembly breakpoints where the contact signal drops, automatically splitting contigs at discontinuities.
- Vision-based contact map analysis using the Anthropic Messages API (Claude)
- Captures a screenshot of the current map state, builds assembly context from contig ordering, metrics, and scaffold assignments, and sends it for analysis
- Returns executable DSL command suggestions with one-click "Run" buttons
- Prompt strategy system with 8 built-in strategies:
- General Analysis — balanced, suitable for any assembly
- Inversion Detection — specialized for anti-diagonal inversion patterns
- Scaffold Assignment — guides chromosome-level organization
- Fragmented Assembly — optimized for many-contig assemblies, emphasizes autosort/autocut
- Micro-chromosomes — for bird and reptile genomes with micro-chromosomes
- Analysis-Guided Curation — uses computed analysis tracks and health score to guide decisions
- Haplotig Detection — identifies haplotigs in dual-haplotype or partially phased assemblies
- Telomere-Aware Curation — uses telomere detection to assess chromosome completeness
- Custom strategy editor: create, edit, and delete your own strategies with supplementary prompt text and few-shot examples
- Strategy export/import as JSON files for sharing between users
- Browse community strategies from the openpretext-strategies repository directly from the AI panel
- Per-suggestion feedback (thumbs up/down) with aggregate strategy ratings
- Insulation score + TAD boundaries — sliding off-diagonal window (Crane et al. 2015) computes insulation profile and detects topologically associating domain boundaries as prominent local minima
- P(s) contact decay curve — intra-chromosomal contact frequency vs genomic distance with power-law exponent fitting in log-log space; inline SVG chart with comparative baseline overlay
- Per-chromosome P(s) — scaffold-aware decay curves computed independently per chromosome
- A/B compartments — observed/expected normalization, correlation matrix, first eigenvector via power iteration; rendered as a heatmap track
- ICE normalization — Sinkhorn-Knopp iterative matrix balancing for bias correction; computes per-bin bias vector and normalized contact matrix with low-coverage bins masked by quantile filtering
- Directionality index — Dixon et al. 2012 signed chi-square directionality scores with configurable window size; detects TAD boundaries at sign-change zero crossings
- Hi-C library quality — cis/trans contact ratio, short/long range ratio, contact density, and per-contig cis ratios; integrates into the 5-component assembly health score
- Saddle plot — compartment strength visualization showing O/E enrichment by eigenvector quantile; inline SVG heatmap with strength metric
- Virtual 4C — interactive locus contact profiling from any viewpoint bin via Alt+click; distance-expected normalization with optional log2 transform
- KR normalization — Knight-Ruiz iterative matrix balancing (Knight & Ruiz 2013); faster convergence than ICE; produces "KR Bias" line track and re-runs downstream analysis on the normalized matrix
- Telomere repeat detection — scans loaded FASTA sequences for TTAGGG/CCCTAA repeat motifs at contig ends; computes genome-wide density profiles and identifies telomere-positive ends; requires loading a reference FASTA first
- Composite health score — 0-100 score combining contiguity (N50), P(s) decay quality, assembly integrity, compartment strength, and Hi-C library quality; displayed as a prominent card in the sidebar
- All analyses run in a background Web Worker to avoid blocking the UI
- Adjustable insulation window size; auto-computation on file load
- BedGraph and TSV export for all analysis tracks
- Automatic detection of potential chimeric contigs using TAD boundary and compartment switch signals that fall inside (not at edges of) contigs
- Misassembly flags shown as orange "MIS" badges in the contig sidebar
- Cut suggestions with composite confidence scoring (TAD 50% + compartment 30% + decay 20%) shown as sorted cards with colored badges (green/yellow/red)
- Cut review panel — step-by-step guided walkthrough of each cut suggestion with camera navigation, accept/skip/back controls
- Pattern detection — algorithmic inversion (anti-diagonal butterfly) and translocation (off-diagonal enrichment) detection with clickable result cards and "Go" navigation
- Scaffold auto-detection — detects chromosome block boundaries from the block-diagonal structure of the contact map and assigns scaffolds automatically
- Real-time ordering quality feedback using Kendall tau rank correlation vs reference
- Longest correct contiguous run counter
- Trend arrows (improving/declining/neutral) after each curation operation
- Resettable reference ordering baseline
- Coverage, telomere, gap, and GC content tracks from embedded
.pretextgraph extensions - Telomere density track from loaded reference FASTA sequences
- BedGraph file upload with automatic contig coordinate mapping
- Per-track configuration: color picker, rendering mode (line/heatmap/marker), visibility toggle
- Track management panel in sidebar
- AGP 2.1 export (scaffold-aware assembly structure)
- BED6 export (scaffold-aware genomic intervals)
- FASTA export with reverse complement for inverted contigs
- PNG screenshot export
- BedGraph/TSV export of all analysis tracks (insulation, P(s), compartments, directionality, ICE bias, KR bias, quality metrics, saddle plot)
- Session save/load (JSON with full undo/redo stack + analysis data)
- Curation log export (JSON operation history)
- Strategy JSON export/import for AI prompt strategies
- Reference FASTA loading for curated sequence export
- BedGraph track file loading
- N50/L50, N90/L90, total length, contig count, longest/shortest/mean/median contig
- Live stats panel in the sidebar with delta tracking after each operation
- Automatic metric snapshots on file load and after every curation operation
- Scaffold count and operation counter
- 18-command curation DSL covering all operations (cut, join, invert, move, select, deselect, scaffold create/paint/unpaint/delete, zoom, goto, autocut, autosort, echo)
- Contig references by name or by index (
#0,#15) - Script console with syntax highlighting and output panel
- Replay curation sessions from operation logs via the "From Log" button
- All UI curation operations have script equivalents
- 9 interactive lessons covering the full curation workflow:
- Reading the Contact Map
- Understanding Chromosomes
- Detecting Misassembly
- Cutting and Joining
- Scaffold Assignment
- Full Curation Exercise
- 3D Genomics Analysis
- Classifying Contigs with Meta Tags
- Automated Misassembly Detection
- Step-based progression with instructions, hints, and UI element highlighting
- Auto-advance when the expected user action is detected
- Assessment scoring using Kendall tau similarity against ground-truth orderings
- Difficulty levels: beginner, intermediate, advanced
- 10 curated specimen datasets from GenomeArk spanning mammals, birds, reptiles,
fish, amphibians, and invertebrates, loadable from the welcome screen:
- Koala, Bluehead Wrasse, King Quail, Zebra Finch, Nile Crocodile, Spinyfin, Wait's Blind Snake, Couch's Spadefoot Toad, European Lancelet, Great Fruit-eating Bat
- Hi-C pattern gallery with 11 reference patterns (strong diagonal, chromosome blocks, inversions, translocations, micro-chromosomes, low coverage, unplaced contigs, A/B compartments) with visual descriptions and click-to-navigate
- Each specimen includes metadata: species name, genome size, chromosome count, contig count, difficulty level, and characteristic Hi-C patterns
- Four interaction modes: Navigate, Edit, Scaffold, Waypoint
- Command palette (Cmd+K / Ctrl+K) with fuzzy search across all commands
- Keyboard shortcuts for all operations (press
?for reference) - Pan via mouse drag, zoom via scroll wheel or trackpad pinch (0.5x to 200x)
- Touch and trackpad gesture support (pinch-zoom, two-finger pan)
- Jump to diagonal, reset view, zoom to specific contigs
- Responsive layout with mobile and tablet breakpoints
- Searchable contig list with scaffold badges, meta tag badges, exclusion indicators, and inversion markers
- Sidebar panels: contigs, scaffolds, assembly metrics, history, 3D analysis, curation progress, tracks
- Toast notifications, detailed hover tooltips, loading progress overlay
- Drag-and-drop file opening
- CLI-based benchmark pipeline for automated curation algorithm evaluation
- Specimen download tools for acquiring test data from GenomeArk
- Metrics: breakpoint detection F1 score, Kendall tau ordering accuracy, orientation accuracy
- Parameter sweep for tuning AutoSort and AutoCut thresholds
- Regression testing against stored baselines (
bench/baselines.json) - Ordering metrics shared between browser assessment and CLI benchmarks (kendallTau, adjustedRandIndex, longestCorrectRun)
- Node.js 18 or later
git clone https://github.com/shandley/openpretext.git
cd openpretext
npm install
npm run devOpen http://localhost:3000 in a browser, then either:
- Click Load Example to download and explore a real genome assembly
- Click Open to load a
.pretextfile from your computer - Drag and drop a
.pretextfile onto the window - Click Start Tutorial for a guided walkthrough
Real .pretext files are available from the
GenomeArk public S3 bucket. For example, to
download a zebra finch contact map (~56 MB):
aws s3 cp \
s3://genomeark/species/Taeniopygia_guttata/bTaeGut2/assembly_curated/evaluation/pretext/bTaeGut2.mat.pretext \
. --no-sign-requestThe --no-sign-request flag allows access without AWS credentials.
Press ? at any time to open the shortcuts reference.
| Key | Action |
|---|---|
E |
Edit mode (cut/join/invert/move) |
S |
Scaffold painting mode |
W |
Waypoint mode |
N / Esc |
Navigate mode |
C |
Cut contig at cursor (edit mode) |
J |
Join selected contigs (edit mode) / Jump to diagonal |
F |
Flip/invert selected (edit mode) |
H |
Toggle contig exclusion (edit mode) |
P |
Toggle comparison mode |
L |
Toggle contig grid |
I |
Toggle info sidebar |
X |
Toggle annotation tracks |
M |
Toggle minimap |
? |
Keyboard shortcuts reference |
` |
Script console |
] / . |
Next waypoint |
[ / , |
Previous waypoint |
Up/Down |
Cycle color maps |
Left/Right |
Adjust gamma |
Home |
Reset view |
Delete |
Clear all waypoints |
Cmd+K |
Command palette |
Cmd+Z |
Undo |
Cmd+Shift+Z |
Redo |
Cmd+O |
Open file |
Cmd+S |
Screenshot |
Cmd+A |
Select all contigs (edit mode) |
- Load a
.pretextfile - Press
Eto enter edit mode - Select contigs by clicking (shift-click for range, ctrl-click to toggle)
- Use cut (
C), join (J), invert (F), and drag to reorder the assembly - Press
Hto exclude contigs from export - Press
Sto enter scaffold mode and paint contigs into chromosomes - Export the curated assembly as AGP, BED, or FASTA via the toolbar
- Toggle comparison mode (
P) to see original vs curated boundaries
Open the command palette (Cmd+K) and run:
- Auto cut: detect breakpoints — Scans all contigs for misassembly breakpoints by analyzing the diagonal Hi-C signal. Contigs are split wherever the signal drops significantly. Each cut is a separate undo step.
- Auto sort: Union Find — Scores every contig pair across 4 orientations (head-head, head-tail, tail-head, tail-tail), then greedily chains contigs into chromosome groups. The algorithm applies inversions where needed and reorders the assembly. Each operation is individually undoable.
A typical automated workflow: run Auto Cut first to break misassemblies, then Auto Sort to group and orient the fragments into chromosomes.
All operations support undo (Cmd+Z) and redo (Cmd+Shift+Z).
- Open the AI Assist panel from the toolbar
- Enter your Anthropic API key (stored locally, never sent to any server except Anthropic)
- Select a prompt strategy from the dropdown (or create a custom one)
- Click Analyze Map to send a screenshot and assembly context to Claude
- Review the suggestions and click Run to execute any DSL command block
- Use thumbs up/down to rate suggestions and improve strategy selection
Browse community-contributed strategies at openpretext-strategies or share your own via the Export button.
Open the script console with the backtick key or the Console button. Commands can be typed directly or generated from the curation log using the From Log button. Run scripts with Ctrl+Enter.
Example script:
# Invert a misoriented contig
invert chr3
# Move a contig to a new position
move #5 to 0
# Cut a contig at a pixel offset
cut chr1 500
# Join two adjacent contigs
join chr1_L chr1_R
# Assign contigs to a scaffold
scaffold create Chromosome_1
scaffold paint #0 Chromosome_1
scaffold paint #1 Chromosome_1
# Select and deselect
select chr2
deselect all
# Navigate
zoom chr5
goto 0.25 0.25
# Automated operations
autosort
autocut
# Print a message
echo Curation complete
See the full DSL reference by typing help in the script console.
.pretext-- native BC4-compressed contact maps produced by PretextMap, including embedded graph extensions from PretextGraph.bedgraph-- annotation tracks loaded via the Load Track button.fasta-- reference sequences loaded via Load FASTA for curated export
For technical details on the binary format, see docs/PRETEXT_FORMAT.md.
npm run dev # Start development server with hot reload
npm test # Run unit tests (2,139 tests across 79 files)
npm run test:visual # Run E2E tests (35 tests, Playwright + Chromium)
npm run build # Production build to dist/
npm run preview # Preview the production buildnpm run bench:acquire # Download test specimens from GenomeArk
npm run bench:run # Execute benchmarks
npm run bench:sweep # Sweep parameter ranges
npm run bench:regression # Compare against stored baselinessrc/
main.ts Application entry point and orchestrator
core/
State.ts Application state with undo/redo
EventBus.ts Typed event emitter
DerivedState.ts Computed state selectors
formats/
PretextParser.ts .pretext binary format parser (BC4/deflate)
SyntheticData.ts Demo contact map generator
SyntheticTracks.ts Demo annotation track generator
FASTAParser.ts FASTA sequence parser
BedGraphParser.ts BedGraph annotation track parser
renderer/
WebGLRenderer.ts WebGL2 contact map renderer
Camera.ts Pan/zoom camera with mouse/touch/trackpad
TileManager.ts Tile-based LOD with LRU cache
TileDecoder.ts Background tile decompression worker
ColorMaps.ts Color map implementations (6 maps)
LabelRenderer.ts Contig label overlay
Minimap.ts Overview minimap with viewport indicator
TrackRenderer.ts Annotation track renderer (line/heatmap/marker)
ScaffoldOverlay.ts Scaffold color band overlay
WaypointOverlay.ts Waypoint marker overlay
curation/
CurationEngine.ts Cut/join/invert/move with undo/redo
SelectionManager.ts Contig selection (click/shift/ctrl)
DragReorder.ts Visual drag reordering
ScaffoldManager.ts Scaffold (chromosome) assignment CRUD
WaypointManager.ts Named position markers
ContigExclusion.ts Contig hide/exclude management
MisassemblyFlags.ts Singleton manager for flagged contigs
MetaTagManager.ts Contig classification meta tags
AutoCut.ts Diagonal signal breakpoint detection
AutoSort.ts Union Find link scoring and chaining
BatchOperations.ts Batch select/cut/join/invert/sort
QualityMetrics.ts N50/L50/N90/L90 assembly statistics
OrderingMetrics.ts Shared ordering metrics (kendallTau, ARI)
ai/
AIClient.ts Anthropic Messages API wrapper (vision)
AIContext.ts Assembly state context builder for AI prompts
AIPrompts.ts System prompt with DSL reference + Hi-C guide
AIFeedback.ts Per-suggestion feedback storage + aggregation
AIStrategyIO.ts Strategy export/import as JSON files
data/
SpecimenCatalog.ts Types + loader for specimen-catalog.json
LessonSchema.ts Types + loader for tutorial lesson JSON files
PromptStrategy.ts Types + loader + custom strategy CRUD
scripting/
ScriptParser.ts Tokenizer + parser for 18-command DSL
ScriptExecutor.ts Executes parsed AST via ScriptContext DI
ScriptReplay.ts Converts operation logs to DSL scripts
analysis/
InsulationScore.ts Insulation score + TAD boundary detection
ContactDecay.ts P(s) contact decay curve + exponent fitting
CompartmentAnalysis.ts A/B compartment eigenvector (O/E + PCA)
ICENormalization.ts ICE (Sinkhorn-Knopp) matrix balancing
KRNormalization.ts Knight-Ruiz matrix balancing
DirectionalityIndex.ts Directionality index + TAD boundaries
HiCQualityMetrics.ts Library-level quality metrics
SaddlePlot.ts Compartment strength visualization
Virtual4C.ts Interactive locus contact profiling
TelomereDetector.ts Telomere repeat detection from FASTA
MisassemblyDetector.ts Chimeric contig detection + confidence scoring
HealthScore.ts Composite assembly quality score (0-100)
ScaffoldDetection.ts Auto-detect chromosome blocks from contact map
PatternDetector.ts Inversion + translocation detection
CurationProgress.ts Real-time ordering progress scoring
AnalysisWorker.ts Background Web Worker for analysis
AnalysisWorkerClient.ts Promise-based main-thread worker client
export/
AGPWriter.ts AGP 2.1 format generation
BEDWriter.ts BED6 format export (scaffold-aware)
FASTAWriter.ts FASTA export with reverse complement
AnalysisExport.ts BedGraph/TSV export for analysis tracks
SnapshotExporter.ts PNG screenshot via canvas.toBlob
CurationLog.ts JSON operation history export
io/
SessionManager.ts Session save/load (JSON with undo stack + analysis)
ui/ 35 UI modules (pure DOM, no framework)
data/
specimen-catalog.json Curated multi-specimen catalog (10 species)
lessons/ Tutorial lesson JSON files (9 lessons)
pattern-gallery.json Hi-C pattern reference gallery (11 patterns)
prompt-strategies.json AI prompt strategy library (8 strategies)
tests/
unit/ 2,139 unit tests across 79 test files
e2e/ 35 E2E tests (Playwright + Chromium)
bench/
cli.ts Benchmark CLI (run/sweep/report/regression)
runner.ts Benchmark pipeline orchestrator
regression.ts Regression runner against baselines
baselines.json Regression thresholds for CI
metrics/ AutoSort, AutoCut, chromosome metrics
acquire/ GenomeArk specimen download tools
- TypeScript with strict mode
- Vite for development and builds
- WebGL2 for GPU-accelerated rendering
- pako (single runtime dependency) for deflate decompression
- Vitest for unit testing
- Playwright for E2E testing
- Pure DOM manipulation for UI (no React/Vue/Angular)
PretextView is an essential tool in genome assembly curation, used to visualize Hi-C contact maps and manually arrange contigs into chromosomes. It is developed by the Wellcome Sanger Institute as part of the Pretext suite:
- PretextMap -- converts
aligned Hi-C reads into
.pretextcontact maps - PretextView -- desktop viewer for manual curation
- PretextGraph -- embeds
bedgraph annotation tracks into
.pretextfiles - PretextSnapshot -- command-line screenshot tool
OpenPretext aims to provide a browser-based alternative that works on any platform, requires no installation, supports trackpad input, and offers scriptable and AI-assisted curation workflows.
- The Pretext suite by the Wellcome Sanger Institute Tree of Life programme
- GenomeArk and the Vertebrate Genomes Project for public genome assembly data
- The Darwin Tree of Life, Earth BioGenome Project, and genome curation communities