CAPR is a Dockerized stack (Flask API + Svelte UI + Caddy) for managing wordlists, cognate boards, and finite-state transducers (FSTs). The project currently focuses on the Burmish and Germanic pipelines; the Germanic dataset now tracks four doculects (English, Old English, Dutch, German).
- From the repo root:
docker compose up -d
- Backend ⇨
http://127.0.0.1:5001 - Frontend ⇨
http://127.0.0.1:8080
- Backend ⇨
- In another terminal, proxy the stack through Caddy:
caddy run --config Caddyfile.dev
- Open
http://localhost:5002, chooseburmish-aligned-final.tsvorgermanic-aligned-final.tsv, and load the matching FST fromserver/fsts/. - Need the longer checklist (regressions, tear-down, hand-offs)? See
docs/runbook.md.
docs/README.md– master index for all project docs.SETUP.md– full installation guide (Docker + manual paths).USAGE.md– UI walkthrough, including the FST editor workflow.docs/runbook.md+docs/regression_checks.md– operational checklist and API smoke-test plan (server/tools/api_regression.py).DEV_NOTES.md– dated hand-offs; add a new section per session.docs/germanic_transducer_report.md– Germanic FST coverage/status summary (with supporting files underdocs/germanic_*).
server/tools/add_old_english_rows.pyduplicates every English row into an Old English placeholder so the TSV always contains 1:1 coverage.server/tools/fetch_old_english_from_wiktionary.pyhits the Wiktionary API to pull Old English lemmas from each English entry and writesserver/data/old_english_wiktionary.tsv. Run it whenever you want a fresh scrape of the etymology data (results are cached underserver/tmp/).server/data/old_english_swadesh.tsvstores the Wiktionary Swadesh export used to seed real Old English forms.server/tools/update_old_english_forms.pyapplies the Swadesh mappings to the gold-standard TSVs (updatingIPA,TOKENS,COUNTERPART,NOTE). Run it whenever the stage3 export is regenerated.server/tools/validate_old_english_pairs.pyconfirms both TSVs still have a matching Old English row for every English entry (and reports how many placeholders remain).
.
├── cognate-app/ # Svelte interface (boards + FST editor)
├── docs/ # Project documentation & planning bundles
├── server/ # Flask API, FSTs, data, regression harness
├── docker-compose.yml # Development stack (backend + frontend)
├── Caddyfile(.dev) # Reverse proxy definitions
└── SETUP.md / USAGE.md # Detailed setup & usage notes
The project is actively developing the Proto-Germanic → Old English transducer pipeline.
- 31.9% match rate (120/376 OE lexemes) with systematically bucketed mismatches
- Empirical discovery: Heavy-syllable nasal apocope rule (PGmc *-ą deletion after heavy stems)
- A-restoration fix: Corrected foma syntax bug causing unconditional fronting
- Refined diagnostics: Split 256 mismatches into 20+ specific phenomenon buckets
- Latest reports in
server/docs/debug_snapshots/:oe_mismatch_report_2026-02-07_refined_v3.txt(bucketed mismatches)oe_full_trace_report_2026-02-07_refined_buckets.txt(stage-by-stage traces)
- Top mismatch buckets:
final_vowel_missing(38),vowel_quality_other(27),breaking_extra_other(22) - Diagnostic tools:
server/tools/oe_mismatch_report.py,server/tools/oe_full_trace_report.py
- Run mismatch/trace reports to identify issues
- Investigate phonological phenomena in reference sources (Hogg, Ringe/Taylor)
- Implement/fix FST rules in
server/fsts/germanic.txt - Regenerate reports to verify improvements
- Document findings in
DEV_NOTES.md
- Keep Docker + Caddy steps documented in
docs/runbook.md - Record each session in
DEV_NOTES.mdwith regression results
- Xun Gong & Nathan Hill (2020). Materials for an Etymological Dictionary of Burmish. Zenodo. https://doi.org/10.5281/zenodo.4311182
- List, J.-M. & R. Forkel (2022). LingRex. Zenodo.
- List, J.-M. & R. Forkel (2021). LingPy. https://lingpy.org
- Hulden, M. (2009). “Foma: a finite-state compiler and library.” EACL.