Obsidian_annotator is a lightweight, no-frills tool for English learners—especially Japanese speakers—who want to deeply understand English texts with the help of GPT-powered inline footnote annotations.
It reads .txt files, splits them into ~700-word chunks, auto-detects the genre, applies genre-specific annotation guidance, and sends each chunk to GPT-4.1 to generate easy-to-understand annotations in Obsidian-style footnotes.
-
Automatically detects the text's genre (e.g., literary fiction, fantasy, self-help)
-
Applies genre-specific annotation strategy
-
Adds annotations for:
- 📚 Difficult or abstract vocabulary
- 🔧 Complex grammar and syntax
- 💓 Emotional nuance and tone
- 🧠 Idioms or ambiguous expressions
- 🗝️ Symbolism and metaphor
- 🌍 Cultural references
- 🧩 Subtle interpretation or logic
- ⏳ Tense, mood, and aspect
-
🇯🇵 Japanese glosses (when helpful)
-
Remembers recently annotated terms so later chunks skip duplicate glosses unless the sense changes
-
Leaves extra token headroom per chunk so heavy-footnote sections stay intact
-
Uses a low sampling temperature for flashcard-friendly, consistent explanations
-
Includes strict GPT-4.1 editing safeguards so the original text remains untouched
-
Caches token counts so repeated chunking stays fast even on large inputs
-
Reuses pre-serialized OpenAI payloads so retry loops avoid redundant work
-
Auto-scales OpenAI timeouts for GPT-4.1 so long chunks finish reliably
-
Streams via the OpenAI Responses API for tighter latency and clearer errors
-
Auto-shrinks problematic chunks on repeated timeouts to keep runs moving
-
Outputs
.txtfiles with Obsidian-style footnotes like:
She kept her composure[^1] even as the storm raged outside.
[^1]: 🧠 keep one's composure: remain calm and in control 🇯🇵 平静を保つ
- Footnotes use emoji markers for clarity:
| Emoji | Category | Use Case |
|---|---|---|
| 📚 | Vocabulary | Word definitions |
| 🔧 | Grammar | Structure, syntax |
| 💓 | Emotion | Feelings, emotional tone |
| 🧠 | Idiom/Nuance | Phrases, unclear nuance |
| 🗝️ | Symbolism | Metaphor, allegory, hidden meaning |
| 🌍 | Culture | Social/cultural context |
| 🧩 | Interpretation | Psychological/implicit logic |
| ⏳ | Tense/Aspect | Tense, mood, aspect, subjunctive etc. |
| 🇯🇵 | JP Gloss | Japanese gloss if helpful after English |
Each footnote definition must start with exactly one of the emojis above.
- Each chunk is cached under
.cache/<input_name>_chunks/(input path is sanitised). Re-running without--no-resumeskips finished chunks so you can recover from API hiccups quickly. - If GPT output fails validation, the raw response is saved under
.cache/<input_name>_chunks/failed/(<chunk>_<attempt>.md) so you can diff what went wrong. - A temporary
.cache/<input_name>_vocab_memory.jsonstores roughly the last 200 annotated terms during a run (auto-deleted at completion). Only the most recent entries are sent to the model, nudging it to annotate genuinely new vocabulary. - When the final annotated file already exists, the CLI asks whether to overwrite it or keep both (the new file gets an incremental suffix).
- If a chunk times out twice in a row, the tool automatically splits it into smaller fallback chunks and resumes processing so long passages do not block the run.
| Setting | How to change | Default | Purpose |
|---|---|---|---|
--chunk-tokens |
CLI flag | 3400 |
Base token cap per chunk before annotations are added (tuned for GPT-4.1). |
FOOTNOTE_MARGIN_TOKENS |
Env var | 300 |
Reserved annotation budget; effective cap is chunk_tokens - margin (never below 200). |
TOKEN_COUNT_CACHE_SIZE |
Env var | 4096 |
Max entries in the token-count LRU cache (set higher for many repeated paragraphs). |
OPENAI_TIMEOUT |
Env var | dynamic (≈ max(60, min(180, 0.03 * chunk_tokens))) |
Override the per-request timeout if you need longer or shorter waits. |
ANNOTATION_TEMPERATURE |
Env var | 0.3 |
Sampling temperature for annotation prompts; lower values reduce wording drift. |
KEEP_CHUNKS |
Env var (1/true/yes) |
disabled | Keep chunk cache files after a successful run for inspection. |
OPENAI_BASE_URL |
Env var | https://api.openai.com/v1 |
Point to a proxy or compatible endpoint if you self-host the API. |
OPENAI_ORGANIZATION |
Env var | unset | Optional header for organisation-scoped OpenAI keys. |
# Clone this repo
git clone https://github.com/mtskf/Obsidian_annotator.git
cd Obsidian_annotator
# Install Node dependencies
npm install
# Add your OpenAI key to .env
echo "OPENAI_API_KEY=your_api_key" > .env
# ⚠️ Make sure the .env file exists and contains a valid key. The program will exit if the key is missing.
# Run the annotator on a text file
npm run annotate -- books/sample.txt
# or invoke the binary directly
node ./src/index.mjs books/sample.txtOutput will be saved as your_text_annotated.txt (or *_annotated_1.txt, *_annotated_2.txt, ... if you choose to keep existing files). The CLI requires Node.js 18+ (built-in fetch).
MIT. Free to use, modify, and share. Contributions welcome!