Obsidian_annotator

Obsidian_annotator is a lightweight, no-frills tool for English learners—especially Japanese speakers—who want to deeply understand English texts with the help of GPT-powered inline footnote annotations.

It reads .txt files, splits them into ~700-word chunks, auto-detects the genre, applies genre-specific annotation guidance, and sends each chunk to GPT-4.1 to generate easy-to-understand annotations in Obsidian-style footnotes.

✨ What It Does

Automatically detects the text's genre (e.g., literary fiction, fantasy, self-help)
Applies genre-specific annotation strategy
Adds annotations for:
- 📚 Difficult or abstract vocabulary
- 🔧 Complex grammar and syntax
- 💓 Emotional nuance and tone
- 🧠 Idioms or ambiguous expressions
- 🗝️ Symbolism and metaphor
- 🌍 Cultural references
- 🧩 Subtle interpretation or logic
- ⏳ Tense, mood, and aspect
🇯🇵 Japanese glosses (when helpful)
Remembers recently annotated terms so later chunks skip duplicate glosses unless the sense changes
Leaves extra token headroom per chunk so heavy-footnote sections stay intact
Uses a low sampling temperature for flashcard-friendly, consistent explanations
Includes strict GPT-4.1 editing safeguards so the original text remains untouched
Caches token counts so repeated chunking stays fast even on large inputs
Reuses pre-serialized OpenAI payloads so retry loops avoid redundant work
Auto-scales OpenAI timeouts for GPT-4.1 so long chunks finish reliably
Streams via the OpenAI Responses API for tighter latency and clearer errors
Auto-shrinks problematic chunks on repeated timeouts to keep runs moving
Outputs .txt files with Obsidian-style footnotes like:

She kept her composure[^1] even as the storm raged outside.

[^1]: 🧠 keep one's composure: remain calm and in control  🇯🇵 平静を保つ

📌 Annotation Rules

Footnotes use emoji markers for clarity:

Emoji	Category	Use Case
📚	Vocabulary	Word definitions
🔧	Grammar	Structure, syntax
💓	Emotion	Feelings, emotional tone
🧠	Idiom/Nuance	Phrases, unclear nuance
🗝️	Symbolism	Metaphor, allegory, hidden meaning
🌍	Culture	Social/cultural context
🧩	Interpretation	Psychological/implicit logic
⏳	Tense/Aspect	Tense, mood, aspect, subjunctive etc.
🇯🇵	JP Gloss	Japanese gloss if helpful after English

Each footnote definition must start with exactly one of the emojis above.

♻️ Resume & Memory

Each chunk is cached under .cache/<input_name>_chunks/ (input path is sanitised). Re-running without --no-resume skips finished chunks so you can recover from API hiccups quickly.
If GPT output fails validation, the raw response is saved under .cache/<input_name>_chunks/failed/ (<chunk>_<attempt>.md) so you can diff what went wrong.
A temporary .cache/<input_name>_vocab_memory.json stores roughly the last 200 annotated terms during a run (auto-deleted at completion). Only the most recent entries are sent to the model, nudging it to annotate genuinely new vocabulary.
When the final annotated file already exists, the CLI asks whether to overwrite it or keep both (the new file gets an incremental suffix).
If a chunk times out twice in a row, the tool automatically splits it into smaller fallback chunks and resumes processing so long passages do not block the run.

🛠️ Configuration

Setting	How to change	Default	Purpose
`--chunk-tokens`	CLI flag	`3400`	Base token cap per chunk before annotations are added (tuned for GPT-4.1).
`FOOTNOTE_MARGIN_TOKENS`	Env var	`300`	Reserved annotation budget; effective cap is `chunk_tokens - margin` (never below 200).
`TOKEN_COUNT_CACHE_SIZE`	Env var	`4096`	Max entries in the token-count LRU cache (set higher for many repeated paragraphs).
`OPENAI_TIMEOUT`	Env var	`dynamic` (≈ max(60, min(180, `0.03 * chunk_tokens`)))	Override the per-request timeout if you need longer or shorter waits.
`ANNOTATION_TEMPERATURE`	Env var	`0.3`	Sampling temperature for annotation prompts; lower values reduce wording drift.
`KEEP_CHUNKS`	Env var (`1/true/yes`)	disabled	Keep chunk cache files after a successful run for inspection.
`OPENAI_BASE_URL`	Env var	`https://api.openai.com/v1`	Point to a proxy or compatible endpoint if you self-host the API.
`OPENAI_ORGANIZATION`	Env var	unset	Optional header for organisation-scoped OpenAI keys.

🚀 Quickstart

# Clone this repo
git clone https://github.com/mtskf/Obsidian_annotator.git
cd Obsidian_annotator

# Install Node dependencies
npm install

# Add your OpenAI key to .env
echo "OPENAI_API_KEY=your_api_key" > .env

# ⚠️ Make sure the .env file exists and contains a valid key. The program will exit if the key is missing.

# Run the annotator on a text file
npm run annotate -- books/sample.txt
# or invoke the binary directly
node ./src/index.mjs books/sample.txt

Output will be saved as your_text_annotated.txt (or *_annotated_1.txt, *_annotated_2.txt, ... if you choose to keep existing files). The CLI requires Node.js 18+ (built-in fetch).

🪪 License

MIT. Free to use, modify, and share. Contributions welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Obsidian_annotator

✨ What It Does

📌 Annotation Rules

♻️ Resume & Memory

🛠️ Configuration

🚀 Quickstart

🪪 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mtskf/Obsidian_annotator

Folders and files

Latest commit

History

Repository files navigation

Obsidian_annotator

✨ What It Does

📌 Annotation Rules

♻️ Resume & Memory

🛠️ Configuration

🚀 Quickstart

🪪 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages