Skip to content

✨ A simple AI-powered English reading assistant that adds helpful annotations to literary text. Great for learners and language lovers.

License

Notifications You must be signed in to change notification settings

mtskf/Obsidian_annotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIT License

Obsidian_annotator

Obsidian_annotator is a lightweight, no-frills tool for English learners—especially Japanese speakers—who want to deeply understand English texts with the help of GPT-powered inline footnote annotations.

It reads .txt files, splits them into ~700-word chunks, auto-detects the genre, applies genre-specific annotation guidance, and sends each chunk to GPT-4.1 to generate easy-to-understand annotations in Obsidian-style footnotes.


✨ What It Does

  • Automatically detects the text's genre (e.g., literary fiction, fantasy, self-help)

  • Applies genre-specific annotation strategy

  • Adds annotations for:

    • 📚 Difficult or abstract vocabulary
    • 🔧 Complex grammar and syntax
    • 💓 Emotional nuance and tone
    • 🧠 Idioms or ambiguous expressions
    • 🗝️ Symbolism and metaphor
    • 🌍 Cultural references
    • 🧩 Subtle interpretation or logic
    • ⏳ Tense, mood, and aspect
  • 🇯🇵 Japanese glosses (when helpful)

  • Remembers recently annotated terms so later chunks skip duplicate glosses unless the sense changes

  • Leaves extra token headroom per chunk so heavy-footnote sections stay intact

  • Uses a low sampling temperature for flashcard-friendly, consistent explanations

  • Includes strict GPT-4.1 editing safeguards so the original text remains untouched

  • Caches token counts so repeated chunking stays fast even on large inputs

  • Reuses pre-serialized OpenAI payloads so retry loops avoid redundant work

  • Auto-scales OpenAI timeouts for GPT-4.1 so long chunks finish reliably

  • Streams via the OpenAI Responses API for tighter latency and clearer errors

  • Auto-shrinks problematic chunks on repeated timeouts to keep runs moving

  • Outputs .txt files with Obsidian-style footnotes like:

She kept her composure[^1] even as the storm raged outside.

[^1]: 🧠 keep one's composure: remain calm and in control  🇯🇵 平静を保つ

📌 Annotation Rules

  • Footnotes use emoji markers for clarity:
Emoji Category Use Case
📚 Vocabulary Word definitions
🔧 Grammar Structure, syntax
💓 Emotion Feelings, emotional tone
🧠 Idiom/Nuance Phrases, unclear nuance
🗝️ Symbolism Metaphor, allegory, hidden meaning
🌍 Culture Social/cultural context
🧩 Interpretation Psychological/implicit logic
Tense/Aspect Tense, mood, aspect, subjunctive etc.
🇯🇵 JP Gloss Japanese gloss if helpful after English

Each footnote definition must start with exactly one of the emojis above.


♻️ Resume & Memory

  • Each chunk is cached under .cache/<input_name>_chunks/ (input path is sanitised). Re-running without --no-resume skips finished chunks so you can recover from API hiccups quickly.
  • If GPT output fails validation, the raw response is saved under .cache/<input_name>_chunks/failed/ (<chunk>_<attempt>.md) so you can diff what went wrong.
  • A temporary .cache/<input_name>_vocab_memory.json stores roughly the last 200 annotated terms during a run (auto-deleted at completion). Only the most recent entries are sent to the model, nudging it to annotate genuinely new vocabulary.
  • When the final annotated file already exists, the CLI asks whether to overwrite it or keep both (the new file gets an incremental suffix).
  • If a chunk times out twice in a row, the tool automatically splits it into smaller fallback chunks and resumes processing so long passages do not block the run.

🛠️ Configuration

Setting How to change Default Purpose
--chunk-tokens CLI flag 3400 Base token cap per chunk before annotations are added (tuned for GPT-4.1).
FOOTNOTE_MARGIN_TOKENS Env var 300 Reserved annotation budget; effective cap is chunk_tokens - margin (never below 200).
TOKEN_COUNT_CACHE_SIZE Env var 4096 Max entries in the token-count LRU cache (set higher for many repeated paragraphs).
OPENAI_TIMEOUT Env var dynamic (≈ max(60, min(180, 0.03 * chunk_tokens))) Override the per-request timeout if you need longer or shorter waits.
ANNOTATION_TEMPERATURE Env var 0.3 Sampling temperature for annotation prompts; lower values reduce wording drift.
KEEP_CHUNKS Env var (1/true/yes) disabled Keep chunk cache files after a successful run for inspection.
OPENAI_BASE_URL Env var https://api.openai.com/v1 Point to a proxy or compatible endpoint if you self-host the API.
OPENAI_ORGANIZATION Env var unset Optional header for organisation-scoped OpenAI keys.

🚀 Quickstart

# Clone this repo
git clone https://github.com/mtskf/Obsidian_annotator.git
cd Obsidian_annotator

# Install Node dependencies
npm install

# Add your OpenAI key to .env
echo "OPENAI_API_KEY=your_api_key" > .env

# ⚠️ Make sure the .env file exists and contains a valid key. The program will exit if the key is missing.

# Run the annotator on a text file
npm run annotate -- books/sample.txt
# or invoke the binary directly
node ./src/index.mjs books/sample.txt

Output will be saved as your_text_annotated.txt (or *_annotated_1.txt, *_annotated_2.txt, ... if you choose to keep existing files). The CLI requires Node.js 18+ (built-in fetch).


🪪 License

MIT. Free to use, modify, and share. Contributions welcome!

About

✨ A simple AI-powered English reading assistant that adds helpful annotations to literary text. Great for learners and language lovers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published