Skip to content

jenish-rudani/PageMatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎧 PageMatch — AudioBook Place Finder

Lost your place in an audiobook? Paste a sentence. Get the exact timestamp.

PageMatch transcribes your audiobook once using NVIDIA's Parakeet model running locally on your Apple Silicon GPU via MLX. After that, finding any moment in a 20-hour book takes under a second — just paste a sentence from the text.

No internet. No API keys. No subscriptions. Runs entirely on your machine.

Python PyQt6 MLX License Platform


✨ Why Parakeet + MLX?

Most audiobook tools use Whisper — an autoregressive model that processes audio chunk-by-chunk and accumulates timestamp drift over hours of audio. PageMatch takes a different approach:

Whisper Parakeet + MLX
Architecture Autoregressive Non-autoregressive (CTC/TDT)
Timestamps Per-chunk, drift accumulates Absolute, sentence-level
Hardware (Apple Silicon) CPU or slow MPS Native GPU via unified memory
Long files (2 GB+) Needs manual chunking Built-in chunk_duration param
Speed 1–4× realtime 8–20× realtime on M-series
Deterministic No (beam search) Yes

✨ Features

  • Paste-to-timestamp — paste any sentence, get HH:MM:SS.mmm back instantly
  • Apple Silicon GPU acceleration — Parakeet runs on the M-chip Neural Engine via MLX unified memory
  • One-time indexing — transcribe once (~5–10 min for a 10-hour book), search forever in milliseconds
  • Near-zero timestamp drift — non-autoregressive model produces absolute timestamps, no per-chunk error
  • Top-3 ranked results — confidence scores, matched audio text, single- and cross-segment search
  • Drift correction — fine-tune with an offset spinner; correction is stored permanently in the index
  • One-click VLC playback — jump straight to the moment in a running VLC instance, no new windows
  • Drag & drop GUI — dark PyQt6 interface with live progress and speed readout
  • Batch indexer — pre-index an entire library overnight with one command
  • Multilingual — switch to parakeet-tdt-0.6b-v3 for 25 European languages

🚀 Quick Start

PageMatch uses uv for dependency management — it handles the parakeet-mlx native dependencies cleanly and is dramatically faster than pip.

1. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Clone and sync

git clone https://github.com/yourusername/pagematch.git
cd pagematch
uv sync

uv sync creates a virtualenv and installs everything from pyproject.tomlparakeet-mlx, PyQt6, rapidfuzz, faster-whisper, and dev tooling (ruff, black, pre-commit).

3. Launch the GUI

uv run python src/gui.py

4. Use it

  1. Drop your audiobook (.m4b, .mp3, .m4a, .flac, .wav) onto the drop zone
  2. Build Index — Parakeet transcribes locally on your GPU; first run downloads the model (~600 MB), subsequent runs start immediately
  3. Paste any sentence from the book into the text box
  4. Find Timestamp — top 3 matches appear with timestamps and confidence scores
  5. Click Play in VLC to jump there instantly (requires VLC)

🖥️ CLI Usage

Index an audiobook

# First-time index (downloads model on first run)
uv run python src/find_my_place.py --audio "book.m4b" --index

# Specific model (multilingual)
uv run python src/find_my_place.py --audio "book.m4b" --index --model parakeet-tdt-0.6b-v3

# Bake in a drift correction
uv run python src/find_my_place.py --audio "book.m4b" --index --offset 2.0

# Force rebuild
uv run python src/find_my_place.py --audio "book.m4b" --index --reindex

Search for a snippet

uv run python src/find_my_place.py --audio "book.m4b" --text "The morning brought no relief"

Interactive mode

uv run python src/find_my_place.py --audio "book.m4b"

Batch index a whole library

uv run python src/index_folder.py /path/to/audiobooks/

# Force rebuild all
uv run python src/index_folder.py /path/to/audiobooks/ --reindex

⏱️ Drift Correction

Parakeet uses absolute timestamps — drift should be near zero for most files. But if your audiobook has a silent intro or publisher bumper that shifts everything:

  1. Search for a sentence you know the exact position of in your player
  2. Measure the gap between what PageMatch reports and the actual audio time
  3. Enter that value (seconds) in the Drift Correction spinner in the GUI
  4. Click Rebuild — the offset is stored in the index permanently

Via CLI: pass --offset 3.0 during --index and it's saved for all future searches automatically.


🤖 Models

Model Languages Speed Notes
parakeet-tdt-0.6b-v2 English only ⚡ Fastest Default
parakeet-tdt-0.6b-v3 25 European languages Slightly slower Non-English books

Models are downloaded automatically from mlx-community/ on HuggingFace on first run and cached locally.


📁 Project Structure

pagematch/
├── src/
│   ├── find_my_place.py    # Core: MLX Parakeet transcription, SQLite index, fuzzy search
│   ├── gui.py              # PyQt6 GUI — drag & drop, indexing, search, VLC control
│   └── index_folder.py     # Batch indexer CLI
├── pyproject.toml          # uv / PEP 517 project config
├── README.md
└── LICENSE

🛠️ Requirements

  • macOS with Apple Silicon (M1 / M2 / M3 / M4) — required for MLX / Parakeet GPU acceleration

  • Python 3.10+

  • ffmpeg — required for audio decoding

    brew install ffmpeg
  • VLC (optional) — for one-click playback at timestamp

Intel Mac / Linux / Windows? faster-whisper is already in pyproject.toml. Open an issue if you'd like the non-MLX code path wired up in the GUI.


🗄️ How It Works

Indexing — PageMatch feeds your audiobook through Parakeet via parakeet-mlx with chunk_duration=120s and overlap_duration=15s. Because Parakeet is non-autoregressive, it processes each chunk in parallel on the GPU and returns absolute sentence-level timestamps. These are stored in a lightweight SQLite .abfinder.db file sitting next to your audio file — no external database, nothing to configure.

Searching — Your pasted query is matched against all stored segments using rapidfuzz partial ratio scoring. Both single-segment and two-segment sliding window matches are evaluated, catching queries that straddle a transcript boundary. The top 3 results are returned ranked by confidence score.

VLC integration — PageMatch either launches VLC with --start-time set to the matched position, or sends a seek command to an already-running VLC instance via its built-in HTTP interface on localhost:9090 — so you never get a second VLC window opening.


🤝 Contributing

PRs and issues welcome. Some ideas on the roadmap:

  • Non-MLX code path for Linux / Windows (faster-whisper backend already in deps)
  • Export results to .srt or chapter markers
  • Chapter-aware search (show chapter name alongside timestamp)
  • Apple Shortcuts / Raycast extension
  • Sentence highlighting in transcript preview

Please open an issue before starting large changes.


📄 License

MIT © 2025 — see LICENSE for details.


Made for people who listen at 3× speed and still lose their place.

About

PageMatch transcribes your audiobook once using NVIDIA's Parakeet model running locally on your Apple Silicon GPU via MLX. After that, finding any moment in a 20-hour book takes under a second — just paste a sentence from the text.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages