Skip to content

This project aims to build an intelligent, end-to-end pipeline that converts a PDF document into a professionally narrated, chapter-wise audiobook. The pipeline leverages AI technologies—specifically OpenAI’s ChatGPT for intelligent content structuring and narration planning, and Sarvam AI’s text-to-speech (TTS) engine for natural voice synthesis.

Notifications You must be signed in to change notification settings

dhiraj7kr/BookToAudioBook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Project Description: Automated Chapter-wise Audiobook Generation from PDF Using AI

This project aims to build an intelligent, end-to-end pipeline that converts a PDF document into a professionally narrated, chapter-wise audiobook. The pipeline leverages AI technologies—specifically OpenAI’s ChatGPT for intelligent content structuring and narration planning, and Sarvam AI’s text-to-speech (TTS) engine for natural voice synthesis.

Step-by-Step Workflow

  1. PDF Ingestion and Parsing

    • The user uploads a PDF document containing the content to be converted into an audiobook.
    • The system extracts the text content from the PDF while preserving the logical and visual structure (e.g., chapters, sections, headings).
    • Special handling ensures that footnotes, captions, or formatting artifacts are either cleaned or appropriately integrated into the narration.
  2. Content Structuring and Chapter Segmentation

    • ChatGPT processes the extracted text to identify and segment it into chapters and sub-sections.
    • If the original PDF lacks explicit chapter markers, ChatGPT intelligently infers logical divisions based on headings, thematic breaks, and content flow.
    • Each chapter is labeled appropriately and stored for further processing.
  3. Narration Planning and Voice Guidance

    • For each chapter, ChatGPT generates a narration plan. This includes:

      • Narrative Tone & Style: e.g., formal, conversational, storytelling, academic.
      • Voice Instructions: e.g., where to pause, emphasize, soften or raise tone, based on content type (quotes, emotions, dialogues).
      • Contextual Enhancements: Optionally add brief introductory or closing statements for each chapter, or clarify complex terms if the target audience is general.
  4. Text-to-Speech Conversion

    • The processed and structured narration-ready text is sent to Sarvam AI’s TTS engine.
    • One chapter at a time is fed into the TTS model, using the narration guidance provided by ChatGPT to influence voice modulation, inflection, and timing.
    • The output is a high-quality audio file for each chapter, named and organized sequentially.
  5. Output Compilation

    • The final output is a collection of chapter-wise audio files (e.g., Chapter_01_Introduction.mp3, Chapter_02_The_History.mp3, etc.).
    • Optional metadata (e.g., duration, transcript, summary) can be generated and attached to each chapter for audiobook publishing platforms.

Optional Features for Enhancement

  • Voice Personalization: Allow the user to choose different voices or accents from Sarvam AI.
  • Multilingual Support: Translate content before TTS for bilingual audiobooks.
  • Summary Versions: Generate short summaries of each chapter as bonus audio content.
  • Accessibility Enhancements: Add descriptions for images or diagrams in the original PDF.

Use Case Applications

  • Educational audiobooks (textbooks, guides)
  • Storybooks and fiction
  • Corporate training materials
  • Accessibility solutions for visually impaired users

This pipeline ensures that the audiobook is not just a robotic readout of text but a context-aware, tone-sensitive, chapter-organized narrative tailored for listener engagement and clarity.

image

About

This project aims to build an intelligent, end-to-end pipeline that converts a PDF document into a professionally narrated, chapter-wise audiobook. The pipeline leverages AI technologies—specifically OpenAI’s ChatGPT for intelligent content structuring and narration planning, and Sarvam AI’s text-to-speech (TTS) engine for natural voice synthesis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages