The AI Voice Tutor is an interactive web application designed to bridge the gap between written content and auditory learning. Users can input text, or upload image and PDF files, to receive intelligent answers from Google's Gemini AI. The application then converts the AI's response into natural-sounding speech using Murf.ai's text-to-speech technology.
The primary goal of this project is to create an accessible and versatile tool that:
- Transforms Text to Knowledge: Converts static text from various sources into dynamic, audible information.
- Enhances Accessibility: Provides an audio-based learning alternative for users who may have reading difficulties, visual impairments, or simply prefer auditory learning.
- Leverages Modern AI: Integrates powerful APIs for content understanding (Google Gemini) and voice synthesis (Murf.ai) to provide a seamless and intelligent user experience.
- Multi-Modal Input: Accepts user input through direct text entry, as well as text extraction from images and PDF documents, making it a flexible tool for various use cases.
- AI-Powered Q&A: Utilizes the Google Gemini API to understand and respond to user queries from the provided text.
- High-Quality Text-to-Speech: Employs Murf.ai to generate realistic and natural-sounding audio from the AI's answers.
- Optical Character Recognition (OCR): Extracts text directly from uploaded images (
.jpg,.png,.jpeg) using Tesseract.js. - PDF Text Extraction: Parses and extracts text content from uploaded PDF files using PDF.js.
- Interactive Audio Player: Allows users to play, pause, and listen to the generated speech directly in the browser.
- Downloadable Audio: Provides a direct download link for the generated MP3 audio file.
- Responsive Frontend: A clean and user-friendly interface that works seamlessly across different devices.
- Node.js Backend: A robust server-side application built with Express.js to securely handle API requests.
Main Interface & Text Input
The clean and simple user interface for text entry.
File Upload & Text Extraction
Demonstrating the result of uploading a PDF or image file.
AI Response & Audio Playback
Showing the AI's generated answer with the audio player ready.
This project is built upon a modern stack, integrating several powerful libraries and APIs.
- HTML (
index.html): Structures the web application's user interface. - CSS (
style.css): Provides the styling for a clean and responsive design. - JavaScript (
script.js): Manages all client-side logic, including:- DOM manipulation and event handling.
- API calls to the backend server.
- Client-side text extraction from files.
- Node.js: A JavaScript runtime for the server-side environment.
- Express.js: A web application framework for Node.js, used to create the API endpoint.
- CORS: A package to enable Cross-Origin Resource Sharing, allowing the frontend to communicate with the backend.
- Dotenv: A module to load environment variables from a
.envfile for secure key management.
- Google Gemini API: The core AI model used for understanding context and generating intelligent answers from the user's text.
- Murf.ai API: A leading text-to-speech service that provides high-quality, natural-sounding AI voices.
- Tesseract.js: A powerful JavaScript library that performs Optical Character Recognition (OCR) directly in the browser.
- PDF.js: A JavaScript library by Mozilla for rendering and parsing PDF files in the browser.
- Node.js and npm (or yarn) installed on your machine.
- A valid API key from Google AI for the Gemini API.
- A valid API key from Murf.ai.
-
Clone the repository:
git clone <your-repository-url> cd <repository-folder>
-
Install backend dependencies:
npm install
-
Create a
.envfile in the root directory and add your API keys:GEMINI_API_KEY=YOUR_GEMINI_API_KEY_HERE -
Add your Murf.ai API Key: Open
script.jsand replace the placeholder with your actual Murf API key:// In script.js const MURF_API_KEY = "YOUR_WORKING_MURF_API_KEY_HERE";
-
Start the server:
node server.js
The server will be running at
http://localhost:3000. -
Open the application: Open the
index.htmlfile in your web browser to start using the AI Voice Tutor.
This project is open-source and available for anyone to use. Please refer to the LICENSE file for more details.
