AI Voice Tutor

The AI Voice Tutor is an interactive web application designed to bridge the gap between written content and auditory learning. Users can input text, or upload image and PDF files, to receive intelligent answers from Google's Gemini AI. The application then converts the AI's response into natural-sounding speech using Murf.ai's text-to-speech technology.

🚀 Objectives

The primary goal of this project is to create an accessible and versatile tool that:

Transforms Text to Knowledge: Converts static text from various sources into dynamic, audible information.
Enhances Accessibility: Provides an audio-based learning alternative for users who may have reading difficulties, visual impairments, or simply prefer auditory learning.
Leverages Modern AI: Integrates powerful APIs for content understanding (Google Gemini) and voice synthesis (Murf.ai) to provide a seamless and intelligent user experience.
Multi-Modal Input: Accepts user input through direct text entry, as well as text extraction from images and PDF documents, making it a flexible tool for various use cases.

✨ Features

AI-Powered Q&A: Utilizes the Google Gemini API to understand and respond to user queries from the provided text.
High-Quality Text-to-Speech: Employs Murf.ai to generate realistic and natural-sounding audio from the AI's answers.
Optical Character Recognition (OCR): Extracts text directly from uploaded images (.jpg, .png, .jpeg) using Tesseract.js.
PDF Text Extraction: Parses and extracts text content from uploaded PDF files using PDF.js.
Interactive Audio Player: Allows users to play, pause, and listen to the generated speech directly in the browser.
Downloadable Audio: Provides a direct download link for the generated MP3 audio file.
Responsive Frontend: A clean and user-friendly interface that works seamlessly across different devices.
Node.js Backend: A robust server-side application built with Express.js to securely handle API requests.

🖼️ Screenshots

Main Interface & Text Input
The clean and simple user interface for text entry.

File Upload & Text Extraction
Demonstrating the result of uploading a PDF or image file.

AI Response & Audio Playback
Showing the AI's generated answer with the audio player ready.

⚙️ Core Components & Technologies

This project is built upon a modern stack, integrating several powerful libraries and APIs.

Frontend (`/`)

HTML (index.html): Structures the web application's user interface.
CSS (style.css): Provides the styling for a clean and responsive design.
JavaScript (script.js): Manages all client-side logic, including:
- DOM manipulation and event handling.
- API calls to the backend server.
- Client-side text extraction from files.

Backend (`server.js`)

Node.js: A JavaScript runtime for the server-side environment.
Express.js: A web application framework for Node.js, used to create the API endpoint.
CORS: A package to enable Cross-Origin Resource Sharing, allowing the frontend to communicate with the backend.
Dotenv: A module to load environment variables from a .env file for secure key management.

External Services & Libraries

Google Gemini API: The core AI model used for understanding context and generating intelligent answers from the user's text.
Murf.ai API: A leading text-to-speech service that provides high-quality, natural-sounding AI voices.
Tesseract.js: A powerful JavaScript library that performs Optical Character Recognition (OCR) directly in the browser.
PDF.js: A JavaScript library by Mozilla for rendering and parsing PDF files in the browser.

🛠️ Getting Started

Prerequisites

Node.js and npm (or yarn) installed on your machine.
A valid API key from Google AI for the Gemini API.
A valid API key from Murf.ai.

Installation & Setup

Clone the repository:

git clone <your-repository-url>
cd <repository-folder>

Install backend dependencies:
```
npm install
```
Create a .env file in the root directory and add your API keys:
```
GEMINI_API_KEY=YOUR_GEMINI_API_KEY_HERE
```
Add your Murf.ai API Key: Open script.js and replace the placeholder with your actual Murf API key:
```
// In script.js
const MURF_API_KEY = "YOUR_WORKING_MURF_API_KEY_HERE";
```
Start the server:
```
node server.js
```
The server will be running at http://localhost:3000.
Open the application: Open the index.html file in your web browser to start using the AI Voice Tutor.

📜 License

This project is open-source and available for anyone to use. Please refer to the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
node_modules		node_modules
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
script.js		script.js
server.js		server.js
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Voice Tutor

🚀 Objectives

✨ Features

🖼️ Screenshots

⚙️ Core Components & Technologies

Frontend (`/`)

Backend (`server.js`)

External Services & Libraries

🛠️ Getting Started

Prerequisites

Installation & Setup

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Voice Tutor

🚀 Objectives

✨ Features

🖼️ Screenshots

⚙️ Core Components & Technologies

Frontend (/)

Backend (server.js)

External Services & Libraries

🛠️ Getting Started

Prerequisites

Installation & Setup

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Frontend (`/`)

Backend (`server.js`)

Packages