Sahayak AI is a revolutionary conversational web navigator that makes any website accessible through natural voice commands, providing real-time audio guidance and dynamic visual highlights in multiple languages.
A brief video is the best way to experience the power and fluidity of Sahayak AI in action.
https://youtu.be/--FzH8gNI44?si=zrPZHvL4BHRd-e9i
The web is the backbone of modern society, but for millions of people-the elderly, non-native speakers, or those with digital literacy challenges-it's a maze. Complex layouts, unfamiliar languages, and confusing jargon on essential sites for banking, healthcare, or government services create a frustrating barrier, locking people out of the digital world. Standard screen readers help the visually impaired, but they don't solve the core problem of comprehension and navigation.
Sahayak (meaning "helper") is not just another screen reader; it's a compassionate and intelligent co-pilot for the web. It uses a state-of-the-art AI stack to see the webpage like a human, understand your voice commands in your native language, and guide you step-by-step with both audio and visual cues, creating an experience that feels like having a helpful friend by your side.
-
⚡ Lightning-Fast Conversational AI: Powered by Murf.ai's WebSocket endpoint, Sahayak delivers instantaneous audio responses. This isn't just a feature-it's the core of a natural, lag-free conversational experience, directly addressing a key challenge criterion.
-
🗣️ Advanced Multi-Language Support: Sahayak fully embraces Murf.ai's new Indian languages, demonstrating two sophisticated use cases:
- Native Language TTS: For languages like Hindi, Sahayak provides guidance in native Hindi script, spoken by Murf's authentic native Hindi voice (
hi-IN-amit). - Regional Accents: For Telugu, Marathi, Kannada, and Gujarati, Sahayak showcases the powerful
multiNativeLocalefeature, speaking English guidance with a clear, regional accent.
- Native Language TTS: For languages like Hindi, Sahayak provides guidance in native Hindi script, spoken by Murf's authentic native Hindi voice (
-
🧠 "Conversation Mode": Sahayak doesn't just answer one question and stop. After providing guidance, it intelligently asks "What's next?" and automatically listens for a follow-up command, maintaining the context of the conversation for multi-step tasks.
-
👆 Dynamic Visual Highlighting (The Finger): To eliminate any confusion, Sahayak intelligently identifies the exact button, link, or input field you need to interact with and applies a prominent, glowing highlight, ensuring you never get lost.
-
📖 AI-Powered Summarization ("Read This Page"): With a single click, Sahayak can analyze an entire article, use AI to generate a concise summary, and read it back to you in your chosen language, turning any webpage into a quick audio brief.
-
🖼️ Persistent UI Window: Sahayak runs in its own dedicated, persistent window. It doesn't disappear when you click on the webpage, allowing for a seamless and continuous guidance experience.
Sahayak was architected from the ground up to be a showcase for the Murf.ai platform, demonstrating a deep understanding of its most advanced features.
-
Exemplary Use of WebSockets: We didn't just use an API; we implemented Murf's documented WebSocket endpoint (
wss://api.murf.ai/v1/speech/stream-input). This provides the true low-latency, bidirectional communication that the challenge explicitly asked for, resulting in "lightning fast" audio responses that are crucial for a conversational AI. -
Strategic Showcase of New Indian Languages: We didn't just add a language dropdown. We built a system that demonstrates the versatility of Murf's voice library by implementing both native language TTS and the nuanced
multiNativeLocaleaccent feature. This shows a deep dive into the API's capabilities. -
Voice as an Integral Solution: In Sahayak, Murf is not an add-on; it's the voice of the AI brain. It delivers dynamic, context-aware instructions generated in real-time. It's the perfect embodiment of "using voice as a solution" to a complex problem.
-
A Complete, Polished Product: From the multi-feature UI to the conversational memory and the intelligent content scraper, Sahayak is a complete, user-focused product, not just a technical demo.
| Category | Technology / Service |
|---|---|
| Voice AI | Murf.ai API (WebSocket Streaming, Native & Accented Multi-Language Voices) |
| AI Logic | Google Gemini API (Gemini 2.5 Flash) |
| Backend | Python 3.12, FastAPI, Uvicorn, websockets, httpx |
| Frontend | JavaScript (ES6+), HTML5, CSS3, webkitSpeechRecognition API |
| Extension | Chrome Extension Manifest V3 (Service Worker, chrome.scripting, chrome.windows) |
sahayak-project/
├── backend/
│ ├── app/
│ │ ├── api/
│ │ │ └── routes.py \# Defines API endpoints (/api/guide, /api/summarize)
│ │ ├── services/
│ │ │ ├── gemini_service.py \# Logic for calling Google Gemini API
│ │ │ └── murf_service.py \# Logic for Murf.ai WebSocket communication
│ │ └── config.py \# Handles loading of API keys from .env
│ ├── venv/ \# Python virtual environment (ignored)
│ ├── main.py \# Entry point to run the FastAPI server
│ ├── requirements.txt \# Python dependencies
│ └── .env \# Secret API keys (ignored)
│
├── extension/
│ ├── icons/ \# Extension icons (16x16, 48x48, 128x128)
│ ├── src/
│ │ ├── background.js \# Handles window creation & screenshots
│ │ ├── content.js \# Injected into webpages to handle highlighting
│ │ └── popup/
│ │ ├── popup.html \# The UI for the persistent window
│ │ ├── popup.js \# Frontend logic, speech recognition, API calls
│ │ └── popup.css \# Styling for the UI
│ └── manifest.json \# Core configuration file for the Chrome Extension
│
├── .gitignore \# Specifies files for Git to ignore
└── README.md \# This file
- Python 3.12+
- Google Chrome
- API Keys from Murf.ai and Google AI Studio.
git clone [https://github.com/ajith-kumar99/Sahayak-AI-Extension.git](https://github.com/ajith-kumar99/Sahayak-AI-Extension.git)
cd Sahayak-AI- Navigate to the
backenddirectory:cd backend - Create and activate a virtual environment:
python -m venv venv .\venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Create a
.envfile in thebackenddirectory and add your API keys:GOOGLE_API_KEY="your_google_gemini_api_key" MURF_API_KEY="your_murf_api_key"
- Run the backend server:
uvicorn main:app --reload --port 5001
- Open Google Chrome and navigate to
chrome://extensions. - Enable "Developer mode" in the top-right corner.
- Click "Load unpacked".
- Select the
extensionfolder from the repository. - The Sahayak AI icon will appear in your Chrome toolbar.
- Go to site settings of Sahayak AI Extension and allow microphone permission.
- Navigate to any webpage you want help with.
- Click the Sahayak AI icon in your toolbar to open the persistent assistant window.
- Select your desired language (e.g., Hindi, Telugu, English).
- Click "🎤 Guide Me" and speak your command (e.g., "how do I sign in?").
- Listen to the audio guidance and watch for the red highlight on the page!
- For multi-step tasks, simply speak your next command when Sahayak automatically listens again.
- To get a summary of a page, click "📖 Read This Page".
A huge thank you to the Murf.ai team for hosting this inspiring challenge and providing a powerful, low-latency WebSocket API that made the core of this project possible.
