Skip to content

ajith-kumar99/Sahayak-AI-Extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sahayak AI 🚀

Sahayak AI Banner Image

Sahayak AI is a revolutionary conversational web navigator that makes any website accessible through natural voice commands, providing real-time audio guidance and dynamic visual highlights in multiple languages.

Challenge Tech Stack License


🎥 Demo Video (Must Watch!)

A brief video is the best way to experience the power and fluidity of Sahayak AI in action.

https://youtu.be/--FzH8gNI44?si=zrPZHvL4BHRd-e9i


🎯 The Problem: The Digital Divide is Real

The web is the backbone of modern society, but for millions of people-the elderly, non-native speakers, or those with digital literacy challenges-it's a maze. Complex layouts, unfamiliar languages, and confusing jargon on essential sites for banking, healthcare, or government services create a frustrating barrier, locking people out of the digital world. Standard screen readers help the visually impaired, but they don't solve the core problem of comprehension and navigation.


✨ Our Solution: Sahayak AI (The Helper)

Sahayak (meaning "helper") is not just another screen reader; it's a compassionate and intelligent co-pilot for the web. It uses a state-of-the-art AI stack to see the webpage like a human, understand your voice commands in your native language, and guide you step-by-step with both audio and visual cues, creating an experience that feels like having a helpful friend by your side.


🌟 Key Features: Built to Impress

  • ⚡ Lightning-Fast Conversational AI: Powered by Murf.ai's WebSocket endpoint, Sahayak delivers instantaneous audio responses. This isn't just a feature-it's the core of a natural, lag-free conversational experience, directly addressing a key challenge criterion.

  • 🗣️ Advanced Multi-Language Support: Sahayak fully embraces Murf.ai's new Indian languages, demonstrating two sophisticated use cases:

    • Native Language TTS: For languages like Hindi, Sahayak provides guidance in native Hindi script, spoken by Murf's authentic native Hindi voice (hi-IN-amit).
    • Regional Accents: For Telugu, Marathi, Kannada, and Gujarati, Sahayak showcases the powerful multiNativeLocale feature, speaking English guidance with a clear, regional accent.
  • 🧠 "Conversation Mode": Sahayak doesn't just answer one question and stop. After providing guidance, it intelligently asks "What's next?" and automatically listens for a follow-up command, maintaining the context of the conversation for multi-step tasks.

  • 👆 Dynamic Visual Highlighting (The Finger): To eliminate any confusion, Sahayak intelligently identifies the exact button, link, or input field you need to interact with and applies a prominent, glowing highlight, ensuring you never get lost.

  • 📖 AI-Powered Summarization ("Read This Page"): With a single click, Sahayak can analyze an entire article, use AI to generate a concise summary, and read it back to you in your chosen language, turning any webpage into a quick audio brief.

  • 🖼️ Persistent UI Window: Sahayak runs in its own dedicated, persistent window. It doesn't disappear when you click on the webpage, allowing for a seamless and continuous guidance experience.


🏆 Showcasing the Power of Murf.ai

Sahayak was architected from the ground up to be a showcase for the Murf.ai platform, demonstrating a deep understanding of its most advanced features.

  1. Exemplary Use of WebSockets: We didn't just use an API; we implemented Murf's documented WebSocket endpoint (wss://api.murf.ai/v1/speech/stream-input). This provides the true low-latency, bidirectional communication that the challenge explicitly asked for, resulting in "lightning fast" audio responses that are crucial for a conversational AI.

  2. Strategic Showcase of New Indian Languages: We didn't just add a language dropdown. We built a system that demonstrates the versatility of Murf's voice library by implementing both native language TTS and the nuanced multiNativeLocale accent feature. This shows a deep dive into the API's capabilities.

  3. Voice as an Integral Solution: In Sahayak, Murf is not an add-on; it's the voice of the AI brain. It delivers dynamic, context-aware instructions generated in real-time. It's the perfect embodiment of "using voice as a solution" to a complex problem.

  4. A Complete, Polished Product: From the multi-feature UI to the conversational memory and the intelligent content scraper, Sahayak is a complete, user-focused product, not just a technical demo.


🛠️ Tech Stack

Category Technology / Service
Voice AI Murf.ai API (WebSocket Streaming, Native & Accented Multi-Language Voices)
AI Logic Google Gemini API (Gemini 2.5 Flash)
Backend Python 3.12, FastAPI, Uvicorn, websockets, httpx
Frontend JavaScript (ES6+), HTML5, CSS3, webkitSpeechRecognition API
Extension Chrome Extension Manifest V3 (Service Worker, chrome.scripting, chrome.windows)

📁 Folder Structure


sahayak-project/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   │   └── routes.py         \# Defines API endpoints (/api/guide, /api/summarize)
│   │   ├── services/
│   │   │   ├── gemini_service.py \# Logic for calling Google Gemini API
│   │   │   └── murf_service.py   \# Logic for Murf.ai WebSocket communication
│   │   └── config.py           \# Handles loading of API keys from .env
│   ├── venv/                   \# Python virtual environment (ignored)
│   ├── main.py                 \# Entry point to run the FastAPI server
│   ├── requirements.txt        \# Python dependencies
│   └── .env                    \# Secret API keys (ignored)
│
├── extension/
│   ├── icons/                  \# Extension icons (16x16, 48x48, 128x128)
│   ├── src/
│   │   ├── background.js       \# Handles window creation & screenshots
│   │   ├── content.js          \# Injected into webpages to handle highlighting
│   │   └── popup/
│   │       ├── popup.html      \# The UI for the persistent window
│   │       ├── popup.js        \# Frontend logic, speech recognition, API calls
│   │       └── popup.css       \# Styling for the UI
│   └── manifest.json           \# Core configuration file for the Chrome Extension
│
├── .gitignore                  \# Specifies files for Git to ignore
└── README.md                   \# This file


⚙️ Setup and Installation

Prerequisites

  • Python 3.12+
  • Google Chrome
  • API Keys from Murf.ai and Google AI Studio.

1. Clone the Repository

git clone [https://github.com/ajith-kumar99/Sahayak-AI-Extension.git](https://github.com/ajith-kumar99/Sahayak-AI-Extension.git)
cd Sahayak-AI

2. Backend Setup

  1. Navigate to the backend directory:
    cd backend
  2. Create and activate a virtual environment:
    python -m venv venv
    .\venv\Scripts\activate
  3. Install dependencies:
    pip install -r requirements.txt
  4. Create a .env file in the backend directory and add your API keys:
    GOOGLE_API_KEY="your_google_gemini_api_key"
    MURF_API_KEY="your_murf_api_key"
  5. Run the backend server:
    uvicorn main:app --reload --port 5001

3. Frontend (Chrome Extension) Setup

  1. Open Google Chrome and navigate to chrome://extensions.
  2. Enable "Developer mode" in the top-right corner.
  3. Click "Load unpacked".
  4. Select the extension folder from the repository.
  5. The Sahayak AI icon will appear in your Chrome toolbar.
  6. Go to site settings of Sahayak AI Extension and allow microphone permission.

🚀 How to Use

  1. Navigate to any webpage you want help with.
  2. Click the Sahayak AI icon in your toolbar to open the persistent assistant window.
  3. Select your desired language (e.g., Hindi, Telugu, English).
  4. Click "🎤 Guide Me" and speak your command (e.g., "how do I sign in?").
  5. Listen to the audio guidance and watch for the red highlight on the page!
  6. For multi-step tasks, simply speak your next command when Sahayak automatically listens again.
  7. To get a summary of a page, click "📖 Read This Page".

Acknowledgements

A huge thank you to the Murf.ai team for hosting this inspiring challenge and providing a powerful, low-latency WebSocket API that made the core of this project possible.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors