Skip to content

An entirely offline, privacy-centric voice assistant that leverages lightweight local AI for speech-to-text (Vosk), large language model processing (GGUF via Llama.cpp), and text-to-speech (Kokoro), offering seamless, low-latency, and secure voice interactions directly from your machine.

Notifications You must be signed in to change notification settings

ZaidK07/Offline-Voice-Chatbot

Repository files navigation

Offline Voice Chatbot

A fully offline, privacy-focused voice assistant that runs entirely on your local machine. This project integrates lightweight, high-performance AI models to provide a fluid voice-to-voice chat experience without sending any data to the cloud.

Features

  • 100% Offline: No internet connection required after initial setup.
  • Low Latency: Optimized for local execution using llama.cpp and vosk.
  • High-Quality TTS: Uses Kokoro for natural-sounding speech synthesis.
  • Smart Conversation: Powered by Ministral 3B (or any GGUF model) for intelligent responses.
  • Resource Efficient: Designed to run smoothly on consumer hardware (Apple Silicon/CPU).
  • Privacy First: Your voice and data never leave your device.

Technologies Used

  • Language: Python 3.10+
  • Speech-to-Text (STT): Vosk (Lightweight, offline speech recognition).
  • Large Language Model (LLM): Llama.cpp (Running Ministral 3B GGUF).
  • Text-to-Speech (TTS): Kokoro (High-quality offline TTS).
  • Audio Handling: sounddevice, soundfile.

Setup Instructions

1. Prerequisites

Ensure you have the following installed on your system:

  • Python 3.10 or higher.
  • Git.
  • System Audio Libraries (Required for sounddevice and espeak):
    • macOS (Homebrew):
      brew install portaudio espeak-ng
    • Linux (Ubuntu/Debian):
      sudo apt-get install -y libportaudio2 espeak-ng

2. Installation

  1. Clone the repository:

    git clone https://github.com/ZaidK07/Offline-Voice-Chatbot
    cd Offline-Voice-Chatbot
  2. Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install Python dependencies:

    pip install -r requirements.txt

3. Model Setup (Crucial Step)

You need to download the models manually as they are too large to include in the repository.

A. Speech-to-Text (Vosk)

  1. Create a models directory if it doesn't exist.
  2. Download the vosk-model-small-en-us-0.15 from Vosk Models.
  3. Extract it into models/ so the path is models/vosk-model-small-en-us-0.15.

B. LLM (Ministral 3B)

  1. Download a GGUF quantized version of Ministral 3B (e.g., Ministral-3B-Instruct-v0.1.Q4_K_M.gguf).
    • Recommended Source: Search for "Ministral GGUF" on Hugging Face.
  2. Place the .gguf file directly inside the models/ directory.

Directory Structure Verification:

project_root/
├── main.py
├── vosk-model-small-stt    <-- Folder containing Vosk files
├── mistral-3b-instruct/ministral-3b-instruct.gguf    <-- Your GGUF model file
├── speech_to_text.py
├── text_to_speech.py
└── gen_ai_model.py

4. Configuration

You can tweak settings in the following files:

  • gen_ai_model.py: Change n_ctx (context size) or n_gpu_layers (GPU offloading).
  • text_to_speech.py: Change the voice variable (e.g., 'af_sarah', 'am_michael').

Usage

  1. Start the Chatbot:
    python main.py
  2. Interaction:
    • Wait for the initialization message ("I am online...").
    • Speak clearly into your microphone.
    • The bot will listen, process your request, and speak back.
    • Note: The microphone is disabled while the bot is speaking to prevent it from hearing itself.

Testing

Since this is an interactive hardware-dependent project, automated testing is limited.

  • Manual Test: Run main.py and verify:
    1. Initialization logs appear without error.
    2. Microphone input is detected (text appears on screen).
    3. LLM generates a coherent response.
    4. Audio plays back clearly.

Contributing

Contributions are welcome!

  1. Fork the repository.
  2. Create a feature branch (git checkout -b feature/AmazingFeature).
  3. Commit your changes (git commit -m 'Add some AmazingFeature').
  4. Push to the branch (git push origin feature/AmazingFeature).
  5. Open a Pull Request.

License

Distributed under the MIT License. See LICENSE for more information.

About

An entirely offline, privacy-centric voice assistant that leverages lightweight local AI for speech-to-text (Vosk), large language model processing (GGUF via Llama.cpp), and text-to-speech (Kokoro), offering seamless, low-latency, and secure voice interactions directly from your machine.

Topics

Resources

Stars

Watchers

Forks

Languages