A fully offline, privacy-focused voice assistant that runs entirely on your local machine. This project integrates lightweight, high-performance AI models to provide a fluid voice-to-voice chat experience without sending any data to the cloud.
- 100% Offline: No internet connection required after initial setup.
- Low Latency: Optimized for local execution using
llama.cppandvosk. - High-Quality TTS: Uses Kokoro for natural-sounding speech synthesis.
- Smart Conversation: Powered by Ministral 3B (or any GGUF model) for intelligent responses.
- Resource Efficient: Designed to run smoothly on consumer hardware (Apple Silicon/CPU).
- Privacy First: Your voice and data never leave your device.
- Language: Python 3.10+
- Speech-to-Text (STT): Vosk (Lightweight, offline speech recognition).
- Large Language Model (LLM): Llama.cpp (Running Ministral 3B GGUF).
- Text-to-Speech (TTS): Kokoro (High-quality offline TTS).
- Audio Handling:
sounddevice,soundfile.
Ensure you have the following installed on your system:
- Python 3.10 or higher.
- Git.
- System Audio Libraries (Required for
sounddeviceandespeak):- macOS (Homebrew):
brew install portaudio espeak-ng
- Linux (Ubuntu/Debian):
sudo apt-get install -y libportaudio2 espeak-ng
- macOS (Homebrew):
-
Clone the repository:
git clone https://github.com/ZaidK07/Offline-Voice-Chatbot cd Offline-Voice-Chatbot -
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install Python dependencies:
pip install -r requirements.txt
You need to download the models manually as they are too large to include in the repository.
A. Speech-to-Text (Vosk)
- Create a
modelsdirectory if it doesn't exist. - Download the vosk-model-small-en-us-0.15 from Vosk Models.
- Extract it into
models/so the path ismodels/vosk-model-small-en-us-0.15.
B. LLM (Ministral 3B)
- Download a GGUF quantized version of Ministral 3B (e.g.,
Ministral-3B-Instruct-v0.1.Q4_K_M.gguf).- Recommended Source: Search for "Ministral GGUF" on Hugging Face.
- Place the
.gguffile directly inside themodels/directory.
Directory Structure Verification:
project_root/
├── main.py
├── vosk-model-small-stt <-- Folder containing Vosk files
├── mistral-3b-instruct/ministral-3b-instruct.gguf <-- Your GGUF model file
├── speech_to_text.py
├── text_to_speech.py
└── gen_ai_model.py
You can tweak settings in the following files:
gen_ai_model.py: Changen_ctx(context size) orn_gpu_layers(GPU offloading).text_to_speech.py: Change thevoicevariable (e.g.,'af_sarah','am_michael').
- Start the Chatbot:
python main.py
- Interaction:
- Wait for the initialization message ("I am online...").
- Speak clearly into your microphone.
- The bot will listen, process your request, and speak back.
- Note: The microphone is disabled while the bot is speaking to prevent it from hearing itself.
Since this is an interactive hardware-dependent project, automated testing is limited.
- Manual Test: Run
main.pyand verify:- Initialization logs appear without error.
- Microphone input is detected (text appears on screen).
- LLM generates a coherent response.
- Audio plays back clearly.
Contributions are welcome!
- Fork the repository.
- Create a feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
Distributed under the MIT License. See LICENSE for more information.