Production-grade, full-stack voice AI platform delivering sub-3-second voice interactions with 25+ AI personalities, enterprise-level caching, and real-time audio streaming.
- Overview
- Architecture
- Key Features
- Technology Stack
- Performance Metrics
- Quick Start
- API Documentation
- Frontend Documentation
- Deployment
- Contributing
LiV.AI is a comprehensive voice AI platform that combines cutting-edge speech recognition, natural language processing, and neural text-to-speech synthesis to deliver human-like conversational experiences. Built for scale, the system handles real-time voice interactions with sub-3-second latency while maintaining 99.9% uptime.
- β‘ Sub-3s Response Time - From voice input to audio output
- π 25+ AI Personalities - Multi-cultural, age-diverse voice profiles
- π 99.9% Uptime - Production-tested with intelligent fallbacks
- πΎ 90%+ Cache Hit Rate - Redis-powered performance optimization
- π Multi-Language Support - English, Hindi, Japanese, French, German
graph TB
subgraph "Frontend Layer"
A[React UI] --> B[Voice Interface]
B --> C[Audio Recording]
C --> D[WebSocket/HTTP Client]
end
subgraph "API Gateway"
D --> E[FastAPI Server]
E --> F[CORS & Auth Middleware]
end
subgraph "Processing Pipeline"
F --> G[Speech-to-Text Engine]
G --> H{Cache Check}
H -->|Hit| I[Cached Response]
H -->|Miss| J[AI Response Generator]
J --> K[Text-to-Speech Engine]
K --> L[Audio Optimizer]
end
subgraph "AI Services"
G --> M[Deepgram API]
G --> N[AssemblyAI Fallback]
J --> O[Gemini Flash]
J --> P[OpenAI Fallback]
K --> Q[Cartesia TTS]
K --> R[gTTS Fallback]
end
subgraph "Data Layer"
H --> S[(Redis Cache)]
J --> T[(Supabase DB)]
T --> U[Conversation History]
T --> V[User Analytics]
end
L --> W[Audio Response]
W --> D
sequenceDiagram
participant U as User
participant F as Frontend
participant API as FastAPI
participant STT as Deepgram STT
participant Cache as Redis Cache
participant AI as Gemini Flash
participant TTS as Cartesia TTS
U->>F: Speaks into microphone
F->>F: Records audio (WebM/WAV)
F->>API: POST /voice-call-ultra-fast
par STT Processing
API->>STT: Audio buffer (direct)
STT-->>API: Transcript (1.5-2.2s)
and Preload Resources
API->>API: Load bot personality
API->>Cache: Check instant responses
end
alt Cache Hit
Cache-->>API: Instant response (<0.1s)
else Cache Miss
API->>AI: Generate response (2-3s)
AI-->>API: AI response
API->>Cache: Store response
end
par TTS Generation
API->>Cache: Check TTS cache
alt TTS Cache Hit
Cache-->>API: Cached audio (<0.1s)
else TTS Cache Miss
API->>TTS: Generate audio (0.5-1.8s)
TTS-->>API: Audio base64
API->>Cache: Store audio
end
and Background Tasks
API->>API: Log conversation
API->>API: Update analytics
end
API-->>F: JSON Response
F->>F: Decode audio
F->>U: Play audio response
Note over U,TTS: Total Time: 2.5-3.5s (Target: <6s)
graph LR
subgraph "Input Processing"
A[Audio Input] --> B[Format Validation]
B --> C[Buffer Processing]
C --> D[STT Engine]
end
subgraph "Intelligence Layer"
D --> E{Instant Response?}
E -->|Yes| F[Pattern Match Cache]
E -->|No| G[AI Model Selection]
G --> H[Context Retrieval]
H --> I[Response Generation]
end
subgraph "Output Processing"
F --> J[TTS Cache Check]
I --> J
J -->|Hit| K[Serve Cached Audio]
J -->|Miss| L[Generate Audio]
L --> M[Audio Optimization]
M --> N[Cache Storage]
end
subgraph "Monitoring"
O[Performance Metrics]
P[Error Tracking]
Q[Analytics Dashboard]
end
K --> R[Client Response]
N --> R
R --> O
D --> P
I --> Q
| Category | Personalities | Voice Characteristics |
|---|---|---|
| Indian | Old Male/Female, Mid Male/Female, Romantic Male/Female | Hindi-English blend, cultural wisdom, warm tonality |
| Japanese | Old Male/Female, Mid Male/Female, Romantic Male/Female | Honorific language, zen philosophy, refined elegance |
| Parisian | Old Male/Female, Mid Male/Female, Romantic Female | French sophistication, intellectual charm, artistic flair |
| Berlin | Old Male/Female, Mid Male/Female, Romantic Male/Female | German efficiency, progressive values, creative depth |
| Spiritual | Krishna, Rama, Hanuman, Shiva, Trimurti | Divine wisdom, philosophical depth, transcendent guidance |
# Before: 7+ seconds | After: 1.5-2.2 seconds
- Direct buffer processing (no temporary files)
- Deepgram Nova-2 primary engine
- Intelligent fallback chain
- Enhanced model configurationsCACHE_LAYERS = {
"instant_responses": "<0.1s", # Pattern-matched phrases
"tts_cache": "<0.1s", # Pre-generated audio
"response_cache": "0.5-1s", # AI responses
"memory_cache": "1-2s" # Conversation context
}def get_smart_audio_format(text: str, use_case: str):
"""
Dynamically optimize audio format based on:
- Text length and complexity
- Use case (voice_call, streaming, high_quality)
- Network conditions
Returns optimal encoding, sample rate, and container
"""# Concurrent execution of:
- STT processing + Resource preloading
- AI response generation + TTS preparation
- Audio generation + Background logging
- Cache storage + Analytics update| Component | Technology | Purpose |
|---|---|---|
| API Framework | FastAPI 0.104+ | Async HTTP server with automatic OpenAPI docs |
| Speech-to-Text | Deepgram Nova-2 | Primary STT with 1.5-2.2s latency |
| STT Fallback | AssemblyAI, Google Speech | Redundancy for 99.9% uptime |
| AI Models | Gemini Flash, OpenAI GPT-3.5 | Fast response generation (2-3s) |
| Text-to-Speech | Cartesia Sonic | Neural voice synthesis (0.5-1.8s) |
| TTS Fallback | gTTS | Reliability fallback |
| Caching | Redis (Upstash) | Sub-100ms cache retrieval |
| Database | Supabase (PostgreSQL) | Conversation logs, analytics |
| Audio Processing | pydub, FFmpeg | Format conversion, optimization |
| Async Runtime | uvicorn, asyncio | High-concurrency request handling |
| Component | Technology | Purpose |
|---|---|---|
| Framework | React 18+ | Component-based UI architecture |
| Styling | Tailwind CSS | Utility-first responsive design |
| Animations | Framer Motion | Smooth UI transitions |
| Audio | Web Audio API | Real-time audio recording/playback |
| HTTP Client | Axios | API communication |
| State Management | React Hooks | Local state management |
| Build Tool | Create React App | Development environment |
| Component | Technology | Purpose |
|---|---|---|
| Containerization | Docker | Reproducible deployments |
| CI/CD | GitHub Actions | Automated testing and deployment |
| Monitoring | Custom Analytics API | Performance metrics, error tracking |
| Logging | Python logging + File storage | Comprehensive system logs |
| Version Control | Git, GitHub | Source code management |
| Metric | Target | Actual | Status |
|---|---|---|---|
| Total Response Time | <6.0s | 2.5-3.5s | β 58% faster |
| STT Latency | <2.5s | 1.5-2.2s | β 20% faster |
| AI Response Time | <3.0s | 2.1-2.8s | β 7% faster |
| TTS Generation | <2.0s | 0.5-1.8s | β 10% faster |
| Cache Hit Rate | >80% | 85-92% | β 15% better |
| Memory Usage | <512MB | 256-384MB | β 25% less |
| Uptime | >99% | 99.9% | β Exceeded |
| Concurrent Users | 100+ | 150+ | β 50% more |
xychart-beta
title "Response Time Optimization (Last 6 Months)"
x-axis [May, Jun, Jul, Aug, Sep, Oct]
y-axis "Response Time (seconds)" 0 --> 10
line [8.5, 7.2, 5.8, 4.5, 3.8, 3.2]
# System Requirements
- Python 3.8+ with pip
- Node.js 16+ with npm
- Redis instance (local or cloud)
- FFmpeg (for audio processing)
# API Keys Required
- Deepgram API key (STT)
- Gemini API key (AI responses)
- Cartesia API key (TTS)
- Supabase credentials (database)
- Redis credentials (caching)# 1. Clone the repository
git clone https://github.com/Likhith623/LiV.AI-Voice-Agent.git
cd LiV.AI-Voice-Agent
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment variables
cp .env.example .env
# Edit .env with your API keys:
nano .env # or use any text editor
# 5. Start the backend server
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Server will be available at http://localhost:8000
# API docs at http://localhost:8000/docs# 1. Navigate to frontend directory
cd frontend
# 2. Install dependencies
npm install
# 3. Configure API endpoint (optional)
# Edit src/config/api.js if backend is not on localhost:8000
# 4. Start development server
npm start
# Frontend will open at http://localhost:3000# 1. Build backend image
docker build -t liv-ai-backend .
# 2. Run with docker-compose
docker-compose up -d
# Services will be available at:
# - Backend: http://localhost:8000
# - Frontend: http://localhost:3000
# - Redis: localhost:6379POST /voice-call-ultra-fast
Content-Type: multipart/form-data
Parameters:
- audio_file: File (required) - Audio file in WAV/MP3/WebM format
- bot_id: string (optional) - Personality ID (default: "delhi_mentor_male")
- email: string (optional) - User email for analytics
- platform: string (optional) - Platform identifier (web/mobile)
Response: 200 OK
{
"transcript": "Hello, how are you?",
"text_response": "I'm doing great! How can I help you today?",
"voice_id": "fd2ada67-c2d9-4afe-b474-6386b87d8fc3",
"audio_base64": "UklGRiQAAABXQVZFZm10IBAAA...",
"performance": {
"total_time": 3.2,
"target_achieved": true,
"optimizations_applied": [
"ultra_fast_stt",
"instant_response_check",
"ultra_fast_audio_format"
]
},
"cached": false
}POST /generate-audio-optimized
Content-Type: application/json
Body:
{
"transcript": "Hello from LiV.AI!",
"bot_id": "indian_old_male",
"output_format": {
"container": "wav",
"encoding": "pcm_s16le",
"sample_rate": 8000
}
}
Response: 200 OK
{
"voice_id": "fd2ada67-c2d9-4afe-b474-6386b87d8fc3",
"audio_base64": "UklGRiQAAABXQVZFZm10IBAAA...",
"cached": true,
"generation_time": 0.15,
"optimization_used": "perfect_http"
}POST /stream-audio-raw
Content-Type: application/json
Body:
{
"transcript": "Streaming audio example",
"bot_id": "japanese_mid_female"
}
Response: 200 OK (audio/wav stream)
Binary audio data streamed in chunksGET /stt-performance/stats
Response: 200 OK
{
"overall_stats": {
"total_requests": 1543,
"successful_requests": 1538,
"success_rate_percentage": 99.7,
"average_processing_time": 1.82
},
"provider_performance": {
"deepgram_direct": {
"requests": 1520,
"average_time": 1.78,
"success_rate": 98.6
}
}
}GET /tts-cache/stats
Response: 200 OK
{
"cache_enabled": true,
"cache_size": 847,
"hit_rate_percentage": 89.3,
"statistics": {
"total_requests": 2341,
"cache_hits": 2091,
"cache_misses": 250
}
}// 400 Bad Request - Invalid audio format
{
"error": "Invalid audio file format",
"details": "Audio must be WAV, MP3, or WebM"
}
// 422 Unprocessable Entity - Missing parameters
{
"error": "Missing required parameter: audio_file"
}
// 500 Internal Server Error - Processing failure
{
"error": "Voice call processing failed",
"details": "AI response generation failed",
"processing_time": 5.84
}| Plan | Requests/Minute | Requests/Day |
|---|---|---|
| Free | 10 | 100 |
| Pro | 60 | 10,000 |
| Enterprise | Unlimited | Unlimited |
frontend/
βββ public/
β βββ index.html
β βββ manifest.json
β βββ photos/ # Personality avatars
βββ src/
β βββ components/
β β βββ Navbar.jsx
β β βββ Footer.jsx
β β βββ PersonalityCard.jsx # Personality selection UI
β β βββ VoiceInterface.jsx # Voice interaction component
β βββ config/
β β βββ api.js # API configuration
β βββ data/
β β βββ personalities.js # Personality definitions
β βββ App.js # Main app component
β βββ App.css
β βββ index.js
β βββ index.css
βββ package.json
βββ tailwind.config.js
βββ README.md
// Ultra-optimized voice call component
const VoiceInterface = ({ personality, onClose }) => {
const [isListening, setIsListening] = useState(false);
const [audioLevel, setAudioLevel] = useState(0);
// Auto-start recording on mount
useEffect(() => {
startRecording();
return () => stopRecording();
}, []);
// Handle voice detection and upload
const handleVoiceDetection = async (audioBlob) => {
const formData = new FormData();
formData.append('audio_file', audioBlob);
formData.append('bot_id', personality.id);
const response = await fetch('/voice-call-ultra-fast', {
method: 'POST',
body: formData
});
const data = await response.json();
playAudio(data.audio_base64);
};
return (
<div className="voice-interface">
{/* Waveform visualization */}
{/* Personality avatar */}
{/* Status indicators */}
</div>
);
};// Interactive personality selection cards
const PersonalityCard = ({ personality, onClick, index }) => {
return (
<motion.div
initial={{ opacity: 0, y: 30 }}
animate={{ opacity: 1, y: 0 }}
transition={{ delay: index * 0.1 }}
whileHover={{ scale: 1.05 }}
onClick={() => onClick(personality)}
className="glass rounded-2xl p-6 cursor-pointer"
>
<img src={personality.avatar} alt={personality.name} />
<h3>{personality.name}</h3>
<p>{personality.description}</p>
<span className="badge">{personality.category}</span>
</motion.div>
);
};// Custom Tailwind configuration
module.exports = {
theme: {
extend: {
colors: {
primary: '#6366f1',
secondary: '#8b5cf6',
},
animation: {
'float': 'float 6s ease-in-out infinite',
'pulse-slow': 'pulse 3s cubic-bezier(0.4, 0, 0.6, 1) infinite',
}
}
},
plugins: []
}# Development build
npm start
# Production build
npm run build
# Test build locally
npx serve -s build
# Deploy to Vercel
vercel --prod
# Deploy to Netlify
netlify deploy --prod# 1. Create Procfile
echo "web: uvicorn main:app --host 0.0.0.0 --port \$PORT" > Procfile
# 2. Create runtime.txt
echo "python-3.10.12" > runtime.txt
# 3. Deploy to Railway
railway up
# 4. Set environment variables in Railway dashboard# 1. Install Vercel CLI
npm i -g vercel
# 2. Login to Vercel
vercel login
# 3. Deploy
cd frontend
vercel --prod
# 4. Set environment variables in Vercel dashboard
# REACT_APP_API_URL=https://your-backend-url.com# docker-compose.prod.yml
version: '3.8'
services:
backend:
build: .
ports:
- "8000:8000"
environment:
- REDIS_HOST=redis
- ENVIRONMENT=production
depends_on:
- redis
restart: always
frontend:
build: ./frontend
ports:
- "80:80"
depends_on:
- backend
restart: always
redis:
image: redis:alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
restart: always
volumes:
redis_data:# Run all tests
pytest tests/ -v --cov=main --cov-report=html
# Test STT performance
curl -X POST "http://localhost:8000/test-stt-performance" \
-F "audio_file=@test_audio.wav" \
-F "iterations=10"
# Test cache performance
curl "http://localhost:8000/test-cache-now"
# Load testing with locust
locust -f tests/load_test.py --host=http://localhost:8000# Run Jest tests
npm test
# Run with coverage
npm test -- --coverage
# E2E testing with Cypress
npm run cypress:open# Monitor endpoint performance
curl http://localhost:8000/stt-performance/stats | jq
# Monitor cache efficiency
curl http://localhost:8000/tts-cache/stats | jq
# Check system health
curl http://localhost:8000/redis/health | jqWe welcome contributions! Please see our Contributing Guidelines.
# 1. Fork the repository
# 2. Create a feature branch
git checkout -b feature/amazing-feature
# 3. Make your changes
# 4. Run tests
pytest tests/
npm test
# 5. Commit with conventional commits
git commit -m "feat: add amazing feature"
# 6. Push to your fork
git push origin feature/amazing-feature
# 7. Open a Pull Request# Format code
black main.py
isort main.py
# Lint code
flake8 main.py
pylint main.py
# Type checking
mypy main.py --strictThis project is licensed under the MIT License - see the LICENSE file for details.
- Deepgram - Revolutionary speech recognition technology
- Cartesia - Premium neural text-to-speech synthesis
- Google AI - Gemini Flash language model
- Redis - High-performance caching infrastructure
- FastAPI - Modern Python web framework
- React - Component-based UI library
Likhith Vasireddy
Full-Stack AI Engineer specializing in voice AI, real-time systems, and scalable backend architecture. Passionate about building production-grade AI applications that deliver exceptional user experiences.
- ποΈ Voice AI & Speech Processing
- π High-Performance Backend Systems
- βοΈ Modern Frontend Development
- π System Architecture & Design
- π§ DevOps & Cloud Infrastructure
β Star this repo if you find it useful!
- Email: kingjames.08623@gmail.com
Built with β€οΈ by Likhith Vasireddy | Β© 2025 LiV.AI. All rights reserved.