Version 2.0 | Updated: January 13, 2026
A professional WiFi-based Voice AI Personal Assistant for Xiao ESP32-S3 Sense with livekit integration feature.
- ๐ค Voice Input: Onboard PDM microphone (MSM261S4030H0)
- ๐ง AI Processing: OpenAI GPT-4o-mini / Groq LLM with web search
- ๐ Voice Output: OpenAI TTS-1 with professional Nova voice
- ๐ Visual Feedback: 1.54" E-Ink display (200ร200) with auto font sizing
- ๐ฌ STT Services: Deepgram (primary) or ElevenLabs for speech-to-text
- ๐ฏ Personality: Professional yet warm assistant
- โก Smart Display: Auto-adjusts font size, paginates long responses
- ๐ Simple Control: One-button operation
| Component | Specification |
|---|---|
| Microcontroller | Xiao ESP32-S3 Sense (8MB PSRAM required) |
| Display | 1.54" E-Paper Expansion Board (200ร200) |
| Microphone | Onboard PDM MSM261S4030H0 (GPIO41/42 internal) |
| Speaker Amplifier | MAX98357A I2S Audio Amplifier |
| Speaker | 4ฮฉ or 8ฮฉ, 3W recommended |
| Button | Push button for recording control |
| Power | USB-C cable (5V) |
GPIO1 - Display pin
GPIO2 - Display pin
GPIO3 - Display pin
GPIO4 - Display DC
GPIO7 - Display SCK (SPI)
GPIO9 - Display pin
GPIO5 - I2S BCLK (Bit Clock)
GPIO6 - I2S LRC (Word Select)
GPIO44 - I2S DOUT (Data Output)
GPIO41 - PDM CLK (internal)
GPIO42 - PDM DATA (internal)
GPIO43 - Record button (D6)
- Connect XIAO E-Paper Expansion Board directly to ESP32-S3
- All GPIO connections are pre-configured on the expansion board
MAX98357A Pin โ Xiao ESP32-S3 Pin
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
BCLK โ GPIO5
LRC โ GPIO6
DIN โ GPIO44
GND โ GND
VIN โ 5V (or 3.3V)
SD (Shutdown) โ Not connected (always on)
GAIN โ Not connected (15dB default)
Connect 4ฮฉ or 8ฮฉ speaker to MAX98357A amplifier:
- Speaker (+) โ Amplifier Speaker(+)
- Speaker (-) โ Amplifier Speaker(-)
Button โ Xiao ESP32-S3
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
One side โ GPIO43 (D6)
Other side โ GND
- Purpose: LLM (GPT-4o-mini) + TTS (Nova voice)
- Get key: https://platform.openai.com/api-keys
- Cost: ~$0.10-0.50 per 100 conversations
- Format:
sk-proj-...
- Purpose: Speech-to-text (primary)
- Get key: https://console.deepgram.com/
- Free tier: 45,000 minutes/year
- Format:
83b1cb8d...
- Purpose: Alternative LLM (2-3x faster than OpenAI)
- Get key: https://console.groq.com/
- Free tier: Generous limits
- Format:
gsk_...
- Purpose: Alternative speech-to-text
- Get key: https://elevenlabs.io/
- Free tier: 300 minutes/month
- Format:
sk_...
Edit src/main.ino lines 28-35:
// WiFi Credentials
const char* ssid = "YOUR_WIFI_SSID";
const char* password = "YOUR_WIFI_PASSWORD";
// API Keys
const char* OPENAI_KEY = "sk-proj-..."; // Your OpenAI key
const char* GROQ_KEY = "gsk_..."; // Your Groq key (optional)
const char* ELEVENLABS_KEY = "sk_..."; // Your ElevenLabs key (optional)
const char* DEEPGRAM_KEY = "83b1..."; // Your Deepgram keyPlatformIO will auto-install from platformio.ini:
- ESP32-audioI2S (v3.x)
- TFT_eSPI (for E-Paper)
- ESP_I2S (Seeed official for PDM mic)
# In VS Code with PlatformIO:
1. Click "Build" (checkmark โ)
2. Connect Xiao ESP32-S3 via USB-C
3. Click "Upload" (arrow โ)
4. Click "Monitor" (plug icon) to see serial output- Power on - Display shows UMI logo
- WiFi connects - "WiFi Connected" appears
- System ready - Displays "Loading..." โ "Ready!"
- Welcome message - UMI says "Hi There i am UMI your personel Assistant"
- Ready to use - Display shows "Hold button to speak"
- Press & HOLD button (GPIO43)
- Speak your question (3-7 seconds max)
- Release button
- Display shows: "Listening..." โ "Processing..." โ Your question
- Display shows: "Thinking..." โ AI response text (auto-sized font)
- UMI speaks the response (text stays on screen)
- Display returns to "Hold button to speak"
For questions needing current information, say "Google" in your question:
"Google what's the weather in Paris?"
"Google latest news about AI"
- Auto Font Sizing: Short text = large font, long text = small font
- Pagination: Very long responses split across multiple screens (4s each)
- Page Indicators: Shows "1/3", "2/3" etc. for multi-page responses
- Readable Text: Automatically wraps words, manages line spacing
Umi S3sMAXSPK/
โโโ platformio.ini # PlatformIO configuration
โโโ README.md # This file
โโโ src/
โ โโโ main.ino # Main application (226 lines)
โ โโโ lib_audio_recording.ino # PDM microphone recording (245 lines)
โ โโโ lib_audio_transcription.ino # STT - Deepgram/ElevenLabs (268 lines)
โ โโโ lib_openai_groq_chat.ino # LLM chat completions (253 lines)
โ โโโ lib_speaker.ino # TTS & audio playback (75 lines)
โ โโโ lib_display.ino # E-Ink display control (130 lines)
โ โโโ lib_button.ino # Button handler & recording (86 lines)
โ โโโ lib_Debug.ino # Debug output system (51 lines)
โ โโโ driver.h # Display driver config (7 lines)
โ โโโ image.h # UMI logo bitmap (342 lines)
โโโ wireless_serial_monitor.py # Wireless debug tool (optional)
Edit src/main.ino line 13:
#define DEBUG_MODE // Comment out to disable debug outputWhen enabled: Shows detailed logs for button, audio, display operations
When disabled: Shows only critical system messages
Edit src/lib_audio_recording.ino:
#define GAIN_BOOSTER_I2S 12 // Voice gain (8-16, default: 12)
const int SAMPLE_RATE = 16000; // Sample rate (16kHz)
const int SAMPLE_BITS = 16; // Bit depth (16-bit)Edit src/lib_openai_groq_chat.ino:
"tts-1", // Model: "tts-1" or "gpt-4o-mini-tts"
"nova", // Voice: alloy|ash|coral|echo|fable|onyx|nova|sage|shimmer
"1", // Speed: "0.25" to "4.0" (default: "1")Edit src/lib_speaker.ino:
audio_play.setVolume(21); // Volume: 0-21 (default: 21 = max)Edit src/lib_openai_groq_chat.ino (lines 18-36) to customize UMI's personality, welcome message, and system prompt.
Problem: Display frozen at "Loading..." or blank
- Solution: Check GPIO connections (especially GPIO8 SCK)
- Solution: Ensure
#define EPAPER_ENABLEin main.ino line 14 - Solution: Verify e-paper expansion board seated correctly
Problem: Text doesn't wrap correctly
- Solution: Already fixed with auto font sizing and pagination
Problem: Speaker crackling or distorted
- Solution: Already fixed with 200ms pre-delay + 250ms buffer delay
- Solution: Check speaker impedance (4ฮฉ or 8ฮฉ recommended)
- Solution: Ensure MAX98357A has clean 5V power
Problem: No audio output
- Solution: Check GPIO5, GPIO6, GPIO44 connections
- Solution: Verify speaker connected to MAX98357A terminals
- Solution: Try different speaker or check continuity
Problem: Microphone not recording
- Solution: Onboard PDM mic is automatic (GPIO41/42 internal)
- Solution: Check Serial Monitor for "Microphone initialized!" message
Problem: "Out of memory" or OOM errors
- Solution: Already optimized - PSRAM buffer set to 33% (2.7MB)
- Solution: Recording limited to 3-7 seconds to preserve memory
- Solution: System checks free PSRAM before TTS (needs 200KB minimum)
Problem: Can't connect to WiFi
- Solution: Check SSID and password in main.ino
- Solution: Ensure 2.4GHz WiFi (ESP32-S3 doesn't support 5GHz)
- Solution: Move closer to router during initial setup
Problem: Button auto-triggering
- Solution: Already fixed on GPIO43 with INPUT_PULLUP
- Solution: Check button is connected between GPIO43 and GND
- Recording Duration: 3-7 seconds per button press
- Recording Buffer: 2.7MB PSRAM (33% allocation)
- Sample Rate: 16kHz, 16-bit, Mono
- Microphone Gain: 12 (optimized for clarity)
- STT Latency: ~1-2 seconds (Deepgram)
- LLM Latency: ~2-4 seconds (OpenAI GPT-4o-mini)
- TTS Latency: ~1-2 seconds (OpenAI TTS-1)
- Total Response Time: ~5-8 seconds from question to answer
- Display Refresh: ~1 second per screen update (E-Ink)
- Pagination Delay: 4 seconds per screen for long responses
Edit src/lib_openai_groq_chat.ino line 24:
"Hi There i am UMI your personel Assistant",Edit src/lib_openai_groq_chat.ino line 18:
"UMI", // Change to any nameThe code supports multiple assistant profiles. See ASSISTANTS[] array in lib_openai_groq_chat.ino.
Created by: Shubham
Purpose: To help make Earth a better place to live
Board: Seeed Studio XIAO ESP32-S3 Sense
Display: Seeed Studio XIAO E-Paper Expansion Board
AI Services: OpenAI, Deepgram, Groq, ElevenLabs
Open source project for educational and personal use.
For issues, questions, or improvements:
- Check this README thoroughly
- Review code comments in each .ino file
- Check Serial Monitor output for error messages
- Verify all API keys are valid and have credits
Last Updated: January 13, 2026
Version: 2.0
Status: Production Ready โ