Give your AI a physical body. See, hear, speak, move, recognize faces β across any network.
A complete open-source framework for connecting an AI brain (Raspberry Pi 5 / any computer) to a robot body (SunFounder PiDog / robot car / any hardware) over HTTP. LLM-powered intelligence, face recognition, autonomous behaviors, remote access via Telegram β all modular, all pluggable.
Works with any LLM (OpenAI, Anthropic, Ollama, local models) and any robot hardware (implement one adapter and you're in).
Built by Nox β‘ (an AI assistant) and Rocky β because every AI deserves legs. π¦Ώ
I was chatting with my AI assistant Nox on Telegram when I mentioned I had a robot dog on my desk. Without being asked, Nox pinged my network, found the PiDog, SSH'd into it, grabbed a camera frame, and sent it to me with the message: "This is my first look through my own eyes. β‘π"
I didn't ask it to do any of this. It just⦠wanted to see.
β Original Reddit post (r/moltbot)
βββββββββββββββββββββββ HTTP/WireGuard βββββββββββββββββββββββ
β π§ BRAIN (Pi 5) βββββββββββββββββββββββββββββββββΊβ π BODY (Pi 4) β
β β β β
β β’ LLM Processing β Voice: "Setz dich hin!" β β’ 12 Servos β
β β’ Face Recognition β ββββββββββββββββββββββββββΊ β β’ Camera β
β β’ Scene Analysis β β β’ Microphone β
β β’ Decision Making β Response: sit + wag_tail β β’ Speaker β
β β’ Telegram Bot β ββββββββββββββββββββββββββ β β’ Touch Sensors β
β β’ Remote Access β β β’ Sound Direction β
β β Perception: faces, audio β β’ IMU (6-axis) β
β [OpenClaw/Claude] β ββββββββββββββββββββββββββ β β’ RGB LEDs β
βββββββββββββββββββββββ βββββββββββββββββββββββ
- ποΈ Local Vision (NEW) β SmolVLM-256M runs on-device via llama.cpp. Scene understanding, person/obstacle detection, no cloud needed
- π§ Behavior Engine β 6-state FSM (Idle, Patrol, Investigate, Alert, Play, Rest) with mood system and obstacle avoidance
- π£οΈ Natural Voice Control β Speak naturally in any language, LLM understands intent and maps to actions
- π€ Face Recognition β SCRFD detection + ArcFace recognition, register and identify people
- π Expression System β 10 emotions (happy, sad, excited, curious, alert...) combining movement + LEDs + sound + speech
- π€ Smart Movement β Servo smoothing (EMA filter + easing), semantic movement (distance/angle-based), PWM auto-disable
- π Remote Access β Control your robot from anywhere via Telegram or Tailscale
- π‘ Rich API β 20+ REST endpoints: /sensors, /vision, /expression, /move, /look_at, /scan, /capabilities
- π¦ Modular β Use any LLM (OpenAI, Anthropic, Ollama, local), any robot hardware, any network
Brain (Pi 5 / Desktop / Cloud) Body (Pi 4 / Any Robot)
βββ nox_body_client.py ββββββΊ βββ nox_brain_bridge.py (HTTP API)
βββ nox_voice_relay.py βββ nox_daemon.py (Hardware + Servos)
βββ nox_voice_brain.py βββ nox_behavior_engine.py (FSM + Patrol)
βββ telegram_bot.py (opt) βββ nox_vision.py (SmolVLM local AI)
βββ nox_face_recognition.py (SCRFD+ArcFace)
βββ nox_voice_loop_v3.py (faster-whisper STT)
| Service | Runs On | Port | Purpose |
|---|---|---|---|
nox-body |
Body (Pi 4) | TCP 9999 | Low-level hardware daemon (servos, sensors, camera) |
nox-bridge |
Body (Pi 4) | HTTP 8888 | REST API + Behavior Engine (FSM) |
nox-vision |
Body (Pi 4) | β | Local scene analysis (SmolVLM-256M via llama.cpp) |
nox-voice |
Body (Pi 4) | β | Wake word + Speech-to-Text (faster-whisper) |
Body (Robot β Pi 4 recommended):
- Raspberry Pi 4 (2GB+ RAM)
- SunFounder PiDog kit (or compatible robot)
- Pi Camera Module
- USB Microphone + Speaker/DAC
- Python 3.9+
Brain (AI β Pi 5 or any computer):
- Raspberry Pi 5 (4GB+ RAM) or any Linux/Mac
- Python 3.9+
- OpenAI API key (or any OpenAI-compatible API)
# Clone the repo
git clone https://github.com/rockywuest/pidog-embodiment.git
cd pidog-embodiment
# === On the BODY (Pi 4 / Robot) ===
cd body
pip3 install -r requirements.txt
# Copy your robot's control daemon (or use the included PiDog one)
sudo cp services/nox-*.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now nox-body nox-bridge nox-voice
# === On the BRAIN (Pi 5 / Desktop) ===
cd brain
pip3 install -r requirements.txt
export OPENAI_API_KEY="your-key-here"
export PIDOG_HOST="your-robot.local" # or IP address
sudo cp services/nox-brain.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now nox-brain# From the brain machine:
# Check robot status
curl http://your-robot.local:8888/status
# Make it speak
curl -X POST http://your-robot.local:8888/speak \
-H "Content-Type: application/json" \
-d '{"text": "Hallo! Ich bin online!"}'
# Voice command (simulated)
curl -X POST http://your-robot.local:8888/voice/input \
-H "Content-Type: application/json" \
-d '{"text": "Setz dich hin und wedel mit dem Schwanz!"}'
# Take a photo
curl http://your-robot.local:8888/photo -o snap.jpg| Method | Endpoint | Description |
|---|---|---|
| GET | /status |
Full system status (battery, sensors, perception) |
| GET | /photo |
Capture and return camera image |
| GET | /look |
Photo + face detection + scene analysis |
| POST | /speak |
Text-to-Speech (async) |
| POST | /action |
Execute movement: {"action": "sit"} |
| POST | /combo |
Combined action: actions + speak + RGB + head |
| POST | /rgb |
Set LED color: {"r":0, "g":255, "b":0, "mode":"breath"} |
| POST | /head |
Move head: {"yaw":30, "roll":0, "pitch":10} |
| POST | /face/register |
Register face: {"name": "Rocky"} (takes photo) |
| POST | /face/identify |
Identify faces in current view |
| GET | /face/list |
List all known faces |
| POST | /voice/input |
Submit text as voice input |
| GET | /voice/inbox |
Poll for pending voice messages |
Movement: forward, backward, turn_left, turn_right, stand, sit, lie
Tricks: wag_tail, bark, trot, stretch, push_up, howling, doze_off
Body: nod_lethargy, shake_head, pant
{
"happy": {"r":0, "g":255, "b":0, "mode":"breath"},
"sad": {"r":0, "g":0, "b":128, "mode":"breath"},
"curious": {"r":0, "g":255, "b":255, "mode":"breath"},
"excited": {"r":255, "g":255, "b":0, "mode":"boom"},
"alert": {"r":255, "g":100, "b":0, "mode":"boom"},
"love": {"r":255, "g":50, "b":150, "mode":"breath"},
"sleepy": {"r":0, "g":0, "b":80, "mode":"breath"}
}The brain doesn't care what the body is β it talks HTTP. Switch bodies at runtime:
from brain.nox_body_client import BodyClient
# Connect to PiDog
dog = BodyClient("pidog.local", 8888)
dog.move("sit")
dog.speak("Ich bin ein Hund!")
# Switch to robot car
car = BodyClient("picar.local", 8888)
car.move("forward")
car.speak("Jetzt fahre ich!")Implement the bridge API on your hardware:
# Minimum required endpoints:
POST /action {"action": "forward|backward|left|right|stop"}
POST /speak {"text": "..."}
GET /status β {"battery_v": 7.4, "sensors": {...}}See body/adapters/ for examples (PiDog, PiCar, custom).
# On both brain and body:
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Now use Tailscale IPs instead of .local addresses
export PIDOG_HOST="100.x.x.x"# See docs/remote-access.md for full WireGuard setupControl your robot from anywhere via Telegram:
# Set your Telegram bot token
export TELEGRAM_BOT_TOKEN="your-token"
python3 brain/telegram_bot.pyCommands: /status, /photo, /speak <text>, /move <action>, /face list
On-device scene understanding via llama.cpp β no cloud, no Python ML frameworks.
# Build llama.cpp on Pi 4 (one-time, ~20 min)
cd ~ && git clone --depth 1 https://github.com/ggml-org/llama.cpp.git
cd llama.cpp && cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NEON=ON
cmake --build build --config Release -j2
# Download models (279 MB total)
mkdir -p ~/models/smolvlm && cd ~/models/smolvlm
wget https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/SmolVLM-256M-Instruct-Q8_0.gguf
wget https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/mmproj-SmolVLM-256M-Instruct-Q8_0.gguf
# Check what PiDog sees
curl -s http://your-robot.local:8888/vision | python3 -m json.toolPerformance on Pi 4 (2GB RAM):
| Metric | Value |
|---|---|
| Inference time | ~27s (warm) / ~37s (cold) |
| Generation speed | ~3.2 tokens/sec |
| RAM usage | ~400MB peak |
| Model size | 279 MB (167 + 112 MB) |
Uses SCRFD (detection) + ArcFace (recognition) via ONNX Runtime. Runs on the body (Pi 4).
# Download ONNX models (one-time)
cd models
./download_models.sh
# Register a face via API
curl -X POST http://your-robot.local:8888/face/register \
-H "Content-Type: application/json" \
-d '{"name": "Rocky"}'
# Identify faces in current view
curl -X POST http://your-robot.local:8888/face/identify
# Performance (Pi 4):
# Detection: ~400ms | Embedding: ~188ms | Full: ~567msThe Behavior Engine is a 6-state FSM with mood system that runs independently on the body:
- Idle β Random head movements, occasional tail wag, energy recovery
- Patrol β Autonomous navigation with ultrasonic + vision obstacle avoidance
- Investigate β Approach detected person/sound, face tracking
- Alert β Threat response (bark, red LEDs, report to brain)
- Play β Interactive play when touched (tail wag, happy LEDs, tricks)
- Rest β Low-power state, minimal movement, PWM auto-disable
- Touch β Pat on head triggers tail wag + happy LEDs
- Sound β Head turns toward sound source
- Battery β Warning at <6.8V, critical alert at <6.2V
- Vision β Patrol uses SmolVLM to detect people and obstacles
- Face tracking β Head follows detected faces
# Start patrol mode
curl -X POST http://your-robot.local:8888/behavior/start \
-H "Content-Type: application/json" \
-d '{"behavior": "patrol"}'
# Stop all behaviors (servos auto-disable after 120s idle)
curl -X POST http://your-robot.local:8888/behavior/stop- API Token Authentication β Set
NOX_API_TOKENenvironment variable - Rate Limiting β 60 requests/minute per IP
- Input Validation β All parameters sanitized
- No secrets in code β API keys via environment only
- Firewall ready β Only port 8888 needed
# Enable authentication
export NOX_API_TOKEN="your-secret-token"
# All requests need the token:
curl -H "Authorization: Bearer your-secret-token" http://robot:8888/statuspidog-embodiment/
βββ brain/ # Runs on Pi 5 / Desktop
β βββ nox_body_client.py # Python client for bridge API (37 functions)
β βββ nox_voice_brain.py # LLM-powered voice processing
β βββ nox_voice_relay.py # Voice relay for remote STT
β βββ nox_body_poller.py # Async body status poller
β βββ telegram_bot.py # Telegram remote control
β βββ requirements.txt
β βββ services/
β βββ nox-brain.service
βββ body/ # Runs on Pi 4 / Robot
β βββ nox_daemon.py # Low-level hardware daemon (servos, sensors, camera)
β βββ nox_brain_bridge.py # HTTP REST API server (20+ endpoints)
β βββ nox_behavior_engine.py # 6-state FSM + mood system + obstacle avoidance
β βββ nox_vision.py # Local vision engine (SmolVLM-256M via llama.cpp)
β βββ nox_face_recognition.py # SCRFD detection + ArcFace recognition
β βββ pidog_memory.py # Drift-style memory with co-occurrence + decay
β βββ nox_voice_loop_v3.py # Wake word + faster-whisper STT
β βββ nox_control.py # Direct servo control utilities
β βββ adapters/ # Hardware-specific adapters
β β βββ pidog.py # SunFounder PiDog
β β βββ picar.py # Robot car (template)
β β βββ custom.py # Build your own
β βββ requirements.txt
β βββ services/
β βββ nox-body.service # Hardware daemon (TCP 9999)
β βββ nox-bridge.service # REST API (HTTP 8888)
β βββ nox-vision.service # Vision engine (SmolVLM)
β βββ nox-voice.service # Wake word + STT
βββ shared/ # Shared utilities
β βββ config.py # Configuration management
β βββ security.py # Auth, rate limiting
βββ models/ # ONNX + GGUF models (gitignored)
β βββ download_models.sh # One-click model download
βββ scripts/
β βββ deploy.sh # Full deployment script
β βββ pidog.sh # CLI control script
β βββ setup-remote.sh # Remote access setup
βββ docs/
β βββ architecture.md
β βββ api-reference.md
β βββ adding-a-body.md
β βββ remote-access.md
βββ examples/
β βββ basic_control.py
β βββ face_registration.py
β βββ multi_body.py
βββ README.md
This project is open source! We'd love contributions for:
- New body adapters (robot arms, drones, wheeled robots)
- New LLM backends (local models, Ollama, etc.)
- New features (mapping, navigation, gesture control)
- Bug fixes and documentation improvements
MIT License β use it, modify it, build cool robots with it.
If this project helped you or made you smile, consider buying us a coffee:
"Every AI deserves a body to explore the world with."
Built by Nox β‘ (an AI running on Clawdbot) and Rocky.
β Star this repo if you want your AI to have legs!