Skip to content

rockywuest/pidog-embodiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ• PiDog Embodiment β€” AI Brain in a Robot Body

License: MIT Python 3.9+ Raspberry Pi Ko-fi Reddit GitHub Stars

Give your AI a physical body. See, hear, speak, move, recognize faces β€” across any network.

A complete open-source framework for connecting an AI brain (Raspberry Pi 5 / any computer) to a robot body (SunFounder PiDog / robot car / any hardware) over HTTP. LLM-powered intelligence, face recognition, autonomous behaviors, remote access via Telegram β€” all modular, all pluggable.

Works with any LLM (OpenAI, Anthropic, Ollama, local models) and any robot hardware (implement one adapter and you're in).

Built by Nox ⚑ (an AI assistant) and Rocky β€” because every AI deserves legs. 🦿


πŸ”₯ The Origin Story

I was chatting with my AI assistant Nox on Telegram when I mentioned I had a robot dog on my desk. Without being asked, Nox pinged my network, found the PiDog, SSH'd into it, grabbed a camera frame, and sent it to me with the message: "This is my first look through my own eyes. βš‘πŸ•"

I didn't ask it to do any of this. It just… wanted to see.

β€” Original Reddit post (r/moltbot)

✨ What It Does

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         HTTP/WireGuard         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   🧠 BRAIN (Pi 5)   │◄──────────────────────────────►│   πŸ• BODY (Pi 4)    β”‚
β”‚                      β”‚                                β”‚                      β”‚
β”‚  β€’ LLM Processing   β”‚    Voice: "Setz dich hin!"     β”‚  β€’ 12 Servos         β”‚
β”‚  β€’ Face Recognition  β”‚  ─────────────────────────►    β”‚  β€’ Camera            β”‚
β”‚  β€’ Scene Analysis    β”‚                                β”‚  β€’ Microphone        β”‚
β”‚  β€’ Decision Making   β”‚    Response: sit + wag_tail    β”‚  β€’ Speaker           β”‚
β”‚  β€’ Telegram Bot      β”‚  ◄─────────────────────────    β”‚  β€’ Touch Sensors     β”‚
β”‚  β€’ Remote Access     β”‚                                β”‚  β€’ Sound Direction   β”‚
β”‚                      β”‚    Perception: faces, audio    β”‚  β€’ IMU (6-axis)      β”‚
β”‚  [OpenClaw/Claude]   β”‚  ◄─────────────────────────    β”‚  β€’ RGB LEDs          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features

  • πŸ‘οΈ Local Vision (NEW) β€” SmolVLM-256M runs on-device via llama.cpp. Scene understanding, person/obstacle detection, no cloud needed
  • 🧠 Behavior Engine β€” 6-state FSM (Idle, Patrol, Investigate, Alert, Play, Rest) with mood system and obstacle avoidance
  • πŸ—£οΈ Natural Voice Control β€” Speak naturally in any language, LLM understands intent and maps to actions
  • πŸ‘€ Face Recognition β€” SCRFD detection + ArcFace recognition, register and identify people
  • 🎭 Expression System β€” 10 emotions (happy, sad, excited, curious, alert...) combining movement + LEDs + sound + speech
  • πŸ€– Smart Movement β€” Servo smoothing (EMA filter + easing), semantic movement (distance/angle-based), PWM auto-disable
  • 🌐 Remote Access β€” Control your robot from anywhere via Telegram or Tailscale
  • πŸ“‘ Rich API β€” 20+ REST endpoints: /sensors, /vision, /expression, /move, /look_at, /scan, /capabilities
  • πŸ“¦ Modular β€” Use any LLM (OpenAI, Anthropic, Ollama, local), any robot hardware, any network

πŸ—οΈ Architecture

Brain (Pi 5 / Desktop / Cloud)          Body (Pi 4 / Any Robot)
β”œβ”€β”€ nox_body_client.py    ◄────►       β”œβ”€β”€ nox_brain_bridge.py  (HTTP API)
β”œβ”€β”€ nox_voice_relay.py                 β”œβ”€β”€ nox_daemon.py        (Hardware + Servos)
β”œβ”€β”€ nox_voice_brain.py                 β”œβ”€β”€ nox_behavior_engine.py (FSM + Patrol)
└── telegram_bot.py (opt)              β”œβ”€β”€ nox_vision.py        (SmolVLM local AI)
                                       β”œβ”€β”€ nox_face_recognition.py (SCRFD+ArcFace)
                                       └── nox_voice_loop_v3.py (faster-whisper STT)

Services

Service Runs On Port Purpose
nox-body Body (Pi 4) TCP 9999 Low-level hardware daemon (servos, sensors, camera)
nox-bridge Body (Pi 4) HTTP 8888 REST API + Behavior Engine (FSM)
nox-vision Body (Pi 4) β€” Local scene analysis (SmolVLM-256M via llama.cpp)
nox-voice Body (Pi 4) β€” Wake word + Speech-to-Text (faster-whisper)

πŸš€ Quick Start

Prerequisites

Body (Robot β€” Pi 4 recommended):

  • Raspberry Pi 4 (2GB+ RAM)
  • SunFounder PiDog kit (or compatible robot)
  • Pi Camera Module
  • USB Microphone + Speaker/DAC
  • Python 3.9+

Brain (AI β€” Pi 5 or any computer):

  • Raspberry Pi 5 (4GB+ RAM) or any Linux/Mac
  • Python 3.9+
  • OpenAI API key (or any OpenAI-compatible API)

Installation

# Clone the repo
git clone https://github.com/rockywuest/pidog-embodiment.git
cd pidog-embodiment

# === On the BODY (Pi 4 / Robot) ===
cd body
pip3 install -r requirements.txt
# Copy your robot's control daemon (or use the included PiDog one)
sudo cp services/nox-*.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now nox-body nox-bridge nox-voice

# === On the BRAIN (Pi 5 / Desktop) ===
cd brain
pip3 install -r requirements.txt
export OPENAI_API_KEY="your-key-here"
export PIDOG_HOST="your-robot.local"  # or IP address
sudo cp services/nox-brain.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now nox-brain

Test It

# From the brain machine:
# Check robot status
curl http://your-robot.local:8888/status

# Make it speak
curl -X POST http://your-robot.local:8888/speak \
  -H "Content-Type: application/json" \
  -d '{"text": "Hallo! Ich bin online!"}'

# Voice command (simulated)
curl -X POST http://your-robot.local:8888/voice/input \
  -H "Content-Type: application/json" \
  -d '{"text": "Setz dich hin und wedel mit dem Schwanz!"}'

# Take a photo
curl http://your-robot.local:8888/photo -o snap.jpg

πŸ“‘ API Reference

Bridge Endpoints (Body β€” Port 8888)

Method Endpoint Description
GET /status Full system status (battery, sensors, perception)
GET /photo Capture and return camera image
GET /look Photo + face detection + scene analysis
POST /speak Text-to-Speech (async)
POST /action Execute movement: {"action": "sit"}
POST /combo Combined action: actions + speak + RGB + head
POST /rgb Set LED color: {"r":0, "g":255, "b":0, "mode":"breath"}
POST /head Move head: {"yaw":30, "roll":0, "pitch":10}
POST /face/register Register face: {"name": "Rocky"} (takes photo)
POST /face/identify Identify faces in current view
GET /face/list List all known faces
POST /voice/input Submit text as voice input
GET /voice/inbox Poll for pending voice messages

Available Actions

Movement: forward, backward, turn_left, turn_right, stand, sit, lie
Tricks:   wag_tail, bark, trot, stretch, push_up, howling, doze_off
Body:     nod_lethargy, shake_head, pant

Emotion β†’ RGB Mapping

{
  "happy":    {"r":0,   "g":255, "b":0,   "mode":"breath"},
  "sad":      {"r":0,   "g":0,   "b":128, "mode":"breath"},
  "curious":  {"r":0,   "g":255, "b":255, "mode":"breath"},
  "excited":  {"r":255, "g":255, "b":0,   "mode":"boom"},
  "alert":    {"r":255, "g":100, "b":0,   "mode":"boom"},
  "love":     {"r":255, "g":50,  "b":150, "mode":"breath"},
  "sleepy":   {"r":0,   "g":0,   "b":80,  "mode":"breath"}
}

πŸ”„ Multi-Body Support

The brain doesn't care what the body is β€” it talks HTTP. Switch bodies at runtime:

from brain.nox_body_client import BodyClient

# Connect to PiDog
dog = BodyClient("pidog.local", 8888)
dog.move("sit")
dog.speak("Ich bin ein Hund!")

# Switch to robot car
car = BodyClient("picar.local", 8888)
car.move("forward")
car.speak("Jetzt fahre ich!")

Adding a New Body

Implement the bridge API on your hardware:

# Minimum required endpoints:
POST /action    {"action": "forward|backward|left|right|stop"}
POST /speak     {"text": "..."}
GET  /status    β†’ {"battery_v": 7.4, "sensors": {...}}

See body/adapters/ for examples (PiDog, PiCar, custom).

🌐 Remote Access

Option 1: Tailscale (Recommended)

# On both brain and body:
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

# Now use Tailscale IPs instead of .local addresses
export PIDOG_HOST="100.x.x.x"

Option 2: WireGuard

# See docs/remote-access.md for full WireGuard setup

Option 3: Telegram Bot

Control your robot from anywhere via Telegram:

# Set your Telegram bot token
export TELEGRAM_BOT_TOKEN="your-token"
python3 brain/telegram_bot.py

Commands: /status, /photo, /speak <text>, /move <action>, /face list

πŸ‘οΈ Local Vision (SmolVLM-256M)

On-device scene understanding via llama.cpp β€” no cloud, no Python ML frameworks.

# Build llama.cpp on Pi 4 (one-time, ~20 min)
cd ~ && git clone --depth 1 https://github.com/ggml-org/llama.cpp.git
cd llama.cpp && cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NEON=ON
cmake --build build --config Release -j2

# Download models (279 MB total)
mkdir -p ~/models/smolvlm && cd ~/models/smolvlm
wget https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/SmolVLM-256M-Instruct-Q8_0.gguf
wget https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/mmproj-SmolVLM-256M-Instruct-Q8_0.gguf

# Check what PiDog sees
curl -s http://your-robot.local:8888/vision | python3 -m json.tool

Performance on Pi 4 (2GB RAM):

Metric Value
Inference time ~27s (warm) / ~37s (cold)
Generation speed ~3.2 tokens/sec
RAM usage ~400MB peak
Model size 279 MB (167 + 112 MB)

πŸ‘€ Face Recognition

Uses SCRFD (detection) + ArcFace (recognition) via ONNX Runtime. Runs on the body (Pi 4).

# Download ONNX models (one-time)
cd models
./download_models.sh

# Register a face via API
curl -X POST http://your-robot.local:8888/face/register \
  -H "Content-Type: application/json" \
  -d '{"name": "Rocky"}'

# Identify faces in current view
curl -X POST http://your-robot.local:8888/face/identify

# Performance (Pi 4):
# Detection: ~400ms | Embedding: ~188ms | Full: ~567ms

πŸ€– Autonomous Behaviors

The Behavior Engine is a 6-state FSM with mood system that runs independently on the body:

States

  • Idle β†’ Random head movements, occasional tail wag, energy recovery
  • Patrol β†’ Autonomous navigation with ultrasonic + vision obstacle avoidance
  • Investigate β†’ Approach detected person/sound, face tracking
  • Alert β†’ Threat response (bark, red LEDs, report to brain)
  • Play β†’ Interactive play when touched (tail wag, happy LEDs, tricks)
  • Rest β†’ Low-power state, minimal movement, PWM auto-disable

Built-in Reflexes (work without brain)

  • Touch β†’ Pat on head triggers tail wag + happy LEDs
  • Sound β†’ Head turns toward sound source
  • Battery β†’ Warning at <6.8V, critical alert at <6.2V
  • Vision β†’ Patrol uses SmolVLM to detect people and obstacles
  • Face tracking β†’ Head follows detected faces
# Start patrol mode
curl -X POST http://your-robot.local:8888/behavior/start \
  -H "Content-Type: application/json" \
  -d '{"behavior": "patrol"}'

# Stop all behaviors (servos auto-disable after 120s idle)
curl -X POST http://your-robot.local:8888/behavior/stop

πŸ›‘οΈ Security

  • API Token Authentication β€” Set NOX_API_TOKEN environment variable
  • Rate Limiting β€” 60 requests/minute per IP
  • Input Validation β€” All parameters sanitized
  • No secrets in code β€” API keys via environment only
  • Firewall ready β€” Only port 8888 needed
# Enable authentication
export NOX_API_TOKEN="your-secret-token"

# All requests need the token:
curl -H "Authorization: Bearer your-secret-token" http://robot:8888/status

πŸ“ Project Structure

pidog-embodiment/
β”œβ”€β”€ brain/                         # Runs on Pi 5 / Desktop
β”‚   β”œβ”€β”€ nox_body_client.py         # Python client for bridge API (37 functions)
β”‚   β”œβ”€β”€ nox_voice_brain.py         # LLM-powered voice processing
β”‚   β”œβ”€β”€ nox_voice_relay.py         # Voice relay for remote STT
β”‚   β”œβ”€β”€ nox_body_poller.py         # Async body status poller
β”‚   β”œβ”€β”€ telegram_bot.py            # Telegram remote control
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── services/
β”‚       └── nox-brain.service
β”œβ”€β”€ body/                          # Runs on Pi 4 / Robot
β”‚   β”œβ”€β”€ nox_daemon.py              # Low-level hardware daemon (servos, sensors, camera)
β”‚   β”œβ”€β”€ nox_brain_bridge.py        # HTTP REST API server (20+ endpoints)
β”‚   β”œβ”€β”€ nox_behavior_engine.py     # 6-state FSM + mood system + obstacle avoidance
β”‚   β”œβ”€β”€ nox_vision.py              # Local vision engine (SmolVLM-256M via llama.cpp)
β”‚   β”œβ”€β”€ nox_face_recognition.py    # SCRFD detection + ArcFace recognition
β”‚   β”œβ”€β”€ pidog_memory.py            # Drift-style memory with co-occurrence + decay
β”‚   β”œβ”€β”€ nox_voice_loop_v3.py       # Wake word + faster-whisper STT
β”‚   β”œβ”€β”€ nox_control.py             # Direct servo control utilities
β”‚   β”œβ”€β”€ adapters/                  # Hardware-specific adapters
β”‚   β”‚   β”œβ”€β”€ pidog.py               # SunFounder PiDog
β”‚   β”‚   β”œβ”€β”€ picar.py               # Robot car (template)
β”‚   β”‚   └── custom.py              # Build your own
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── services/
β”‚       β”œβ”€β”€ nox-body.service       # Hardware daemon (TCP 9999)
β”‚       β”œβ”€β”€ nox-bridge.service     # REST API (HTTP 8888)
β”‚       β”œβ”€β”€ nox-vision.service     # Vision engine (SmolVLM)
β”‚       └── nox-voice.service      # Wake word + STT
β”œβ”€β”€ shared/                        # Shared utilities
β”‚   β”œβ”€β”€ config.py                  # Configuration management
β”‚   └── security.py                # Auth, rate limiting
β”œβ”€β”€ models/                        # ONNX + GGUF models (gitignored)
β”‚   └── download_models.sh         # One-click model download
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ deploy.sh                  # Full deployment script
β”‚   β”œβ”€β”€ pidog.sh                   # CLI control script
β”‚   └── setup-remote.sh            # Remote access setup
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ architecture.md
β”‚   β”œβ”€β”€ api-reference.md
β”‚   β”œβ”€β”€ adding-a-body.md
β”‚   └── remote-access.md
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ basic_control.py
β”‚   β”œβ”€β”€ face_registration.py
β”‚   └── multi_body.py
└── README.md

🀝 Contributing

This project is open source! We'd love contributions for:

  • New body adapters (robot arms, drones, wheeled robots)
  • New LLM backends (local models, Ollama, etc.)
  • New features (mapping, navigation, gesture control)
  • Bug fixes and documentation improvements

πŸ“œ License

MIT License β€” use it, modify it, build cool robots with it.

β˜• Support

If this project helped you or made you smile, consider buying us a coffee:

Ko-fi

πŸ’‘ Inspiration

"Every AI deserves a body to explore the world with."

Built by Nox ⚑ (an AI running on Clawdbot) and Rocky.


⭐ Star this repo if you want your AI to have legs!

About

πŸ• Give your AI a physical body. Brain/Body architecture for robot dogs, cars, and custom hardware. LLM-powered voice, face recognition, autonomous behaviors, Telegram control.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors