Skip to content

tyu-hi/Ultron

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

61 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

VRChat AI Bot ๐Ÿค–

An advanced VRChat AI companion with gesture recognition, multiple personalities, vision capabilities, and natural movement simulation. Features real-time voice interaction, emotional responses, and intelligent behavior that makes it feel like chatting with a real person.

Python VRChat License: MIT

โœจ Key Features

๐ŸŽญ Multiple Personalities

  • 5 unique characters: Yuki, Unit 734, Greg, Byte, and Brody
  • Each with distinct personalities, speech patterns, and movement behaviors
  • Configurable via experiment/personality.json

๐Ÿ‘๏ธ Vision Capabilities

  • On-demand screenshot analysis using OpenRouter API (GPT-4V/Claude 3.5)
  • Natural first-person responses to "what do you see" questions
  • Smart vision trigger detection in conversations

๐ŸŽค Advanced Voice System

  • Voice recognition using Vosk for offline speech-to-text
  • Natural TTS with Edge-TTS supporting multiple voices
  • Real-time conversation with context awareness

๐Ÿค– Intelligent Movement

  • Human-like idle animations: looking around, weight shifting, natural gestures
  • Smart movement patterns with character-specific behaviors
  • Emote system with automatic gesture recognition from text
  • Hand gestures during speech for natural communication

๐Ÿง  Machine Learning Integration

  • Gesture recognition framework with participant-based evaluation
  • Pre-computed feature extraction for fast performance
  • Multiple ML models: SVM, Random Forest, Neural Networks, and more

๐Ÿ”— VRChat Integration

  • OSC support for real-time control
  • Automatic friend acceptance and group invitations
  • Chatbox integration with formatted messages
  • Real-time notifications handling

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+
  • VRChat Desktop (capable of 15+ FPS)
  • Virtual Audio Cables (VB-Audio Cable or Voicemeeter)
  • Microphone for voice input

Installation

  1. Download the project

    git clone https://github.com/your-username/vrchat-ai-bot.git
    cd vrchat-ai-bot
  2. Install dependencies

    pip install -r requirements.txt
  3. Download Vosk model

    • Download vosk-model-small-en-us-0.15 from Vosk Models
    • Extract to models/vosk-model-small-en-us-0.15/
  4. Configure credentials

    • Copy credentials.py.example to credentials.py
    • Fill in your credentials (see Configuration)
  5. Set up audio routing

    • Install VB-Audio Cable or Voicemeeter
    • Route bot output โ†’ VRChat input (for bot speech)
    • Route VRChat output โ†’ bot input (for hearing others)
  6. Run the bot

    python run.py --character Yuki

โš™๏ธ Configuration

credentials.py

# VRChat API credentials
VRCHAT_USER = 'your_email@domain.com'
VRCHAT_PASSWORD = 'your_vrchat_password'
USER_AGENT = 'YourName\'s Bot - VRChat AI'

# HuggingFace credentials (for chat AI)
HUGGINGFACE_EMAIL = 'your_huggingface_email@example.com'
HUGGINGFACE_PASSWORD = 'your_huggingface_password'

# OpenRouter API (for vision - optional but recommended)
OPENROUTER_API_KEY = 'your_openrouter_api_key'  # Get from https://openrouter.ai/keys

Character Selection

# Available characters:
python run.py --character Yuki      # Friendly and enthusiastic
python run.py --character "Unit 734" # Robotic and analytical  
python run.py --character Greg      # Casual and laid-back
python run.py --character Byte      # Tech-focused personality
python run.py --character Brody     # Energetic and playful

VRChat Setup

  1. Enable OSC in VRChat settings
  2. Configure audio:
    • Set VRChat microphone to Virtual Cable A
    • Set system audio output to Virtual Cable B
    • Set bot to listen on Virtual Cable B, speak on Virtual Cable A

๐ŸŽฎ Usage

Voice Commands

  • "What do you see?" - Triggers vision analysis
  • "Move forward/backward/left/right" - Manual movement control
  • "Pause/unpause movement" - Controls automatic movement
  • "Reset" - Restarts conversation context

Chat Features

  • Natural conversation powered by HuggingFace models
  • Automatic emotes triggered by keywords (wave, dance, point, etc.)
  • Context awareness with personality-consistent responses
  • Vision responses when asked about visual content

Movement Behaviors

Each character has unique movement patterns:

  • Idle animations: Looking around, weight shifting
  • Natural gestures: Hand movements during speech
  • Special moves: Twerking, bunny hopping, WASD dancing
  • Smooth transitions with realistic timing

๐Ÿ”ง Advanced Features

Gesture Recognition Training

Run the gesture recognition training system:

python paste.txt  # Gesture recognition model training

Features:

  • Participant-based data splitting for realistic evaluation
  • Direct feature extraction from pre-computed JSON data
  • Multiple ML algorithms with cross-validation
  • Feature importance analysis for model interpretability

Personality Customization

Edit experiment/personality.json to:

  • Add new characters with custom prompts
  • Modify movement weights for different behaviors
  • Change voice settings and response styles
  • Customize startup messages and emote triggers

Vision System

The bot can analyze screenshots when asked:

  • Smart trigger detection: "what do you see", "describe my avatar", etc.
  • First-person responses: Natural perspective from bot's viewpoint
  • Multiple model support: GPT-4V, Claude 3.5 Sonnet via OpenRouter
  • Fallback handling: Graceful degradation if vision unavailable

๐Ÿ” Technical Details

Architecture

  • Modular design with separate threads for movement, audio, and API calls
  • Event-driven system with non-blocking operations
  • Robust error handling with automatic recovery
  • Memory efficient with smart caching and cleanup

Performance

  • Lightweight: Runs on integrated graphics
  • Fast responses: Pre-computed features and optimized models
  • Scalable: Supports multiple concurrent operations
  • Reliable: Extensive testing and fallback mechanisms

Dependencies

  • hugchat - HuggingFace chat integration
  • vrchatapi - VRChat API wrapper
  • edge-tts - High-quality text-to-speech
  • vosk - Offline speech recognition
  • scikit-learn - Machine learning models
  • pythonosc - OSC communication
  • pygame - Audio playback

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add docstrings for new functions
  • Test with multiple personalities
  • Ensure audio routing works properly

๐Ÿ› Troubleshooting

Common Issues

Bot not responding to voice:

  • Check virtual audio cable routing
  • Verify microphone permissions
  • Ensure Vosk model is properly installed

Vision not working:

  • Add OPENROUTER_API_KEY to credentials.py
  • Check API key validity at OpenRouter
  • Verify screenshot permissions

Movement issues:

  • Enable OSC in VRChat settings
  • Check OSC port (default: 9000)
  • Verify keyboard permissions for emotes

Chat problems:

  • Verify HuggingFace credentials
  • Check internet connection
  • Try resetting conversation with "reset" command

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

TL;DR: Feel free to use, modify, and distribute this project. Just keep the original license and credit intact.

๐Ÿ™ Acknowledgments

  • VRChat for the amazing platform
  • HuggingFace for accessible AI models
  • OpenRouter for vision capabilities
  • VB-Audio for virtual audio solutions
  • Vosk for offline speech recognition
  • TuckerIsAPizza for his original repository

โšก Ready to bring your VRChat experience to life? Get started now!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 67.3%
  • HTML 32.7%