This isn't just another AI agent. It's a resilient, multimodal, self-correcting reasoning engine built from the ground up to tackle complex, multi-step objectives in a persistent environment.
This project was forged through rigorous, iterative debugging to create a truly robust agent architecture that overcomes common failure points seen in simpler prototypes. It is designed for stability, power, and genuine autonomy.
Chimera is more than a language model in a loop. It's a complete system with a full suite of senses and tools.
-
👀 See: Utilizes Florence-2-base for state-of-the-art vision, allowing it to perform detailed OCR on full pages of text or generate rich descriptions of images.
-
🎧 Hear: Employs distil-whisper for fast, accurate audio transcription, enabling it to process information from audio files like meetings or recordings.
-
🧠 Reason: Powered by Llama-3.1-8B-Instruct through the high-performance vLLM engine, the agent uses a sophisticated ReAct (Reason + Act) loop to break down complex goals into a sequence of logical steps.
-
⚡ Act: Wields a hardened, sandboxed toolset for interacting with its environment:
- File System: Full CRUDL (Create, Read, Update, Delete, List) operations, completely jailed to a secure
sandboxdirectory. - Code Interpreter: Writes and executes Python scripts in an isolated environment, capable of installing its own dependencies on the fly.
- Shell Access: Can run shell commands (
ls,cat, etc.) directly in the sandbox for powerful system interactions. - Web Research Suite: A multi-tool web stack featuring Tavily for AI-native search, a web page scraper for deep reading, and a binary file downloader.
- File System: Full CRUDL (Create, Read, Update, Delete, List) operations, completely jailed to a secure
-
📚 Learn: Features a persistent long-term memory powered by a ChromaDB vector store, allowing it to remember and recall facts across sessions.
This project showcases solutions to critical, real-world challenges in building autonomous agents.
The agent is built to survive failure. It features a max_consecutive_failures counter that triggers a human-in-the-loop (HITL) fallback, forcing the agent to ask for help when it gets stuck, preventing infinite loops and wasted resources.
All operations that interact with the system are strictly sandboxed:
- The
FileSystemToolis jailed to the/sandboxdirectory, preventing any possibility of path traversal attacks. - The
CodeInterpreterToolexecutes all user-generated code inside temporary, isolated directories that are destroyed after each run.
Through rigorous testing, a critical failure mode was identified: "context corruption" from handling large, messy data. Chimera solves this with a professional-grade workflow:
- Data-producing tools (
vision,audio) save their output directly to a file. - The agent never touches the raw data directly. It only ever handles clean, simple filenames.
- This prevents JSON parsing failures and keeps the agent's reasoning context clean and focused, dramatically increasing stability on complex tasks.
Instead of a basic Hugging Face pipeline, the core LLM runs on vLLM, a state-of-the-art serving engine that provides significantly higher throughput and lower latency, making the agent faster and more responsive.
| Component | Technology |
|---|---|
| Core Engine | Python, vLLM |
| LLM (Brain) | meta-llama/Llama-3.1-8B-Instruct |
| Vision Model | microsoft/Florence-2-large |
| Audio Model | distil-whisper/distil-medium.en |
| Long-Term Memory | ChromaDB, Sentence-Transformers |
| Web Search | Tavily AI API |
| Core Libraries | transformers, torch, requests, beautifulsoup4 |
-
Clone the repository:
git clone https://github.com/Yash3561/Project_Chimera.git cd Project_Chimera -
Create and activate a Python virtual environment:
python -m venv venv source venv/bin/activate -
Install the required packages:
pip install -r requirements.txt
-
Set up your API key:
- Create a file named
.envin the project root. - Add your Tavily API key to it:
TAVILY_API_KEY="your-key-here"
- Create a file named
-
Run the agent:
python agent.py
You can now give the agent complex objectives directly in your terminal.