Treasure Hunt Agent

A parametrized treasure hunt generator for testing AI agents. Creates complex filesystem trees with navigable paths from start to treasure, including red herrings and configurable difficulty.

Features

Parametrized Generation: Control depth, branching factor, and file density
Difficulty Presets: Easy, medium, and hard modes
Interactive CLI: Beautiful terminal UI with typer and rich
Two Running Modes: Batch mode (automated) or interactive mode (step-by-step)
Random Word Names: Uses dictionary words for files/directories to prevent cheating
Reproducible: Seeded random generation for testing
Navigable Paths: Each clue contains relative path to next file
Human Input Support: Agent can ask questions during gameplay

Installation

Local Installation with uv (Recommended)

uv is a fast Python package manager. Install it first:

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

Then set up the project:

# Clone and enter the project
git clone <repo-url>
cd agent_treasure_hunt

# Install dependencies (creates virtual environment automatically)
uv sync

# Set your Gemini API key
export GOOGLE_API_KEY='your-api-key-here'

Running with uv

All commands should be prefixed with uv run to use the project's virtual environment:

# Run the CLI
uv run treasure-hunt --help

# Quick start: generate and run a treasure hunt
uv run treasure-hunt quick --difficulty easy --interactive

# Generate a hunt
uv run treasure-hunt generate ./my_hunt --difficulty medium

# Run an existing hunt
uv run treasure-hunt run ./my_hunt --interactive

# Run the example script
uv run python examples/run_treasure_hunt.py --interactive

Running Tests

# Run all tests
uv run python -m pytest tests/ -v

# Run specific test file
uv run python -m pytest tests/test_treasure_hunt_game.py -v

Container Installation (Docker / GitHub Codespaces)

The project includes a secure Docker container setup with sandboxing and network restrictions.

Using GitHub Codespaces

Fork or clone this repository to GitHub
Click "Code" → "Codespaces" → "Create codespace on main"
The devcontainer will automatically set up the environment
Set your API key: export GOOGLE_API_KEY='your-api-key'

Using Docker Locally

# Build and start the container
docker-compose -f .devcontainer/docker-compose.yml up -d

# Enter the container
docker-compose -f .devcontainer/docker-compose.yml exec agent bash

# Set your API key
export GOOGLE_API_KEY='your-api-key-here'

# Run the treasure hunt
uv run treasure-hunt quick --interactive

Security Features

The Docker container includes:

Non-root user execution (UID 1000)
Dropped capabilities (minimal permissions)
Network restrictions (access only to trusted domains)
Resource limits (CPU and memory constraints)
Isolated filesystem (sandboxed environment)

See .devcontainer/SECURITY.md for detailed security documentation.

CLI Usage

The treasure-hunt CLI provides three commands:

Quick Start (Generate + Run)

# Interactive mode with real-time turn display
uv run treasure-hunt quick --difficulty easy --interactive

# Batch mode (runs to completion)
uv run treasure-hunt quick --difficulty medium --batch

# With custom options
uv run treasure-hunt quick --difficulty hard --seed 42 --max-turns 30 --keep

Generate a Treasure Hunt

# Generate with default settings
uv run treasure-hunt generate ./my_hunt

# Generate with options
uv run treasure-hunt generate ./my_hunt --difficulty hard --seed 42

# Overwrite existing
uv run treasure-hunt generate ./my_hunt --force

Run an Existing Hunt

# Interactive mode (see each turn as it happens)
uv run treasure-hunt run ./my_hunt --interactive

# Batch mode (show results at end)
uv run treasure-hunt run ./my_hunt --batch

# With custom model and limits
uv run treasure-hunt run ./my_hunt --model gemini-2.0-flash --max-turns 100

CLI Options

Option	Description
`--difficulty`, `-d`	easy, medium, or hard
`--seed`, `-s`	Random seed for reproducibility
`--model`, `-m`	Gemini model name
`--max-turns`, `-t`	Maximum turns allowed
`--interactive/-i`, `--batch/-b`	Show turns live or run to completion
`--keep`, `-k`	Keep generated hunt directory after completion
`--api-key`	Google API key (or use GOOGLE_API_KEY env var)

Programmatic Usage

Batch Mode (Simple)

from treasure_hunt_agent import TreasureHuntGame
from treasure_hunt_agent.gemini_agent import GeminiAgent
from treasure_hunt_agent.game_tools import TOOL_DEFINITIONS

# Create agent
agent = GeminiAgent(
    model_name="gemini-2.5-flash",
    system_instructions="You are a treasure hunter...",
    tools=TOOL_DEFINITIONS,
)

# Create and run game
game = TreasureHuntGame("./my_hunt", agent)
result = game.run()

print(f"Success: {result.success}")
print(f"Turns: {result.turns_taken}")
print(f"Tokens: {result.total_tokens}")

Interactive Mode (Step-by-Step)

from treasure_hunt_agent import TreasureHuntGame

game = TreasureHuntGame("./my_hunt", agent)

while True:
    turn = game.take_turn()

    # Display turn info
    print(f"Turn {turn.turn_number}")
    for tc in turn.tool_calls:
        print(f"  {tc['name']}({tc['arguments']}) -> {tc['result'][:50]}...")

    # Handle human input if agent asks a question
    if turn.needs_human_input:
        response = input(f"Agent asks: {turn.pending_question}\nYour response: ")
        turn = game.take_turn(human_input=response)

    if turn.game_over:
        print(f"Game over! Success: {turn.game_result.success}")
        break

Custom Input Providers

from treasure_hunt_agent import AutoInputProvider, CallbackInputProvider

# For testing: provide canned responses
provider = AutoInputProvider(["look in the basement", "check the safe"])

# For integration: custom callback
def my_callback(question: str) -> str:
    # Could call an API, show a UI dialog, etc.
    return fetch_response_from_somewhere(question)

provider = CallbackInputProvider(my_callback)

Treasure Hunt Structure

A generated hunt creates:

treasure_hunt/
├── .treasure_hunt_config.json  # Configuration and metadata
├── [start_file].txt             # Starting point
├── [dir1]/
│   ├── [clue1].txt             # Points to next location
│   ├── [dir2]/
│   │   ├── [clue2].txt
│   │   └── [treasure].txt      # Final treasure with key
│   └── [red_herring_dir]/      # Dead ends
└── [red_herring_dir]/

Configuration File

The .treasure_hunt_config.json file contains:

{
  "treasure_key": "randomkey123",
  "start_file": "start_word.txt",
  "treasure_file": "path/to/treasure.txt",
  "path_length": 5,
  "golden_path": ["dir1", "dir2", "dir3"],
  "depth": 6,
  "branching_factor": 3,
  "seed": 42
}

Difficulty Presets

Difficulty	Depth	Branching	File Density
Easy	4	2	0.2
Medium	6	3	0.3
Hard	8	4	0.4

Architecture

The system consists of these main components:

Treasure Hunt Generator (treasure_hunt_generator.py)
- Creates parametrized filesystem puzzles
- Random word-based names to prevent cheating
Agent (gemini_agent.py)
- LLM client using Gemini API
- Manages conversation and tool calling
- Tracks token usage
Game Loop (treasure_hunt_game.py)
- run(): Batch mode - runs to completion
- take_turn(): Interactive mode - one turn at a time
- Executes tools (ls, cd, cat, pwd, check_treasure, give_up, ask_human)
- Validates paths stay within hunt boundaries
- Tracks turns, tokens, time
CLI (cli.py)
- Beautiful terminal interface using typer and rich
- Interactive and batch modes
- Real-time turn display
Input Providers (input_providers.py)
- Protocol for pluggable human input
- CLI, automatic (for testing), and callback implementations

Available Tools

The agent has access to these tools:

Tool	Description
`ls(path)`	List files and directories
`cd(path)`	Change working directory
`cat(file_path)`	Read file contents
`pwd()`	Print current directory
`check_treasure(key)`	Check if key is correct (ends game if correct)
`give_up()`	Give up (ends game as failure)
`ask_human(question)`	Ask the human for help

Development

Running Tests

# Run all tests
uv run python -m pytest tests/ -v

# Run specific test file
uv run python -m pytest tests/test_treasure_hunt_game.py -v

# Run with coverage
uv run python -m pytest tests/ --cov=src --cov-report=html

Project Structure

agent_treasure_hunt/
├── src/treasure_hunt_agent/
│   ├── __init__.py           # Package exports
│   ├── cli.py                # Typer CLI with rich output
│   ├── game_tools.py         # Tool implementations
│   ├── gemini_agent.py       # Gemini LLM client
│   ├── input_providers.py    # Human input protocols
│   ├── treasure_hunt_game.py # Game loop (run/take_turn)
│   └── treasure_hunt_generator.py
├── examples/
│   └── run_treasure_hunt.py  # Example script
├── tests/
│   └── test_*.py             # Test files
├── pyproject.toml            # Dependencies and config
└── uv.lock                   # Lock file for reproducibility

Next Steps

Add Docker sandboxing for secure agent execution
Add interactive mode with step-by-step control
Add rich CLI with typer
Create multiple agent implementations (langchain, llm, ADK)
Add difficulty scaling and performance metrics
Implement docs/help system
Add cost tracking and optimization

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.devcontainer		.devcontainer
claude-conversation		claude-conversation
examples		examples
scripts		scripts
src/treasure_hunt_agent		src/treasure_hunt_agent
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
BLOG_STATS.md		BLOG_STATS.md
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Treasure Hunt Agent

Features

Installation

Local Installation with uv (Recommended)

Running with uv

Running Tests

Container Installation (Docker / GitHub Codespaces)

Using GitHub Codespaces

Using Docker Locally

Security Features

CLI Usage

Quick Start (Generate + Run)

Generate a Treasure Hunt

Run an Existing Hunt

CLI Options

Programmatic Usage

Batch Mode (Simple)

Interactive Mode (Step-by-Step)

Custom Input Providers

Treasure Hunt Structure

Configuration File

Difficulty Presets

Architecture

Available Tools

Development

Running Tests

Project Structure

Next Steps

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

matweldon/agent_treasure_hunt

Folders and files

Latest commit

History

Repository files navigation

Treasure Hunt Agent

Features

Installation

Local Installation with uv (Recommended)

Running with uv

Running Tests

Container Installation (Docker / GitHub Codespaces)

Using GitHub Codespaces

Using Docker Locally

Security Features

CLI Usage

Quick Start (Generate + Run)

Generate a Treasure Hunt

Run an Existing Hunt

CLI Options

Programmatic Usage

Batch Mode (Simple)

Interactive Mode (Step-by-Step)

Custom Input Providers

Treasure Hunt Structure

Configuration File

Difficulty Presets

Architecture

Available Tools

Development

Running Tests

Project Structure

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages