-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
feature:ml-botsML reinforcement learning bot training featureML reinforcement learning bot training featuretype:epicEpic-level issue containing multiple tasksEpic-level issue containing multiple tasks
Description
Overview
Implement an ML reinforcement learning bot training system for go-towerfall. The system will train PPO-based agents through successive self-play, starting with a rule-based baseline bot and progressively training against stronger generations.
Background
The goal is to create AI players that can compete with human players by learning through reinforcement learning. Key components include:
- Go Server Extensions: Training mode APIs with configurable tick rates for accelerated training
- Python Bot Infrastructure: Gym environment wrapper, observation/action spaces, PPO agent
- Training Pipeline: Orchestration of game servers, model training, successive generation logic
- Monitoring: Spectator support and metrics visualization
Technical Decisions
- RL Algorithm: PPO (Proximal Policy Optimization)
- Action Space: Discrete (key press/release for W/A/S/D, aim direction buckets, shoot)
- Observation Space: Own player state, other players, projectiles, map geometry
- Speed-up: Accelerated tick rate on game server
- "Better" Threshold: Average kills/deaths > previous generation
Task Breakdown
| Task ID | Title | Dependencies | GitHub Issue # |
|---|---|---|---|
| TASK-001 | Add configurable tick rate to game server | — | #21 |
| TASK-002 | Extend CreateGame API for training options | TASK-001 | #27 |
| TASK-003 | Add game state snapshot REST endpoint | — | #22 |
| TASK-004 | Add bot action submission REST endpoint | — | #24 |
| TASK-005 | Add game reset/restart endpoint | — | #23 |
| TASK-006 | Add game statistics endpoint | — | #26 |
| TASK-007 | Configure bot2 project dependencies | — | #25 |
| TASK-008 | Create Pydantic models for game state | TASK-007 | #29 |
| TASK-009 | Implement async HTTP client wrapper | TASK-007, TASK-008 | #31 |
| TASK-010 | Refactor GameClient for training mode | TASK-009 | #37 |
| TASK-011 | Define discrete action space | TASK-008 | #30 |
| TASK-012 | Implement observation space normalization | TASK-008 | #35 |
| TASK-013 | Add map geometry encoding | TASK-012 | #36 |
| TASK-014 | Create base Gym environment class | TASK-010, TASK-011, TASK-012 | #42 |
| TASK-015 | Implement reward function | TASK-014 | #43 |
| TASK-016 | Add episode termination logic | TASK-014 | #47 |
| TASK-017 | Support multiple parallel environments | TASK-014 | #44 |
| TASK-018 | Implement rule-based bot movement | TASK-008 | #34 |
| TASK-019 | Implement rule-based bot aiming and shooting | TASK-018 | #38 |
| TASK-020 | Add rule-based bot as opponent option | TASK-019, TASK-014 | #48 |
| TASK-021 | Implement PPO actor-critic network | TASK-007, TASK-011, TASK-012 | #39 |
| TASK-022 | Implement PPO training loop | TASK-021 | #40 |
| TASK-023 | Add model serialization | TASK-021 | #41 |
| TASK-024 | Add hyperparameter configuration | TASK-007 | #28 |
| TASK-025 | Create game server manager component | TASK-002 | #33 |
| TASK-026 | Implement model registry | TASK-023 | #46 |
| TASK-027 | Build training orchestrator | TASK-017, TASK-022, TASK-025, TASK-026 | #50 |
| TASK-028 | Implement successive training logic | TASK-027, TASK-020 | #52 |
| TASK-029 | Add training CLI interface | TASK-027 | #51 |
| TASK-030 | Add WebSocket spectator for training games | TASK-002 | #32 |
| TASK-031 | Implement training metrics logging | TASK-022 | #45 |
| TASK-032 | Add generation comparison dashboard | TASK-031, TASK-026 | #49 |
Acceptance Criteria
- Training pipeline can start a game server and train PPO agents
- Rule-based bot serves as initial training opponent
- Successive training automatically promotes better models
- Human spectators can watch training games in real-time
- Training metrics are logged and visualizable
- All linked task issues completed
References
- Repository: https://github.com/tgrunnagle/go-towerfall
- Bot2 project location:
go-towerfall/bot2/ - Coding standards:
go-towerfall/CLAUDE.md
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feature:ml-botsML reinforcement learning bot training featureML reinforcement learning bot training featuretype:epicEpic-level issue containing multiple tasksEpic-level issue containing multiple tasks