Epic: ML Reinforcement Learning Bot Training Pipeline

## Overview

Implement an ML reinforcement learning bot training system for go-towerfall. The system will train PPO-based agents through successive self-play, starting with a rule-based baseline bot and progressively training against stronger generations.

## Background

The goal is to create AI players that can compete with human players by learning through reinforcement learning. Key components include:

- **Go Server Extensions**: Training mode APIs with configurable tick rates for accelerated training
- **Python Bot Infrastructure**: Gym environment wrapper, observation/action spaces, PPO agent
- **Training Pipeline**: Orchestration of game servers, model training, successive generation logic
- **Monitoring**: Spectator support and metrics visualization

### Technical Decisions
- **RL Algorithm**: PPO (Proximal Policy Optimization)
- **Action Space**: Discrete (key press/release for W/A/S/D, aim direction buckets, shoot)
- **Observation Space**: Own player state, other players, projectiles, map geometry
- **Speed-up**: Accelerated tick rate on game server
- **"Better" Threshold**: Average kills/deaths > previous generation

## Task Breakdown

| Task ID | Title | Dependencies | GitHub Issue # |
|---------|-------|--------------|----------------|
| TASK-001 | Add configurable tick rate to game server | — | #21 |
| TASK-002 | Extend CreateGame API for training options | TASK-001 | #27 |
| TASK-003 | Add game state snapshot REST endpoint | — | #22 |
| TASK-004 | Add bot action submission REST endpoint | — | #24 |
| TASK-005 | Add game reset/restart endpoint | — | #23 |
| TASK-006 | Add game statistics endpoint | — | #26 |
| TASK-007 | Configure bot2 project dependencies | — | #25 |
| TASK-008 | Create Pydantic models for game state | TASK-007 | #29 |
| TASK-009 | Implement async HTTP client wrapper | TASK-007, TASK-008 | #31 |
| TASK-010 | Refactor GameClient for training mode | TASK-009 | #37 |
| TASK-011 | Define discrete action space | TASK-008 | #30 |
| TASK-012 | Implement observation space normalization | TASK-008 | #35 |
| TASK-013 | Add map geometry encoding | TASK-012 | #36 |
| TASK-014 | Create base Gym environment class | TASK-010, TASK-011, TASK-012 | #42 |
| TASK-015 | Implement reward function | TASK-014 | #43 |
| TASK-016 | Add episode termination logic | TASK-014 | #47 |
| TASK-017 | Support multiple parallel environments | TASK-014 | #44 |
| TASK-018 | Implement rule-based bot movement | TASK-008 | #34 |
| TASK-019 | Implement rule-based bot aiming and shooting | TASK-018 | #38 |
| TASK-020 | Add rule-based bot as opponent option | TASK-019, TASK-014 | #48 |
| TASK-021 | Implement PPO actor-critic network | TASK-007, TASK-011, TASK-012 | #39 |
| TASK-022 | Implement PPO training loop | TASK-021 | #40 |
| TASK-023 | Add model serialization | TASK-021 | #41 |
| TASK-024 | Add hyperparameter configuration | TASK-007 | #28 |
| TASK-025 | Create game server manager component | TASK-002 | #33 |
| TASK-026 | Implement model registry | TASK-023 | #46 |
| TASK-027 | Build training orchestrator | TASK-017, TASK-022, TASK-025, TASK-026 | #50 |
| TASK-028 | Implement successive training logic | TASK-027, TASK-020 | #52 |
| TASK-029 | Add training CLI interface | TASK-027 | #51 |
| TASK-030 | Add WebSocket spectator for training games | TASK-002 | #32 |
| TASK-031 | Implement training metrics logging | TASK-022 | #45 |
| TASK-032 | Add generation comparison dashboard | TASK-031, TASK-026 | #49 |

## Acceptance Criteria

- [ ] Training pipeline can start a game server and train PPO agents
- [ ] Rule-based bot serves as initial training opponent
- [ ] Successive training automatically promotes better models
- [ ] Human spectators can watch training games in real-time
- [ ] Training metrics are logged and visualizable
- [ ] All linked task issues completed

## References

- Repository: https://github.com/tgrunnagle/go-towerfall
- Bot2 project location: `go-towerfall/bot2/`
- Coding standards: `go-towerfall/CLAUDE.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: ML Reinforcement Learning Bot Training Pipeline #14

Overview

Background

Technical Decisions

Task Breakdown

Acceptance Criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Task ID	Title	Dependencies	GitHub Issue #
TASK-001	Add configurable tick rate to game server	—	#21
TASK-002	Extend CreateGame API for training options	TASK-001	#27
TASK-003	Add game state snapshot REST endpoint	—	#22
TASK-004	Add bot action submission REST endpoint	—	#24
TASK-005	Add game reset/restart endpoint	—	#23
TASK-006	Add game statistics endpoint	—	#26
TASK-007	Configure bot2 project dependencies	—	#25
TASK-008	Create Pydantic models for game state	TASK-007	#29
TASK-009	Implement async HTTP client wrapper	TASK-007, TASK-008	#31
TASK-010	Refactor GameClient for training mode	TASK-009	#37
TASK-011	Define discrete action space	TASK-008	#30
TASK-012	Implement observation space normalization	TASK-008	#35
TASK-013	Add map geometry encoding	TASK-012	#36
TASK-014	Create base Gym environment class	TASK-010, TASK-011, TASK-012	#42
TASK-015	Implement reward function	TASK-014	#43
TASK-016	Add episode termination logic	TASK-014	#47
TASK-017	Support multiple parallel environments	TASK-014	#44
TASK-018	Implement rule-based bot movement	TASK-008	#34
TASK-019	Implement rule-based bot aiming and shooting	TASK-018	#38
TASK-020	Add rule-based bot as opponent option	TASK-019, TASK-014	#48
TASK-021	Implement PPO actor-critic network	TASK-007, TASK-011, TASK-012	#39
TASK-022	Implement PPO training loop	TASK-021	#40
TASK-023	Add model serialization	TASK-021	#41
TASK-024	Add hyperparameter configuration	TASK-007	#28
TASK-025	Create game server manager component	TASK-002	#33
TASK-026	Implement model registry	TASK-023	#46
TASK-027	Build training orchestrator	TASK-017, TASK-022, TASK-025, TASK-026	#50
TASK-028	Implement successive training logic	TASK-027, TASK-020	#52
TASK-029	Add training CLI interface	TASK-027	#51
TASK-030	Add WebSocket spectator for training games	TASK-002	#32
TASK-031	Implement training metrics logging	TASK-022	#45
TASK-032	Add generation comparison dashboard	TASK-031, TASK-026	#49

Epic: ML Reinforcement Learning Bot Training Pipeline #14

Description

Overview

Background

Technical Decisions

Task Breakdown

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions