Skip to content

Epic: ML Reinforcement Learning Bot Training Pipeline #14

@tgrunnagle

Description

@tgrunnagle

Overview

Implement an ML reinforcement learning bot training system for go-towerfall. The system will train PPO-based agents through successive self-play, starting with a rule-based baseline bot and progressively training against stronger generations.

Background

The goal is to create AI players that can compete with human players by learning through reinforcement learning. Key components include:

  • Go Server Extensions: Training mode APIs with configurable tick rates for accelerated training
  • Python Bot Infrastructure: Gym environment wrapper, observation/action spaces, PPO agent
  • Training Pipeline: Orchestration of game servers, model training, successive generation logic
  • Monitoring: Spectator support and metrics visualization

Technical Decisions

  • RL Algorithm: PPO (Proximal Policy Optimization)
  • Action Space: Discrete (key press/release for W/A/S/D, aim direction buckets, shoot)
  • Observation Space: Own player state, other players, projectiles, map geometry
  • Speed-up: Accelerated tick rate on game server
  • "Better" Threshold: Average kills/deaths > previous generation

Task Breakdown

Task ID Title Dependencies GitHub Issue #
TASK-001 Add configurable tick rate to game server #21
TASK-002 Extend CreateGame API for training options TASK-001 #27
TASK-003 Add game state snapshot REST endpoint #22
TASK-004 Add bot action submission REST endpoint #24
TASK-005 Add game reset/restart endpoint #23
TASK-006 Add game statistics endpoint #26
TASK-007 Configure bot2 project dependencies #25
TASK-008 Create Pydantic models for game state TASK-007 #29
TASK-009 Implement async HTTP client wrapper TASK-007, TASK-008 #31
TASK-010 Refactor GameClient for training mode TASK-009 #37
TASK-011 Define discrete action space TASK-008 #30
TASK-012 Implement observation space normalization TASK-008 #35
TASK-013 Add map geometry encoding TASK-012 #36
TASK-014 Create base Gym environment class TASK-010, TASK-011, TASK-012 #42
TASK-015 Implement reward function TASK-014 #43
TASK-016 Add episode termination logic TASK-014 #47
TASK-017 Support multiple parallel environments TASK-014 #44
TASK-018 Implement rule-based bot movement TASK-008 #34
TASK-019 Implement rule-based bot aiming and shooting TASK-018 #38
TASK-020 Add rule-based bot as opponent option TASK-019, TASK-014 #48
TASK-021 Implement PPO actor-critic network TASK-007, TASK-011, TASK-012 #39
TASK-022 Implement PPO training loop TASK-021 #40
TASK-023 Add model serialization TASK-021 #41
TASK-024 Add hyperparameter configuration TASK-007 #28
TASK-025 Create game server manager component TASK-002 #33
TASK-026 Implement model registry TASK-023 #46
TASK-027 Build training orchestrator TASK-017, TASK-022, TASK-025, TASK-026 #50
TASK-028 Implement successive training logic TASK-027, TASK-020 #52
TASK-029 Add training CLI interface TASK-027 #51
TASK-030 Add WebSocket spectator for training games TASK-002 #32
TASK-031 Implement training metrics logging TASK-022 #45
TASK-032 Add generation comparison dashboard TASK-031, TASK-026 #49

Acceptance Criteria

  • Training pipeline can start a game server and train PPO agents
  • Rule-based bot serves as initial training opponent
  • Successive training automatically promotes better models
  • Human spectators can watch training games in real-time
  • Training metrics are logged and visualizable
  • All linked task issues completed

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature:ml-botsML reinforcement learning bot training featuretype:epicEpic-level issue containing multiple tasks

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions