A machine learning experiment management system with a microservices architecture, featuring Kafka-based messaging and three-tier service separation.
GigaEvo Platform consists of three main components:
- Role: Experiment orchestration and coordination
- Technology: FastAPI, Kafka, PostgreSQL, Redis
- Features:
- Kafka integration for async messaging
- Experiment lifecycle management
- Configuration storage and retrieval
- uv-based dependency management
- Role: Task execution with GigaEvolve integration
- Technology: FastAPI, GigaEvolve tools
- Features:
- Experiment code execution
- Results visualization
- Best program extraction
- Background task processing
- Role: Gradio-based user interface
- Technology: Gradio, Plotly, Requests
- Features:
- Interactive experiment creation
- Real-time progress monitoring
- Results visualization
- System status dashboard
- Docker & Docker Compose
- Python 3.12+ (for local development)
- uv (recommended) or pip
GigaEvo platform reads all LLM settings from a single repo-level file: llm_models.yml. Create llm_models.yml from the llm_models.yml.example template and fill in your credentials.
GigaEvo Platform uses the deploy.sh script with Docker Compose for service orchestration:
make deploy
# Or directly:
./deploy.sh deployThis will deploy with automated health checks:
- Infrastructure: PostgreSQL, Kafka, Zookeeper, Redis (2 instances), MinIO
- Applications: Master API, Runner API, WebUI
- Networking: Docker network and shared volumes
- Health Monitoring: Automatic service health verification
make dev# Run services locally for development (requires infrastructure running)
make master-api # Master API on port 8000
make runner-api # Runner API on port 8001
make web-ui # WebUI on port 7860# Check all services status
make status
# Or:
./deploy.sh status
# Stop all services
make stop
# Or:
./deploy.sh stop
# Restart specific service
make restart SERVICE=master-api
make restart SERVICE=runner-api
make restart SERVICE=web-ui
make restart SERVICE=kafka
# View service logs
./deploy.sh logs [service-name]- WebUI: http://localhost:7860
- Master API: http://localhost:8000
- Runner API: http://localhost:8001
- MinIO Console: http://localhost:9001 (user: minioadmin, pass: minioadmin)
- Kafka Broker: localhost:9092
- Kafka UI: Available in dev mode at http://localhost:9000 (via
make dev)
POST /api/v1/experiments/- Initialize experimentGET /api/v1/experiments/- Get list of experimentsGET /api/v1/experiments/{experiment_id}/status- Request statusPOST /api/v1/experiments/{experiment_id}/start- Start experimentPOST /api/v1/experiments/{experiment_id}/stop- Stop experimentGET /api/v1/experiments/{experiment_id}/results- Get results
POST /api/v1/experiments/{experiment_id}/upload- Load experiment codePOST /api/v1/experiments/{experiment_id}/start- Start experimentPOST /api/v1/experiments/{experiment_id}/stop- Stop experimentGET /api/v1/experiments/{experiment_id}/status- Get execution statusGET /api/v1/experiments/{experiment_id}/visualization- Get visualizationGET /api/v1/experiments/{experiment_id}/best-program- Get best programGET /api/v1/experiments/{experiment_id}/logs- Get logs (optional)
The system uses these Kafka topics for coordination:
experiment-config- Experiment configuration receivedexperiment-prepared- Experiment prepared for executionexperiment-started- Experiment execution startedexperiment-stopped- Experiment execution stoppedrunner-status- Runner status updates
# Install all dependencies
make install
# Run services individually (infrastructure must be running first)
make master-api # Master API on port 8000
make runner-api # Runner API on port 8001
make web-ui # WebUI on port 7860# Development with hot reload (legacy architecture)
make dev
# Production environment (legacy)
make prod
# Clean up containers and volumes
make docker-cleanmake lint # Run linting with ruff
make format # Format code with ruff
make test # Run tests (individual components)make db-reset # Drop and recreate database
make db-migrate # Run database migrations-
Port Conflicts: Ensure these ports are free:
- 5432: PostgreSQL
- 6379, 6380: Redis (2 instances)
- 7860: WebUI
- 8000: Master API
- 8001: Runner API
- 9000, 9001: MinIO
- 9092, 29092, 29093: Kafka
- 2181: Zookeeper
-
Deployment Issues:
# Check deployment status ./deploy.sh status # Or: make status # View service logs ./deploy.sh logs [service-name] # Or for all services: ./deploy.sh logs # Restart specific service make restart SERVICE=master-api make restart SERVICE=runner-api make restart SERVICE=web-ui make restart SERVICE=kafka
-
Service Health Check Failures:
# The deploy script automatically checks service health # If services fail to start, check logs: ./deploy.sh logs postgres ./deploy.sh logs kafka ./deploy.sh logs master-api
-
Database Connection Issues:
# Reset database (use after schema changes) make db-reset # Check PostgreSQL logs ./deploy.sh logs postgres
Key environment variables for Master API:
DATABASE__URL- PostgreSQL connection stringKAFKA__BOOTSTRAP_SERVERS- Kafka bootstrap serversREDIS_URL- Redis connection URLSTORAGE__ENDPOINT_URL- MinIO endpointSTORAGE__ACCESS_KEY- MinIO access keySTORAGE__SECRET_KEY- MinIO secret key
The platform uses a modern microservices architecture with:
- Kafka Message Broker - Asynchronous service communication with topics for experiment coordination
- Separate Docker Compositions - Modular deployment with infrastructure and application services
- Health Monitoring - Automated service health checks and recovery
- Resource Isolation - Dedicated Redis instances and MinIO storage
- uv Dependency Management - Fast package installation and dependency caching
- deploy.sh: Main deployment script with health checks and service management
- docker-compose.kafka.yml: Core infrastructure services
- docker-compose.*.yml: Individual application service configurations
- Makefile: Development commands and shortcuts
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting:
make test && make lint - Submit a pull request
MIT License - see LICENSE file for details.