A scalable, idempotent, async event ingestion and processing system designed to reliably handle high-volume, retry-prone event streams using FastAPI, async SQLAlchemy, and background workers.
Plutus is built around the following production guarantees:
-
Idempotent ingestion
Duplicate events (e.g., webhook retries) are safely ignored using idempotency keys. -
At-least-once processing with safe retries
Events are processed asynchronously, and failures are recorded for inspection or retry. -
Non-blocking ingestion
API ingestion is decoupled from processing via background workers. -
Observability-first
Metrics and traces are emitted for ingestion, processing, and failures. -
Horizontal scalability
Multiple API instances and workers can be added without changing application logic.
Use Case: User behavior tracking & analytics
Events:
page_view, product_click, add_to_cart, checkout_started, payment_completed.
Processing:
-
Real-time recommendations ("Customers who viewed this also bought...")
-
Inventory management (track cart abandonment vs. purchases)
-
Fraud detection (multiple rapid checkout attempts)
Why idempotent? Payment webhooks might retry; cannot double-charge customers
Use Case: Transaction processing & compliance monitoring
Events:
transaction_initiated, card_swipe, transfer_requested, kyc_document_uploaded
Processing:
-
Real-time fraud detection (anomaly scoring across transaction patterns)
-
Regulatory reporting (aggregate transactions for AML compliance)
-
Balance updates (ensure exactly-once semantics)
Why idempotent? Duplicate transaction processing = financial loss/errors
Use Case: > Sensor data collection from vehicles/machines
Events:
engine_temperature:215°F, gps_location_update, fuel_level:42%
Processing:
-
Predictive maintenance (analyze sensor patterns for failure prediction)
-
Real-time fleet tracking & optimization
-
Usage-based insurance calculations
Why async? Thousands of devices sending data simultaneously
Use Case: Player engagement & social graph updates
Events:
player_login, match_completed, friend_request_sent, in_game_purchase
Processing:
-
Leaderboard updates (real-time ranking calculations)
-
Anti-cheat detection (analyze gameplay patterns)
-
Social feed updates (propagate friend activities)
Why scalability matters Millions of concurrent players during peak events
Use Case: Patient monitoring & HIPAA-compliant data handling
Events:
heart_rate_reading, medication_administered, doctor_notes_updated
Processing:
-
Real-time alerting (abnormal vital signs)
-
Care pathway compliance (ensure treatment protocols followed)
-
Audit trail generation (for compliance)
Why async SQL? Handle bursts of patient data during emergencies
Use Case: Ad impression tracking & bid optimization
Events:
ad_impression, click, conversion, viewability_measured
Processing:
-
Real-time bidding decisions (process within 100ms latency budgets)
-
Attribution modeling (which ad led to conversion?)
-
Budget pacing (spend ad budget evenly through the day)
Why background workers? Heavy analytics don't block ad serving
Use Case: Package tracking & route optimization
Events:
package_scanned, location_update, temperature_breach, delivery_attempted
Processing:
-
ETA predictions (machine learning on historical data)
-
Exception handling (reroute packages automatically)
-
SLA monitoring (alert on delivery delays)
Why observability? Debug why specific shipments were delayed
Use Case: Product usage tracking for B2B SaaS
Events:
feature_used, user_invited, dashboard_viewed, export_triggered
Processing:
-
Customer health scoring (predict churn risk)
-
Feature adoption metrics
-
Usage-based billing calculations
Why clean architecture? Multiple teams adding new event types continuously
- Revenue impact: Duplicate payment processing leads to chargebacks and customer loss
- Operational efficiency: Real-time inventory prevents overselling
- Customer experience: Personalization and low latency increase conversion
- Compliance: Financial and healthcare data require strict audit trails
- Cost control: Efficient async processing reduces infrastructure spend
-
Uber: Processes billions of events (rides, payments, tracking) daily
-
Netflix: Ingests viewing events for recommendations & content decisions
-
Airbnb: Handles booking events, search queries, and host interactions
-
Robinhood: Processes market data and trading events with strict idempotency
-
Shopify: Manages merchant events across thousands of stores
Plutus/
├── app/
│ ├── api/ # FastAPI routers
│ │ └── v1/
│ ├── core/ # App wiring & infrastructure
│ ├── db/ # Database layer
│ ├── ingestion/ # Event validation & persistence
│ ├── processing/ # Event handlers & business logic
│ ├── workers/ # Long-running async consumers
│ ├── observability/ # Metrics & tracing
│ ├── main.py # FastAPI entrypoint
│ └── __init__.py
│
├── migrations/ # Alembic migrations
├── tests/
│ ├── unit/
│ └── integration/
│
├── docker/
│ ├── api.Dockerfile
│ └── worker.Dockerfile
│
├── docker-compose.yml
├── pyproject.toml
├── README.md
└── .gitignore
Language & Framework
- Python 3.12
- FastAPI
- Pydantic
Persistence
- PostgreSQL
- SQLAlchemy (async)
Infrastructure
- Docker & Docker Compose
- Poetry
Observability
- Prometheus
- OpenTelemetry
Create a .env file (or use environment injection), follow the structure provided in .example.env
Build and start services
docker compose up --build
⚠️ Ensure database migrations are applied before running workers.Example:
docker compose exec api alembic upgrade head
This will start:
-
API service (:8000)
-
Background worker
-
PostgreSQL database
Ingest Event POST /api/v1/ingest
Example payload:
{
"source": "payment_service",
"schema_version": "1.0",
"idempotency_key": "evt_123456",
"payload": {
"amount": 100,
"currency": "KES"
}
}
Response:
{
"event_id": "550e8400-e29b-41d4-a716-446655440000"
}
Prometheus metrics are exposed at:
GET /metrics
Example metrics:
ingest_requests_total
raw_events_created_total
raw_events_duplicate_total
processing_success_total
processing_failure_total
The worker:
- Polls for
RawEventrecords with statusRECEIVED - Processes events asynchronously
- Updates event status atomically
- Records failures for inspection and retry
- Supports graceful shutdown via SIGTERM
Run manually (outside Docker):
python -m workers.consumer
Recommended approach:
Unit tests for:
Idempotency logic
Payload normalization
Integration tests for:
API → DB → Worker flow
Run tests with:
pytest
Plutus does not include built-in rate limiting or abuse prevention.
It is expected that deployments enforce request limits at the edge (e.g. API gateways, load balancers, or reverse proxies).
This design allows Plutus to remain focused on ingestion correctness and processing reliability while deferring traffic control to infrastructure better suited for that role.
Plutus is intended to run as a trusted internal service. For production deployments, the following practices are recommended:
- Run the API behind a reverse proxy or API gateway
- Enable TLS termination at the edge
- Restrict database access to internal networks only
- Store secrets (database URLs, credentials) in a secure secret manager
- Avoid logging raw event payloads in production
- Limit access to
/metricsendpoints to trusted monitoring systems
Plutus does not implement authentication or authorization by default. These concerns should be handled at the infrastructure or gateway level.
Contributions are welcome! Please read CONTRIBUTING.md
- Fork the Project
- Create your Feature Branch (git checkout -b feature/AmazingFeature)
- Commit your Changes (git commit -m 'Add some AmazingFeature')
- Push to the Branch (git push origin feature/AmazingFeature)
- Open a Pull Request
This project is licensed under the MIT License.