Agentic Field Manual

The missing operations manual for production AI systems.

Quick Start · Assess Your System · Crisis Playbook · Quick Reference

Agentic systems fail silently, expensively, then catastrophically. This manual gives you the patterns, checklists, and code to prevent that—from a team that learned operating 1.5M+ MAU systems the hard way.

Who This Is For

Principal Engineers inheriting or building agentic systems
CTOs/VPEs who need to understand AI operational risk
AI/ML Engineers shipping to production (not just prototyping)
Platform Teams building shared AI infrastructure

This manual is for production systems where failure has consequences.

Quick Start

Add this to your inference calls today:

def log_inference(request, response, model_version, prompt_version):
    return {
        "trace_id": str(uuid.uuid4()),
        "trigger_type": request.get("trigger", "user_explicit"),
        "model_version": model_version,
        "prompt_version": prompt_version,
        "cost_usd": calculate_cost(response.usage),
        "state": "speculative",  # → "committed" when user accepts
    }

Then run weekly:

SELECT SUM(cost_usd) / NULLIF(COUNT(DISTINCT CASE 
  WHEN state = 'committed' THEN trace_id END), 0) as cost_per_outcome
FROM inference_logs WHERE created_at > NOW() - INTERVAL '7 days';

If cost per outcome is rising, you have a problem → Cost Investigation

Start Here

Your Situation	Go To
Just inherited a system	System Assessment → Failure Modes
Building something new	State Model → Interaction Contract
In crisis mode	Crisis Playbook
Costs are spiking	Cost Investigation
Presenting to leadership	Board Explainer
Setting up observability	Metrics Reference
Preparing for an audit	Audit Preparation

Tip

First time here? Run the 10-minute assessment to identify your gaps and get personalized recommendations.

The 4 Failure Modes

These are how agentic systems break:

Failure Mode	Signal	Fix
Legibility Loss	"Why did it do that?" takes hours	Decision envelopes
Control Surface Drift	Costs rise, traffic flat	Interaction contracts
Auditability Gap	Can show outputs, not rationale	Provenance logging
Margin Fragility	Success destroys margin	Cost-per-outcome tracking

The 3 Irreversible Decisions

These decisions harden quickly—get them right early:

Decision	Controls	Why It Hardens
State Model	What you persist	Downstream systems depend on schema
Interaction Contract	What triggers compute	Users form habits around UX
Control Plane Ownership	What you own vs rent	Contracts and migrations lock in

Case Studies

Story	Lesson
The Undo Button That Killed Our Margin	Hidden recompute from "free" UI actions
Why We Rebuilt State Twice	State model closes reversibility window
The Compliance Question We Couldn't Answer	Auditability gaps kill enterprise deals
From API to Owned in 90 Days	When to transition inference ownership

Reference

Resource	Purpose
Quick Reference Card	One-page summary - print this
System Assessment	10-minute self-assessment with scoring
Adoption Guide	How to implement these patterns incrementally
Anti-Patterns	Common mistakes to avoid
Glossary	All terms with technical and executive definitions
Metrics Reference	Formulas and queries for every metric
Examples	Production code and schemas
Templates	Documents you copy and fill in

Full Topic Index

Expand for complete navigation by topic

Architecture and Design

Topic	Document
State persistence	State Model
User action triggers	Interaction Contract
Build vs buy	API vs Owned
Control plane	Control Plane Ownership
Multi-agent orchestration	Orchestration
Tool failure handling	Tool Reliability

Cost and Economics

Topic	Document
Cost investigation	Cost Investigation
Cost per outcome	Cost Model
Hidden recompute	Hidden Recompute
Capacity planning	Capacity Planning
Margin at scale	Margin Fragility

Quality and Reliability

Topic	Document
Evals and regression	Eval and Regression
Latency and SLOs	Latency and SLOs
Rollout and rollback	Rollout and Rollback
Safety and guardrails	Safety Surface
Human oversight	Human in the Loop

Compliance

Topic	Document
Audit preparation	Audit Preparation
Auditability requirements	Auditability
Data sovereignty	Sovereignty
Operational independence	Operational Independence
Data privacy	Data Privacy

Templates

Situation	Template
System failing	Crisis Playbook
Costs spiked	Cost Spike Runbook
Weekly review	Weekly Ops Checklist
After incident	Incident Post-Mortem
Before shipping	Pre-Ship Checklist
Architecture decision	Decision Record

Contributing

Found a bug? Have a war story to share? See CONTRIBUTING.md.

About the Author

Rade Joksimovic - Principal engineer focused on AI systems at scale.

15+ years building SaaS systems
Recent focus: LLM-driven products and agentic infrastructure
Scale: 1.5M+ MAU, 30M+ monthly API calls, 50K+ orchestrated agents

Twitter ・ LinkedIn ・ Email

Acknowledgments

Thanks to the teams at Filekit, ShortlistIQ, Olovka, and Rumora for battle-testing these patterns across diverse agentic architectures—from document generation to autonomous interviewing to social media orchestration.

Special thanks to everyone who shared war stories, reported issues, and contributed patterns from their own production systems.

If you can explain the output, you have control.
If you cannot, you are negotiating with your own product.

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
00-templates		00-templates
01-failure-modes		01-failure-modes
02-architecture		02-architecture
03-economics		03-economics
04-compliance		04-compliance
05-communication		05-communication
06-operations		06-operations
07-examples		07-examples
08-war-stories		08-war-stories
.gitattributes		.gitattributes
.gitignore		.gitignore
ADOPTION.md		ADOPTION.md
ALTERNATIVES.md		ALTERNATIVES.md
ANTI-PATTERNS.md		ANTI-PATTERNS.md
ASSESS.md		ASSESS.md
AUTHOR.md		AUTHOR.md
CONTRIBUTING.md		CONTRIBUTING.md
GLOSSARY.md		GLOSSARY.md
LICENSE		LICENSE
QUICK-REFERENCE.md		QUICK-REFERENCE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Field Manual

The missing operations manual for production AI systems.

Who This Is For

Quick Start

Start Here

The 4 Failure Modes

The 3 Irreversible Decisions

Case Studies

Reference

Full Topic Index

Architecture and Design

Cost and Economics

Quality and Reliability

Compliance

Templates

Contributing

About the Author

Acknowledgments

About

Uh oh!

Releases 1

Packages

License

whoisrade/agentic-field-manual

Folders and files

Latest commit

History

Repository files navigation

Agentic Field Manual

The missing operations manual for production AI systems.

Who This Is For

Quick Start

Start Here

The 4 Failure Modes

The 3 Irreversible Decisions

Case Studies

Reference

Full Topic Index

Architecture and Design

Cost and Economics

Quality and Reliability

Compliance

Templates

Contributing

About the Author

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Packages