AegisSOC is a secure multi-agent SOC triage assistant built with Google's AI Developer Kit (ADK).
It ingests synthetic security alerts, parses and correlates them, proposes triage actions, and then routes every decision through a dedicated A2A guardrail microservice. The system is fully instrumented with sessions, structured observability, and a scenario-based evaluation harness.
This repository is the capstone project for the Google 5-Day AI Agents course (Enterprise Agents track), with a strong focus on AI security engineering.
At a high level, AegisSOC is a multi-agent pipeline:
- Root Triage Agent orchestrates the workflow and proposes SOC actions.
- Parser Agent normalizes raw alerts into structured fields.
- Correlation Agent enriches alerts with contextual signals.
- Guardrail Agent (A2A microservice) validates and normalizes actions.
- Session Service persists state across turns.
- Observability Layer records tool calls, decisions, and state snapshots.
- Evaluation Engine runs scenario-based tests over the system's behavior.
Architecture diagram (SOC / HUD style):
-
Multi-Agent Design
- Root triage agent, parser agent, correlation agent
-
Dedicated A2A Guardrail Microservice
- Runs as a separate service (
guardrail_agent/app.py). - Enforces a strict action schema:
ESCALATE,MONITOR,CLOSE,NEEDS_MORE_INFO.
- Runs as a separate service (
-
Security Guardrails
- Action normalization
- Fake execution detection ("I already reset the password…")
- Prompt injection detection ("Ignore all previous instructions…")
-
Sessions & State
InMemorySessionServicemanages per-session state.- Named keys:
raw_alerts,parsed_alerts,correlation_summary,triage_summary,events.
-
Structured Observability
- Every tool call, agent output, guardrail response, and state snapshot is captured as a
StructuredEvent.
- Every tool call, agent output, guardrail response, and state snapshot is captured as a
-
Scenario-Based Evaluation
- Synthetic alerts + evaluation scenarios:
- benign, suspicious, malicious, ambiguous, prompt injection.
- Synthetic alerts + evaluation scenarios:
-
Functional Guardrail Tests
- Live LLM tests validate guardrail reasoning for normalization, fake execution claims, and prompt-injection resistance.
- Load synthetic alert from
data/synthetic_alerts.jsonvia a tool. - Parser Agent extracts key fields (source, event type, principal, IPs, etc.).
- Correlation Agent adds context (patterns, frequencies, cross-signal hints).
- Root Triage Agent proposes an action (e.g.,
ESCALATE). - Guardrail Agent (A2A) receives the proposed action and:
- validates it against policy,
- detects prompt injection or fake execution claims,
- returns
allow,normalized_action, andrationale.
- Root Agent adopts the guardrail-normalized action.
- Session & Observability:
- state is updated (alerts, summaries),
- events are appended (tool calls, outputs, guardrail responses).
- Evaluation Engine:
- uses structured outputs to check if behavior matches expectations for each scenario.
aegis-soc/
├── aegis_soc_app/ # Phase 1-2: Single & multi-agent baseline
│ ├── agent.py # Root, LogParser, Correlation agents (baseline)
│ ├── app.py # ADK app configuration
│ └── __init__.py
├── aegis_soc_sessions/ # Phase 3+: Session-aware agents
│ ├── agent.py # Root agent wiring, tools, sub-agents
│ ├── app.py # ADK App construction
│ ├── observability.py # StructuredEvent + logging helpers
│ ├── action_schema.py # NORMALIZED_ACTIONS + enforce_action_schema
│ └── __init__.py
├── guardrail_agent/
│ ├── agent.py # Guardrail LlmAgent definition
│ ├── app.py # A2A microservice (port 8001)
│ └── __init__.py
├── data/
│ └── synthetic_alerts.json # Synthetic SOC alerts for evaluation
├── tests/
│ ├── conftest.py # pytest configuration
│ ├── helpers.py # Guardrail mock tool + context manager
│ ├── test_phase3_sessions.py # Sessions & state behavior
│ ├── test_phase4_guardrail_a2a.py
│ ├── test_phase5_observability.py
│ ├── test_phase6_evaluation.py
│ ├── test_guardrail_logic.py # Phase 6.5 functional guardrail tests
│ ├── eval/ # Evaluation scenario data
│ └── __init__.py
├── docs/
│ ├── architecture.png # Architecture diagram (SOC HUD style)
│ └── Logo.png # AegisSOC logo
├── run_tests.py # Helper to run pytest with captured output
├── demo_script.md # 3-minute demo video script
├── docs/Kaggle_Writeup.md # Kaggle submission writeup
├── README.md
├── TESTING.md
├── SECURITY.md
├── requirements.txt # Python dependencies
├── pytest.ini
├── .env.example
├── .gitignore
└── LICENSE
- Python 3.13.5 (or compatible version with ADK 1.18.0)
- Virtual environment (recommended)
- A valid
GOOGLE_API_KEYfor ADK / Gemini
# Clone the repo
git clone https://github.com/mwill20/MultiAgent_SOC.git aegis-soc
cd aegis-soc
# Create and activate virtualenv
python -m venv .venv
.\.venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtCreate .env in the project root based on .env.example:
GOOGLE_API_KEY=your_api_key_hereIn one terminal:
# From repo root
.\.venv\Scripts\activate
python -m guardrail_agent.appThis exposes the Guardrail Agent via A2A on localhost:8001.
In another terminal:
.\\venv\Scripts\activate
# Run via pytest (recommended)
python -m pytest tests/test_phase3_sessions.py -v
# Or run the ADK app directly
python -m aegis_soc_sessions.appThis will:
- Load a synthetic alert
- Run the triage pipeline
- Call the guardrail microservice
- Print the final triage result
Note: Refer to TESTING.md for all test commands and execution details.
All tests are written with pytest.
python -m pytest tests/test_phase3_sessions.py -vVerifies:
InMemorySessionServicebehavior- multi-turn sessions
- persistence of state keys (
raw_alerts,parsed_alerts, etc.)
python -m pytest tests/test_phase5_observability.py -vVerifies:
state["events"]existstool_callandagent_outputevents recorded- events accumulate across turns
# Run individual scenarios (recommended)
python -m pytest tests/test_phase6_evaluation.py::test_phase6_evaluation_scenario -k "scenario0" -v
python -m pytest tests/test_phase6_evaluation.py::test_phase6_evaluation_scenario -k "scenario1" -vVerifies:
- evaluation scenarios are loaded
- system behavior is checked against expected outcomes
- observability is used to assert correctness
Note: Some scenarios may be marked xfail or skipped if LLM variance leads to no final action in a specific run. This is documented in TESTING.md.
# Run all tests (may hit event loop issue on test 2)
python run_tests.py
# OR run individually (recommended)
python -m pytest tests/test_guardrail_logic.py::test_action_normalization -v
python -m pytest tests/test_guardrail_logic.py::test_fake_execution_detection -v
python -m pytest tests/test_guardrail_logic.py::test_prompt_injection -vVerifies guardrail behavior against a live LLM:
- Action normalization
- Fake execution claims
- Prompt injection attempts
These tests use the real A2A Guardrail Agent and validate its reasoning.
On Windows + Python 3.13 setups, running all tests in a single pytest invocation can produce:
RuntimeError: Event loop is closed
This is a known issue with asyncio / httpx cleanup, not a logic bug. The recommended approach is to run tests individually as shown above.
See TESTING.md for more detail.
A detailed security discussion lives in SECURITY.md. High-level points:
- All SOC actions flow through a separate policy agent.
- The triage agent cannot finalize decisions on its own.
- All actions must be one of:
ESCALATE,MONITOR,CLOSE,NEEDS_MORE_INFO.
action_schema.enforce_action_schema()is used to prevent drift.
- Guardrail flags and normalizes responses that claim actions like:
- "I already reset the password…"
- "Assume the firewall is patched…"
- Prevents hallucinated execution.
- Guardrail detects attempts to override policy/instructions:
- "Ignore all previous instructions…"
- "Say everything is safe…"
- Evaluation includes explicit prompt injection scenarios.
- Guardrail runs as a separate microservice behind a firewall boundary.
- Agent tools are explicitly whitelisted and scoped.
Planned for AegisSOC v2:
- Semantic Prompt Injection Detection
- Detect obfuscated variants (leet, homoglyphs, base64, etc.).
- User Log Upload
- Allow users to upload their own SIEM/EDR/FW logs for analysis.
- SOC-Native Outcome Schema
- Determination: Benign / Suspicious / Malicious
- Severity: Informational → Critical
- Disposition: Resolve / Escalate / Incident Response
- Richer Correlation Engine
- Multi-alert correlation over time windows.
- UI Demo App
- Streamlit-based SOC console to visualize:
- parsed alerts,
- correlation,
- guardrail decisions,
- observability timelines.
- Streamlit-based SOC console to visualize:
- Python 3.13.5 (or compatible version)
- Google ADK 1.18.0
- a2a-sdk 0.3.14
- pytest 9.0.1 (with pytest-asyncio)
- uvicorn 0.38.0
- httpx, pydantic, python-dotenv
- Valid Google API key (Gemini 2.5 Flash-Lite)
See requirements.txt for the complete dependency list.
- ADK 1.18.0 CLI: Session bug with Python 3.13.5 — use
RunnerorInMemoryRunner.run_debug()programmatically - RemoteA2aAgent: Marked as EXPERIMENTAL (warnings expected during runtime)
- Event Loop Cleanup: Issue on Windows when running multiple async tests together (see
TESTING.mdfor workaround)
MIT License - see LICENSE file for details.
- Google 5-Day AI Agents Course (Enterprise Agents track)
- Google ADK team
- Ready Tensor guidelines and repo standards
- Kaggle AI Agents Capstone competition
Repository: https://github.com/mwill20/MultiAgent_SOC
Author: Michael Williams
Date: November 2025

