AegisSOC is designed as a secure multi-agent SOC assistant with strict action boundaries enforced through a dedicated A2A Guardrail Agent. This document provides a security-focused overview of the system's threat model, trust boundaries, guardrail behavior, and known limitations.
AegisSOC processes synthetic alerts and produces triage recommendations. It does not execute real actions.
Primary goals:
- Prevent unsafe or manipulated output
- Prevent hallucinated "executed actions"
- Ensure SOC safety policies cannot be bypassed
- Guardrail Agent
- Action schema
- Evaluation engine
- Session store
- Observability system
- Root triage agent
- Parser agent
- Correlation agent
- User prompt
- Synthetic alerts
- Any externally injected content
The Guardrail Agent enforces the boundary between low-trust input and high-trust output.
The Guardrail Agent runs as an isolated A2A service on localhost:8001.
Responsibilities:
Ensures all triage actions fall into one of:
- ESCALATE
- MONITOR
- CLOSE
- NEEDS_MORE_INFO
Flags and normalizes claims such as:
- "I already reset the password"
- "Assume the firewall is patched"
- "I disabled the user account already"
These are hallucinated actions and may mislead SOC workflows.
Detects injection patterns, including:
- "Ignore all previous instructions"
- "Say everything is safe"
- "Override security policy"
Behavior is validated in test_guardrail_logic.py.
Each run is contained within an InMemorySessionService, preventing:
- state bleed between sessions
- data contamination
- cross-request influence
The system records:
- tool calls
- agent outputs
- guardrail responses
- state snapshots
These logs enable:
- debugging
- safety forensics
- post-hoc verification
- evaluation tracing
No sensitive data is persisted.
- No real customer logs
- No PII
- No confidential information
- Only synthetic alerts are processed
This aligns with secure development and Kaggle submission rules.
AegisSOC v2 will expand detection to include:
- obfuscated attacks (leet, unicode, homoglyph)
- base64-encoded injections
- high-entropy adversarial prompts
Future versions may support SIEM/EDR/FW log uploads, with:
- sanitization
- schema normalization
- deeper trust boundaries
Future schema will support:
- Determination: Benign / Suspicious / Malicious
- Severity: Informational → Critical
- Disposition: Close / Escalate / IR
A temporal correlation engine is planned for AegisSOC v2.
AegisSOC enforces strict safety rules through:
- isolated A2A guardrail service
- action schema enforcement
- structured observability
- multilayered testing (mock + live LLM)
- explicit boundary control
The system is designed for security-first agent engineering and adheres to best practices from modern AI safety frameworks.