GitHub - Base4Security/GraphHunter: Hypothesis-driven threat hunting on temporal knowledge graphs

Graph-based & Hypothesis-driven threat hunting

Ingest security logs, build an entity-relationship graph with causal ordering, and hunt for attack paths using pattern matching with optional MITRE ATT&CK–aligned detection templates. Integrates GNN-based threat classification via ONNX models with optional NPU/GPU acceleration.

About

Graph Hunter is a graph-based threat hunting engine that turns heterogeneous security telemetry (Sysmon, Microsoft Sentinel, generic JSON, CSV) into a single knowledge graph. Analysts define hypotheses as chains of entity types and relation types (e.g., User →[Auth]→ Host →[Execute]→ Process). The engine finds all paths that match the pattern while enforcing causal monotonicity: each step occurs at or after the previous one in time. Results are explored via an interactive graph canvas, IOC search, timeline and heatmap views, and optional ATT&CK-mapped hypothesis templates.

The engine includes an endogenous anomaly scoring system with five components — Entity Rarity, Edge Rarity, Neighborhood Concentration, Temporal Novelty, and GNN Threat — that automatically prioritizes the most suspicious paths. The GNN component integrates ONNX models (e.g., exported from GraphOS-APT) that classify k-hop subgraphs into threat categories (Benign, Exfiltration, C2 Beacon, Lateral Movement, Privilege Escalation), with optional NPU/GPU acceleration via DirectML.

Screenshot: Exploring nodes on map

Why Graph-Based Hunting?

Traditional SIEM-style queries are rigid and schema-bound. Attack chains span multiple data sources and event types; correlating them often requires custom rules and manual pivoting. Graph Hunter instead:

Normalizes diverse log formats into a unified model (entities + typed relations + timestamps).
Searches by pattern (who executed what, who connected where, what wrote which file) instead of by field names.
Surfaces multi-hop attack paths that satisfy temporal order, so you see full chains, not isolated events.

How It Works

Security Logs ──► Parser ──► Knowledge Graph ──► Hypothesis Search ──► Hunt Attack Paths

Ingest — Load logs in any supported format. The engine auto-detects the format or you can specify it. Parsers extract entities (IP, Host, User, Process, File, Domain, Registry, URL, Service) and relations (Auth, Connect, Execute, Read, Write, DNS, Modify, Spawn, Delete) with timestamps.
Build Graph — Entities become nodes, relations become directed edges. Duplicate entities are deduplicated; metadata is merged.
Hunt — Define a hypothesis as a chain of typed steps (e.g., User →[Auth]→ Host →[Execute]→ Process). The engine finds all paths matching the pattern with causal monotonicity (each step at or after the previous one). Optional k-simplicity allows a vertex to repeat up to k times per path.
Explore — Search for IOCs, expand node neighborhoods, inspect metadata and anomaly scores, pivot via Events view, Heatmap, and Timeline.

Screenshot: Ingesting data

Key Features

Area	Features
Engine	Temporal pattern matching (DFS + causal monotonicity), 5-component endogenous anomaly scoring (ER, EdgeR, NC, TN, GNN Threat), parallel parsing (Rayon), entity/relation deduplication
GNN Scoring	ONNX model inference for k-hop subgraph classification (5 threat classes), DirectML NPU/GPU acceleration, batch scoring, configurable k-hop depth, feature-gated (`ml-scoring`)
Formats	Sysmon, EVTX, Microsoft Sentinel, generic JSON (80+ field variants), CSV;
Hypotheses	Visual step builder or DSL (`User -[Auth]-> Host -[Execute]-> Process`); wildcards (``) for any type; ATT&CK hypothesis catalog* with one-click load
UI	Sessions (multiple graphs, persisted); Hunt vs Explorer modes; Events, Heatmap, Timeline views; Path Nodes (pinned nodes); Notes (standalone or node-linked); GNN Threat Model panel; paginated hunt results for large path sets
Data	Configurable generic parser (field → entity type mapping); preview before ingest; dataset list per session (remove/rename)
SIEM integrations	Azure Sentinel (Log Analytics): KQL queries, workspace + tenant/client/secret (env or UI). Elasticsearch: index + query JSON, API key or user/password (env or UI). Query-based ingest via gateway or CLI; results loaded into the graph.

Supported Log Formats

Graph Hunter supports Sysmon, Microsoft Sentinel, generic JSON (80+ field variants), and CSV. Use Auto-detect to let the engine choose the parser from content heuristics, or select a format manually.

Full details (event IDs, Sentinel tables, triples, generic field mapping, CSV): Supported log formats in the documentation.

SIEM Integrations

Graph Hunter can pull data directly from Azure Sentinel (Log Analytics) and Elasticsearch via their APIs—run a query, then ingest the results into your session.

SIEM	Auth	Usage
Azure Sentinel	Tenant ID, Client ID, Client Secret (env or UI)	Workspace ID + KQL query; default: SecurityEvent, last 24h
Elasticsearch	API key or User/Password (env or UI)	Cluster URL, index, query JSON, size

Available in the web app with gateway (Datasets → Data Ingestion) or via the gateway API (POST /api/ingest/query). Desktop app without gateway: use From file and export from your SIEM first. See SIEM query-based ingest in the docs for env vars and pagination.

Hypothesis DSL & ATT&CK Catalog

DSL — Build hypotheses as arrow chains with optional wildcards:

User -[Auth]-> Host -[Execute]-> Process
Process -[DNS]-> Domain -[Connect]-> IP
* -[Execute]-> Process -[Spawn]-> Process

Catalog — Pre-built hypotheses mapped to MITRE ATT&CK (e.g., Valid Accounts T1078, Credential Dumping T1003, RDP Lateral Movement T1021.001, C2 T1071). Load from the catalog or use them as templates for custom chains.

GNN Threat Scoring

Graph Hunter can use GNN-based threat classification via ONNX models (e.g. from GraphOS-APT): the engine extracts k-hop subgraphs, runs inference (DirectML/GPU or CPU), and injects a 5-class threat score (Benign, Exfiltration, C2 Beacon, Lateral Movement, Privilege Escalation) into the anomaly scorer as weight W5. Hunt results are then ranked by the composite score so high-threat paths appear first. GNN scoring is optional and off by default; load a model and click Compute Scores in the GNN Threat Model panel to enable it.

Pre-trained ONNX model: Download from Hugging Face.

Full details (pipeline, threat classes, UI workflow, training): GNN Threat Scoring in the documentation.

Architecture

Graph Hunter is split into a Rust core (domain logic, parsing, graph, search), a Tauri + React desktop app (UI and persistence), and optional graph-hunter-mcp for AI assistants. The core holds all business logic; the app exposes commands, session state, and an HTTP API.

Full details (directory layout, core modules, app structure, data flow): Architecture in the documentation.

Installation

You need Rust, Node.js, and the Tauri v2 prerequisites—no extra services or accounts. Follow the steps below; the first run may take a few minutes while dependencies build.

Install prerequisites (if not already installed):
- Rust (2024 edition)
- Node.js (v18+)
- Platform-specific build tools: see Tauri prerequisites
Clone and run in development:
```
cd app
npm install
npm run tauri dev
```
Verify: The app window opens. Create a session, load demo_data/apt_attack_simulation.json with Auto-detect, then run a hunt (e.g. Hunt Mode → add step User -[Auth]-> Host → Run). If you see paths and the graph, you’re ready to go.

Run tests:

cd graph_hunter_core
cargo test

Build for production:

cd app
npm run tauri build

Usage

Minimal run: start the app, load a log file, and hunt.

cd app && npm run tauri dev

Then in the UI: create or select a session → Select Log File → choose a file from demo_data/ (or your own) → Auto-detect → load. Switch to Hunt Mode, build a hypothesis (or pick one from the ATT&CK catalog), and click Run. Results appear in the graph and in the hunt table when there are many paths.

Demo Data & Try It

Three attack simulation datasets are included in demo_data/:

File	Format	Scenario
`apt_attack_simulation.json`	Sysmon	APT kill chain: spearphishing, discovery, Mimikatz, PsExec, C2, exfiltration
`sentinel_attack_simulation.json`	Sentinel	Cloud-to-on-prem: brute-force DC, Azure AD abuse, lateral movement, beacon, exfiltration
`generic_csv_logs.csv`	CSV	Firewall/proxy logs: normal + C2, SMB lateral, exfiltration attempts

Quick run:

Start the app: npm run tauri dev (from app/).
Create or select a session; choose Auto-detect (or a specific format), then load a demo file.
Open Hunt Mode and build a hypothesis, e.g.:
- User →[Execute]→ Process →[Write]→ File (malware drop)
- User →[Auth]→ Host (lateral auth)
- Host →[Connect]→ IP (C2)
- Process →[Spawn]→ Process (parent-child chains)
- Or pick a pattern from the ATT&CK catalog.
Switch to Explorer Mode to search IOCs and expand neighborhoods; use Events, Heatmap, and Timeline for context.

Real-world datasets (OTRF/Mordor, Splunk attack_data)

For large-scale testing with real attack telemetry, see demo_data/DOWNLOAD_REAL_DATA.md for download and conversion instructions (OTRF Security-Datasets, Mordor, Splunk attack_data).

Privacy & data

All processing is local. Logs are read from files you select; no data is sent to external services. Sessions and notes are stored in your OS application data directory. No telemetry or analytics are included.

Core Engine Details

The engine provides temporal pattern matching (DFS with causal monotonicity), time-window filtering, 5-component endogenous anomaly scoring (optional GNN), k-simplicity for path constraints, parallel parsing (Rayon), and entity/relation deduplication. Entity and relation types, and full module descriptions, are in the documentation.

Full details: Architecture (core modules and data flow); Hypothesis & catalog (DSL, k-simplicity); Log formats (entity and relation types).

HTTP API & MCP (AI integration)

When the desktop app is running with a session loaded, it exposes an HTTP API on 127.0.0.1:37891 (configurable via GRAPHHUNTER_API_PORT). This allows external tools to query the graph (entity types, search, expand nodes, run hunts, create notes) without using the UI. The API is protected by token authentication: at startup the app prints GRAPHHUNTER_API_TOKEN=<uuid> to the console; clients (e.g. the MCP server) must send this token (e.g. via Authorization: Bearer <token> or the GRAPHHUNTER_API_TOKEN env var) or requests return 401 Unauthorized.

The graph-hunter-mcp package is an MCP (Model Context Protocol) server that turns these operations into tools for AI assistants (e.g. Claude Code). You can ask the AI to hunt for malicious paths, expand nodes, or summarize findings while the app holds the session and graph.

Prerequisite	Description
App running	Start the Tauri app and load or create a session with data.
API token	Copy `GRAPHHUNTER_API_TOKEN` from the app startup log into your MCP config `env` so the MCP can authenticate.
MCP config	Add the `graph-hunter-mcp` server to your MCP client pointing at the app’s API URL.

Usage sample — Once the MCP is connected, you can ask the AI assistant in natural language to run hunts and explore the graph. For example:

"Use Graph Hunter to find any user who logged into a Host and then ran a suspicious process that wrote to the System32 folder."

The assistant will translate your request into different searchs, the appropriate hypothesis (e.g. User -[Auth]-> Host -[Execute]-> Process -[Write]-> File with filters) and run the hunt.

Demo

Quick setup: See graph-hunter-mcp/README.md for install, mcp.json example, tool list, and troubleshooting (firewall, port, 401, session required).

Screenshots

Description	Link
Hunt mode — Hypothesis builder, DSL, and hunt results with graph and path table
Explorer + graph — IOC search, neighborhood expansion, and graph canvas
Hypothesis & ATT&CK catalog — Step builder and one-click catalog templates
Events, Heatmap, Timeline — Event list, entity/relation heatmap, temporal view
GNN Threat Model — ONNX model load, k-hop config, Compute Scores (optional)
AI Analysis — AI-assisted threat hunting; natural-language hunts

License

This project is licensed under the GNU General Public License v3.0 — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
app		app
demo_data		demo_data
docs		docs
gateway		gateway
graph-hunter-mcp		graph-hunter-mcp
graph_hunter_cli		graph_hunter_cli
graph_hunter_core		graph_hunter_core
training		training
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
TODO.md		TODO.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

About

Why Graph-Based Hunting?

How It Works

Key Features

Supported Log Formats

SIEM Integrations

Hypothesis DSL & ATT&CK Catalog

GNN Threat Scoring

Architecture

Installation

Usage

Demo Data & Try It

Privacy & data

Core Engine Details

HTTP API & MCP (AI integration)

Screenshots

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

About

Why Graph-Based Hunting?

How It Works

Key Features

Supported Log Formats

SIEM Integrations

Hypothesis DSL & ATT&CK Catalog

GNN Threat Scoring

Architecture

Installation

Usage

Demo Data & Try It

Privacy & data

Core Engine Details

HTTP API & MCP (AI integration)

Screenshots

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages