A structured framework to document and analyze benign tools abused for data exfiltration. It highlights detection-relevant features, stealth techniques, and forensic artifacts to support threat hunting, detection engineering, and post-incident analysis.
This project provides a centralized knowledge base of legitimate tools that have been observed—or have the potential—to be used for data exfiltration in adversary operations.
While these tools are not inherently malicious, they are often repurposed by threat actors to exfiltrate sensitive information from compromised environments. The goal is to document these tools in a consistent format, making the data easily accessible for defensive use.
This project uses automated validation to ensure YAML files are properly formatted and contain all required fields.
-
Generic YAML Linting (yamllint)
- Enforces consistent YAML formatting
- Catches syntax errors and common issues
- Configuration in
.yamllint
-
Custom Schema Validation (
scripts/validate_yml.py)- Validates required top-level fields (Name, Description, Category, etc.)
- Ensures Forensics section contains all required fields
- Validates against project-specific JSON schema
Install Python dependencies:
pip install -r requirements.txtEnable pre-commit hooks:
pre-commit installPre-commit (automatic): Both yamllint and custom validation run automatically before each commit.
Manual testing:
# Run all validation checks
./scripts/test-validation.sh
# Or run individually:
yamllint .
python scripts/validate_yml.pyGitHub Actions: Every push and pull request automatically runs both validation checks to ensure quality standards.
- yamllint rules:
.yamllint(document-start, line-length, and comments spacing disabled) - Custom validation:
scripts/validate_yml.pyandYML-Schema.yml
Each tool is documented in its own YAML file, located in the /yml/ directory. These entries capture:
- Tool metadata (name, category, platform, execution method)
- Capabilities relevant to exfiltration
- Stealth techniques used to avoid detection
- Forensic artifacts left on disk or in memory
- Threat actor usage and external references
- Tags to support filtering and grouping
This format is designed to support both human review and programmatic consumption (e.g., automation, detection generation or AI projects).
exfiltration-framework/
├── yml/ # One YAML file per tool, structured for parsing
│ ├── awscli.yml
│ ├── pscp.yml
│ ├── restic.yml
│ ├── rclone.yml
│ ├── dropboxapi.yml
│ ├── syncthing.yml
│ ├── curl.yml
│ ├── powershell.yml
│ ├── azcopy.yml
│ └── s3browser.yml
│
├── LICENSE # Apache License 2.0
├── README.md # Project overview and usage
├── CONTRIBUTING.md # Contribution guidelines
Tools may include tags to help categorize them by behavior, usage, or context. Examples include:
By Origin:
nativethird-partycloud-based
By Execution Type:
cliguiapi
By Detection-Relevant Behavior:
masqueradingencrypted-transferscheduled-taskbackground-execution
By Threat Context:
ransomwareaptexfiltration-only
Contributions are welcome. If you would like to propose a new tool, improve an existing entry, or suggest new fields or tags, please refer to the contributing guidelines (coming soon).
To ensure consistency and avoid errors, all tool entries in the yml/ folder should be validated against the schema defined in YML-Schema.yml.
Before running the validation script, install the required Python libraries:
pip install pyyaml jsonschema
From the root of the repository, run:
python validate_yml.py
This will check all .yml files in the yml/ directory (excluding templates and meta files) and print the validation results.
A file is considered invalid if it is missing required fields, contains unsupported tags, or does not conform to the schema.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
This framework is intended for educational and defensive purposes only. All tools listed are legitimate and not inherently malicious. Their inclusion is based on publicly documented misuse by threat actors.
# =========================
# CATEGORIES: General classification of the tool's origin or deployment model
# =========================
categories:
- native # Built into the OS (e.g., PowerShell, certutil)
- third-party # Tools developed independently from the OS (e.g., rclone, curl)
- cloud-based # Tools designed for use with cloud services or platforms (e.g., AWS CLI, Dropbox API)
# =========================
# PLATFORMS: Operating systems supported by the tool
# =========================
platforms:
- windows # Microsoft Windows
- linux # Linux distributions
- macos # Apple macOS
# =========================
# EXECUTION: How the tool is typically executed or interfaced with
# =========================
execution:
- cli # Command-line interface
- gui # Graphical user interface
# =========================
# CAPABILITIES: Functional features relevant to exfiltration
# =========================
capabilities:
- file-sync # Continuous synchronization between folders or systems (e.g., rclone, Syncthing).
- cloud-sync # Sync or upload to cloud storage platforms like S3, Azure Blob, Dropbox.
- api-transfer # Upload via official or custom APIs (e.g., Dropbox API, AWS SDK).
- direct-to-cloud # Exfiltrates data directly to cloud endpoints without local staging.
- selective-upload # Allows filtering or targeting specific file types or directories.
- recursive-upload # Recursively uploads entire folder trees.
- credentialed-upload # Requires or supports authenticated uploads (e.g., tokens, IAM keys).
- anonymous-upload # Supports unauthenticated uploads (e.g., pre-signed URLs or public buckets).
- proxy-aware # Can route traffic through proxies to mask destination.
- silent-execution # Executes without user interaction or visible output (used in scripts or automation).
- portable-execution # Runs from non-standard paths or without installation (portable binaries).
- service-identity # Supports managed identities or service principals (e.g., AzCopy with Azure roles).
- user-agent-spoofing # Can spoof or customize the user-agent string in HTTP requests.
- endpoint-override # Allows setting custom or attacker-controlled endpoints (e.g., `--endpoint-url`).
- ftp-upload # Can exfiltrate via FTP protocol to external servers.
- header-exfiltration # Exfiltrates data inside HTTP headers (e.g., `X-Data:`).
- multipart-upload # Simulates browser-style form uploads (e.g., using `curl -F`).
# =========================
# FORENSICS: Artifacts and indicators that may appear on a compromised system
# =========================
forensics:
- binary-location # Known install or execution paths, especially outside standard directories.
- config-file-path # Presence of tool-specific configuration or credential files.
- command-line-flags # Flags or arguments used by attackers to trigger upload, sync, or stealth behavior.
- registry-entry # Registry keys used to auto-launch or persist the tool (Windows only).
- scheduled-task-created # Use of task scheduler or cron to automate tool execution.
- log-file-location # Logs written by the tool that may reveal execution or errors.
- network-indicator # Domain or API patterns associated with this tool’s upload behavior.
# =========================
# THREAT ACTORS: Types of actors known to abuse the tool
# =========================
threat-actors:
- apt # Advanced Persistent Threat groups
- ransomware # Ransomware gangs or affiliates