A set of lightweight, standard-library Python scripts designed to prepare data (codebases and network traffic) for AI analysis and security auditing.
Prerequisites: Python 3.x (No external libraries required).
Consolidates an entire Git repository or local directory into a single .txt file. Ideal for feeding context to LLMs.
Features:
- Automatically filters binaries and large files.
- Interactive prompt to exclude specific files by size.
- Supports GitHub URLs or local paths.
Usage:
python3 scraper.py
# Follow the interactive prompts to select source and exclusions.Optimizes HAR (network log) files for AI analysis by removing "bloat" (images, fonts, binary blobs) to save tokens while preserving request logic.
Usage:
-
Default (Best for Logic Analysis): Removes static assets, binary data, and timing metadata.
python3 harcleaner.py traffic.har
-
Keep CSS (For UI Analysis):
python3 harcleaner.py traffic.har --keep-css
-
Keep Everything (Just remove metadata):
python3 harcleaner.py traffic.har --keep-static --keep-css --keep-binary
Surgically removes or replaces specific secrets (tokens, passwords, API keys) from HAR files without breaking the JSON structure.
Usage:
-
Replace Secret: Finds the token and replaces it with
[REDACTED].python3 har_redact.py traffic.har "MY_SECRET_TOKEN" --replace -
Delete Line: Removes the specific JSON key/value pair containing the secret.
python3 har_redact.py traffic.har "MY_SECRET_TOKEN" --delete-line -
Delete Request: Removes the entire HTTP request/response entry if the secret is found anywhere inside it.
python3 har_redact.py traffic.har "MY_SECRET_TOKEN" --delete-req
Make scripts executable to run them directly from your terminal:
chmod +x scraper.py harcleaner.py har_redact.pyNow you can run them as ./scraper.py, etc.