Skip to content

AndrewAndrewsen/secmask

Repository files navigation

SecMask: Mixture-of-Experts Secret Detection

Python 3.11+ License: MIT HuggingFace Models

Production-ready secret detection and masking using Mixture-of-Experts (MoE) architecture.

SecMask combines two specialized NER models—a fast expert (DistilBERT, 512 tokens) for short contexts and a long expert (Longformer, 2048 tokens) for documents—with intelligent routing to achieve high accuracy secret detection at low latency. Choose between rule-based routing or learned MoE with a tiny 12KB gating network.

Note: This repository contains inference code and documentation. Models are hosted separately on HuggingFace.


🎯 Overview

SecMask detects and redacts sensitive information across multiple secret types:

  • AWS Access Keys (AKIA...)
  • GitHub Personal Access Tokens
  • JWT Tokens
  • API Keys
  • PEM Certificate Blocks
  • Kubernetes Secrets
  • Database Credentials

Architecture Options

SecMask offers two routing strategies:

Option 1: Heuristic Routing (Rule-Based)

  • Uses hand-crafted rules (token count, PEM blocks, etc.)
  • Fast and explainable
  • No additional model needed
  • 92.7% routing accuracy

Option 2: Learned Routing (True MoE)

  • Uses a trained 12KB gating network
  • Learns optimal routing from data
  • Combines both expert outputs with learned weights
  • 100% training accuracy, 92.7% test accuracy
  • Only +0.19ms latency overhead

Key Features

High Precision: 92.3% precision with Fast + Filters configuration
Good Recall: 80.0% recall (F1 Score: 0.857)
Fast Inference: 11ms P50 latency on CPU
High Throughput: 84 req/s on CPU
Dual Routing Modes: Rule-based or learned MoE
Multi-Stage Pipeline: NER models + deterministic filters for comprehensive coverage
Production Ready: Tested, benchmarked, and documented
Easy Integration: Simple CLI and Python API
Tiny MoE Gate: Only 12KB gating network (3,042 parameters)

Performance Note: The recommended configuration (Fast Expert + Filters) achieves 92.3% precision and 80.0% recall (F1: 0.857), competitive with commercial secret scanning tools. Post-processing filters improve precision by +12.3%. See BENCHMARK_RESULTS.md and CONFIGURATION_GUIDE.md for detailed metrics and usage recommendations.

Model Components

This system uses 3 separate models (download as needed):

Component HuggingFace Repo Size Purpose
Fast Expert andrewandrewsen/distilbert-secret-masker 265MB Short texts (≤512 tokens)
Long Expert andrewandrewsen/longformer-secret-masker 592MB Long documents (≤2048 tokens)
MoE Gate (Optional) andrewandrewsen/secretmask-gate 12KB Learned routing weights

🚀 Quick Start

Installation

# Install dependencies
pip install transformers torch

# Clone repository (contains inference code)
git clone https://github.com/andrewandrewsen/secmask.git
cd secmask

Basic Usage

Heuristic Routing (Rule-Based):

# Mask secrets in text using rule-based routing
python infer_moe.py \
    --text "My AWS key: AKIAIOSFODNN7EXAMPLE" \
    --fast-model andrewandrewsen/distilbert-secret-masker \
    --routing-mode heuristic \
    --tau 0.80

# Output: My AWS key: [SECRET]

Learned Routing (True MoE):

# Mask secrets using learned gating network
python infer_moe.py \
    --text "My AWS key: AKIAIOSFODNN7EXAMPLE" \
    --fast-model andrewandrewsen/distilbert-secret-masker \
    --long-model andrewandrewsen/longformer-secret-masker \
    --routing-mode learned \
    --gate-model andrewandrewsen/secretmask-gate \
    --tau 0.80

# Output: My AWS key: [SECRET]

Output:

My AWS key: [SECRET]

Python API:

from infer_moe import mask_text_moe

# Heuristic routing
result = mask_text_moe(
    "AWS key: AKIAIOSFODNN7EXAMPLE and password: hunter2",
    fast_model_dir="andrewandrewsen/distilbert-secret-masker",
    tau=0.80,
    routing_mode="heuristic"
)
print(result)  # "AWS key: [SECRET] and password: [SECRET]"

# Learned MoE routing
result = mask_text_moe(
    "AWS key: AKIAIOSFODNN7EXAMPLE and password: hunter2",
    fast_model_dir="andrewandrewsen/distilbert-secret-masker",
    long_model_dir="andrewandrewsen/longformer-secret-masker",
    tau=0.80,
    routing_mode="learned",
    gate_model_path="andrewandrewsen/secretmask-gate"
)
print(result)  # "AWS key: [SECRET] and password: [SECRET]"

Process Files:

# With heuristic routing
python infer_moe.py \
    --in config.yaml \
    --fast-model andrewandrewsen/distilbert-secret-masker \
    --routing-mode heuristic \
    --tau 0.80

# With learned MoE routing
python infer_moe.py \
    --in config.yaml \
    --fast-model andrewandrewsen/distilbert-secret-masker \
    --long-model andrewandrewsen/longformer-secret-masker \
    --routing-mode learned \
    --gate-model andrewandrewsen/secretmask-gate \
    --tau 0.80

📊 Performance Highlights

Secret Detection Performance (Fast + Filters Configuration)

Metric Value
F1 Score 0.857
Precision 92.3%
Recall 80.0%
P50 Latency 11ms
P90 Latency 14ms
P99 Latency 17ms
Throughput 84 req/s (CPU)

Note: Metrics from the recommended Fast Expert + Filters configuration at τ=0.80. Post-processing filters improve precision by +12.3%. See BENCHMARK_RESULTS.md for comprehensive benchmarks and CONFIGURATION_GUIDE.md for usage recommendations.

Configuration Comparison

Configuration Precision Recall F1 Score Use Case
Fast + Filters (RECOMMENDED) 92.3% 80.0% 0.857 General purpose
Full MoE (Fast+Long+Filters) 90.9% 76.9% 0.833 Long documents
Fast NER Only 80.0% 80.0% 0.800 Development/testing

Routing Performance

Metric Heuristic Learned MoE Difference
Routing Accuracy 92.7% 92.7% Equal
Fast Expert Usage 92.7% 92.7% Equal
Long Expert Usage 7.3% 7.3% Equal
Latency Overhead 0.065ms 0.256ms +0.19ms
Model Size 0KB 12KB +12KB
Training Accuracy N/A 100% -

Key Insight: Learned MoE achieves identical routing decisions to heuristics with minimal overhead, validating the rule-based approach while providing a learned alternative.

See BENCHMARK_RESULTS.md for detailed performance analysis and CONFIGURATION_GUIDE.md for configuration-specific guidance.


🏗️ Architecture

SecMask uses a Mixture-of-Experts (MoE) approach with flexible routing:

Input Text
    ↓
┌─────────────┐
│   Chunking  │  Split long documents (480 token chunks)
└──────┬──────┘
       ↓
┌─────────────┐
│   Router    │  Choose mode:
│  Heuristic  │  • Heuristic: Rule-based
│     OR      │  • Learned: 12KB gating network
│  Learned    │
└──────┬──────┘
       ↓
   ┌───┴───┐
   ↓       ↓
┌──────┐ ┌──────┐
│ Fast │ │ Long │  Fast: DistilBERT (512 tokens, 265MB)
│Expert│ │Expert│  Long: Longformer (2048 tokens, 592MB)
└───┬──┘ └──┬───┘
    ↓       ↓
    └───┬───┘
        ↓
┌────────────┐
│  Combiner  │  Heuristic: single output
│            │  Learned: weighted combination
└─────┬──────┘
      ↓
┌────────────┐
│Post-Filters│  Guaranteed patterns (AKIA, github_pat_)
└─────┬──────┘
      ↓
Masked Output

Routing Strategies

Heuristic Routing (Rule-Based)

The router selects the optimal expert based on:

  1. Token Count: >480 tokens → Long Expert
  2. Pattern Detection: PEM blocks, K8s manifests → Long Expert
  3. Entropy Analysis: High entropy → Long Expert (likely encoded secrets)
  4. Structure: YAML/JSON → Long Expert (configuration files)
  5. Default: Fast Expert (lower latency)

Confidence Escalation: If Fast Expert returns low confidence (<0.85), automatically escalate to Long Expert for higher accuracy.

Learned Routing (True MoE)

Uses a trained 12KB gating network (3,042 parameters):

  1. Extract 10 features from text:

    • Token count, entropy, has PEM, has K8s secret
    • AWS/GitHub/JWT pattern counts
    • Base64 count, line count, avg line length
  2. Gating network predicts weights: [w_fast, w_long] (sum to 1.0)

    • Architecture: 3-layer MLP (10 → 64 → 32 → 2)
    • Training: 100% validation accuracy on 6,000 examples
  3. Run both experts and combine outputs using learned weights:

    final_output = w_fast * fast_expert_output + w_long * long_expert_output
  4. Performance: Same routing accuracy as heuristics (92.7%) with only +0.19ms overhead


📦 Installation

Prerequisites

  • Python 3.11+
  • PyTorch 2.0+
  • transformers 4.30+

Using Conda (Recommended)

# Create environment
conda create -n secmask python=3.11
conda activate secmask

# Install dependencies
pip install -r requirements.txt

Using venv

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

GPU Support

CUDA (NVIDIA):

pip install torch --index-url https://download.pytorch.org/whl/cu118

MPS (Apple Silicon):

# PyTorch automatically detects and uses MPS
# No additional installation needed

🔧 Usage

CLI Options

python infer_moe.py [OPTIONS]

Options:
  --text TEXT                  Text to mask (or use --in for file)
  --in FILE                   Input file path
  --fast-model MODEL          HuggingFace model ID or local path
  --long-model MODEL          Long expert model (optional)
  --tau FLOAT                 Detection threshold (0.0-1.0, default: 0.80)
  --token TOKEN               HuggingFace token for private models
  --no-filters                Disable post-processing filters
  --no-escalation            Disable confidence-based escalation
  --device DEVICE            Device: cpu, cuda, mps (auto-detect if not set)
  --stats                    Show routing and performance statistics

Python API

Basic Masking:

from infer_moe import mask_text_moe

masked_text = mask_text_moe(
    text="Your text here",
    fast_model_dir="andrewandrewsen/distilbert-secret-masker",
    long_model_dir="andrewandrewsen/longformer-secret-masker",  # Optional
    tau=0.80,
    enable_filters=True,
    enable_escalation=True,
    device="cpu"  # or "cuda", "mps"
)

With Statistics:

from infer_moe import mask_text_moe_with_stats

result = mask_text_moe_with_stats(
    text="Your text here",
    fast_model_dir="andrewandrewsen/distilbert-secret-masker",
    tau=0.80
)

print(result["masked_text"])
print(f"Chunks processed: {result['num_chunks']}")
print(f"Fast/Long routing: {result['num_fast']}/{result['num_long']}")
print(f"Escalated: {result['num_escalated']}")

Batch Processing

# Process multiple files
for file in logs/*.log; do
    python infer_moe.py --in "$file" \
        --fast-model andrewandrewsen/distilbert-secret-masker > "${file}.masked"
done

Piping

# From stdin
cat config.yaml | python infer_moe.py \
    --fast-model andrewandrewsen/distilbert-secret-masker

# Git diff scanning
git diff | python infer_moe.py \
    --fast-model andrewandrewsen/distilbert-secret-masker

🧪 Testing

Run the comprehensive test suite:

# Install test dependencies
pip install pytest

# Run all tests
pytest tests/ -v

# Run integration tests
python test_moe_comprehensive.py

Expected Output:

✅ Test 1: Short AWS Key - PASSED
✅ Test 2: PEM Block (Long Expert) - PASSED
✅ Test 3: Multiple Secrets - PASSED
✅ Test 4: Clean Text (No False Positives) - PASSED

🎉 All Tests Passed!

📖 Documentation


🔒 Security Guarantees

SecMask provides multiple layers of protection:

  1. NER Models: Trained to detect secrets in context (92.3% precision, 80% recall)
  2. Post-Processing Filters: Deterministic patterns for guaranteed detection (PEM blocks, K8s secrets, AWS patterns)
  3. Configurable Thresholds: Adjust tau to balance precision vs recall for your use case

See CONFIGURATION_GUIDE.md for security best practices and recommended configurations.


🚀 Use Cases

  • CI/CD Pipelines: Pre-commit hooks, GitHub Actions
  • Log Sanitization: Real-time and batch log processing
  • Configuration Auditing: Scan K8s manifests, Terraform files
  • API Security: Filter secrets from API responses
  • Documentation: Clean README files before open-sourcing
  • Security Scanning: Repository audits, compliance checks

See USE_CASES.md for detailed examples.


🎓 How It Works

Training Data

Models trained on 2000+ examples per expert:

  • AWS keys, GitHub tokens, JWTs, API keys
  • PEM certificate blocks
  • Kubernetes secrets, database credentials
  • Synthetic + real-world samples

Models

Fast Expert (andrewandrewsen/distilbert-secret-masker):

  • Base: DistilBERT-base-uncased
  • Max Context: 512 tokens
  • Size: 268MB
  • Speed: ~6ms avg latency

Long Expert (andrewandrewsen/longformer-secret-masker):

  • Base: Longformer-base-4096
  • Max Context: 2048 tokens (trained on M3 Pro)
  • Size: 592MB
  • Speed: ~12ms avg latency

Post-Processing Filters

Deterministic filters for guaranteed detection:

  • AWS Key pattern: AKIA[0-9A-Z]{16}
  • GitHub PAT: github_pat_[0-9a-zA-Z]{22}_[0-9a-zA-Z]{59}
  • JWT: eyJ[A-Za-z0-9-_]+\.eyJ[A-Za-z0-9-_]+\.[A-Za-z0-9-_]+
  • PEM blocks: -----BEGIN .+-----

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas for contribution:

  • Support for new secret types
  • Performance optimizations
  • Additional deployment examples
  • Documentation improvements
  • Bug fixes

📄 License

MIT License - see LICENSE file for details.

Base Model Licenses:

  • DistilBERT (distilbert-base-uncased): Apache 2.0 - © Hugging Face
  • Longformer (allenai/longformer-base-4096): Apache 2.0 - © Allen Institute for AI

Our fine-tuned models inherit and comply with Apache 2.0 license terms. The MIT license applies to the SecMask codebase, training scripts, and documentation.


🙏 Acknowledgments

  • Built with Transformers by Hugging Face
  • DistilBERT by Hugging Face (Apache 2.0)
  • Longformer by Allen Institute for AI (Apache 2.0)
  • Inspired by production secret scanning needs

Note: This is a learning/hobby project exploring Mixture-of-Experts architectures for secret detection. While functional and achieving competitive results (92.3% precision, 80% recall), it's not intended as an enterprise replacement for commercial tools. Use cases include development environments, personal projects, and educational purposes.


📬 Contact


⭐ Star History

If SecMask helps your project, consider giving it a star! ⭐

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages