SecMask: Mixture-of-Experts Secret Detection

Production-ready secret detection and masking using Mixture-of-Experts (MoE) architecture.

SecMask combines two specialized NER models—a fast expert (DistilBERT, 512 tokens) for short contexts and a long expert (Longformer, 2048 tokens) for documents—with intelligent routing to achieve high accuracy secret detection at low latency. Choose between rule-based routing or learned MoE with a tiny 12KB gating network.

Note: This repository contains inference code and documentation. Models are hosted separately on HuggingFace.

🎯 Overview

SecMask detects and redacts sensitive information across multiple secret types:

AWS Access Keys (AKIA...)
GitHub Personal Access Tokens
JWT Tokens
API Keys
PEM Certificate Blocks
Kubernetes Secrets
Database Credentials

Architecture Options

SecMask offers two routing strategies:

Option 1: Heuristic Routing (Rule-Based)

Uses hand-crafted rules (token count, PEM blocks, etc.)
Fast and explainable
No additional model needed
92.7% routing accuracy

Option 2: Learned Routing (True MoE)

Uses a trained 12KB gating network
Learns optimal routing from data
Combines both expert outputs with learned weights
100% training accuracy, 92.7% test accuracy
Only +0.19ms latency overhead

Key Features

✅ High Precision: 92.3% precision with Fast + Filters configuration
✅ Good Recall: 80.0% recall (F1 Score: 0.857)
✅ Fast Inference: 11ms P50 latency on CPU
✅ High Throughput: 84 req/s on CPU
✅ Dual Routing Modes: Rule-based or learned MoE
✅ Multi-Stage Pipeline: NER models + deterministic filters for comprehensive coverage
✅ Production Ready: Tested, benchmarked, and documented
✅ Easy Integration: Simple CLI and Python API
✅ Tiny MoE Gate: Only 12KB gating network (3,042 parameters)

Performance Note: The recommended configuration (Fast Expert + Filters) achieves 92.3% precision and 80.0% recall (F1: 0.857), competitive with commercial secret scanning tools. Post-processing filters improve precision by +12.3%. See BENCHMARK_RESULTS.md and CONFIGURATION_GUIDE.md for detailed metrics and usage recommendations.

Model Components

This system uses 3 separate models (download as needed):

Component	HuggingFace Repo	Size	Purpose
Fast Expert	`andrewandrewsen/distilbert-secret-masker`	265MB	Short texts (≤512 tokens)
Long Expert	`andrewandrewsen/longformer-secret-masker`	592MB	Long documents (≤2048 tokens)
MoE Gate (Optional)	`andrewandrewsen/secretmask-gate`	12KB	Learned routing weights

🚀 Quick Start

Installation

# Install dependencies
pip install transformers torch

# Clone repository (contains inference code)
git clone https://github.com/andrewandrewsen/secmask.git
cd secmask

Basic Usage

Heuristic Routing (Rule-Based):

# Mask secrets in text using rule-based routing
python infer_moe.py \
    --text "My AWS key: AKIAIOSFODNN7EXAMPLE" \
    --fast-model andrewandrewsen/distilbert-secret-masker \
    --routing-mode heuristic \
    --tau 0.80

# Output: My AWS key: [SECRET]

Learned Routing (True MoE):

# Mask secrets using learned gating network
python infer_moe.py \
    --text "My AWS key: AKIAIOSFODNN7EXAMPLE" \
    --fast-model andrewandrewsen/distilbert-secret-masker \
    --long-model andrewandrewsen/longformer-secret-masker \
    --routing-mode learned \
    --gate-model andrewandrewsen/secretmask-gate \
    --tau 0.80

# Output: My AWS key: [SECRET]

Output:

My AWS key: [SECRET]

Python API:

from infer_moe import mask_text_moe

# Heuristic routing
result = mask_text_moe(
    "AWS key: AKIAIOSFODNN7EXAMPLE and password: hunter2",
    fast_model_dir="andrewandrewsen/distilbert-secret-masker",
    tau=0.80,
    routing_mode="heuristic"
)
print(result)  # "AWS key: [SECRET] and password: [SECRET]"

# Learned MoE routing
result = mask_text_moe(
    "AWS key: AKIAIOSFODNN7EXAMPLE and password: hunter2",
    fast_model_dir="andrewandrewsen/distilbert-secret-masker",
    long_model_dir="andrewandrewsen/longformer-secret-masker",
    tau=0.80,
    routing_mode="learned",
    gate_model_path="andrewandrewsen/secretmask-gate"
)
print(result)  # "AWS key: [SECRET] and password: [SECRET]"

Process Files:

# With heuristic routing
python infer_moe.py \
    --in config.yaml \
    --fast-model andrewandrewsen/distilbert-secret-masker \
    --routing-mode heuristic \
    --tau 0.80

# With learned MoE routing
python infer_moe.py \
    --in config.yaml \
    --fast-model andrewandrewsen/distilbert-secret-masker \
    --long-model andrewandrewsen/longformer-secret-masker \
    --routing-mode learned \
    --gate-model andrewandrewsen/secretmask-gate \
    --tau 0.80

📊 Performance Highlights

Secret Detection Performance (Fast + Filters Configuration)

Metric	Value
F1 Score	0.857
Precision	92.3%
Recall	80.0%
P50 Latency	11ms
P90 Latency	14ms
P99 Latency	17ms
Throughput	84 req/s (CPU)

Note: Metrics from the recommended Fast Expert + Filters configuration at τ=0.80. Post-processing filters improve precision by +12.3%. See BENCHMARK_RESULTS.md for comprehensive benchmarks and CONFIGURATION_GUIDE.md for usage recommendations.

Configuration Comparison

Configuration	Precision	Recall	F1 Score	Use Case
Fast + Filters (RECOMMENDED)	92.3%	80.0%	0.857	General purpose
Full MoE (Fast+Long+Filters)	90.9%	76.9%	0.833	Long documents
Fast NER Only	80.0%	80.0%	0.800	Development/testing

Routing Performance

Metric	Heuristic	Learned MoE	Difference
Routing Accuracy	92.7%	92.7%	Equal
Fast Expert Usage	92.7%	92.7%	Equal
Long Expert Usage	7.3%	7.3%	Equal
Latency Overhead	0.065ms	0.256ms	+0.19ms
Model Size	0KB	12KB	+12KB
Training Accuracy	N/A	100%	-

Key Insight: Learned MoE achieves identical routing decisions to heuristics with minimal overhead, validating the rule-based approach while providing a learned alternative.

See BENCHMARK_RESULTS.md for detailed performance analysis and CONFIGURATION_GUIDE.md for configuration-specific guidance.

🏗️ Architecture

SecMask uses a Mixture-of-Experts (MoE) approach with flexible routing:

Input Text
    ↓
┌─────────────┐
│   Chunking  │  Split long documents (480 token chunks)
└──────┬──────┘
       ↓
┌─────────────┐
│   Router    │  Choose mode:
│  Heuristic  │  • Heuristic: Rule-based
│     OR      │  • Learned: 12KB gating network
│  Learned    │
└──────┬──────┘
       ↓
   ┌───┴───┐
   ↓       ↓
┌──────┐ ┌──────┐
│ Fast │ │ Long │  Fast: DistilBERT (512 tokens, 265MB)
│Expert│ │Expert│  Long: Longformer (2048 tokens, 592MB)
└───┬──┘ └──┬───┘
    ↓       ↓
    └───┬───┘
        ↓
┌────────────┐
│  Combiner  │  Heuristic: single output
│            │  Learned: weighted combination
└─────┬──────┘
      ↓
┌────────────┐
│Post-Filters│  Guaranteed patterns (AKIA, github_pat_)
└─────┬──────┘
      ↓
Masked Output

Routing Strategies

Heuristic Routing (Rule-Based)

The router selects the optimal expert based on:

Token Count: >480 tokens → Long Expert
Pattern Detection: PEM blocks, K8s manifests → Long Expert
Entropy Analysis: High entropy → Long Expert (likely encoded secrets)
Structure: YAML/JSON → Long Expert (configuration files)
Default: Fast Expert (lower latency)

Confidence Escalation: If Fast Expert returns low confidence (<0.85), automatically escalate to Long Expert for higher accuracy.

Learned Routing (True MoE)

Uses a trained 12KB gating network (3,042 parameters):

Extract 10 features from text:
- Token count, entropy, has PEM, has K8s secret
- AWS/GitHub/JWT pattern counts
- Base64 count, line count, avg line length
Gating network predicts weights: [w_fast, w_long] (sum to 1.0)
- Architecture: 3-layer MLP (10 → 64 → 32 → 2)
- Training: 100% validation accuracy on 6,000 examples

Run both experts and combine outputs using learned weights:

final_output = w_fast * fast_expert_output + w_long * long_expert_output

Performance: Same routing accuracy as heuristics (92.7%) with only +0.19ms overhead

📦 Installation

Prerequisites

Python 3.11+
PyTorch 2.0+
transformers 4.30+

Using Conda (Recommended)

# Create environment
conda create -n secmask python=3.11
conda activate secmask

# Install dependencies
pip install -r requirements.txt

Using venv

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

GPU Support

CUDA (NVIDIA):

pip install torch --index-url https://download.pytorch.org/whl/cu118

MPS (Apple Silicon):

# PyTorch automatically detects and uses MPS
# No additional installation needed

🔧 Usage

CLI Options

python infer_moe.py [OPTIONS]

Options:
  --text TEXT                  Text to mask (or use --in for file)
  --in FILE                   Input file path
  --fast-model MODEL          HuggingFace model ID or local path
  --long-model MODEL          Long expert model (optional)
  --tau FLOAT                 Detection threshold (0.0-1.0, default: 0.80)
  --token TOKEN               HuggingFace token for private models
  --no-filters                Disable post-processing filters
  --no-escalation            Disable confidence-based escalation
  --device DEVICE            Device: cpu, cuda, mps (auto-detect if not set)
  --stats                    Show routing and performance statistics

Python API

Basic Masking:

from infer_moe import mask_text_moe

masked_text = mask_text_moe(
    text="Your text here",
    fast_model_dir="andrewandrewsen/distilbert-secret-masker",
    long_model_dir="andrewandrewsen/longformer-secret-masker",  # Optional
    tau=0.80,
    enable_filters=True,
    enable_escalation=True,
    device="cpu"  # or "cuda", "mps"
)

With Statistics:

from infer_moe import mask_text_moe_with_stats

result = mask_text_moe_with_stats(
    text="Your text here",
    fast_model_dir="andrewandrewsen/distilbert-secret-masker",
    tau=0.80
)

print(result["masked_text"])
print(f"Chunks processed: {result['num_chunks']}")
print(f"Fast/Long routing: {result['num_fast']}/{result['num_long']}")
print(f"Escalated: {result['num_escalated']}")

Batch Processing

# Process multiple files
for file in logs/*.log; do
    python infer_moe.py --in "$file" \
        --fast-model andrewandrewsen/distilbert-secret-masker > "${file}.masked"
done

Piping

# From stdin
cat config.yaml | python infer_moe.py \
    --fast-model andrewandrewsen/distilbert-secret-masker

# Git diff scanning
git diff | python infer_moe.py \
    --fast-model andrewandrewsen/distilbert-secret-masker

🧪 Testing

Run the comprehensive test suite:

# Install test dependencies
pip install pytest

# Run all tests
pytest tests/ -v

# Run integration tests
python test_moe_comprehensive.py

Expected Output:

✅ Test 1: Short AWS Key - PASSED
✅ Test 2: PEM Block (Long Expert) - PASSED
✅ Test 3: Multiple Secrets - PASSED
✅ Test 4: Clean Text (No False Positives) - PASSED

🎉 All Tests Passed!

📖 Documentation

BENCHMARKS.md - Performance metrics, hardware comparisons
BENCHMARK_RESULTS.md - Latest comprehensive benchmark results
CONFIGURATION_GUIDE.md - Configuration recommendations and best practices
USE_CASES.md - Real-world applications with code examples
EXAMPLES.md - Quick reference code snippets
DEPLOYMENT.md - Production deployment guide
FAQ.md - Frequently asked questions
CONTRIBUTING.md - Contribution guidelines

🔒 Security Guarantees

SecMask provides multiple layers of protection:

NER Models: Trained to detect secrets in context (92.3% precision, 80% recall)
Post-Processing Filters: Deterministic patterns for guaranteed detection (PEM blocks, K8s secrets, AWS patterns)
Configurable Thresholds: Adjust tau to balance precision vs recall for your use case

See CONFIGURATION_GUIDE.md for security best practices and recommended configurations.

🚀 Use Cases

CI/CD Pipelines: Pre-commit hooks, GitHub Actions
Log Sanitization: Real-time and batch log processing
Configuration Auditing: Scan K8s manifests, Terraform files
API Security: Filter secrets from API responses
Documentation: Clean README files before open-sourcing
Security Scanning: Repository audits, compliance checks

See USE_CASES.md for detailed examples.

🎓 How It Works

Training Data

Models trained on 2000+ examples per expert:

AWS keys, GitHub tokens, JWTs, API keys
PEM certificate blocks
Kubernetes secrets, database credentials
Synthetic + real-world samples

Models

Fast Expert (andrewandrewsen/distilbert-secret-masker):

Base: DistilBERT-base-uncased
Max Context: 512 tokens
Size: 268MB
Speed: ~6ms avg latency

Long Expert (andrewandrewsen/longformer-secret-masker):

Base: Longformer-base-4096
Max Context: 2048 tokens (trained on M3 Pro)
Size: 592MB
Speed: ~12ms avg latency

Post-Processing Filters

Deterministic filters for guaranteed detection:

AWS Key pattern: AKIA[0-9A-Z]{16}
GitHub PAT: github_pat_[0-9a-zA-Z]{22}_[0-9a-zA-Z]{59}
JWT: eyJ[A-Za-z0-9-_]+\.eyJ[A-Za-z0-9-_]+\.[A-Za-z0-9-_]+
PEM blocks: -----BEGIN .+-----

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas for contribution:

Support for new secret types
Performance optimizations
Additional deployment examples
Documentation improvements
Bug fixes

📄 License

MIT License - see LICENSE file for details.

Base Model Licenses:

Our fine-tuned models inherit and comply with Apache 2.0 license terms. The MIT license applies to the SecMask codebase, training scripts, and documentation.

🙏 Acknowledgments

Built with Transformers by Hugging Face
DistilBERT by Hugging Face (Apache 2.0)
Longformer by Allen Institute for AI (Apache 2.0)
Inspired by production secret scanning needs

Note: This is a learning/hobby project exploring Mixture-of-Experts architectures for secret detection. While functional and achieving competitive results (92.3% precision, 80% recall), it's not intended as an enterprise replacement for commercial tools. Use cases include development environments, personal projects, and educational purposes.

📬 Contact

GitHub Issues: andrewandrewsen/secmask/issues
Discussions: andrewandrewsen/secmask/discussions
HuggingFace: andrewandrewsen

⭐ Star History

If SecMask helps your project, consider giving it a star! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
tests		tests
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
BENCHMARK_RESULTS.md		BENCHMARK_RESULTS.md
CONFIGURATION_GUIDE.md		CONFIGURATION_GUIDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
EXAMPLES.md		EXAMPLES.md
FAQ.md		FAQ.md
LICENSE		LICENSE
README.md		README.md
STATUS_REPORT.md		STATUS_REPORT.md
USE_CASES.md		USE_CASES.md
benchmark.py		benchmark.py
chunking.py		chunking.py
eval_accuracy.py		eval_accuracy.py
eval_configs.py		eval_configs.py
eval_simple.py		eval_simple.py
features.py		features.py
filters.py		filters.py
infer_moe.py		infer_moe.py
requirements.txt		requirements.txt
router.py		router.py

Folders and files

Latest commit

History

Repository files navigation

SecMask: Mixture-of-Experts Secret Detection

🎯 Overview

Architecture Options

Option 1: Heuristic Routing (Rule-Based)

Option 2: Learned Routing (True MoE)

Key Features

Model Components

🚀 Quick Start

Installation

Basic Usage

📊 Performance Highlights

Secret Detection Performance (Fast + Filters Configuration)

Configuration Comparison

Routing Performance

🏗️ Architecture

Routing Strategies

Heuristic Routing (Rule-Based)

Learned Routing (True MoE)

📦 Installation

Prerequisites

Using Conda (Recommended)

Using venv

GPU Support

🔧 Usage

CLI Options

Python API

Batch Processing

Piping

🧪 Testing

📖 Documentation

🔒 Security Guarantees

🚀 Use Cases

🎓 How It Works

Training Data

Models

Post-Processing Filters

🤝 Contributing

📄 License

🙏 Acknowledgments

📬 Contact

⭐ Star History

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages