Production-ready secret detection and masking using Mixture-of-Experts (MoE) architecture.
SecMask combines two specialized NER models—a fast expert (DistilBERT, 512 tokens) for short contexts and a long expert (Longformer, 2048 tokens) for documents—with intelligent routing to achieve high accuracy secret detection at low latency. Choose between rule-based routing or learned MoE with a tiny 12KB gating network.
Note: This repository contains inference code and documentation. Models are hosted separately on HuggingFace.
SecMask detects and redacts sensitive information across multiple secret types:
- AWS Access Keys (AKIA...)
- GitHub Personal Access Tokens
- JWT Tokens
- API Keys
- PEM Certificate Blocks
- Kubernetes Secrets
- Database Credentials
SecMask offers two routing strategies:
- Uses hand-crafted rules (token count, PEM blocks, etc.)
- Fast and explainable
- No additional model needed
- 92.7% routing accuracy
- Uses a trained 12KB gating network
- Learns optimal routing from data
- Combines both expert outputs with learned weights
- 100% training accuracy, 92.7% test accuracy
- Only +0.19ms latency overhead
✅ High Precision: 92.3% precision with Fast + Filters configuration
✅ Good Recall: 80.0% recall (F1 Score: 0.857)
✅ Fast Inference: 11ms P50 latency on CPU
✅ High Throughput: 84 req/s on CPU
✅ Dual Routing Modes: Rule-based or learned MoE
✅ Multi-Stage Pipeline: NER models + deterministic filters for comprehensive coverage
✅ Production Ready: Tested, benchmarked, and documented
✅ Easy Integration: Simple CLI and Python API
✅ Tiny MoE Gate: Only 12KB gating network (3,042 parameters)
Performance Note: The recommended configuration (Fast Expert + Filters) achieves 92.3% precision and 80.0% recall (F1: 0.857), competitive with commercial secret scanning tools. Post-processing filters improve precision by +12.3%. See
BENCHMARK_RESULTS.mdandCONFIGURATION_GUIDE.mdfor detailed metrics and usage recommendations.
This system uses 3 separate models (download as needed):
| Component | HuggingFace Repo | Size | Purpose |
|---|---|---|---|
| Fast Expert | andrewandrewsen/distilbert-secret-masker |
265MB | Short texts (≤512 tokens) |
| Long Expert | andrewandrewsen/longformer-secret-masker |
592MB | Long documents (≤2048 tokens) |
| MoE Gate (Optional) | andrewandrewsen/secretmask-gate |
12KB | Learned routing weights |
# Install dependencies
pip install transformers torch
# Clone repository (contains inference code)
git clone https://github.com/andrewandrewsen/secmask.git
cd secmaskHeuristic Routing (Rule-Based):
# Mask secrets in text using rule-based routing
python infer_moe.py \
--text "My AWS key: AKIAIOSFODNN7EXAMPLE" \
--fast-model andrewandrewsen/distilbert-secret-masker \
--routing-mode heuristic \
--tau 0.80
# Output: My AWS key: [SECRET]Learned Routing (True MoE):
# Mask secrets using learned gating network
python infer_moe.py \
--text "My AWS key: AKIAIOSFODNN7EXAMPLE" \
--fast-model andrewandrewsen/distilbert-secret-masker \
--long-model andrewandrewsen/longformer-secret-masker \
--routing-mode learned \
--gate-model andrewandrewsen/secretmask-gate \
--tau 0.80
# Output: My AWS key: [SECRET]Output:
My AWS key: [SECRET]
Python API:
from infer_moe import mask_text_moe
# Heuristic routing
result = mask_text_moe(
"AWS key: AKIAIOSFODNN7EXAMPLE and password: hunter2",
fast_model_dir="andrewandrewsen/distilbert-secret-masker",
tau=0.80,
routing_mode="heuristic"
)
print(result) # "AWS key: [SECRET] and password: [SECRET]"
# Learned MoE routing
result = mask_text_moe(
"AWS key: AKIAIOSFODNN7EXAMPLE and password: hunter2",
fast_model_dir="andrewandrewsen/distilbert-secret-masker",
long_model_dir="andrewandrewsen/longformer-secret-masker",
tau=0.80,
routing_mode="learned",
gate_model_path="andrewandrewsen/secretmask-gate"
)
print(result) # "AWS key: [SECRET] and password: [SECRET]"Process Files:
# With heuristic routing
python infer_moe.py \
--in config.yaml \
--fast-model andrewandrewsen/distilbert-secret-masker \
--routing-mode heuristic \
--tau 0.80
# With learned MoE routing
python infer_moe.py \
--in config.yaml \
--fast-model andrewandrewsen/distilbert-secret-masker \
--long-model andrewandrewsen/longformer-secret-masker \
--routing-mode learned \
--gate-model andrewandrewsen/secretmask-gate \
--tau 0.80| Metric | Value |
|---|---|
| F1 Score | 0.857 |
| Precision | 92.3% |
| Recall | 80.0% |
| P50 Latency | 11ms |
| P90 Latency | 14ms |
| P99 Latency | 17ms |
| Throughput | 84 req/s (CPU) |
Note: Metrics from the recommended Fast Expert + Filters configuration at τ=0.80. Post-processing filters improve precision by +12.3%. See
BENCHMARK_RESULTS.mdfor comprehensive benchmarks andCONFIGURATION_GUIDE.mdfor usage recommendations.
| Configuration | Precision | Recall | F1 Score | Use Case |
|---|---|---|---|---|
| Fast + Filters (RECOMMENDED) | 92.3% | 80.0% | 0.857 | General purpose |
| Full MoE (Fast+Long+Filters) | 90.9% | 76.9% | 0.833 | Long documents |
| Fast NER Only | 80.0% | 80.0% | 0.800 | Development/testing |
| Metric | Heuristic | Learned MoE | Difference |
|---|---|---|---|
| Routing Accuracy | 92.7% | 92.7% | Equal |
| Fast Expert Usage | 92.7% | 92.7% | Equal |
| Long Expert Usage | 7.3% | 7.3% | Equal |
| Latency Overhead | 0.065ms | 0.256ms | +0.19ms |
| Model Size | 0KB | 12KB | +12KB |
| Training Accuracy | N/A | 100% | - |
Key Insight: Learned MoE achieves identical routing decisions to heuristics with minimal overhead, validating the rule-based approach while providing a learned alternative.
See BENCHMARK_RESULTS.md for detailed performance analysis and CONFIGURATION_GUIDE.md for configuration-specific guidance.
SecMask uses a Mixture-of-Experts (MoE) approach with flexible routing:
Input Text
↓
┌─────────────┐
│ Chunking │ Split long documents (480 token chunks)
└──────┬──────┘
↓
┌─────────────┐
│ Router │ Choose mode:
│ Heuristic │ • Heuristic: Rule-based
│ OR │ • Learned: 12KB gating network
│ Learned │
└──────┬──────┘
↓
┌───┴───┐
↓ ↓
┌──────┐ ┌──────┐
│ Fast │ │ Long │ Fast: DistilBERT (512 tokens, 265MB)
│Expert│ │Expert│ Long: Longformer (2048 tokens, 592MB)
└───┬──┘ └──┬───┘
↓ ↓
└───┬───┘
↓
┌────────────┐
│ Combiner │ Heuristic: single output
│ │ Learned: weighted combination
└─────┬──────┘
↓
┌────────────┐
│Post-Filters│ Guaranteed patterns (AKIA, github_pat_)
└─────┬──────┘
↓
Masked Output
The router selects the optimal expert based on:
- Token Count: >480 tokens → Long Expert
- Pattern Detection: PEM blocks, K8s manifests → Long Expert
- Entropy Analysis: High entropy → Long Expert (likely encoded secrets)
- Structure: YAML/JSON → Long Expert (configuration files)
- Default: Fast Expert (lower latency)
Confidence Escalation: If Fast Expert returns low confidence (<0.85), automatically escalate to Long Expert for higher accuracy.
Uses a trained 12KB gating network (3,042 parameters):
-
Extract 10 features from text:
- Token count, entropy, has PEM, has K8s secret
- AWS/GitHub/JWT pattern counts
- Base64 count, line count, avg line length
-
Gating network predicts weights:
[w_fast, w_long](sum to 1.0)- Architecture: 3-layer MLP (10 → 64 → 32 → 2)
- Training: 100% validation accuracy on 6,000 examples
-
Run both experts and combine outputs using learned weights:
final_output = w_fast * fast_expert_output + w_long * long_expert_output
-
Performance: Same routing accuracy as heuristics (92.7%) with only +0.19ms overhead
- Python 3.11+
- PyTorch 2.0+
- transformers 4.30+
# Create environment
conda create -n secmask python=3.11
conda activate secmask
# Install dependencies
pip install -r requirements.txtpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtCUDA (NVIDIA):
pip install torch --index-url https://download.pytorch.org/whl/cu118MPS (Apple Silicon):
# PyTorch automatically detects and uses MPS
# No additional installation neededpython infer_moe.py [OPTIONS]
Options:
--text TEXT Text to mask (or use --in for file)
--in FILE Input file path
--fast-model MODEL HuggingFace model ID or local path
--long-model MODEL Long expert model (optional)
--tau FLOAT Detection threshold (0.0-1.0, default: 0.80)
--token TOKEN HuggingFace token for private models
--no-filters Disable post-processing filters
--no-escalation Disable confidence-based escalation
--device DEVICE Device: cpu, cuda, mps (auto-detect if not set)
--stats Show routing and performance statisticsBasic Masking:
from infer_moe import mask_text_moe
masked_text = mask_text_moe(
text="Your text here",
fast_model_dir="andrewandrewsen/distilbert-secret-masker",
long_model_dir="andrewandrewsen/longformer-secret-masker", # Optional
tau=0.80,
enable_filters=True,
enable_escalation=True,
device="cpu" # or "cuda", "mps"
)With Statistics:
from infer_moe import mask_text_moe_with_stats
result = mask_text_moe_with_stats(
text="Your text here",
fast_model_dir="andrewandrewsen/distilbert-secret-masker",
tau=0.80
)
print(result["masked_text"])
print(f"Chunks processed: {result['num_chunks']}")
print(f"Fast/Long routing: {result['num_fast']}/{result['num_long']}")
print(f"Escalated: {result['num_escalated']}")# Process multiple files
for file in logs/*.log; do
python infer_moe.py --in "$file" \
--fast-model andrewandrewsen/distilbert-secret-masker > "${file}.masked"
done# From stdin
cat config.yaml | python infer_moe.py \
--fast-model andrewandrewsen/distilbert-secret-masker
# Git diff scanning
git diff | python infer_moe.py \
--fast-model andrewandrewsen/distilbert-secret-maskerRun the comprehensive test suite:
# Install test dependencies
pip install pytest
# Run all tests
pytest tests/ -v
# Run integration tests
python test_moe_comprehensive.pyExpected Output:
✅ Test 1: Short AWS Key - PASSED
✅ Test 2: PEM Block (Long Expert) - PASSED
✅ Test 3: Multiple Secrets - PASSED
✅ Test 4: Clean Text (No False Positives) - PASSED
🎉 All Tests Passed!
- BENCHMARKS.md - Performance metrics, hardware comparisons
- BENCHMARK_RESULTS.md - Latest comprehensive benchmark results
- CONFIGURATION_GUIDE.md - Configuration recommendations and best practices
- USE_CASES.md - Real-world applications with code examples
- EXAMPLES.md - Quick reference code snippets
- DEPLOYMENT.md - Production deployment guide
- FAQ.md - Frequently asked questions
- CONTRIBUTING.md - Contribution guidelines
SecMask provides multiple layers of protection:
- NER Models: Trained to detect secrets in context (92.3% precision, 80% recall)
- Post-Processing Filters: Deterministic patterns for guaranteed detection (PEM blocks, K8s secrets, AWS patterns)
- Configurable Thresholds: Adjust tau to balance precision vs recall for your use case
See CONFIGURATION_GUIDE.md for security best practices and recommended configurations.
- CI/CD Pipelines: Pre-commit hooks, GitHub Actions
- Log Sanitization: Real-time and batch log processing
- Configuration Auditing: Scan K8s manifests, Terraform files
- API Security: Filter secrets from API responses
- Documentation: Clean README files before open-sourcing
- Security Scanning: Repository audits, compliance checks
See USE_CASES.md for detailed examples.
Models trained on 2000+ examples per expert:
- AWS keys, GitHub tokens, JWTs, API keys
- PEM certificate blocks
- Kubernetes secrets, database credentials
- Synthetic + real-world samples
Fast Expert (andrewandrewsen/distilbert-secret-masker):
- Base: DistilBERT-base-uncased
- Max Context: 512 tokens
- Size: 268MB
- Speed: ~6ms avg latency
Long Expert (andrewandrewsen/longformer-secret-masker):
- Base: Longformer-base-4096
- Max Context: 2048 tokens (trained on M3 Pro)
- Size: 592MB
- Speed: ~12ms avg latency
Deterministic filters for guaranteed detection:
- AWS Key pattern:
AKIA[0-9A-Z]{16} - GitHub PAT:
github_pat_[0-9a-zA-Z]{22}_[0-9a-zA-Z]{59} - JWT:
eyJ[A-Za-z0-9-_]+\.eyJ[A-Za-z0-9-_]+\.[A-Za-z0-9-_]+ - PEM blocks:
-----BEGIN .+-----
We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas for contribution:
- Support for new secret types
- Performance optimizations
- Additional deployment examples
- Documentation improvements
- Bug fixes
MIT License - see LICENSE file for details.
Base Model Licenses:
- DistilBERT (
distilbert-base-uncased): Apache 2.0 - © Hugging Face - Longformer (
allenai/longformer-base-4096): Apache 2.0 - © Allen Institute for AI
Our fine-tuned models inherit and comply with Apache 2.0 license terms. The MIT license applies to the SecMask codebase, training scripts, and documentation.
- Built with Transformers by Hugging Face
- DistilBERT by Hugging Face (Apache 2.0)
- Longformer by Allen Institute for AI (Apache 2.0)
- Inspired by production secret scanning needs
Note: This is a learning/hobby project exploring Mixture-of-Experts architectures for secret detection. While functional and achieving competitive results (92.3% precision, 80% recall), it's not intended as an enterprise replacement for commercial tools. Use cases include development environments, personal projects, and educational purposes.
- GitHub Issues: andrewandrewsen/secmask/issues
- Discussions: andrewandrewsen/secmask/discussions
- HuggingFace: andrewandrewsen
If SecMask helps your project, consider giving it a star! ⭐