Save 30-50% on OpenAI & Anthropic API Costs
Drop-in replacement clients with automatic caching, compression & GPU acceleration.
git clone https://github.com/DeadManOfficial/token-optimization.git
cd token-optimization
pip install -r requirements.txt
# GPU support (optional)
pip install torch --index-url https://download.pytorch.org/whl/cu118from src.auto_optimizer import OptimizedAnthropic, OptimizedOpenAI
# Anthropic (Claude)
client = OptimizedAnthropic(use_gpu_embeddings=True)
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
# OpenAI (GPT)
client = OptimizedOpenAI(use_gpu_embeddings=True)
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
# Check savings
print(client.get_stats())| Feature | Description |
|---|---|
| Drop-in Replacement | No code changes needed |
| Smart Caching | MD5 + semantic deduplication |
| GPU Acceleration | CUDA-powered semantic search |
| Multi-Provider | Anthropic & OpenAI support |
| Analytics | Track savings & performance |
| Configuration | Savings | Use Case |
|---|---|---|
| CPU Caching | 20-30% | General usage |
| GPU Semantic | 40-50% | Agent frameworks |
| Mixed Workload | 30% avg | Production apps |
- NVIDIA GPU (RTX recommended)
- CUDA 11.8+
- 4GB+ VRAM
Falls back to CPU automatically if unavailable.
src/
├── auto_optimizer.py # Drop-in API wrappers
├── token_optimizer.py # 7 optimization techniques
└── gpu_embeddings.py # GPU semantic caching
- mcp-auditor — Security & compliance auditor for Claude
- claude-canvas — External monitor for Claude Code
- AI-Updates — Daily AI intelligence briefs
MIT