A lightweight Google Gemini API-compatible proxy server that allows you to call OpenAI-compatible LLM services using the Gemini API format.
Currently, Gemini CLI cannot easily use models other than Gemini. This Python tool was developed to meet this need.
Usage Steps:
- Modify
config.jsonto add your API provider configurations. - Install dependencies and run
python gemini_proxy_for_kimi.py. - When prompted, select the API provider you want to use.
- Set environment variables:
export GOOGLE_GEMINI_BASE_URL=http://localhost:8000/ export GEMINI_API_KEY=sk-1234
- In Gemini CLI, use
/authand select "Use Gemini API Key".
β οΈ Important Note: The current version has only been fully tested and optimized on Moonshot Kimi. Other OpenAI-compatible services require your own testing and adjustments.
δΈζζζ‘£ | Chinese Documentation
- π Complete API Compatibility - Supports all Gemini API endpoints
- π Intelligent Format Conversion - Seamless Gemini β OpenAI format conversion
- π Streaming Response Support - Complete Server-Sent Events streaming processing
- π οΈ Function Calling Support - Bidirectional conversion of tool calls
- π£οΈ Multi-turn Conversations - Complete conversation history handling
- π Model Mapping - Flexible model name mapping configuration
- π Detailed Logging - Configurable access logs and detailed request logs
- βοΈ Configuration Files - Unified management through JSON configuration files
- π Automatic Retries - Automatically retries requests on failure
- Python 3.8+
- Dependencies:
fastapi,uvicorn,openai
pip install -r requirements.txtCreate a config.json file, which now supports multiple providers:
{
"providers": [
{
"name": "Moonshot",
"openai_api_key": "sk-1234",
"openai_base_url": "https://api.moonshot.cn/v1",
"daily_limit": -1,
"model_mapping": {
"gemini-2.5-pro": "kimi-k2-0711-preview",
"gemini-2.5-flash": "moonshot-v1-auto"
},
"default_openai_model": "kimi-k2-0711-preview"
},
{
"name": "OpenAI",
"openai_api_key": "sk-5678",
"openai_base_url": "https://api.openai.com/v1",
"daily_limit": -1,
"model_mapping": {
"gemini-2.5-pro": "gpt-4o",
"gemini-2.5-flash": "gpt-4o-mini"
},
"default_openai_model": "gpt-4o-mini"
}
],
"server": {
"host": "0.0.0.0",
"port": 8000,
"log_level": "info"
},
"logging": {
"enable_detailed_logs": false,
"enable_access_logs": true,
"log_directory": "logs"
},
"retry": {
"max_retries": 999,
"wait_fixed": 5
}
}When you start the service, you will be prompted to select a provider:
python gemini_proxy_for_kimi.pyThe service will start at http://0.0.0.0:8000.
curl -X POST http://localhost:8000/v1beta/models/gemini-2.5-pro:generateContent \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [{"text": "Hello, how are you?"}]
}],
"generationConfig": {
"temperature": 0.7,
"maxOutputTokens": 200
}
}'curl -X POST http://localhost:8000/v1beta/models/gemini-2.5-pro:streamGenerateContent \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [{"text": "Write a short story"}]
}]
}'curl -X POST http://localhost:8000/v1beta/models/gemini-2.5-pro:countTokens \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [{"text": "Count tokens for this text"}]
}]
}'curl -X POST http://localhost:8000/v1beta/models/gemini-2.5-pro:generateContent \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [{"text": "What is the weather like in Beijing?"}]
}],
"tools": [{
"functionDeclarations": [{
"name": "get_weather",
"description": "Get weather information for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"date": {"type": "string", "description": "Date in YYYY-MM-DD format"}
},
"required": ["city"]
}
}]
}]
}'{
"providers": [
{
"name": "Moonshot",
"openai_api_key": "sk-1234",
"openai_base_url": "https://api.moonshot.cn/v1",
"daily_limit": -1,
"model_mapping": {
"gemini-2.5-pro": "kimi-k2-0711-preview",
"gemini-2.5-flash": "moonshot-v1-auto"
},
"default_openai_model": "kimi-k2-0711-preview"
}
],
"server": {
"host": "0.0.0.0",
"port": 8000,
"log_level": "info"
},
"logging": {
"enable_detailed_logs": false,
"enable_access_logs": true,
"log_directory": "logs"
},
"retry": {
"max_retries": 999,
"wait_fixed": 5
}
}| Option | Description | Default |
|---|---|---|
providers |
A list of API provider configurations. | [] |
providers[].name |
The name of the provider, for selection at startup. | Unnamed Provider |
providers[].openai_api_key |
OpenAI API key for the provider. | Required |
providers[].openai_base_url |
OpenAI API base URL for the provider. | https://api.openai.com/v1 |
providers[].daily_limit |
Daily request limit for the provider. -1 means unlimited. | -1 |
providers[].model_mapping |
Gemini to OpenAI model mapping for the provider. | {} |
providers[].default_openai_model |
Default OpenAI model for the provider. | gpt-3.5-turbo |
server.host |
Listen address | 0.0.0.0 |
server.port |
Listen port | 8000 |
server.log_level |
Log level | info |
logging.enable_detailed_logs |
Enable detailed request logs | false |
logging.enable_access_logs |
Enable access logs | true |
logging.log_directory |
Log directory | logs |
retry.max_retries |
Maximum number of retries on failure | 3 |
retry.wait_fixed |
Fixed wait time between retries in seconds | 2 |
The service includes an automatic failover mechanism based on the daily_limit of each provider to ensure high availability:
- Primary Provider: The service will always try to use the provider you selected at startup first.
- Unlimited Failover: If the primary provider reaches its daily limit, the service will automatically search for and switch to the first available provider with an unlimited quota (
"daily_limit": -1). - Limited Failover: If no unlimited providers are available, the service will then search for and switch to the first available provider that still has a remaining request quota.
- Service Unavailable: If all configured providers have reached their daily limits, the API will return a
503 Service Unavailableerror.
{
"openai_api_key": "sk-xxx",
"openai_base_url": "https://api.moonshot.cn/v1",
"model_mapping": {
"gemini-2.5-pro": "kimi-k2-0711-preview",
"gemini-2.5-flash": "kimi-k2-0711-preview"
}
}Note: The following services are theoretically compatible but require your own testing and adjustments. You can enable enable_detailed_logs in the config file to output detailed request and response information for targeted adaptation.
{
"openai_api_key": "sk-xxx",
"openai_base_url": "https://api.openai.com/v1",
"model_mapping": {
"gemini-2.5-pro": "gpt-4o",
"gemini-2.5-flash": "gpt-4o-mini"
}
}{
"openai_api_key": "your-azure-key",
"openai_base_url": "https://your-resource.openai.azure.com/openai/deployments/your-deployment",
"model_mapping": {
"gemini-2.5-pro": "gpt-4",
"gemini-2.5-flash": "gpt-35-turbo"
}
}{
"openai_api_key": "sk-xxx",
"openai_base_url": "https://api.deepseek.com/v1",
"model_mapping": {
"gemini-2.5-pro": "deepseek-chat",
"gemini-2.5-flash": "deepseek-chat"
}
}{
"openai_api_key": "your-zhipu-key",
"openai_base_url": "https://open.bigmodel.cn/api/paas/v4",
"model_mapping": {
"gemini-2.5-pro": "glm-4",
"gemini-2.5-flash": "glm-4-flash"
}
}{
"openai_api_key": "ollama",
"openai_base_url": "http://localhost:11434/v1",
"model_mapping": {
"gemini-2.5-pro": "llama3:8b",
"gemini-2.5-flash": "llama3:8b"
}
}If you need to adapt other OpenAI-compatible services:
- Clone the project:
git clone - Modify configuration: Update API endpoints and model mappings in
config.json - Test functionality: Focus on testing streaming responses, function calls, multi-turn conversations. You can enable enable_detailed_logs in the config file to output detailed request and response information for targeted adaptation.
- Optimize code: Adjust conversion logic based on target service characteristics
Concise access logs showing basic information for each request:
π POST /v1beta/models/gemini-2.5-pro:generateContent - 200 - Model: gemini-2.5-pro - ID: abc12345 - 2.341s
π POST /v1beta/models/gemini-2.5-pro:streamGenerateContent - 200 - Model: gemini-2.5-pro(stream) - ID: def67890 - 5.123s
Complete request/response conversion process (optional):
1_GEMINI_REQUEST- Original Gemini request2_OPENAI_REQUEST- Converted OpenAI request3_OPENAI_RESPONSE- Raw OpenAI response4_GEMINI_RESPONSE- Final Gemini response
- Install Gunicorn:
pip install gunicorn- Create startup script
start.sh:
#!/bin/bash
gunicorn gemini_proxy_for_kimi:app \
--worker-class uvicorn.workers.UvicornWorker \
--workers 4 \
--bind 0.0.0.0:8000 \
--access-logfile - \
--error-logfile - \
--log-level info- Run:
chmod +x start.sh
./start.shCreate /etc/systemd/system/gemini-proxy.service:
[Unit]
Description=Gemini API Proxy
After=network.target
[Service]
Type=exec
User=your-user
Group=your-group
WorkingDirectory=/path/to/your/app
ExecStart=/path/to/your/venv/bin/gunicorn gemini_proxy_for_kimi:app \
--worker-class uvicorn.workers.UvicornWorker \
--workers 4 \
--bind 0.0.0.0:8000
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.targetStart the service:
sudo systemctl daemon-reload
sudo systemctl enable gemini-proxy
sudo systemctl start gemini-proxyCreate Nginx configuration:
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Support streaming responses
proxy_buffering off;
proxy_cache off;
# Increase timeout
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
}-
Worker Count: Set based on CPU cores, recommended
2 * CPU_CORES + 1 -
Memory Optimization:
# Limit memory usage
gunicorn --max-requests 1000 --max-requests-jitter 100 ...-
Connection Pool: Add connection pool settings in configuration
-
Caching: Consider adding Redis caching layer for responses
curl http://localhost:8000/healthReturns:
{"status": "healthy", "service": "gemini-proxy"}Use logrotate to manage log files:
# /etc/logrotate.d/gemini-proxy
/path/to/your/app/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
create 644 your-user your-group
postrotate
systemctl reload gemini-proxy
endscript
}Access logs include:
- Request method and path
- Response status code
- Model used
- Request ID
- Response time
# Enable detailed logs
# Set in config.json
{
"logging": {
"enable_detailed_logs": true,
"enable_access_logs": true
}
}
# Start development server
python gemini_proxy_for_kimi.py- View detailed logs: Enable
enable_detailed_logsto see complete conversion process - Model mapping testing: Test mapping with different Gemini model names
- Streaming response debugging: Observe SSE data streams
- Function call debugging: Check format conversion of tool calls
Issues and Pull Requests are welcome!
git clone
cd gemini-proxy
pip install -r requirements.txt- Follow PEP 8 code style
- Add appropriate type annotations
- Write clear docstrings
- Ensure backward compatibility
This project is licensed under the MIT License. See LICENSE file for details.
A: Supports generateContent, streamGenerateContent, countTokens, and health endpoints.
A: Currently only fully tested on Moonshot Kimi. Other OpenAI-compatible services are theoretically usable but require your own testing and adjustments.
A: 1) Clone the project code 2) Modify config.json configuration 3) Focus on testing streaming responses, function calls, etc. 4) Adjust code logic as needed
A: Different LLM services have variations in API details, response formats, error handling, etc., requiring targeted testing and optimization. Currently, efforts are focused on complete Kimi adaptation.
A: Check if the client correctly handles text/event-stream format and ensure the network environment supports SSE.
A: Increase Gunicorn worker count, use load balancers, consider adding caching layers.
β If this project helps you, please give us a Star!