Model A/B testing, prompt management, and tracing for LLM applications. Zero latency, production-ready.
pip install fallomimport fallom
from openai import OpenAI
# Initialize Fallom once at app startup
fallom.init(api_key="your-api-key")
# Create a session for this conversation/request
session = fallom.session(
config_key="my-app",
session_id="session-123",
customer_id="user-456", # optional
)
# Wrap your LLM client
openai = session.wrap_openai(OpenAI())
# All LLM calls are now automatically traced!
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)Wrap any of these LLM clients:
# OpenAI
openai = session.wrap_openai(OpenAI())
# Anthropic
anthropic = session.wrap_anthropic(Anthropic())
# Google AI
import google.generativeai as genai
genai.configure(api_key="...")
model = genai.GenerativeModel("gemini-1.5-flash")
gemini = session.wrap_google_ai(model)
# OpenRouter (uses OpenAI SDK)
openrouter = session.wrap_openai(
OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key",
)
)Run A/B tests on models with zero latency. Same session always gets same model (sticky assignment).
from fallom import models
# Get assigned model for this session
model = models.get("summarizer-config", session_id)
# Returns: "gpt-4o" or "claude-3-5-sonnet" based on your config weightsOr use the session's get_model() method:
session = fallom.session(
config_key="summarizer-config",
session_id=session_id,
)
# Get model for this session's config
model = await session.get_model(fallback="gpt-4o-mini")# Use latest version (default)
model = models.get("my-config", session_id)
# Pin to specific version
model = models.get("my-config", session_id, version=2)Always provide a fallback so your app works even if Fallom is down:
model = models.get(
"my-config",
session_id,
fallback="gpt-4o-mini" # Used if config not found or Fallom unreachable
)Override weighted distribution for specific users or segments:
model = models.get(
"my-config",
session_id,
fallback="gpt-4o-mini",
customer_id="user-123", # For individual targeting
context={ # For rule-based targeting
"plan": "enterprise",
"region": "us-west"
}
)Resilience guarantees:
- Short timeouts (1-2 seconds max)
- Background config sync (never blocks your requests)
- Graceful degradation (returns fallback on any error)
- Your app is never impacted by Fallom being down
Manage prompts centrally and A/B test them with zero latency.
from fallom import prompts
# Get a managed prompt (with template variables)
prompt = prompts.get("onboarding", variables={
"user_name": "John",
"company": "Acme"
})
# Use the prompt with any LLM
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": prompt.system},
{"role": "user", "content": prompt.user}
]
)The prompt object contains:
key: The prompt keyversion: The prompt versionsystem: The system prompt (with variables replaced)user: The user template (with variables replaced)
Run experiments on different prompt versions:
from fallom import prompts
# Get prompt from A/B test (sticky assignment based on session_id)
prompt = prompts.get_ab("onboarding-test", session_id, variables={
"user_name": "John"
})
# prompt.ab_test_key and prompt.variant_index are set
# for analytics in your dashboardWhen you call prompts.get() or prompts.get_ab(), the next LLM call is automatically tagged with the prompt information:
# Get prompt - sets up auto-tagging for next LLM call
prompt = prompts.get("onboarding", variables={"user_name": "John"})
# This call is automatically tagged with prompt_key, prompt_version, etc.
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": prompt.system},
{"role": "user", "content": prompt.user}
]
)Sessions group related LLM calls together (e.g., a conversation or agent run):
session = fallom.session(
config_key="my-agent", # Groups traces in dashboard
session_id="session-123", # Conversation/request ID
customer_id="user-456", # Optional: end-user identifier
metadata={ # Optional: custom key-value metadata
"deployment": "dedicated",
"request_type": "transcript",
},
tags=["production", "high-priority"], # Optional: simple string tags
)Sessions are isolated - safe for concurrent requests:
import concurrent.futures
def handle_request(user_id: str, conversation_id: str):
session = fallom.session(
config_key="my-agent",
session_id=conversation_id,
customer_id=user_id,
)
openai = session.wrap_openai(OpenAI())
# This session's context is isolated
return openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Safe to run concurrently!
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [
executor.submit(handle_request, "user-1", "conv-1"),
executor.submit(handle_request, "user-2", "conv-2"),
]Record business metrics that can't be captured automatically:
from fallom import trace
# Record custom metrics (requires session context via set_session)
trace.set_session("my-agent", session_id)
trace.span({
"outlier_score": 0.8,
"user_satisfaction": 4,
"conversion": True
})
# Or explicitly specify session (for batch jobs)
trace.span(
{"outlier_score": 0.8},
config_key="my-agent",
session_id="user123-convo456"
)FALLOM_API_KEY=your-api-key
FALLOM_TRACES_URL=https://traces.fallom.com
FALLOM_CONFIGS_URL=https://configs.fallom.com
FALLOM_PROMPTS_URL=https://prompts.fallom.com
FALLOM_CAPTURE_CONTENT=true # Set to false for privacy modefallom.init(
api_key="your-api-key", # Or use FALLOM_API_KEY env var
traces_url="...", # Override traces endpoint
configs_url="...", # Override configs endpoint
prompts_url="...", # Override prompts endpoint
capture_content=True, # Set False for privacy mode
debug=False, # Enable debug logging
)For companies with strict data policies, disable prompt/completion capture:
# Via parameter
fallom.init(capture_content=False)
# Or via environment variable
# FALLOM_CAPTURE_CONTENT=falseIn privacy mode, Fallom still tracks:
- ✅ Model used
- ✅ Token counts
- ✅ Latency
- ✅ Session/config context
- ✅ Prompt key/version (metadata only)
- ❌ Prompt content (not captured)
- ❌ Completion content (not captured)
Initialize the SDK. Call this once at app startup.
Create a session for tracing.
config_key: Your config name from the dashboardsession_id: Unique session/conversation IDcustomer_id: Optional user identifiermetadata: Optional dict of custom metadatatags: Optional list of string tags
Wrap an OpenAI client for automatic tracing.
Wrap an Anthropic client for automatic tracing.
Wrap a Google AI GenerativeModel for automatic tracing.
Get model assignment for this session.
Get model assignment for a session.
Get a managed prompt.
Get a prompt from an A/B test (sticky assignment).
Set trace context (legacy API for backwards compatibility).
Record custom business metrics.
For backwards compatibility, you can still use the global set_session() API with auto-instrumentation:
import fallom
fallom.init()
from openai import OpenAI
client = OpenAI()
fallom.trace.set_session("my-agent", session_id)
# Calls are traced if opentelemetry instrumentation is installed
response = client.chat.completions.create(...)However, we recommend using the new session-based API for:
- Better isolation in concurrent environments
- Explicit wrapping (no import order dependencies)
- Clearer code structure
Run the test suite:
cd sdk/python-sdk
pip install pytest
pytest tests/ -vMIT