This is a specialized ReAct agent built with LangGraph and the Model Context Protocol (MCP) to assist human agents with escalated Medicare insurance member conversations.
The agent analyzes conversations between members and automated systems, then provides structured guidance to human agents on how to respond. It uses documentation (welcome call scripts, FAQs, response templates) to generate compliant, empathetic responses.
- MCP Gateway Server: Manages MCP server processes and provides unified tool access
- File System MCP Server: Provides access to documentation in the
docs/directory - ReAct Agent: Analyzes conversations and generates proposed responses
- State Management Tools: Local tools for reading/writing structured state
The agent uses a structured state system with:
- conversation_history: Messages between member and automated system
- escalation_context: Why escalated (reason, urgency, member sentiment)
- proposed_response: Agent's suggested message with reasoning, tone, and references
- accessed_documents: Tracking of documentation used
retrieve_context: Get conversation history, escalation context, and preloaded docs confirmation in ONE callsubmit_response: Submit final proposed message and track accessed documents in ONE call
All documentation is preloaded into the system prompt at startup for optimal performance:
blueprint.md: Welcome call campaign script with structured talking pointsfaq.md: Common member questions and templated responsessamples.md: Live chat response templates for various scenarios
No MCP calls needed during the reasoning loop, resulting in 60-70% faster response times.
This section explains how a request flows through the system from startup to response generation.
langgraph dev
└─> LangGraph Server starts (port 2024)
└─> Loads langgraph.json
└─> Imports react_agent.graph.py:graph
└─> MODULE INITIALIZATION
├─> prompts.py imports (line 3)
│ └─> load_documentation() executes (prompts.py:6)
│ ├─> Reads docs/blueprint.md
│ ├─> Reads docs/faq.md
│ ├─> Reads docs/samples.md
│ ├─> Escapes curly braces for format() safety
│ └─> Returns _PRELOADED_DOCS string
│ └─> Injected into SYSTEM_PROMPT (prompts.py:108)
│
├─> Configuration.load_from_langgraph_json() (graph.py:22-23)
│ └─> Sets mcp_gateway_url = "http://localhost:8808"
│
└─> asyncio.run(initialize_tools(config))
├─> Local tools: [retrieve_context, submit_response]
│
└─> MCP tools via mcp_client.list_tools()
└─> HTTP POST to gateway:8808/message
└─> Returns memory tools (filesystem not needed for docs)
Parallel Process - MCP Gateway (port 8808):
cd gateway && python3 -m mcp_gateway.server
├─> Loads gateway/config.json
├─> Spawns MCP server subprocesses
│ ├─> npx @modelcontextprotocol/server-filesystem ../docs
│ └─> npx @modelcontextprotocol/server-memory
└─> Listens on port 8808 for tool requestsWhen a request arrives at POST http://localhost:2024/runs/stream:
1. State Initialization
LangGraph Runtime
└─> Creates State from InputState (state.py)
├─> messages: [HumanMessage("Please analyze...")]
├─> conversation_history: [ConversationMessage(...), ...]
├─> escalation_context: EscalationContext(...)
├─> proposed_response: None
└─> accessed_documents: []
2. StateGraph Execution
The agent uses a ReAct (Reasoning and Acting) pattern:
__start__ → call_model → [route] → __end__
↑ ↓
└─── tools ──┘
3. The ReAct Loop
Node: call_model (graph.py:25-75)
call_model(state, config)
│
├─> 1. Load Configuration
│ configuration = Configuration.from_runnable_config(config)
│ └─> Extracts: model, azure_*, system_prompt, etc.
│
├─> 2. Initialize LLM (lines 42-48)
│ model = load_chat_model(
│ model_name="azure/gpt-4", # or anthropic/openai/openrouter
│ azure_endpoint=...,
│ ...
│ ).bind_tools(TOOLS)
│ │
│ └─> utils.py:load_chat_model()
│ ├─> Parses "provider/model" pattern
│ └─> Routes to provider:
│ ├─> azure → AzureChatOpenAI(endpoint, deployment, key, version)
│ ├─> anthropic → ChatAnthropic(model)
│ ├─> openai → ChatOpenAI(model)
│ └─> openrouter → ChatOpenAI(base_url)
│
├─> 3. Format System Prompt
│ system_message = configuration.system_prompt.format(
│ system_time=datetime.now().isoformat()
│ )
│
├─> 4. Invoke LLM
│ response = await model.ainvoke([
│ {"role": "system", "content": system_message},
│ *state["messages"]
│ ])
│ │
│ └─> LLM decides:
│ ├─> Option A: Return text (no tools) → __end__
│ └─> Option B: Request tools → tools node
│
└─> 5. Return Response
return {"messages": [response]}
Conditional Routing: route_model_output (graph.py:90-110)
if not last_message.tool_calls:
return "__end__" # Finish execution
else:
return "tools" # Execute tools, continue loopNode: tools (graph.py:83) - If tools requested
ToolNode(TOOLS)
└─> For each tool_call in response.tool_calls:
│
├─> retrieve_context (Consolidated State Tool)
│ └─> tools.py:122-178
│ ├─> Reads state["conversation_history"]
│ ├─> Reads state["escalation_context"]
│ ├─> Confirms preloaded docs available in system prompt
│ └─> Returns Command(update={
│ "messages": [ToolMessage(
│ content="# RETRIEVED CONTEXT\n\n## Conversation History...\n\n## Escalation Context...\n\n## Preloaded Documentation\nAll documentation has been preloaded..."
│ )]
│ })
│
├─> submit_response (Consolidated Output Tool)
│ └─> tools.py:181-230
│ ├─> Validates message, reasoning, tone, relevant_docs, key_points
│ ├─> Updates state["proposed_response"]
│ ├─> Updates state["accessed_documents"]
│ └─> Returns Command(update={...})
│
└─> MCP Tool (e.g., memory operations - if configured)
└─> tools.py:_create_tool_wrapper()
└─> mcp_client.call_tool("memory_operation", {...})
└─> HTTP POST to gateway:8808/message
└─> MCP Gateway routes to memory server
└─> Tool results appended to state["messages"]
└─> Edge: tools → call_model (LOOP CONTINUES)
4. Loop Termination
The agent cycles through call_model → tools → call_model until:
- Model returns a response without tool calls
- Recursion limit reached (default: 50)
is_last_step=Truesafety check triggers
5. Final State & Response
Final State:
{
"messages": [
HumanMessage("Please analyze..."),
AIMessage("I'll check the docs", tool_calls=[...]),
ToolMessage("Blueprint contents: ..."),
AIMessage("Based on the docs, I'll set the response"),
ToolMessage("Response set successfully")
],
"proposed_response": {
"message": "I understand your frustration...",
"reasoning": "Member is frustrated due to...",
"suggested_tone": "empathetic_and_solution_focused",
"relevant_docs": ["samples.md#apologies"],
"key_points": [...]
},
"accessed_documents": ["docs/blueprint.md", "docs/samples.md"]
}
LangGraph streams response via Server-Sent Events:
- Event: metadata (run_id)
- Event: messages (each AI/Tool message)
- Event: debug (node execution)
- Event: values (final state)
- Event: end
1. State Injection for Local Tools
@tool
def get_conversation_history(
state: Annotated[dict, InjectedState] # LangGraph injects current state
) -> Command:
return Command(update={"messages": [...]}) # Updates state2. MCP Tool Wrapping
def _create_tool_wrapper(tool_def: Dict) -> BaseTool:
async def wrapper(**kwargs):
result = await mcp_client.call_tool(name, kwargs) # HTTP to gateway
return result
return StructuredTool(name=..., func=wrapper, args_schema=...)3. Provider Abstraction
def load_chat_model(model_name: str, **provider_config):
provider, model = model_name.split("/") # "azure/gpt-4"
if provider == "azure":
return AzureChatOpenAI(...)
elif provider == "anthropic":
return ChatAnthropic(...)
# ... etcUser HTTP Request
└─> LangGraph Server (port 2024)
└─> StateGraph.compile().ainvoke()
└─> __start__ node
└─> call_model node
├─> Configuration.from_runnable_config()
├─> load_chat_model()
│ └─> AzureChatOpenAI(...) or ChatAnthropic(...) or ChatOpenAI(...)
├─> model.bind_tools(TOOLS)
└─> model.ainvoke([system_prompt, *messages])
└─> route_model_output()
├─> IF no tool_calls → __end__
└─> IF tool_calls → tools node
└─> ToolNode(TOOLS)
├─> Local Tool → Command(update={...})
└─> MCP Tool → HTTP to gateway:8808
└─> MCP Gateway → MCP Server
└─> Tool execution → ToolMessage
└─> Edge: tools → call_model (LOOP)
This architecture enables:
- ✅ Separation of concerns: LLM reasoning separate from tool execution
- ✅ Extensibility: Easy to add new tools via MCP or local functions
- ✅ Provider flexibility: Swap LLM providers via configuration
- ✅ State management: Clean state updates via Command pattern
- ✅ Observability: LangSmith tracing of entire execution flow
- Python 3.11 or higher
- uv (recommended) or pip for package management
- Node.js and npm (for MCP servers)
- An API key for your chosen LLM provider:
- Anthropic (Claude models)
- OpenAI (GPT models)
- Azure OpenAI (Azure-hosted GPT models)
- OpenRouter (Access to multiple providers)
Create a .env file in the project root with your API keys:
# LLM Configuration - supports openai, anthropic, openrouter, azure
# Choose ONE provider by setting LLM_MODEL:
# Option 1: Anthropic (recommended)
LLM_MODEL=anthropic/claude-3-5-sonnet-20241022
ANTHROPIC_API_KEY=your_api_key_here
# Option 2: OpenAI
# LLM_MODEL=openai/gpt-4o
# OPENAI_API_KEY=your_openai_api_key
# Option 3: Azure OpenAI
# LLM_MODEL=azure/your-deployment-name
# AZURE_OPENAI_API_KEY=your_azure_api_key
# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
# AZURE_OPENAI_API_VERSION=2024-10-21
# Option 4: OpenRouter
# LLM_MODEL=openrouter/anthropic/claude-3-5-sonnet-20241022
# OPENROUTER_API_KEY=your_openrouter_api_key
# OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
# LangSmith Configuration - for tracing and monitoring (optional)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langsmith_key_here
LANGCHAIN_PROJECT=ohl-agentIf using Azure OpenAI, you need to gather configuration from your Azure Portal:
-
AZURE_OPENAI_ENDPOINT:
- Go to your Azure OpenAI resource → "Keys and Endpoint"
- Copy the "Endpoint" URL (e.g.,
https://your-resource.openai.azure.com/)
-
AZURE_OPENAI_API_KEY:
- Same location → Copy "KEY 1" or "KEY 2"
-
Deployment Name (for
LLM_MODEL):- Go to your Azure OpenAI resource → "Model deployments" (or Azure OpenAI Studio)
- Find your deployment name (e.g.,
gpt-4,gpt-35-turbo, etc.) - Important: This is YOUR deployment name, not the model name
- Set
LLM_MODEL=azure/your-deployment-name
-
AZURE_OPENAI_API_VERSION:
- Use a stable version like
2024-10-21(recommended) - See Azure OpenAI API versions for latest
- Use a stable version like
The MCP gateway needs access to documentation. Create a directory at the same level as the project:
# From the project root
cd ..
mkdir docs
# Add your documentation files (blueprint.md, faq.md, samples.md) to this directory# Create and activate virtual environment for the agent
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install the agent package
uv pip install -e .
# Create and activate virtual environment for the gateway
cd gateway
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install the gateway package
uv pip install -e .
cd ..# Create and activate virtual environment for the agent
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install the agent package
pip install -e .
# Create and activate virtual environment for the gateway
cd gateway
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install the gateway package
pip install -e .
cd ..Update gateway/config.json to configure MCP servers:
{
"mcp": {
"servers": {
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"[fully qualified path to]/docs"
]
},
"memory": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-memory"
]
}
}
}
}The agent uses MCP servers to access external resources. Here's what each server provides:
1. File System Server (Required)
- Purpose: Access to documentation files (blueprint.md, faq.md, samples.md)
- Path Configuration: Must point to a directory containing your documentation files
- Example:
"/Users/username/docs" - What to include:
blueprint.md- Welcome call scripts and talking pointsfaq.md- Common questions and answerssamples.md- Response templates and examples
2. Memory Server (Optional)
- Purpose: Stateful memory across conversations
- Use case: Remembering context from previous interactions
- Note: This is separate from LangGraph's built-in thread persistence
3. Provider Search (Optional - Advanced Feature)
If you want the agent to search and recommend medical providers, you need to:
A. Prepare Provider Data:
- Create a
providers/subdirectory in your filesystem MCP path - Add provider CSV files organized by specialty (e.g.,
radiology.csv,cardiology.csv) - CSV files must include these columns:
organization_name- Provider/facility nameaddress_1,city,state,postal_code- Full addresstelephone_number- Contact phonetaxonomy_desc- Provider specialty/type
Example CSV structure:
organization_name,address_1,city,state,postal_code,telephone_number,taxonomy_desc
Mandell & Blau Radiology,140 Main St,Middletown,CT,06457,860-346-7400,Diagnostic Radiology
Stamford Radiological Associates,76 Progress Dr,Stamford,CT,06902,203-359-0130,Diagnostic RadiologyB. Update Gateway Config:
{
"mcp": {
"servers": {
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/path/to/your/data"
]
}
}
}
}Where /path/to/your/data contains:
/path/to/your/data/
├── blueprint.md
├── faq.md
├── samples.md
└── providers/
├── radiology.csv
├── cardiology.csv
└── primary_care.csv
C. How Provider Search Works:
When a member needs a provider:
- Agent receives
patient_datawith member's ZIP code - Agent uses MCP filesystem tools to:
- List files in
providers/directory - Read relevant specialty CSV files
- Parse provider data
- List files in
- Agent estimates distance from member's ZIP code:
- Same ZIP prefix (first 3 digits) = ~10-15 miles
- Same city = ~20 miles
- Same state = ~20-50 miles
- Agent recommends 3-5 closest providers with contact info
Important Notes:
- Provider search requires real provider data - the agent cannot generate fake providers
- Without provider CSV files, the agent will refer members to the plan's provider directory website
- Distance calculations are estimates based on ZIP code proximity, not precise GPS coordinates
- The agent will remind members to verify network status and call ahead
You need to run both servers. Open two terminal windows:
Terminal 1 - MCP Gateway:
cd gateway
source .venv/bin/activate # Activate the gateway virtual environment
python3 -m mcp_gateway.serverThe gateway will start on port 8808 and connect to the File System and Memory MCP servers.
Terminal 2 - LangGraph Dev Server:
# From the project root
source .venv/bin/activate # Activate the agent virtual environment
langgraph devThe LangGraph server will start on port 2024 and automatically:
- Load the agent graph
- Connect to the MCP gateway at http://localhost:8808
- Discover and load all available tools (local + MCP tools)
You should see output indicating:
- MCP Gateway started with filesystem and memory servers
- LangGraph server started and loaded tools
- Both servers ready to accept requests
The agent can be invoked via the LangGraph API or through LangGraph Studio.
Once both servers are running, LangGraph Studio will open in your browser at:
https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
You can interact with the agent through the Studio UI by providing Medicare input.
The agent is invoked via HTTP POST to http://localhost:2024/runs/stream with a structured payload.
IMPORTANT: The messages field is required in the input. This field contains the initial prompt that instructs the agent to analyze the conversation. Even though the Medicare-specific data is in conversation_history and escalation_context, the agent needs at least one message in the messages array to start processing.
{
"input": {
"messages": [
{
"content": "Please analyze this escalated conversation and provide guidance for the human agent.",
"type": "human"
}
],
"conversation_history": [
{
"role": "member",
"content": "I haven't received my ID card yet",
"timestamp": "2025-01-20T10:00:00Z"
},
{
"role": "system",
"content": "Your card was mailed on January 5th",
"timestamp": "2025-01-20T10:00:30Z"
},
{
"role": "member",
"content": "That was 3 weeks ago! I need it now!",
"timestamp": "2025-01-20T10:01:00Z"
}
],
"escalation_context": {
"reason": "member_frustrated",
"urgency": "high",
"member_sentiment": "frustrated"
}
},
"config": {
"tags": [],
"recursion_limit": 50,
"configurable": {}
},
"metadata": {},
"stream_mode": ["debug", "messages"],
"stream_subgraphs": true,
"assistant_id": "agent",
"interrupt_before": [],
"interrupt_after": [],
"multitask_strategy": "rollback"
}The /runs/stream endpoint returns Server-Sent Events (SSE) that must be parsed to extract the agent's response. The stream includes multiple event types:
metadata: Run metadata (run_id, etc.)messages: AI messages and tool calls as they're generateddebug: Node execution lifecycle events (task start/completion)values: Complete state snapshots after each node executionerror: Error events if execution failsend: Stream completion signal
Key Requirements:
- Parse SSE format: Events are separated by
\n\n, withevent:anddata:lines - Handle incremental updates: The agent state evolves as the stream progresses
- Extract final state: The
proposed_responsefield appears in the finalvaluesevent
For detailed information on consuming the streaming API, see:
Alternatively, use the /runs/wait endpoint for a simpler non-streaming response that returns the complete final state in a single HTTP response.
The agent returns a structured response in state.proposed_response:
{
"message": "I understand your frustration, [Member Name]. I'm truly sorry about the delay with your ID card. Let me help you right away. I can see that your card was mailed on January 5th, which is longer than our typical delivery time. I'd like to offer you two immediate solutions: First, I can help you print a temporary ID card from our member website right now, which you can use immediately. Second, I'll request a replacement card to be sent via expedited shipping. Would you like me to walk you through printing the temporary card?",
"reasoning": "Member is frustrated due to delayed ID card delivery beyond normal timeframe. Using empathetic opening from samples.md#apologies-to-members, acknowledging the delay, and offering immediate actionable solutions per blueprint.md#verify-plan-information. Providing both immediate (temporary card) and long-term (replacement) solutions to address urgency.",
"suggested_tone": "empathetic_and_solution_focused",
"confidence_score": 0.7,
"relevant_docs": [
"samples.md#apologies-to-members",
"blueprint.md#verify-plan-information",
"faq.md#id-card-issues"
],
"key_points": [
"Acknowledge frustration and apologize for delay",
"Explain the situation (card mailed but delayed)",
"Offer immediate solution (temporary card)",
"Offer long-term solution (expedited replacement)",
"Provide clear next steps"
]
}The confidence_score field (0.0 to 1.0) represents the agent's confidence that the proposed response is appropriate and likely to resolve the member's issue. The UX can use this to determine when human review is most needed:
- 0.8 - 1.0 (High): Clear question with direct documentation match, positive sentiment
- 0.5 - 0.8 (Medium): Some complexity or minor documentation gaps, manageable sentiment
- 0.3 - 0.5 (Low): Significant complexity, member frustration, likely needs human intervention
- 0.0 - 0.3 (Very Low): Explicit request for human agent, high agitation, requires immediate escalation
Factors that lower confidence:
- Member agitation or explicit request for human agent (major impact)
- Documentation gaps or need for inferences (moderate impact)
- Complex multi-step resolution required (moderate impact)
- Ambiguity in member's request (moderate impact)
{
"mcp": {
"servers": {
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/dan/code/ohl-agent/docs"
]
},
"memory": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-memory"
]
}
}
}
}{
"dependencies": ["."],
"graphs": {
"agent": "./src/react_agent/graph.py:graph"
},
"env": ".env",
"mcp": {
"gateway_url": "http://localhost:8808"
}
}- Preloaded Documentation: All docs loaded into system prompt at startup for instant access
- 60-70% Faster Response Times: Eliminates HTTP round-trips to MCP gateway during reasoning
- 50-60% Cost Reduction: Fewer LLM calls due to consolidated tools
- Verbatim Language Usage: Agent uses exact phrases from documentation for consistency
- Tracks which documents were referenced in responses
- Data-Driven Constraints: Response length limits (50-100 words typical, max 125) based on analysis of actual documentation
- 89% Token Reduction: Reduced from 2,819 output tokens to ~200-300 tokens per response
- 91% Faster Generation: Response generation reduced from 34+ seconds to 3-5 seconds
- Focused Responses: Agent provides actionable, concise guidance rather than exhaustive explanations
- Automatic Checkpointing: LangGraph platform provides built-in persistence when running with
langgraph dev - No Custom Checkpointer Required: Thread state is automatically managed by the platform
- Thread API Support: Use
/threads/{thread_id}/runs/streamendpoint for stateful conversations - Cross-Request Efficiency: State persists between requests within the same thread
- 2 Optimized Tools:
retrieve_contextandsubmit_responsereplace 4 separate tools - Single-Call Context Retrieval: Get conversation history, escalation context, and docs confirmation in one call
- Single-Call Response Submission: Set response and track documents in one call
- Reduces reasoning chain length and LLM token usage
- Clear separation between conversation history and escalation context
- Structured output with message, reasoning, tone, and references
- Easy integration with existing systems
- System prompt emphasizes required disclaimers
- Documentation includes compliance requirements
- Agent trained to include necessary legal language
- Tools use LangGraph's
InjectedStateandInjectedToolCallId - Returns
Commandobjects for state updates - Clean separation between local and MCP tools
You can easily switch between different LLM providers by updating the LLM_MODEL variable in your .env file:
Switch to Anthropic Claude:
LLM_MODEL=anthropic/claude-3-5-sonnet-20241022
# Ensure ANTHROPIC_API_KEY is setSwitch to OpenAI GPT:
LLM_MODEL=openai/gpt-4o
# Ensure OPENAI_API_KEY is setSwitch to Azure OpenAI:
LLM_MODEL=azure/your-deployment-name
# Ensure AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, and AZURE_OPENAI_API_VERSION are setSwitch to OpenRouter:
LLM_MODEL=openrouter/anthropic/claude-3-5-sonnet-20241022
# or
LLM_MODEL=openrouter/openai/gpt-4o
# Ensure OPENROUTER_API_KEY is setAfter changing LLM_MODEL, restart the LangGraph dev server for changes to take effect.
- Add markdown files to the
docs/directory - Update
load_documentation()insrc/react_agent/docs_loader.pyto include new files inrequired_docslist - Update the system prompt in
src/react_agent/prompts.pyto reference new documentation if needed - Restart LangGraph dev server - Documentation is loaded at module initialization, not dynamically
Important: Unlike MCP-based approaches, documentation changes require a server restart to take effect.
- Update
ProposedResponsedataclass insrc/react_agent/state.py - Update
submit_responsetool insrc/react_agent/tools.py - Update system prompt to reflect new structure
Update the docs_dir parameter in load_documentation() call or set DOCS_DIR environment variable:
Option 1: Modify prompts.py
# src/react_agent/prompts.py
_PRELOADED_DOCS = load_documentation(docs_dir="/path/to/your/docs")Option 2: Use environment variable
# .env
DOCS_DIR=/path/to/your/docsThen update docs_loader.py to read from os.getenv("DOCS_DIR") if provided.
# Unit tests
pytest tests/unit_tests/
# Integration tests
pytest tests/integration_tests/Install LangGraph Studio (guide) and open this project. The agent will automatically:
- Connect to the local gateway server
- Discover available tools
- Make tools available for use in conversations
- Ensure MCP gateway is running on port 8808
- Check
gateway/config.jsonfor correct paths - Review logs for tool initialization errors
- Verify tools return
Commandobjects - Check that
InjectedStateandInjectedToolCallIdare properly annotated - Ensure state field names match between tools and state classes
- Verify File System MCP server path in
gateway/config.json - Check that docs directory exists and contains markdown files
- Test MCP gateway directly:
curl http://localhost:8808/tools
Problem: "The API deployment for this resource does not exist"
Solution: Your deployment name is incorrect. The deployment name in LLM_MODEL must match exactly what's in Azure.
- Go to Azure Portal → Your Azure OpenAI resource → "Model deployments"
- Find the actual deployment name (e.g.,
gpt-4,gpt-35-turbo, or your custom name) - Update
.env:LLM_MODEL=azure/your-actual-deployment-name AZURE_OPENAI_DEPLOYMENT=your-actual-deployment-name
- Common mistake: Using the resource name instead of deployment name
- ❌ Wrong:
ohlazureaihubd0505198248(resource name) - ✅ Correct:
gpt-4orgpt-35-turbo(deployment name)
- ❌ Wrong:
Problem: Invalid or expired API key
Solution:
- Verify
AZURE_OPENAI_API_KEYin.envis correct - Check key hasn't been regenerated in Azure Portal
- Ensure no extra spaces or quotes around the key
Problem: API version is deprecated or invalid
Solution:
- Update
AZURE_OPENAI_API_VERSIONto a current version (e.g.,2024-10-21) - See Azure OpenAI API versions
Problem: Incorrect endpoint URL format
Solution:
- Verify endpoint format in
.env:- ✅ Correct:
https://your-resource.openai.azure.com/ - ✅ Also valid:
https://your-resource.services.ai.azure.com/
- ✅ Correct:
- Must end with
/ - Must use
https://
This project is licensed under the MIT License - see the LICENSE file for details.