Medicare Insurance Helper Agent - LangGraph ReAct Agent with MCP

This is a specialized ReAct agent built with LangGraph and the Model Context Protocol (MCP) to assist human agents with escalated Medicare insurance member conversations.

Overview

The agent analyzes conversations between members and automated systems, then provides structured guidance to human agents on how to respond. It uses documentation (welcome call scripts, FAQs, response templates) to generate compliant, empathetic responses.

Architecture

Components

MCP Gateway Server: Manages MCP server processes and provides unified tool access
File System MCP Server: Provides access to documentation in the docs/ directory
ReAct Agent: Analyzes conversations and generates proposed responses
State Management Tools: Local tools for reading/writing structured state

State Structure

The agent uses a structured state system with:

conversation_history: Messages between member and automated system
escalation_context: Why escalated (reason, urgency, member sentiment)
proposed_response: Agent's suggested message with reasoning, tone, and references
accessed_documents: Tracking of documentation used

Available Tools

State Management Tools (Local)

retrieve_context: Get conversation history, escalation context, and preloaded docs confirmation in ONE call
submit_response: Submit final proposed message and track accessed documents in ONE call

Documentation (Preloaded)

All documentation is preloaded into the system prompt at startup for optimal performance:

blueprint.md: Welcome call campaign script with structured talking points
faq.md: Common member questions and templated responses
samples.md: Live chat response templates for various scenarios

No MCP calls needed during the reasoning loop, resulting in 60-70% faster response times.

Execution Flow & Call Stack

This section explains how a request flows through the system from startup to response generation.

Startup Sequence

langgraph dev
  └─> LangGraph Server starts (port 2024)
      └─> Loads langgraph.json
          └─> Imports react_agent.graph.py:graph
              └─> MODULE INITIALIZATION
                  ├─> prompts.py imports (line 3)
                  │   └─> load_documentation() executes (prompts.py:6)
                  │       ├─> Reads docs/blueprint.md
                  │       ├─> Reads docs/faq.md
                  │       ├─> Reads docs/samples.md
                  │       ├─> Escapes curly braces for format() safety
                  │       └─> Returns _PRELOADED_DOCS string
                  │           └─> Injected into SYSTEM_PROMPT (prompts.py:108)
                  │
                  ├─> Configuration.load_from_langgraph_json() (graph.py:22-23)
                  │   └─> Sets mcp_gateway_url = "http://localhost:8808"
                  │
                  └─> asyncio.run(initialize_tools(config))
                      ├─> Local tools: [retrieve_context, submit_response]
                      │
                      └─> MCP tools via mcp_client.list_tools()
                          └─> HTTP POST to gateway:8808/message
                              └─> Returns memory tools (filesystem not needed for docs)

Parallel Process - MCP Gateway (port 8808):

cd gateway && python3 -m mcp_gateway.server
  ├─> Loads gateway/config.json
  ├─> Spawns MCP server subprocesses
  │   ├─> npx @modelcontextprotocol/server-filesystem ../docs
  │   └─> npx @modelcontextprotocol/server-memory
  └─> Listens on port 8808 for tool requests

Request Processing Flow

When a request arrives at POST http://localhost:2024/runs/stream:

1. State Initialization

LangGraph Runtime
  └─> Creates State from InputState (state.py)
      ├─> messages: [HumanMessage("Please analyze...")]
      ├─> conversation_history: [ConversationMessage(...), ...]
      ├─> escalation_context: EscalationContext(...)
      ├─> proposed_response: None
      └─> accessed_documents: []

2. StateGraph Execution

The agent uses a ReAct (Reasoning and Acting) pattern:

__start__ → call_model → [route] → __end__
               ↑            ↓
               └─── tools ──┘

3. The ReAct Loop

Node: call_model (graph.py:25-75)

call_model(state, config)
  │
  ├─> 1. Load Configuration
  │      configuration = Configuration.from_runnable_config(config)
  │      └─> Extracts: model, azure_*, system_prompt, etc.
  │
  ├─> 2. Initialize LLM (lines 42-48)
  │      model = load_chat_model(
  │          model_name="azure/gpt-4",  # or anthropic/openai/openrouter
  │          azure_endpoint=...,
  │          ...
  │      ).bind_tools(TOOLS)
  │      │
  │      └─> utils.py:load_chat_model()
  │          ├─> Parses "provider/model" pattern
  │          └─> Routes to provider:
  │              ├─> azure → AzureChatOpenAI(endpoint, deployment, key, version)
  │              ├─> anthropic → ChatAnthropic(model)
  │              ├─> openai → ChatOpenAI(model)
  │              └─> openrouter → ChatOpenAI(base_url)
  │
  ├─> 3. Format System Prompt
  │      system_message = configuration.system_prompt.format(
  │          system_time=datetime.now().isoformat()
  │      )
  │
  ├─> 4. Invoke LLM
  │      response = await model.ainvoke([
  │          {"role": "system", "content": system_message},
  │          *state["messages"]
  │      ])
  │      │
  │      └─> LLM decides:
  │          ├─> Option A: Return text (no tools) → __end__
  │          └─> Option B: Request tools → tools node
  │
  └─> 5. Return Response
         return {"messages": [response]}

Conditional Routing: route_model_output (graph.py:90-110)

if not last_message.tool_calls:
    return "__end__"        # Finish execution
else:
    return "tools"          # Execute tools, continue loop

Node: tools (graph.py:83) - If tools requested

ToolNode(TOOLS)
  └─> For each tool_call in response.tool_calls:
      │
      ├─> retrieve_context (Consolidated State Tool)
      │   └─> tools.py:122-178
      │       ├─> Reads state["conversation_history"]
      │       ├─> Reads state["escalation_context"]
      │       ├─> Confirms preloaded docs available in system prompt
      │       └─> Returns Command(update={
      │             "messages": [ToolMessage(
      │               content="# RETRIEVED CONTEXT\n\n## Conversation History...\n\n## Escalation Context...\n\n## Preloaded Documentation\nAll documentation has been preloaded..."
      │             )]
      │           })
      │
      ├─> submit_response (Consolidated Output Tool)
      │   └─> tools.py:181-230
      │       ├─> Validates message, reasoning, tone, relevant_docs, key_points
      │       ├─> Updates state["proposed_response"]
      │       ├─> Updates state["accessed_documents"]
      │       └─> Returns Command(update={...})
      │
      └─> MCP Tool (e.g., memory operations - if configured)
          └─> tools.py:_create_tool_wrapper()
              └─> mcp_client.call_tool("memory_operation", {...})
                  └─> HTTP POST to gateway:8808/message
                      └─> MCP Gateway routes to memory server

  └─> Tool results appended to state["messages"]
      └─> Edge: tools → call_model (LOOP CONTINUES)

4. Loop Termination

The agent cycles through call_model → tools → call_model until:

Model returns a response without tool calls
Recursion limit reached (default: 50)
is_last_step=True safety check triggers

5. Final State & Response

Final State:
{
  "messages": [
    HumanMessage("Please analyze..."),
    AIMessage("I'll check the docs", tool_calls=[...]),
    ToolMessage("Blueprint contents: ..."),
    AIMessage("Based on the docs, I'll set the response"),
    ToolMessage("Response set successfully")
  ],
  "proposed_response": {
    "message": "I understand your frustration...",
    "reasoning": "Member is frustrated due to...",
    "suggested_tone": "empathetic_and_solution_focused",
    "relevant_docs": ["samples.md#apologies"],
    "key_points": [...]
  },
  "accessed_documents": ["docs/blueprint.md", "docs/samples.md"]
}

LangGraph streams response via Server-Sent Events:
  - Event: metadata (run_id)
  - Event: messages (each AI/Tool message)
  - Event: debug (node execution)
  - Event: values (final state)
  - Event: end

Key Architectural Patterns

1. State Injection for Local Tools

@tool
def get_conversation_history(
    state: Annotated[dict, InjectedState]  # LangGraph injects current state
) -> Command:
    return Command(update={"messages": [...]})  # Updates state

2. MCP Tool Wrapping

def _create_tool_wrapper(tool_def: Dict) -> BaseTool:
    async def wrapper(**kwargs):
        result = await mcp_client.call_tool(name, kwargs)  # HTTP to gateway
        return result
    return StructuredTool(name=..., func=wrapper, args_schema=...)

3. Provider Abstraction

def load_chat_model(model_name: str, **provider_config):
    provider, model = model_name.split("/")  # "azure/gpt-4"

    if provider == "azure":
        return AzureChatOpenAI(...)
    elif provider == "anthropic":
        return ChatAnthropic(...)
    # ... etc

Complete Call Stack

User HTTP Request
  └─> LangGraph Server (port 2024)
      └─> StateGraph.compile().ainvoke()
          └─> __start__ node
              └─> call_model node
                  ├─> Configuration.from_runnable_config()
                  ├─> load_chat_model()
                  │   └─> AzureChatOpenAI(...) or ChatAnthropic(...) or ChatOpenAI(...)
                  ├─> model.bind_tools(TOOLS)
                  └─> model.ainvoke([system_prompt, *messages])
                      └─> route_model_output()
                          ├─> IF no tool_calls → __end__
                          └─> IF tool_calls → tools node
                              └─> ToolNode(TOOLS)
                                  ├─> Local Tool → Command(update={...})
                                  └─> MCP Tool → HTTP to gateway:8808
                                      └─> MCP Gateway → MCP Server
                                          └─> Tool execution → ToolMessage
                                              └─> Edge: tools → call_model (LOOP)

This architecture enables:

✅ Separation of concerns: LLM reasoning separate from tool execution
✅ Extensibility: Easy to add new tools via MCP or local functions
✅ Provider flexibility: Swap LLM providers via configuration
✅ State management: Clean state updates via Command pattern
✅ Observability: LangSmith tracing of entire execution flow

Getting Started

Prerequisites

Python 3.11 or higher
uv (recommended) or pip for package management
Node.js and npm (for MCP servers)
An API key for your chosen LLM provider:
- Anthropic (Claude models)
- OpenAI (GPT models)
- Azure OpenAI (Azure-hosted GPT models)
- OpenRouter (Access to multiple providers)

1. Setup Environment

Create a .env file in the project root with your API keys:

# LLM Configuration - supports openai, anthropic, openrouter, azure
# Choose ONE provider by setting LLM_MODEL:

# Option 1: Anthropic (recommended)
LLM_MODEL=anthropic/claude-3-5-sonnet-20241022
ANTHROPIC_API_KEY=your_api_key_here

# Option 2: OpenAI
# LLM_MODEL=openai/gpt-4o
# OPENAI_API_KEY=your_openai_api_key

# Option 3: Azure OpenAI
# LLM_MODEL=azure/your-deployment-name
# AZURE_OPENAI_API_KEY=your_azure_api_key
# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
# AZURE_OPENAI_API_VERSION=2024-10-21

# Option 4: OpenRouter
# LLM_MODEL=openrouter/anthropic/claude-3-5-sonnet-20241022
# OPENROUTER_API_KEY=your_openrouter_api_key
# OPENROUTER_BASE_URL=https://openrouter.ai/api/v1

# LangSmith Configuration - for tracing and monitoring (optional)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langsmith_key_here
LANGCHAIN_PROJECT=ohl-agent

Azure OpenAI Setup (Option 3)

If using Azure OpenAI, you need to gather configuration from your Azure Portal:

AZURE_OPENAI_ENDPOINT:
- Go to your Azure OpenAI resource → "Keys and Endpoint"
- Copy the "Endpoint" URL (e.g., https://your-resource.openai.azure.com/)
AZURE_OPENAI_API_KEY:
- Same location → Copy "KEY 1" or "KEY 2"
Deployment Name (for LLM_MODEL):
- Go to your Azure OpenAI resource → "Model deployments" (or Azure OpenAI Studio)
- Find your deployment name (e.g., gpt-4, gpt-35-turbo, etc.)
- Important: This is YOUR deployment name, not the model name
- Set LLM_MODEL=azure/your-deployment-name
AZURE_OPENAI_API_VERSION:
- Use a stable version like 2024-10-21 (recommended)
- See Azure OpenAI API versions for latest

2. Create Documentation Directory

The MCP gateway needs access to documentation. Create a directory at the same level as the project:

# From the project root
cd ..
mkdir docs
# Add your documentation files (blueprint.md, faq.md, samples.md) to this directory

3. Install Dependencies

Option A: Using uv (Recommended)

# Create and activate virtual environment for the agent
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install the agent package
uv pip install -e .

# Create and activate virtual environment for the gateway
cd gateway
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install the gateway package
uv pip install -e .
cd ..

Option B: Using pip with venv

# Create and activate virtual environment for the agent
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install the agent package
pip install -e .

# Create and activate virtual environment for the gateway
cd gateway
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install the gateway package
pip install -e .
cd ..

4. Configure Gateway

Update gateway/config.json to configure MCP servers:

{
  "mcp": {
    "servers": {
      "filesystem": {
        "command": "npx",
        "args": [
          "-y",
          "@modelcontextprotocol/server-filesystem",
          "[fully qualified path to]/docs"
        ]
      },
      "memory": {
        "command": "npx",
        "args": [
          "-y",
          "@modelcontextprotocol/server-memory"
        ]
      }
    }
  }
}

MCP Server Requirements

The agent uses MCP servers to access external resources. Here's what each server provides:

1. File System Server (Required)

Purpose: Access to documentation files (blueprint.md, faq.md, samples.md)
Path Configuration: Must point to a directory containing your documentation files
Example: "/Users/username/docs"
What to include:
- blueprint.md - Welcome call scripts and talking points
- faq.md - Common questions and answers
- samples.md - Response templates and examples

2. Memory Server (Optional)

Purpose: Stateful memory across conversations
Use case: Remembering context from previous interactions
Note: This is separate from LangGraph's built-in thread persistence

3. Provider Search (Optional - Advanced Feature)

If you want the agent to search and recommend medical providers, you need to:

A. Prepare Provider Data:

Create a providers/ subdirectory in your filesystem MCP path
Add provider CSV files organized by specialty (e.g., radiology.csv, cardiology.csv)
CSV files must include these columns:
- organization_name - Provider/facility name
- address_1, city, state, postal_code - Full address
- telephone_number - Contact phone
- taxonomy_desc - Provider specialty/type

Example CSV structure:

organization_name,address_1,city,state,postal_code,telephone_number,taxonomy_desc
Mandell & Blau Radiology,140 Main St,Middletown,CT,06457,860-346-7400,Diagnostic Radiology
Stamford Radiological Associates,76 Progress Dr,Stamford,CT,06902,203-359-0130,Diagnostic Radiology

B. Update Gateway Config:

{
  "mcp": {
    "servers": {
      "filesystem": {
        "command": "npx",
        "args": [
          "-y",
          "@modelcontextprotocol/server-filesystem",
          "/path/to/your/data"
        ]
      }
    }
  }
}

Where /path/to/your/data contains:

/path/to/your/data/
├── blueprint.md
├── faq.md
├── samples.md
└── providers/
    ├── radiology.csv
    ├── cardiology.csv
    └── primary_care.csv

C. How Provider Search Works:

When a member needs a provider:

Agent receives patient_data with member's ZIP code
Agent uses MCP filesystem tools to:
- List files in providers/ directory
- Read relevant specialty CSV files
- Parse provider data
Agent estimates distance from member's ZIP code:
- Same ZIP prefix (first 3 digits) = ~10-15 miles
- Same city = ~20 miles
- Same state = ~20-50 miles
Agent recommends 3-5 closest providers with contact info

Important Notes:

Provider search requires real provider data - the agent cannot generate fake providers
Without provider CSV files, the agent will refer members to the plan's provider directory website
Distance calculations are estimates based on ZIP code proximity, not precise GPS coordinates
The agent will remind members to verify network status and call ahead

5. Start the Servers

You need to run both servers. Open two terminal windows:

Terminal 1 - MCP Gateway:

cd gateway
source .venv/bin/activate  # Activate the gateway virtual environment
python3 -m mcp_gateway.server

The gateway will start on port 8808 and connect to the File System and Memory MCP servers.

Terminal 2 - LangGraph Dev Server:

# From the project root
source .venv/bin/activate  # Activate the agent virtual environment
langgraph dev

The LangGraph server will start on port 2024 and automatically:

Load the agent graph
Connect to the MCP gateway at http://localhost:8808
Discover and load all available tools (local + MCP tools)

You should see output indicating:

MCP Gateway started with filesystem and memory servers
LangGraph server started and loaded tools
Both servers ready to accept requests

6. Test the Agent

The agent can be invoked via the LangGraph API or through LangGraph Studio.

Via LangGraph Studio

Once both servers are running, LangGraph Studio will open in your browser at:

https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024

You can interact with the agent through the Studio UI by providing Medicare input.

Via API

The agent is invoked via HTTP POST to http://localhost:2024/runs/stream with a structured payload.

IMPORTANT: The messages field is required in the input. This field contains the initial prompt that instructs the agent to analyze the conversation. Even though the Medicare-specific data is in conversation_history and escalation_context, the agent needs at least one message in the messages array to start processing.

{
  "input": {
    "messages": [
      {
        "content": "Please analyze this escalated conversation and provide guidance for the human agent.",
        "type": "human"
      }
    ],
    "conversation_history": [
      {
        "role": "member",
        "content": "I haven't received my ID card yet",
        "timestamp": "2025-01-20T10:00:00Z"
      },
      {
        "role": "system",
        "content": "Your card was mailed on January 5th",
        "timestamp": "2025-01-20T10:00:30Z"
      },
      {
        "role": "member",
        "content": "That was 3 weeks ago! I need it now!",
        "timestamp": "2025-01-20T10:01:00Z"
      }
    ],
    "escalation_context": {
      "reason": "member_frustrated",
      "urgency": "high",
      "member_sentiment": "frustrated"
    }
  },
  "config": {
    "tags": [],
    "recursion_limit": 50,
    "configurable": {}
  },
  "metadata": {},
  "stream_mode": ["debug", "messages"],
  "stream_subgraphs": true,
  "assistant_id": "agent",
  "interrupt_before": [],
  "interrupt_after": [],
  "multitask_strategy": "rollback"
}

4. Consuming the Streaming Response

The /runs/stream endpoint returns Server-Sent Events (SSE) that must be parsed to extract the agent's response. The stream includes multiple event types:

metadata: Run metadata (run_id, etc.)
messages: AI messages and tool calls as they're generated
debug: Node execution lifecycle events (task start/completion)
values: Complete state snapshots after each node execution
error: Error events if execution fails
end: Stream completion signal

Key Requirements:

Parse SSE format: Events are separated by \n\n, with event: and data: lines
Handle incremental updates: The agent state evolves as the stream progresses
Extract final state: The proposed_response field appears in the final values event

For detailed information on consuming the streaming API, see:

Alternatively, use the /runs/wait endpoint for a simpler non-streaming response that returns the complete final state in a single HTTP response.

5. Response Format

The agent returns a structured response in state.proposed_response:

{
  "message": "I understand your frustration, [Member Name]. I'm truly sorry about the delay with your ID card. Let me help you right away. I can see that your card was mailed on January 5th, which is longer than our typical delivery time. I'd like to offer you two immediate solutions: First, I can help you print a temporary ID card from our member website right now, which you can use immediately. Second, I'll request a replacement card to be sent via expedited shipping. Would you like me to walk you through printing the temporary card?",
  
  "reasoning": "Member is frustrated due to delayed ID card delivery beyond normal timeframe. Using empathetic opening from samples.md#apologies-to-members, acknowledging the delay, and offering immediate actionable solutions per blueprint.md#verify-plan-information. Providing both immediate (temporary card) and long-term (replacement) solutions to address urgency.",
  
  "suggested_tone": "empathetic_and_solution_focused",
  
  "confidence_score": 0.7,
  
  "relevant_docs": [
    "samples.md#apologies-to-members",
    "blueprint.md#verify-plan-information",
    "faq.md#id-card-issues"
  ],
  
  "key_points": [
    "Acknowledge frustration and apologize for delay",
    "Explain the situation (card mailed but delayed)",
    "Offer immediate solution (temporary card)",
    "Offer long-term solution (expedited replacement)",
    "Provide clear next steps"
  ]
}

Confidence Score

The confidence_score field (0.0 to 1.0) represents the agent's confidence that the proposed response is appropriate and likely to resolve the member's issue. The UX can use this to determine when human review is most needed:

0.8 - 1.0 (High): Clear question with direct documentation match, positive sentiment
0.5 - 0.8 (Medium): Some complexity or minor documentation gaps, manageable sentiment
0.3 - 0.5 (Low): Significant complexity, member frustration, likely needs human intervention
0.0 - 0.3 (Very Low): Explicit request for human agent, high agitation, requires immediate escalation

Factors that lower confidence:

Member agitation or explicit request for human agent (major impact)
Documentation gaps or need for inferences (moderate impact)
Complex multi-step resolution required (moderate impact)
Ambiguity in member's request (moderate impact)

Configuration

MCP Gateway (`gateway/config.json`)

{
  "mcp": {
    "servers": {
      "filesystem": {
        "command": "npx",
        "args": [
          "-y",
          "@modelcontextprotocol/server-filesystem",
          "/Users/dan/code/ohl-agent/docs"
        ]
      },
      "memory": {
        "command": "npx",
        "args": [
          "-y",
          "@modelcontextprotocol/server-memory"
        ]
      }
    }
  }
}

Agent Configuration (`langgraph.json`)

{
  "dependencies": ["."],
  "graphs": {
    "agent": "./src/react_agent/graph.py:graph"
  },
  "env": ".env",
  "mcp": {
    "gateway_url": "http://localhost:8808"
  }
}

Key Features

1. High-Performance Documentation Access

Preloaded Documentation: All docs loaded into system prompt at startup for instant access
60-70% Faster Response Times: Eliminates HTTP round-trips to MCP gateway during reasoning
50-60% Cost Reduction: Fewer LLM calls due to consolidated tools
Verbatim Language Usage: Agent uses exact phrases from documentation for consistency
Tracks which documents were referenced in responses

2. Optimized Response Length

Data-Driven Constraints: Response length limits (50-100 words typical, max 125) based on analysis of actual documentation
89% Token Reduction: Reduced from 2,819 output tokens to ~200-300 tokens per response
91% Faster Generation: Response generation reduced from 34+ seconds to 3-5 seconds
Focused Responses: Agent provides actionable, concise guidance rather than exhaustive explanations

3. Stateful Threading with Platform-Provided Persistence

Automatic Checkpointing: LangGraph platform provides built-in persistence when running with langgraph dev
No Custom Checkpointer Required: Thread state is automatically managed by the platform
Thread API Support: Use /threads/{thread_id}/runs/stream endpoint for stateful conversations
Cross-Request Efficiency: State persists between requests within the same thread

4. Consolidated Tool Architecture

2 Optimized Tools: retrieve_context and submit_response replace 4 separate tools
Single-Call Context Retrieval: Get conversation history, escalation context, and docs confirmation in one call
Single-Call Response Submission: Set response and track documents in one call
Reduces reasoning chain length and LLM token usage

5. Structured Input/Output

Clear separation between conversation history and escalation context
Structured output with message, reasoning, tone, and references
Easy integration with existing systems

6. Compliance-Aware

System prompt emphasizes required disclaimers
Documentation includes compliance requirements
Agent trained to include necessary legal language

7. State Injection Pattern

Tools use LangGraph's InjectedState and InjectedToolCallId
Returns Command objects for state updates
Clean separation between local and MCP tools

Customization

Switching Between LLM Providers

You can easily switch between different LLM providers by updating the LLM_MODEL variable in your .env file:

Switch to Anthropic Claude:

LLM_MODEL=anthropic/claude-3-5-sonnet-20241022
# Ensure ANTHROPIC_API_KEY is set

Switch to OpenAI GPT:

LLM_MODEL=openai/gpt-4o
# Ensure OPENAI_API_KEY is set

Switch to Azure OpenAI:

LLM_MODEL=azure/your-deployment-name
# Ensure AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, and AZURE_OPENAI_API_VERSION are set

Switch to OpenRouter:

LLM_MODEL=openrouter/anthropic/claude-3-5-sonnet-20241022
# or
LLM_MODEL=openrouter/openai/gpt-4o
# Ensure OPENROUTER_API_KEY is set

After changing LLM_MODEL, restart the LangGraph dev server for changes to take effect.

Adding New Documentation

Add markdown files to the docs/ directory
Update load_documentation() in src/react_agent/docs_loader.py to include new files in required_docs list
Update the system prompt in src/react_agent/prompts.py to reference new documentation if needed
Restart LangGraph dev server - Documentation is loaded at module initialization, not dynamically

Important: Unlike MCP-based approaches, documentation changes require a server restart to take effect.

Modifying Response Structure

Update ProposedResponse dataclass in src/react_agent/state.py
Update submit_response tool in src/react_agent/tools.py
Update system prompt to reflect new structure

Changing Documentation Location

Update the docs_dir parameter in load_documentation() call or set DOCS_DIR environment variable:

Option 1: Modify prompts.py

# src/react_agent/prompts.py
_PRELOADED_DOCS = load_documentation(docs_dir="/path/to/your/docs")

Option 2: Use environment variable

# .env
DOCS_DIR=/path/to/your/docs

Then update docs_loader.py to read from os.getenv("DOCS_DIR") if provided.

Development

Running Tests

# Unit tests
pytest tests/unit_tests/

# Integration tests
pytest tests/integration_tests/

Local Development with LangGraph Studio

Install LangGraph Studio (guide) and open this project. The agent will automatically:

Connect to the local gateway server
Discover available tools
Make tools available for use in conversations

Troubleshooting

Tools Not Loading

Ensure MCP gateway is running on port 8808
Check gateway/config.json for correct paths
Review logs for tool initialization errors

State Not Updating

Verify tools return Command objects
Check that InjectedState and InjectedToolCallId are properly annotated
Ensure state field names match between tools and state classes

Documentation Not Accessible

Verify File System MCP server path in gateway/config.json
Check that docs directory exists and contains markdown files
Test MCP gateway directly: curl http://localhost:8808/tools

Azure OpenAI Issues

Error: `DeploymentNotFound` (404)

Problem: "The API deployment for this resource does not exist"

Solution: Your deployment name is incorrect. The deployment name in LLM_MODEL must match exactly what's in Azure.

Go to Azure Portal → Your Azure OpenAI resource → "Model deployments"
Find the actual deployment name (e.g., gpt-4, gpt-35-turbo, or your custom name)

Update .env:

LLM_MODEL=azure/your-actual-deployment-name
AZURE_OPENAI_DEPLOYMENT=your-actual-deployment-name

Common mistake: Using the resource name instead of deployment name
- ❌ Wrong: ohlazureaihubd0505198248 (resource name)
- ✅ Correct: gpt-4 or gpt-35-turbo (deployment name)

Error: Authentication Failed (401)

Problem: Invalid or expired API key

Solution:

Verify AZURE_OPENAI_API_KEY in .env is correct
Check key hasn't been regenerated in Azure Portal
Ensure no extra spaces or quotes around the key

Error: Invalid API Version

Problem: API version is deprecated or invalid

Solution:

Update AZURE_OPENAI_API_VERSION to a current version (e.g., 2024-10-21)
See Azure OpenAI API versions

Error: Endpoint Not Found

Problem: Incorrect endpoint URL format

Solution:

Verify endpoint format in .env:
- ✅ Correct: https://your-resource.openai.azure.com/
- ✅ Also valid: https://your-resource.services.ai.azure.com/
Must end with /
Must use https://

Documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
docs		docs
gateway		gateway
src/react_agent		src/react_agent
static		static
tests		tests
.codespellignore		.codespellignore
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml

License

stride-nyc/ohl-agent

Folders and files

Latest commit

History

Repository files navigation

Medicare Insurance Helper Agent - LangGraph ReAct Agent with MCP

Overview

Architecture

Components

State Structure

Available Tools

State Management Tools (Local)

Documentation (Preloaded)

Execution Flow & Call Stack

Startup Sequence

Request Processing Flow

Key Architectural Patterns

Complete Call Stack

Getting Started

Prerequisites

1. Setup Environment

Azure OpenAI Setup (Option 3)

2. Create Documentation Directory

3. Install Dependencies

Option A: Using uv (Recommended)

Option B: Using pip with venv

4. Configure Gateway

MCP Server Requirements

5. Start the Servers

6. Test the Agent

Via LangGraph Studio

Via API

4. Consuming the Streaming Response

5. Response Format

Confidence Score

Configuration

MCP Gateway (gateway/config.json)

Agent Configuration (langgraph.json)

Key Features

1. High-Performance Documentation Access

2. Optimized Response Length

3. Stateful Threading with Platform-Provided Persistence

4. Consolidated Tool Architecture

5. Structured Input/Output

6. Compliance-Aware

7. State Injection Pattern

Customization

Switching Between LLM Providers

Adding New Documentation

Modifying Response Structure

Changing Documentation Location

Development

Running Tests

Local Development with LangGraph Studio

Troubleshooting

Tools Not Loading

State Not Updating

Documentation Not Accessible

Azure OpenAI Issues

Error: DeploymentNotFound (404)

Error: Authentication Failed (401)

Error: Invalid API Version

Error: Endpoint Not Found

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

MCP Gateway (`gateway/config.json`)

Agent Configuration (`langgraph.json`)

Error: `DeploymentNotFound` (404)

Packages