diff --git a/plugin/skills/azure-aigateway/SKILL.md b/plugin/skills/azure-aigateway/SKILL.md index 5e6aa28d..00050d48 100644 --- a/plugin/skills/azure-aigateway/SKILL.md +++ b/plugin/skills/azure-aigateway/SKILL.md @@ -1,747 +1,163 @@ --- name: azure-aigateway -description: Bootstrap and configure Azure API Management as an AI Gateway for securing, observing, and controlling AI models, tools (MCP Servers), and agents. Use this skill when setting up a gateway for models or tools, rate limiting model/tool requests, adding semantic caching, content safety, or load balancing to AI endpoints. +description: >- + Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. + Use this skill for: (1) AI-specific policies (semantic caching, token limits, content safety, load balancing), + (2) Governance of AI models (cost control, usage metrics), MCP tools (rate limiting), and agents (jailbreak detection), + (3) Adding AI backends from Azure OpenAI or AI Foundry, (4) Testing AI endpoints through the gateway. + For deploying APIM or general API policies, use the azure-deploy skill. + Trigger phrases: "configure my model", "configure my tool", "add Azure OpenAI backend", "add AI Foundry model", + "semantic caching", "token limits", "content safety", "protect my AI model", "rate limit MCP", "jailbreak detection", + "test AI gateway", "AI governance", "LLM policies", "add model to gateway", "configure AI backend". +metadata: + author: microsoft + version: "3.0" +compatibility: Requires Azure CLI (az) for configuration and testing --- # Azure AI Gateway -Bootstrap and configure Azure API Management (APIM) as an AI Gateway for securing, observing, and controlling AI models, tools (MCP Servers), and agents. - -## Skill Activation Triggers - -**Use this skill immediately when the user asks to:** -- "Set up a gateway for my model" -- "Set up a gateway for my tools" -- "Set up a gateway for my agents" -- "Add a gateway to my MCP server" -- "Protect my AI model with a gateway" -- "Secure my AI agents" -- "Ratelimit my model requests" -- "Ratelimit my tool requests" -- "Limit tokens for my model" -- "Add rate limiting to my MCP server" -- "Enable semantic caching for my AI API" -- "Add content safety to my AI endpoint" -- "Add my model behind gateway" -- "Import API from OpenAPI spec" -- "Add API to gateway from swagger" -- "Convert my API to MCP" -- "Expose my API as MCP server" - -**Key Indicators:** -- User deploying Azure OpenAI, AI Foundry, or other AI models -- User creating or managing MCP servers -- User needs token limits, rate limiting, or quota management -- User wants to cache AI responses to reduce costs -- User needs content filtering or safety controls -- User wants load balancing across multiple AI backends - -**Secondary Triggers (Proactive Recommendations):** -- After model creation: Recommend AI Gateway for security, caching, and token limits -- After MCP server creation: Recommend AI Gateway for rate limiting, content safety, and auth - -## Overview - -Azure API Management serves as an AI Gateway that provides: -- **Security**: Authentication, authorization, and content safety -- **Observability**: Token metrics, logging, and monitoring -- **Control**: Rate limiting, token limits, and load balancing -- **Optimization**: Semantic caching to reduce costs and latency +Configure Azure API Management (APIM) as an AI Gateway for governing AI models, MCP tools, and agents. -``` -AI Models ──┐ ┌── Azure OpenAI -MCP Tools ──┼── AI Gateway (APIM) ──┼── AI Foundry -Agents ─────┘ └── Custom Models -``` - -## Key Resources - -- **GitHub Repo**: https://github.com/Azure-Samples/AI-Gateway (aka.ms/aigateway) -- **Docs**: - - [GenAI Gateway Capabilities](https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities) - - [MCP Server Overview](https://learn.microsoft.com/en-us/azure/api-management/mcp-server-overview) - - [Azure AI Foundry API](https://learn.microsoft.com/en-us/azure/api-management/azure-ai-foundry-api) - - [Semantic Caching](https://learn.microsoft.com/en-us/azure/api-management/azure-openai-enable-semantic-caching) - - [Token Limits & LLM Logs](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-llm-logs) - -## Configuration Rules - -**Default to `Basicv2` SKU** when creating new APIM instances: -- Cheaper than other tiers -- Creates quickly (~5-10 minutes vs 30+ for Premium) -- Supports all AI Gateway policies - -## Pattern 1: Quick Bootstrap AI Gateway - -Deploy APIM with Basicv2 SKU for AI workloads. - -```bash -# Create resource group -az group create --name rg-aigateway --location eastus - -# Deploy APIM with Bicep -az deployment group create \ - --resource-group rg-aigateway \ - --template-file main.bicep \ - --parameters apimSku=Basicv2 -``` +> **To deploy APIM**, use the **azure-deploy** skill. -### Bicep Template - -```bicep -param location string = resourceGroup().location -param apimSku string = 'Basicv2' -param apimManagedIdentityType string = 'SystemAssigned' - -// NOTE: Using 2024-06-01-preview because Basicv2 SKU support currently requires this preview API version. -// Update to the latest stable (GA) API version once Basicv2 is available there. -resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' = { - name: 'apim-aigateway-${uniqueString(resourceGroup().id)}' - location: location - sku: { - name: apimSku - capacity: 1 - } - properties: { - publisherEmail: 'admin@contoso.com' - publisherName: 'Contoso' - } - identity: { - type: apimManagedIdentityType - } -} - -output gatewayUrl string = apimService.properties.gatewayUrl -output principalId string = apimService.identity.principalId -``` - -## Pattern 2: Semantic Caching - -Cache similar prompts to reduce costs and latency. - -```xml - - - - - - - - - - - - - -``` +## When to Use This Skill -**Options:** -| Parameter | Range | Description | -|-----------|-------|-------------| -| `score-threshold` | 0.7-0.95 | Higher = stricter matching | -| `duration` | 60-3600 | Cache TTL in seconds | - -## Pattern 3: Token Rate Limiting - -Limit tokens per minute to control costs and prevent abuse. - -```xml - - - - - - - - -``` - -**Options:** -| Parameter | Values | Description | -|-----------|--------|-------------| -| `counter-key` | Subscription.Id, Request.IpAddress, custom | Grouping key for limits | -| `tokens-per-minute` | 100-100000 | Token quota | -| `estimate-prompt-tokens` | true/false | true = faster but less accurate | - -## Pattern 4: Content Safety - -Filter harmful content and detect jailbreak attempts. - -```xml - - - - - - - - - - - - - - custom-blocklist - - - - -``` +| Category | Triggers | +|----------|----------| +| **Model Governance** | "semantic caching", "token limits", "load balance AI", "track token usage" | +| **Tool Governance** | "rate limit MCP", "protect my tools", "configure my tool" | +| **Agent Governance** | "content safety", "jailbreak detection", "filter harmful content" | +| **Configuration** | "add Azure OpenAI backend", "configure my model", "add AI Foundry" | +| **Testing** | "test AI gateway", "call OpenAI through gateway" | -**Options:** -| Parameter | Range | Description | -|-----------|-------|-------------| -| `threshold` | 0-7 | 0=safe, 7=severe | -| `shield-prompt` | true/false | Detect jailbreak attempts | - -## Pattern 5: Rate Limits for MCPs/OpenAPI Tools - -Protect MCP servers and tools with request rate limiting. - -```xml - - - - - - - - - @(context.Variables.GetValueOrDefault("remainingCalls", 0).ToString()) - - - - -``` - -## Pattern 6: Managed Identity Authentication - -Secure backend access with managed identity instead of API keys. - -```xml - - - - - - - @("Bearer " + (string)context.Variables["managed-id-access-token"]) - - - - - - - - - - -``` - -## Pattern 7: Load Balancing with Retry - -Distribute load across multiple backends with automatic failover. - -```xml - - - - - - - - - - - - - - - - - - - - -``` - -## Pattern 8: Add AI Foundry Model Behind Gateway - -When user asks to "add my model behind gateway", first discover available models from Azure AI Foundry, then ask which model to add. - -### Step 1: Discover AI Foundry Projects and Available Models - -```bash -# Set environment variables -accountName="" -resourceGroupName="" - -# List AI Foundry resources (AI Services accounts) -az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, resourceGroup:resourceGroup, location:location}" -o table - -# List available models in the AI Foundry resource -az cognitiveservices account list-models \ - -n $accountName \ - -g $resourceGroupName \ - | jq '.[] | { name: .name, format: .format, version: .version, sku: .skus[0].name, capacity: .skus[0].capacity.default }' - -# List already deployed models -az cognitiveservices account deployment list \ - -n $accountName \ - -g $resourceGroupName -``` - -### Step 2: Ask User Which Model to Add - -After listing the available models, **use the ask_user tool** to present the models as choices and let the user select which model to add behind the gateway. - -Example choices to present: -- Model deployments from the discovered list -- Include model name, format (provider), version, and SKU info - -### Step 3: Deploy the Model (if not already deployed) - -```bash -# Deploy the selected model to AI Foundry -az cognitiveservices account deployment create \ - -n $accountName \ - -g $resourceGroupName \ - --deployment-name \ - --model-name \ - --model-version \ - --model-format \ - --sku-capacity 1 \ - --sku-name -``` - -### Step 4: Configure APIM Backend for Selected Model - -```bash -# Get the AI Foundry inference endpoint -ENDPOINT=$(az cognitiveservices account show \ - -n $accountName \ - -g $resourceGroupName \ - | jq -r '.properties.endpoints["Azure AI Model Inference API"]') - -# Create APIM backend for the selected model -az apim backend create \ - --resource-group \ - --service-name \ - --backend-id -backend \ - --protocol http \ - --url "${ENDPOINT}" -``` - -### Step 5: Create API and Apply Policies - -```bash -# Import Azure OpenAI API specification -az apim api import \ - --resource-group \ - --service-name \ - --path \ - --specification-format OpenApiJson \ - --specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json" -``` - -### Step 6: Grant APIM Access to AI Foundry - -```bash -# Get APIM managed identity principal ID -APIM_PRINCIPAL_ID=$(az apim show \ - --name \ - --resource-group \ - --query "identity.principalId" -o tsv) - -# Get AI Foundry resource ID -AI_RESOURCE_ID=$(az cognitiveservices account show \ - -n $accountName \ - -g $resourceGroupName \ - --query "id" -o tsv) - -# Assign Cognitive Services User role -az role assignment create \ - --assignee $APIM_PRINCIPAL_ID \ - --role "Cognitive Services User" \ - --scope $AI_RESOURCE_ID -``` +--- -### Bicep Template for Backend Configuration - -```bicep -param apimServiceName string -param backendId string -param aiFoundryEndpoint string -param modelDeploymentName string - -resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = { - name: apimServiceName -} - -resource backend 'Microsoft.ApiManagement/service/backends@2024-06-01-preview' = { - parent: apimService - name: backendId - properties: { - protocol: 'http' - url: '${aiFoundryEndpoint}openai/deployments/${modelDeploymentName}' - credentials: { - header: {} - } - tls: { - validateCertificateChain: true - validateCertificateName: true - } - } -} +## Architecture + +``` + ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ + │ Users │ │ Agents │ │ Apps │ + └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ + │ │ │ + ▼ ▼ ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ AI Gateway (APIM) │ +│ Secure • Observe • Control │ +├─────────────────────────────────────────────────────────────────┤ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Models │ │ Tools │ │ Agents │ │ +│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │ +│ │ Token Limits│ │ Rate Limits │ │Content Safety│ │ +│ │ Sem. Cache │ │ Auth/AuthZ │ │Jailbreak Det.│ │ +│ │ Load Balance│ │ Quotas │ │ Filtering │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ │ +└─────────────────────────────────────────────────────────────────┘ + │ │ │ + ▼ ▼ ▼ + ┌──────────┐ ┌──────────┐ ┌──────────┐ + │Azure AOAI│ │MCP Server│ │AI Agents │ + │AI Foundry│ │Tool APIs │ │ Backends │ + │Custom LLM│ │Functions │ │ Services │ + └──────────┘ └──────────┘ └──────────┘ + Models Tools Agents ``` -## Pattern 9: Import API from OpenAPI Specification +--- -Add an API to the gateway from an OpenAPI/Swagger specification, either from a local file or web URL. +## Quick Reference -### Step 1: Import API from Web URL +| Policy | Purpose | Details | +|--------|---------|---------| +| `azure-openai-token-limit` | Cost control | [Model Policies](references/policies.md#token-rate-limiting) | +| `azure-openai-semantic-cache-lookup/store` | 60-80% cost savings | [Model Policies](references/policies.md#semantic-caching) | +| `azure-openai-emit-token-metric` | Observability | [Model Policies](references/policies.md#token-metrics) | +| `llm-content-safety` | Safety & compliance | [Agent Policies](references/policies.md#content-safety) | +| `rate-limit-by-key` | MCP/tool protection | [Tool Policies](references/policies.md#request-rate-limiting) | -```bash -# Import API from a publicly accessible OpenAPI spec URL -az apim api import \ - --resource-group \ - --service-name \ - --api-id \ - --path \ - --display-name "" \ - --specification-format OpenApiJson \ - --specification-url "https://example.com/openapi.json" -``` +--- -### Step 2: Import API from Local File +## Get Gateway Details ```bash -# Import API from a local OpenAPI spec file (JSON or YAML) -az apim api import \ - --resource-group \ - --service-name \ - --api-id \ - --path \ - --display-name "" \ - --specification-format OpenApi \ - --specification-path "./openapi.yaml" -``` +# Get gateway URL +az apim show --name --resource-group --query "gatewayUrl" -o tsv -### Step 3: Configure Backend for the API +# List backends (AI models) +az apim backend list --service-name --resource-group \ + --query "[].{id:name, url:url}" -o table -```bash -# Create backend pointing to your API server -az apim backend create \ - --resource-group \ - --service-name \ - --backend-id \ - --protocol http \ - --url "https://your-api-server.com" - -# Update API to use the backend -az apim api update \ - --resource-group \ - --service-name \ - --api-id \ - --set properties.serviceUrl="https://your-api-server.com" +# Get subscription key +az apim subscription keys list \ + --service-name --resource-group --subscription-id ``` -### Step 4: Apply Policies (Optional) - -```xml - - - - - - - - - - - -``` - -### Supported Specification Formats - -| Format | Value | File Extension | -|--------|-------|----------------| -| OpenAPI 3.x JSON | `OpenApiJson` | `.json` | -| OpenAPI 3.x YAML | `OpenApi` | `.yaml`, `.yml` | -| Swagger 2.0 JSON | `SwaggerJson` | `.json` | -| Swagger 2.0 (link) | `SwaggerLinkJson` | URL | -| WSDL | `Wsdl` | `.wsdl` | -| WADL | `Wadl` | `.wadl` | - -## Pattern 10: Convert API to MCP Server - -Convert existing APIM API operations into an MCP (Model Context Protocol) server, enabling AI agents to discover and use your APIs as tools. - -### Prerequisites - -- APIM instance with Basicv2 SKU or higher -- Existing API imported into APIM -- MCP feature enabled on APIM +--- -### Step 1: List Existing APIs in APIM +## Test AI Endpoint ```bash -# List all APIs in APIM -az apim api list \ - --resource-group \ - --service-name \ - --query "[].{id:name, displayName:displayName, path:path}" \ - -o table -``` - -### Step 2: Ask User Which API to Convert +GATEWAY_URL=$(az apim show --name --resource-group --query "gatewayUrl" -o tsv) -After listing the APIs, **use the ask_user tool** to let the user select which API to convert to an MCP server. - -### Step 3: List API Operations - -```bash -# List all operations for the selected API -az apim api operation list \ - --resource-group \ - --service-name \ - --api-id \ - --query "[].{operationId:name, displayName:displayName, method:method, urlTemplate:urlTemplate}" \ - -o table +curl -X POST "${GATEWAY_URL}/openai/deployments//chat/completions?api-version=2024-02-01" \ + -H "Content-Type: application/json" \ + -H "Ocp-Apim-Subscription-Key: " \ + -d '{"messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}' ``` -### Step 4: Ask User Which Operations to Expose as MCP Tools +--- -After listing the operations, **use the ask_user tool** to present the operations as choices. Let the user select which operations to expose as MCP tools. Users may want to expose all operations or only a subset. +## Common Tasks -Example choices to present: -- All operations (convert entire API) -- Individual operations from the discovered list -- Include operation name, method, and URL template +### Add AI Backend -### Step 5: Enable MCP Server on APIM +See [references/patterns.md](references/patterns.md#pattern-1-add-ai-model-backend) for full steps. ```bash -# Enable MCP server capability (via ARM/Bicep or Portal) -# Note: MCP configuration is done via APIM policies and product configuration -``` +# Discover AI resources +az cognitiveservices account list --query "[?kind=='OpenAI']" -o table -### Step 6: Configure MCP Endpoint for API - -Create an MCP-compatible endpoint that exposes your API operations as tools: - -```xml - - - - - - - - - - application/json - - @{ - var tools = new JArray(); - // Define your API operations as MCP tools - tools.Add(new JObject( - new JProperty("name", "operation_name"), - new JProperty("description", "Description of what this operation does"), - new JProperty("inputSchema", new JObject( - new JProperty("type", "object"), - new JProperty("properties", new JObject( - new JProperty("param1", new JObject( - new JProperty("type", "string"), - new JProperty("description", "Parameter description") - )) - )) - )) - )); - return new JObject(new JProperty("tools", tools)).ToString(); - } - - - - - -``` +# Create backend +az apim backend create --service-name --resource-group \ + --backend-id openai-backend --protocol http --url "https://.openai.azure.com/openai" -### Step 7: Bicep Template for MCP-Enabled API - -```bicep -param apimServiceName string -param apiId string -param apiDisplayName string -param apiPath string -param backendUrl string - -resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = { - name: apimServiceName -} - -resource api 'Microsoft.ApiManagement/service/apis@2024-06-01-preview' = { - parent: apimService - name: apiId - properties: { - displayName: apiDisplayName - path: apiPath - protocols: ['https'] - serviceUrl: backendUrl - subscriptionRequired: true - // MCP endpoints - apiType: 'http' - } -} - -// MCP tools/list operation -resource mcpToolsListOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = { - parent: api - name: 'mcp-tools-list' - properties: { - displayName: 'MCP Tools List' - method: 'POST' - urlTemplate: '/mcp/tools/list' - description: 'List available MCP tools' - } -} - -// MCP tools/call operation -resource mcpToolsCallOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = { - parent: api - name: 'mcp-tools-call' - properties: { - displayName: 'MCP Tools Call' - method: 'POST' - urlTemplate: '/mcp/tools/call' - description: 'Call an MCP tool' - } -} +# Grant access (managed identity) +az role assignment create --assignee \ + --role "Cognitive Services User" --scope ``` -### Step 8: Test MCP Endpoint +### Apply AI Governance Policy -```bash -# Get APIM gateway URL -GATEWAY_URL=$(az apim show \ - --name \ - --resource-group \ - --query "gatewayUrl" -o tsv) - -# Test MCP tools/list endpoint -curl -X POST "${GATEWAY_URL}//mcp/tools/list" \ - -H "Content-Type: application/json" \ - -H "Ocp-Apim-Subscription-Key: " \ - -d '{}' -``` +Recommended policy order in ``: -### MCP Tool Definition Schema - -When converting API operations to MCP tools, use this schema: - -```json -{ - "tools": [ - { - "name": "get_weather", - "description": "Get current weather for a location", - "inputSchema": { - "type": "object", - "properties": { - "location": { - "type": "string", - "description": "City name or coordinates" - } - }, - "required": ["location"] - } - } - ] -} -``` +1. **Authentication** - Managed identity to backend +2. **Semantic Cache Lookup** - Check cache before calling AI +3. **Token Limits** - Cost control +4. **Content Safety** - Filter harmful content +5. **Backend Selection** - Load balancing +6. **Metrics** - Token usage tracking -### Reference +See [references/policies.md](references/policies.md#combining-policies) for complete example. -- [MCP Server Overview](https://learn.microsoft.com/en-us/azure/api-management/mcp-server-overview) -- [MCP from API Lab](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/mcp-from-api) - -## Lab References (AI-Gateway Repo) - -**Essential Labs to Get Started:** - -| Scenario | Lab | Description | -|----------|-----|-------------| -| Semantic Caching | [semantic-caching](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/semantic-caching) | Cache similar prompts to reduce costs | -| Token Rate Limiting | [token-rate-limiting](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/token-rate-limiting) | Limit tokens per minute | -| Content Safety | [content-safety](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/content-safety) | Filter harmful content | -| Load Balancing | [backend-pool-load-balancing](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/backend-pool-load-balancing) | Distribute load across backends | -| MCP from API | [mcp-from-api](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/mcp-from-api) | Convert OpenAPI to MCP server | -| Zero to Production | [zero-to-production](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/zero-to-production) | Complete production setup guide | - -**Find more labs at:** https://github.com/Azure-Samples/AI-Gateway/tree/main/labs - -## Quick Start Checklist - -### Prerequisites -- [ ] Azure subscription created -- [ ] Azure CLI installed and authenticated (`az login`) -- [ ] Resource group created for AI Gateway resources +--- -### Deployment -- [ ] Deploy APIM with Basicv2 SKU -- [ ] Configure managed identity -- [ ] Add backend for Azure OpenAI or AI Foundry -- [ ] Apply policies (caching, rate limits, content safety) +## Troubleshooting -### Verification -- [ ] Test API endpoint through gateway -- [ ] Verify token metrics in Application Insights -- [ ] Check rate limiting headers in response -- [ ] Validate content safety filtering +| Issue | Solution | +|-------|----------| +| Token limit 429 | Increase `tokens-per-minute` or add load balancing | +| No cache hits | Lower `score-threshold` to 0.7 | +| Content false positives | Increase category thresholds (5-6) | +| Backend auth 401 | Grant APIM "Cognitive Services User" role | -## Best Practices +See [references/troubleshooting.md](references/troubleshooting.md) for details. -| Practice | Description | -|----------|-------------| -| **Default to Basicv2** | Use Basicv2 SKU for cost/speed optimization | -| **Use managed identity** | Prefer managed identity over API keys for backend auth | -| **Enable token metrics** | Use `azure-openai-emit-token-metric` for cost tracking | -| **Semantic caching** | Cache similar prompts to reduce costs (60-80% savings possible) | -| **Rate limit by key** | Use subscription ID or IP for granular rate limiting | -| **Content safety** | Enable `shield-prompt` to detect jailbreak attempts | +--- -## Troubleshooting +## References -| Issue | Symptom | Solution | -|-------|---------|----------| -| **Slow APIM creation** | Deployment takes 30+ minutes | Use Basicv2 SKU instead of Premium | -| **Token limit exceeded** | 429 response | Increase `tokens-per-minute` or add load balancing | -| **Cache not working** | No cache hits | Lower `score-threshold` (e.g., 0.7) | -| **Content blocked** | False positives | Increase category thresholds | -| **Backend auth fails** | 401 from Azure OpenAI | Assign Cognitive Services User role to APIM managed identity | -| **Rate limit too strict** | Legitimate requests blocked | Increase `calls` or `renewal-period` | - -## Additional Resources - -- [Azure API Management Documentation](https://learn.microsoft.com/azure/api-management/) -- [AI Gateway Samples Repository](https://github.com/Azure-Samples/AI-Gateway) -- [APIM Policies Reference](https://learn.microsoft.com/azure/api-management/api-management-policies) -- [Azure OpenAI Integration](https://learn.microsoft.com/azure/api-management/azure-openai-api-from-specification) +- [**Detailed Policies**](references/policies.md) - Full policy examples +- [**Configuration Patterns**](references/patterns.md) - Step-by-step patterns +- [**Troubleshooting**](references/troubleshooting.md) - Common issues +- [AI-Gateway Samples](https://github.com/Azure-Samples/AI-Gateway) +- [GenAI Gateway Docs](https://learn.microsoft.com/azure/api-management/genai-gateway-capabilities) diff --git a/plugin/skills/azure-aigateway/references/patterns.md b/plugin/skills/azure-aigateway/references/patterns.md new file mode 100644 index 00000000..3f5ed55f --- /dev/null +++ b/plugin/skills/azure-aigateway/references/patterns.md @@ -0,0 +1,288 @@ +# Azure AI Gateway Configuration Patterns + +This document contains patterns for configuring Azure API Management as an AI Gateway for AI models, MCP tools, and agents. + +> **For deploying a new APIM instance**, see the [azure-deploy skill](../../azure-deploy/reference/apim.md). + +--- + +## Pattern 1: Add AI Model Backend + +Configure a backend for your Azure OpenAI or AI Foundry model. + +### Step 1: Discover AI Resources + +```bash +# List Azure OpenAI resources +az cognitiveservices account list \ + --query "[?kind=='OpenAI'].{name:name, resourceGroup:resourceGroup, location:location}" -o table + +# List AI Foundry resources +az cognitiveservices account list \ + --query "[?kind=='AIServices'].{name:name, resourceGroup:resourceGroup, location:location}" -o table + +# List model deployments +az cognitiveservices account deployment list \ + --name \ + --resource-group -o table +``` + +### Step 2: Ask User Which Model + +Use `ask_user` tool to present discovered models and let user select which to add. + +### Step 3: Create Backend + +```bash +# Get the endpoint +ENDPOINT=$(az cognitiveservices account show \ + --name \ + --resource-group \ + --query "properties.endpoint" -o tsv) + +# Create backend +az apim backend create \ + --service-name \ + --resource-group \ + --backend-id openai-backend \ + --protocol http \ + --url "${ENDPOINT}openai" +``` + +### Step 4: Grant Access (Managed Identity) + +```bash +# Get APIM principal ID +APIM_PRINCIPAL=$(az apim show \ + --name \ + --resource-group \ + --query "identity.principalId" -o tsv) + +# Get AI resource ID +AI_RESOURCE_ID=$(az cognitiveservices account show \ + --name \ + --resource-group \ + --query "id" -o tsv) + +# Assign Cognitive Services User role +az role assignment create \ + --assignee $APIM_PRINCIPAL \ + --role "Cognitive Services User" \ + --scope $AI_RESOURCE_ID +``` + +### Step 5: Apply Governance Policies + +Apply AI governance policies from [policies.md](policies.md): +- Token limits for cost control +- Semantic caching for cost reduction +- Content safety for protection +- Load balancing for high availability + +--- + +## Pattern 2: Add Backend Pool (Load Balancing) + +Distribute load across multiple AI model deployments. + +### Step 1: Create Multiple Backends + +```bash +# Backend 1 - East US +az apim backend create \ + --service-name \ + --resource-group \ + --backend-id openai-eastus \ + --protocol http \ + --url "https://aoai-eastus.openai.azure.com/openai" + +# Backend 2 - West US +az apim backend create \ + --service-name \ + --resource-group \ + --backend-id openai-westus \ + --protocol http \ + --url "https://aoai-westus.openai.azure.com/openai" +``` + +### Step 2: Create Backend Pool (Bicep) + +```bicep +resource backendPool 'Microsoft.ApiManagement/service/backends@2024-06-01-preview' = { + parent: apimService + name: 'openai-backend-pool' + properties: { + type: 'Pool' + pool: { + services: [ + { id: '/backends/openai-eastus', weight: 50, priority: 1 } + { id: '/backends/openai-westus', weight: 50, priority: 1 } + ] + } + } +} +``` + +### Step 3: Apply Retry Policy + +```xml + + + + + + +``` + +--- + +## Pattern 3: Configure Semantic Caching + +Cache similar prompts to reduce costs (60-80% savings possible). + +### Prerequisites + +- An embeddings model deployment (e.g., text-embedding-ada-002) +- Azure Cache for Redis (for production) or internal cache + +### Step 1: Create Embeddings Backend + +```bash +az apim backend create \ + --service-name \ + --resource-group \ + --backend-id embeddings-backend \ + --protocol http \ + --url "https://aoai.openai.azure.com/openai" +``` + +### Step 2: Apply Policy + +```xml + + + + + + +``` + +### Tuning + +| Threshold | Cache Hits | Accuracy | +|-----------|------------|----------| +| 0.7 | High | Lower (broader matching) | +| 0.8 | Medium | Balanced (recommended) | +| 0.9 | Low | High (strict matching) | + +--- + +## Pattern 4: Configure Content Safety + +Protect AI endpoints with content filtering and jailbreak detection. + +### Prerequisites + +- Azure AI Content Safety resource + +### Step 1: Create Content Safety Backend + +```bash +# Get Content Safety endpoint +CS_ENDPOINT=$(az cognitiveservices account show \ + --name \ + --resource-group \ + --query "properties.endpoint" -o tsv) + +# Create backend +az apim backend create \ + --service-name \ + --resource-group \ + --backend-id content-safety-backend \ + --protocol http \ + --url "${CS_ENDPOINT}" +``` + +### Step 2: Grant Access + +```bash +# Get Content Safety resource ID +CS_RESOURCE_ID=$(az cognitiveservices account show \ + --name \ + --resource-group \ + --query "id" -o tsv) + +# Assign role +az role assignment create \ + --assignee $APIM_PRINCIPAL \ + --role "Cognitive Services User" \ + --scope $CS_RESOURCE_ID +``` + +### Step 3: Apply Policy + +```xml + + + + + + + + +``` + +--- + +## Pattern 5: Convert API to MCP Server + +Convert existing APIM API operations into MCP server for AI agent tool discovery. + +### Step 1: List APIs + +```bash +az apim api list \ + --service-name \ + --resource-group \ + --query "[].{id:name, displayName:displayName, path:path}" -o table +``` + +### Step 2: Select API + +Use `ask_user` tool to let user select which API to convert. + +### Step 3: List Operations + +```bash +az apim api operation list \ + --service-name \ + --resource-group \ + --api-id \ + --query "[].{id:name, method:method, url:urlTemplate}" -o table +``` + +### Step 4: Configure MCP Endpoints + +Add MCP tools/list and tools/call operations with appropriate policies. + +### Reference + +- [MCP Server Overview](https://learn.microsoft.com/azure/api-management/mcp-server-overview) + +--- + +## Lab References + +| Scenario | Lab | +|----------|-----| +| Semantic Caching | [semantic-caching](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/semantic-caching) | +| Token Rate Limiting | [token-rate-limiting](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/token-rate-limiting) | +| Content Safety | [content-safety](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/content-safety) | +| Load Balancing | [backend-pool-load-balancing](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/backend-pool-load-balancing) | +| MCP from API | [mcp-from-api](https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/mcp-from-api) | + +**All labs:** https://github.com/Azure-Samples/AI-Gateway/tree/main/labs diff --git a/plugin/skills/azure-aigateway/references/policies.md b/plugin/skills/azure-aigateway/references/policies.md new file mode 100644 index 00000000..84b39e2a --- /dev/null +++ b/plugin/skills/azure-aigateway/references/policies.md @@ -0,0 +1,431 @@ +# Azure AI Gateway Policy Reference + +This document contains detailed policy patterns for governing AI models, MCP tools, and agents through Azure API Management. + +## Table of Contents + +### Model Governance +- [Semantic Caching](#semantic-caching) +- [Token Rate Limiting](#token-rate-limiting) +- [Token Metrics](#token-metrics) +- [Load Balancing with Retry](#load-balancing-with-retry) +- [Managed Identity Authentication](#managed-identity-authentication) + +### Tool Governance (MCP Servers) +- [Request Rate Limiting](#request-rate-limiting) + +### Agent Governance +- [Content Safety](#content-safety) + +### Complete Examples +- [Combining Policies](#combining-policies) + +--- + +# Model Governance Policies + +## Semantic Caching + +Cache similar prompts to reduce costs and latency. + +```xml + + + + + + + + + + + + + +``` + +### Configuration Options + +| Parameter | Range | Description | +|-----------|-------|-------------| +| `score-threshold` | 0.7-0.95 | Higher = stricter matching. Use 0.7-0.8 for broader caching, 0.9+ for exact matching | +| `duration` | 60-3600 | Cache TTL in seconds. Start with 120, adjust based on content freshness needs | +| `embeddings-backend-id` | string | Backend ID for embeddings model | +| `embeddings-backend-auth` | system-assigned, user-assigned | Authentication method for embeddings backend | + +### When to Use + +- Reduce costs on repetitive or similar queries (60-80% savings possible) +- Lower latency for common prompts +- FAQ-style applications with predictable queries + +### Tips + +- Start with `score-threshold="0.8"` and adjust based on cache hit rate +- Lower threshold = more cache hits but potentially less relevant responses +- Monitor cache metrics to optimize threshold + +--- + +## Token Rate Limiting + +Limit tokens per minute to control costs and prevent abuse. + +```xml + + + + + + + + +``` + +### Configuration Options + +| Parameter | Values | Description | +|-----------|--------|-------------| +| `counter-key` | Subscription.Id, Request.IpAddress, custom | Grouping key for limits | +| `tokens-per-minute` | 100-100000 | Token quota per key | +| `estimate-prompt-tokens` | true/false | true = faster but less accurate | +| `remaining-tokens-variable-name` | string | Variable to store remaining tokens | + +### Counter Key Examples + +```xml + +counter-key="@(context.Subscription.Id)" + + +counter-key="@(context.Request.IpAddress)" + + +counter-key="@(context.Request.Headers.GetValueOrDefault('Authorization','').AsJwt()?.Claims['sub'])" + + +counter-key="@(context.Request.Headers.GetValueOrDefault('X-API-Key','anonymous'))" +``` + +### When to Use + +- Control costs by limiting token consumption +- Prevent single user from exhausting quota +- Implement tiered pricing based on subscription level + +--- + +## Content Safety + +Filter harmful content and detect jailbreak attempts. + +```xml + + + + + + + + + + + + + + custom-blocklist + + + + +``` + +### Configuration Options + +| Parameter | Range | Description | +|-----------|-------|-------------| +| `threshold` | 0-7 | 0=safe, 7=severe. Start with 4 for balanced filtering | +| `shield-prompt` | true/false | Detect jailbreak attempts | +| `output-type` | FourSeverityLevels, EightSeverityLevels | Granularity of severity levels | + +### Categories + +| Category | Description | +|----------|-------------| +| `Hate` | Hate speech, discrimination | +| `Sexual` | Sexual content | +| `SelfHarm` | Self-harm content | +| `Violence` | Violent content | + +### Threshold Guidance + +| Threshold | Use Case | +|-----------|----------| +| 0-2 | Very strict filtering (children's apps) | +| 3-4 | Balanced filtering (general use) | +| 5-6 | Permissive (adult audiences with warnings) | +| 7 | Only block extreme content | + +### When to Use + +- Consumer-facing AI applications +- Compliance requirements +- Protecting brand reputation + +--- + +## Request Rate Limiting + +Protect MCP servers and tools with request rate limiting. + +```xml + + + + + + + + + @(context.Variables.GetValueOrDefault("remainingCalls", 0).ToString()) + + + + +``` + +### Configuration Options + +| Parameter | Description | +|-----------|-------------| +| `calls` | Number of calls allowed per period | +| `renewal-period` | Time window in seconds | +| `counter-key` | Grouping key (IP, subscription, custom) | +| `remaining-calls-variable-name` | Variable for remaining calls | + +### When to Use + +- Protect MCP servers from abuse +- Rate limit tool calls from AI agents +- Prevent DDoS on API endpoints + +--- + +## Managed Identity Authentication + +Secure backend access with managed identity instead of API keys. + +```xml + + + + + + + @("Bearer " + (string)context.Variables["managed-id-access-token"]) + + + + +``` + +### Resource URLs + +| Service | Resource URL | +|---------|--------------| +| Azure OpenAI | `https://cognitiveservices.azure.com` | +| Azure AI Services | `https://cognitiveservices.azure.com` | +| Azure Storage | `https://storage.azure.com` | +| Azure Key Vault | `https://vault.azure.net` | + +### Required Role Assignments + +```bash +# Assign Cognitive Services User role to APIM managed identity +az role assignment create \ + --assignee \ + --role "Cognitive Services User" \ + --scope +``` + +### When to Use + +- Production environments (avoid API keys) +- Secure backend access +- Centralized credential management + +--- + +## Load Balancing with Retry + +Distribute load across multiple backends with automatic failover. + +```xml + + + + + + + + + + + + + + + + + + + + +``` + +### Configuration Options + +| Parameter | Description | +|-----------|-------------| +| `count` | Number of retry attempts | +| `interval` | Delay between retries (seconds) | +| `first-fast-retry` | Skip delay on first retry | +| `condition` | When to retry (status codes) | + +### Backend Pool Configuration (Bicep) + +```bicep +resource backendPool 'Microsoft.ApiManagement/service/backends@2024-06-01-preview' = { + parent: apimService + name: 'openai-backend-pool' + properties: { + type: 'Pool' + pool: { + services: [ + { id: '/backends/openai-eastus' } + { id: '/backends/openai-westus' } + ] + } + } +} +``` + +### When to Use + +- High availability requirements +- Geographic load distribution +- Handling rate limits across regions + +--- + +## Token Metrics + +Emit token metrics for monitoring and cost tracking. + +```xml + + + + + + + + + + + + + +``` + +### Available Dimensions + +| Dimension | Value Example | Use Case | +|-----------|---------------|----------| +| Subscription ID | `context.Subscription.Id` | Cost allocation per customer | +| Client IP | `context.Request.IpAddress` | Usage tracking per client | +| API ID | `context.Api.Id` | Usage per API | +| Operation ID | `context.Operation.Id` | Usage per endpoint | +| Product ID | `context.Product.Id` | Usage per product tier | +| User ID | JWT claim | Per-user tracking | + +### When to Use + +- Cost tracking and chargebacks +- Usage monitoring per customer +- Capacity planning + +--- + +## Combining Policies + +Policies can be combined for comprehensive protection: + +```xml + + + + + + + @("Bearer " + (string)context.Variables["managed-id-access-token"]) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +--- + +## References + +- [GenAI Gateway Capabilities](https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities) +- [Semantic Caching](https://learn.microsoft.com/en-us/azure/api-management/azure-openai-enable-semantic-caching) +- [Token Limits & LLM Logs](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-llm-logs) +- [APIM Policies Reference](https://learn.microsoft.com/azure/api-management/api-management-policies) diff --git a/plugin/skills/azure-aigateway/references/troubleshooting.md b/plugin/skills/azure-aigateway/references/troubleshooting.md new file mode 100644 index 00000000..84a28f26 --- /dev/null +++ b/plugin/skills/azure-aigateway/references/troubleshooting.md @@ -0,0 +1,361 @@ +# Azure AI Gateway Troubleshooting Guide + +Common issues and solutions when working with Azure API Management as an AI Gateway. + +## Quick Reference + +| Issue | Symptom | Solution | +|-------|---------|----------| +| Slow APIM creation | Deployment takes 30+ minutes | Use Basicv2 SKU instead of Premium | +| Token limit exceeded | 429 response | Increase `tokens-per-minute` or add load balancing | +| Cache not working | No cache hits | Lower `score-threshold` (e.g., 0.7) | +| Content blocked | False positives | Increase category thresholds | +| Backend auth fails | 401 from Azure OpenAI | Assign Cognitive Services User role to APIM managed identity | +| Rate limit too strict | Legitimate requests blocked | Increase `calls` or `renewal-period` | + +--- + +## Detailed Solutions + +### Slow APIM Creation + +**Problem:** APIM deployment takes 30+ minutes + +**Cause:** Premium and Developer SKUs take longer to provision + +**Solution:** Use Basicv2 SKU for faster provisioning (~5-10 minutes) + +```bash +# Use Basicv2 SKU +az deployment group create \ + --resource-group rg-aigateway \ + --template-file main.bicep \ + --parameters apimSku=Basicv2 +``` + +--- + +### Token Limit Exceeded (429) + +**Problem:** Requests failing with 429 "Too Many Requests" + +**Causes:** +- Token limit too low for workload +- Single backend at capacity + +**Solutions:** + +1. **Increase token limit:** +```xml + + counter-key="@(context.Subscription.Id)" /> +``` + +2. **Add load balancing across backends:** +```xml + + + + + +``` + +3. **Check current token usage:** +```bash +# View token metrics in Application Insights +az monitor metrics list \ + --resource \ + --metric "TokenUsage" \ + --interval PT1H +``` + +--- + +### Semantic Cache Not Working + +**Problem:** No cache hits, all requests going to backend + +**Causes:** +- Score threshold too high +- Embeddings backend not configured +- Cache duration expired + +**Solutions:** + +1. **Lower score threshold:** +```xml + + +``` + +2. **Verify embeddings backend exists:** +```bash +az apim backend show \ + --resource-group \ + --service-name \ + --backend-id embeddings-backend +``` + +3. **Increase cache duration:** +```xml + +``` + +4. **Check cache metrics:** +```bash +# Look for CacheHit vs CacheMiss in APIM metrics +az monitor metrics list \ + --resource \ + --metric "CacheHitCount,CacheMissCount" +``` + +--- + +### Content Safety False Positives + +**Problem:** Legitimate content being blocked + +**Cause:** Category thresholds too strict + +**Solutions:** + +1. **Increase category thresholds:** +```xml + + + + + + + + + +``` + +2. **Review blocked content:** +```bash +# Check APIM diagnostic logs for blocked requests +az monitor diagnostic-settings list \ + --resource +``` + +3. **Use custom blocklists for fine-grained control:** +```xml + + custom-blocklist + +``` + +--- + +### Backend Authentication Fails (401) + +**Problem:** 401 Unauthorized from Azure OpenAI or AI Foundry + +**Causes:** +- APIM managed identity not assigned role +- Wrong resource URL in authentication policy +- System-assigned identity not enabled + +**Solutions:** + +1. **Enable system-assigned managed identity:** +```bash +az apim update \ + --name \ + --resource-group \ + --set identity.type=SystemAssigned +``` + +2. **Assign Cognitive Services User role:** +```bash +# Get APIM principal ID +APIM_PRINCIPAL_ID=$(az apim show \ + --name \ + --resource-group \ + --query "identity.principalId" -o tsv) + +# Assign role +az role assignment create \ + --assignee $APIM_PRINCIPAL_ID \ + --role "Cognitive Services User" \ + --scope +``` + +3. **Verify correct resource URL:** +```xml + + +``` + +4. **Check role assignment:** +```bash +az role assignment list \ + --assignee $APIM_PRINCIPAL_ID \ + --scope +``` + +--- + +### Rate Limit Too Strict + +**Problem:** Legitimate users being rate limited + +**Cause:** Rate limit configuration too restrictive + +**Solutions:** + +1. **Increase rate limit:** +```xml + + renewal-period="60" + counter-key="@(context.Subscription.Id)" /> +``` + +2. **Use different counter key for granular control:** +```xml + + +``` + +3. **Implement tiered rate limits by product:** +```xml + + + + + + + + +``` + +--- + +### MCP Server Not Responding + +**Problem:** MCP tools/list or tools/call endpoints not working + +**Causes:** +- MCP operations not configured +- Policy not applied to MCP endpoints +- Subscription key not provided + +**Solutions:** + +1. **Verify MCP operations exist:** +```bash +az apim api operation list \ + --resource-group \ + --service-name \ + --api-id \ + --query "[?contains(urlTemplate, 'mcp')]" +``` + +2. **Check policy is applied:** +```bash +az apim api operation policy show \ + --resource-group \ + --service-name \ + --api-id \ + --operation-id mcp-tools-list +``` + +3. **Test with subscription key:** +```bash +curl -X POST "${GATEWAY_URL}/api/mcp/tools/list" \ + -H "Content-Type: application/json" \ + -H "Ocp-Apim-Subscription-Key: " \ + -d '{}' +``` + +--- + +### APIM Not Finding Backend + +**Problem:** Backend service not found errors + +**Causes:** +- Backend ID mismatch in policy +- Backend not created +- Backend URL incorrect + +**Solutions:** + +1. **List existing backends:** +```bash +az apim backend list \ + --resource-group \ + --service-name \ + --query "[].{id:name, url:url}" +``` + +2. **Create missing backend:** +```bash +az apim backend create \ + --resource-group \ + --service-name \ + --backend-id my-backend \ + --protocol http \ + --url "https://my-service.azure.com" +``` + +3. **Verify backend ID in policy matches:** +```xml + + +``` + +--- + +## Diagnostic Commands + +### Check APIM Status +```bash +az apim show --name --resource-group --query "provisioningState" +``` + +### View Recent API Calls +```bash +az monitor activity-log list \ + --resource-group \ + --query "[?contains(operationName.value, 'Microsoft.ApiManagement')]" +``` + +### Check Policy Syntax +```bash +# Get policy XML and validate +az apim api policy show \ + --resource-group \ + --service-name \ + --api-id +``` + +### Test Backend Connectivity +```bash +# From APIM, test backend URL +az apim api operation invoke \ + --resource-group \ + --service-name \ + --api-id \ + --operation-id +``` + +--- + +## Getting Help + +- [Azure API Management Documentation](https://learn.microsoft.com/azure/api-management/) +- [AI Gateway Samples Repository](https://github.com/Azure-Samples/AI-Gateway) +- [APIM Troubleshooting Guide](https://learn.microsoft.com/azure/api-management/api-management-howto-troubleshoot) diff --git a/plugin/skills/azure-deploy/SKILL.md b/plugin/skills/azure-deploy/SKILL.md index 273f82d5..545fdda2 100644 --- a/plugin/skills/azure-deploy/SKILL.md +++ b/plugin/skills/azure-deploy/SKILL.md @@ -1,6 +1,6 @@ --- name: azure-deploy -description: Deploy applications to Azure App Service, Azure Functions, and Static Web Apps. USE THIS SKILL when users want to deploy, publish, host, or run their application on Azure. This skill detects application type (React, Vue, Angular, Next.js, Python, .NET, Java, etc.), recommends the optimal Azure service, provides local preview capabilities, and guides deployment. Trigger phrases include "deploy to Azure", "host on Azure", "publish to Azure", "run on Azure", "get this running in the cloud", "deploy my app", "Azure deployment", "set up Azure hosting", "deploy to App Service", "deploy to Functions", "deploy to Static Web Apps", "preview locally", "test before deploying", "what Azure service should I use", "help me deploy", etc. Also handles multi-service deployments with Azure Developer CLI (azd) and Infrastructure as Code when complexity is detected. +description: Deploy applications to Azure App Service, Azure Functions, Static Web Apps, and API Management (APIM). USE THIS SKILL when users want to deploy, publish, host, or run their application on Azure, or deploy APIM for API Gateway/AI Gateway scenarios. This skill detects application type (React, Vue, Angular, Next.js, Python, .NET, Java, etc.), recommends the optimal Azure service, provides local preview capabilities, and guides deployment. Trigger phrases include "deploy to Azure", "host on Azure", "publish to Azure", "run on Azure", "get this running in the cloud", "deploy my app", "Azure deployment", "set up Azure hosting", "deploy to App Service", "deploy to Functions", "deploy to Static Web Apps", "deploy APIM", "create API Management", "deploy API gateway", "preview locally", "test before deploying", "what Azure service should I use", "help me deploy", etc. Also handles multi-service deployments with Azure Developer CLI (azd) and Infrastructure as Code when complexity is detected. --- ## Preferred: Use azd for Deployments @@ -93,6 +93,20 @@ Look for these files first (HIGH confidence signals): | `function.json` or `host.json` | Azure Functions project | **See [Azure Functions Guide](./reference/functions.md)** | | `staticwebapp.config.json` or `swa-cli.config.json` | Static Web Apps project | **See [Static Web Apps Guide](./reference/static-web-apps.md)** | +**Check for API Management / AI Gateway requests:** + +| User Request | Recommendation | Action | +|--------------|----------------|--------| +| "Deploy APIM", "create API Management", "API gateway" | Deploy API Management | **See [APIM Deployment Guide](./reference/apim.md)** | +| "AI gateway", "configure AI model", "add Azure OpenAI backend" | AI Gateway configuration | Deploy with **[APIM Guide](./reference/apim.md)**, configure with **azure-aigateway skill** | + +> 💡 **When to use API Management:** +> - User wants to expose APIs with policies (rate limiting, auth, caching) +> - User needs an AI Gateway for Azure OpenAI or AI Foundry models +> - User mentions APIM, API Management, or API gateway +> +> **📖 See [APIM Deployment Guide](./reference/apim.md)** for deployment, then use **azure-aigateway skill** for AI-specific configuration. + **When `azure.yaml` is found, validate before deployment:** ```javascript // Validate azure.yaml using MCP tool before proceeding @@ -799,6 +813,7 @@ For specialized deployment scenarios, use these comprehensive reference guides: - **⚡ [Azure Functions Deployment Guide](./reference/functions.md)** - Azure Functions deployment with func CLI, triggers/bindings, deployment slots, and function-specific troubleshooting - **☸️ [AKS Deployment Guide](./reference/aks.md)** - Kubernetes deployments with full control, custom operators, and complex microservices - **🌍 [App Service Deployment Guide](./reference/app-service.md)** - Traditional web applications and REST APIs with managed hosting +- **🔌 [APIM Deployment Guide](./reference/apim.md)** - API Management deployment for API Gateway and AI Gateway scenarios with Basicv2 SKU --- @@ -809,6 +824,7 @@ Load these guides as needed for detailed information: - [Azure Functions Guide](./reference/functions.md) - Serverless Functions deployment with func CLI, triggers/bindings, deployment slots, and monitoring - [AKS Guide](./reference/aks.md) - Kubernetes deployment with AKS, node pools, workload identity, scaling, and networking - [App Service Guide](./reference/app-service.md) - Traditional web app deployment with App Service plans, deployment slots, and auto-scaling +- [APIM Guide](./reference/apim.md) - API Management deployment for API gateway and AI gateway, with Basicv2 SKU for fast deployment - Always scan the workspace before generating a deployment plan - Plans integrate with Azure Developer CLI (azd) - Logs require resources deployed through azd diff --git a/plugin/skills/azure-deploy/reference/apim.md b/plugin/skills/azure-deploy/reference/apim.md new file mode 100644 index 00000000..223f1b7d --- /dev/null +++ b/plugin/skills/azure-deploy/reference/apim.md @@ -0,0 +1,444 @@ +# Azure API Management Deployment Guide + +Complete reference for deploying Azure API Management (APIM) for API Gateway and AI Gateway scenarios. + +--- + +## Overview + +Azure API Management is a fully managed service for publishing, securing, transforming, maintaining, and monitoring APIs. It serves as both a traditional API Gateway and an AI Gateway for LLM-based applications. + +**Key Benefits:** +- **Security** - Authentication, authorization, rate limiting, IP filtering +- **Transformation** - Request/response modification, protocol translation +- **Observability** - Logging, metrics, tracing, analytics +- **AI Gateway** - Token limits, semantic caching, content safety for AI models +- **Developer Portal** - Self-service API discovery and documentation + +**When to use Azure API Management:** +- **API Gateway** - Centralized entry point for backend APIs +- **AI Gateway** - Govern AI models, MCP tools, and agents +- **Rate Limiting** - Protect APIs from abuse +- **Authentication** - Centralized auth for multiple backends +- **API Versioning** - Manage multiple API versions +- **Monetization** - Usage tracking and billing + +**Deployment Workflow:** +``` +Create APIM → Add Backends → Import APIs → Configure Policies → Test +``` + +--- + +## Prerequisites and Validation + +### Pattern 0: Prerequisites Validation + +**Always validate prerequisites before starting deployment.** + +```bash +# Check Azure CLI authentication +az account show || az login + +# Verify subscription +az account show --query "{name:name, id:id}" -o table + +# Check if APIM already exists +az apim list --query "[].{name:name, rg:resourceGroup, sku:sku.name}" -o table +``` + +### Prerequisites Checklist + +**Setup:** +- [ ] Azure subscription created +- [ ] Azure CLI installed (`az --version`) +- [ ] Azure CLI authenticated (`az login`) +- [ ] Appropriate permissions (Contributor on resource group) + +--- + +## Pattern 1: Deploy APIM Instance + +Create a new API Management instance with Basicv2 SKU (recommended). + +```bash +# Set variables +RESOURCE_GROUP="rg-apim" +LOCATION="eastus" +APIM_NAME="apim-$(openssl rand -hex 4)" +PUBLISHER_EMAIL="admin@contoso.com" +PUBLISHER_NAME="Contoso" + +# Create resource group +az group create --name $RESOURCE_GROUP --location $LOCATION + +# Deploy APIM with Basicv2 SKU (~5-10 min deployment) +az apim create \ + --name $APIM_NAME \ + --resource-group $RESOURCE_GROUP \ + --publisher-email $PUBLISHER_EMAIL \ + --publisher-name "$PUBLISHER_NAME" \ + --sku-name Basicv2 \ + --enable-managed-identity true + +# Get gateway URL +az apim show --name $APIM_NAME --resource-group $RESOURCE_GROUP --query "gatewayUrl" -o tsv +``` + +### SKU Selection Guide + +| SKU | Deployment Time | Monthly Cost | Use Case | +|-----|-----------------|--------------|----------| +| **Basicv2** (recommended) | ~5-10 min | ~$150 | Dev/test, AI Gateway, small production | +| Standardv2 | ~10-15 min | ~$300 | Production APIs, higher throughput | +| Developer | ~30 min | ~$50 | Development only (no SLA) | +| Premium | ~45+ min | ~$2800+ | Enterprise, multi-region, VNet integration | + +**Decision guide:** +- **Default to Basicv2** - Fast deployment, cost-effective, supports all AI Gateway policies +- **Use Standardv2** - Higher throughput needed (>1000 req/sec) +- **Use Developer** - Development/testing only, no production SLA required +- **Use Premium** - Need VNet integration, multi-region, or >99.95% SLA + +--- + +## Pattern 2: Discovery Commands + +Find existing APIM instances and their configuration. + +```bash +# List all APIM instances in subscription +az apim list --query "[].{name:name, rg:resourceGroup, sku:sku.name, location:location}" -o table + +# Get APIM details +az apim show --name --resource-group + +# Get gateway URL +az apim show --name --resource-group --query "gatewayUrl" -o tsv + +# Get management URL +az apim show --name --resource-group --query "managementApiUrl" -o tsv + +# Get managed identity principal ID (for role assignments) +az apim show --name --resource-group --query "identity.principalId" -o tsv + +# List APIs +az apim api list --service-name --resource-group \ + --query "[].{id:name, displayName:displayName, path:path}" -o table + +# List backends +az apim backend list --service-name --resource-group \ + --query "[].{id:name, url:url, protocol:protocol}" -o table + +# List products +az apim product list --service-name --resource-group \ + --query "[].{id:name, displayName:displayName, state:state}" -o table + +# List subscriptions (for API keys) +az apim subscription list --service-name --resource-group \ + --query "[].{name:displayName, scope:scope, state:state}" -o table +``` + +--- + +## Pattern 3: Add Backend + +Configure backends for your APIs and AI models. + +### Custom API Backend + +```bash +az apim backend create \ + --service-name \ + --resource-group \ + --backend-id my-api-backend \ + --protocol http \ + --url "https://my-api.example.com" +``` + +### Azure OpenAI Backend + +```bash +# Get Azure OpenAI endpoint +AOAI_ENDPOINT=$(az cognitiveservices account show \ + --name \ + --resource-group \ + --query "properties.endpoint" -o tsv) + +# Create backend +az apim backend create \ + --service-name \ + --resource-group \ + --backend-id openai-backend \ + --protocol http \ + --url "${AOAI_ENDPOINT}openai" + +# Grant APIM access to Azure OpenAI +APIM_PRINCIPAL=$(az apim show \ + --name \ + --resource-group \ + --query "identity.principalId" -o tsv) + +AOAI_ID=$(az cognitiveservices account show \ + --name \ + --resource-group \ + --query "id" -o tsv) + +az role assignment create \ + --assignee $APIM_PRINCIPAL \ + --role "Cognitive Services User" \ + --scope $AOAI_ID +``` + +### AI Foundry Backend + +```bash +# Get AI Foundry endpoint +AI_ENDPOINT=$(az cognitiveservices account show \ + --name \ + --resource-group \ + --query "properties.endpoints[\"Azure AI Model Inference API\"]" -o tsv) + +# Create backend +az apim backend create \ + --service-name \ + --resource-group \ + --backend-id ai-foundry-backend \ + --protocol http \ + --url "$AI_ENDPOINT" +``` + +--- + +## Pattern 4: Import API + +Import APIs from OpenAPI/Swagger specifications. + +### From URL + +```bash +az apim api import \ + --service-name \ + --resource-group \ + --api-id my-api \ + --path /api \ + --display-name "My API" \ + --specification-format OpenApiJson \ + --specification-url "https://example.com/openapi.json" +``` + +### From Local File + +```bash +# JSON format +az apim api import \ + --service-name \ + --resource-group \ + --api-id my-api \ + --path /api \ + --display-name "My API" \ + --specification-format OpenApiJson \ + --specification-path "./openapi.json" + +# YAML format +az apim api import \ + --service-name \ + --resource-group \ + --api-id my-api \ + --path /api \ + --display-name "My API" \ + --specification-format OpenApi \ + --specification-path "./openapi.yaml" +``` + +### Supported Specification Formats + +| Format | CLI Value | File Extension | +|--------|-----------|----------------| +| OpenAPI 3.x JSON | `OpenApiJson` | `.json` | +| OpenAPI 3.x YAML | `OpenApi` | `.yaml`, `.yml` | +| Swagger 2.0 JSON | `SwaggerJson` | `.json` | +| WSDL | `Wsdl` | `.wsdl` | + +--- + +## Pattern 5: Get Subscription Keys + +Retrieve API keys for testing. + +```bash +# List subscriptions +az apim subscription list \ + --service-name \ + --resource-group \ + --query "[].{name:displayName, id:name, state:state}" -o table + +# Get subscription keys +az apim subscription keys list \ + --service-name \ + --resource-group \ + --subscription-id + +# Create new subscription +az apim subscription create \ + --service-name \ + --resource-group \ + --subscription-id my-subscription \ + --display-name "My Subscription" \ + --scope "/apis" # All APIs +``` + +--- + +## Pattern 6: Test Gateway + +Test API endpoints through the gateway. + +```bash +# Get gateway URL +GATEWAY_URL=$(az apim show \ + --name \ + --resource-group \ + --query "gatewayUrl" -o tsv) + +# Test simple GET request +curl -X GET "${GATEWAY_URL}/" \ + -H "Ocp-Apim-Subscription-Key: " + +# Test POST with JSON body +curl -X POST "${GATEWAY_URL}/" \ + -H "Content-Type: application/json" \ + -H "Ocp-Apim-Subscription-Key: " \ + -d '{"key": "value"}' + +# Test Azure OpenAI through gateway +curl -X POST "${GATEWAY_URL}/openai/deployments//chat/completions?api-version=2024-02-01" \ + -H "Content-Type: application/json" \ + -H "Ocp-Apim-Subscription-Key: " \ + -d '{ + "messages": [{"role": "user", "content": "Hello"}], + "max_tokens": 100 + }' +``` + +--- + +## Pattern 7: Configuration After Deployment + +After deploying APIM, use the **azure-aigateway** skill to: +- Configure APIM policies (rate limiting, authentication, caching) +- Set up AI-specific policies (semantic caching, token limits, content safety) +- Manage governance for models, tools, and agents + +The **azure-aigateway** skill provides comprehensive policy patterns and configuration guidance. + +--- + +## Best Practices + +| Practice | Description | +|----------|-------------| +| **Use Basicv2 SKU** | Fast deployment (~5-10 min vs 30+ for Developer), cost-effective | +| **Enable managed identity** | Use for secure backend authentication without API keys | +| **Use subscription keys** | Require keys for all APIs (`subscriptionRequired: true`) | +| **Configure rate limiting** | Protect backends from abuse | +| **Enable Application Insights** | For comprehensive monitoring and diagnostics | +| **Use named values** | Store secrets and config in named values, not in policies | +| **Version your APIs** | Use path-based or header-based versioning | + +--- + +## Troubleshooting + +### Common Issues + +| Issue | Symptom | Solution | +|-------|---------|----------| +| **Slow deployment** | Takes 30+ minutes | Use Basicv2 SKU instead of Developer or Premium | +| **Backend auth fails** | 401 from Azure OpenAI | Assign "Cognitive Services User" role to APIM managed identity | +| **API not found** | 404 error | Verify API path, check subscription key is valid for API scope | +| **Rate limit exceeded** | 429 error | Increase rate limit in policy or upgrade SKU | +| **CORS errors** | Browser blocks request | Add CORS policy in inbound section | +| **Import fails** | Invalid OpenAPI spec | Validate spec with online validator, check format parameter | + +### Debug Commands + +```bash +# Check APIM status +az apim show --name --resource-group --query "provisioningState" -o tsv + +# View API details +az apim api show --service-name --resource-group --api-id + +# Check backend configuration +az apim backend show --service-name --resource-group --backend-id + +# Test connectivity to backend +curl -v + +# View APIM logs (requires diagnostic settings) +az monitor diagnostic-settings list --resource +``` + +--- + +## Azure Resources Reference + +### Core Resources for APIM + +| Resource Type | Purpose | API Version | +|--------------|---------|-------------| +| `Microsoft.ApiManagement/service` | APIM instance | 2024-06-01-preview | +| `Microsoft.ApiManagement/service/apis` | API definitions | 2024-06-01-preview | +| `Microsoft.ApiManagement/service/backends` | Backend configurations | 2024-06-01-preview | +| `Microsoft.ApiManagement/service/subscriptions` | API subscriptions/keys | 2024-06-01-preview | +| `Microsoft.ApiManagement/service/products` | API products | 2024-06-01-preview | + +### Example Bicep Template + +```bicep +@description('Location for all resources') +param location string = resourceGroup().location + +@description('APIM instance name') +param apimName string + +@description('Publisher email') +param publisherEmail string + +@description('Publisher name') +param publisherName string + +@allowed(['Basicv2', 'Standardv2', 'Developer', 'Premium']) +param sku string = 'Basicv2' + +resource apim 'Microsoft.ApiManagement/service@2024-06-01-preview' = { + name: apimName + location: location + sku: { + name: sku + capacity: 1 + } + identity: { + type: 'SystemAssigned' + } + properties: { + publisherEmail: publisherEmail + publisherName: publisherName + } +} + +output gatewayUrl string = apim.properties.gatewayUrl +output managementUrl string = apim.properties.managementApiUrl +output principalId string = apim.identity.principalId +``` + +--- + +## Additional Resources + +- [Azure API Management Documentation](https://learn.microsoft.com/azure/api-management/) +- [GenAI Gateway Capabilities](https://learn.microsoft.com/azure/api-management/genai-gateway-capabilities) +- [APIM Policies Reference](https://learn.microsoft.com/azure/api-management/api-management-policies) +- [AI-Gateway Samples Repository](https://github.com/Azure-Samples/AI-Gateway) +- [MCP Server Overview](https://learn.microsoft.com/azure/api-management/mcp-server-overview)