Skip to content

Conversation

@james-tn
Copy link
Contributor

@james-tn james-tn commented Jan 9, 2026

Enterprise Security Infrastructure for Azure OpenAI Workshop

Summary

This PR adds enterprise-grade security features to the Azure infrastructure deployment, aligning Terraform and Bicep configurations with best practices for production workloads.


🔒 Network Security

  • VNet Integration
    • Container Apps Environment now runs inside a dedicated VNet: 10.10.0.0/16
  • Private Endpoints
    • Added private endpoints for:
      • Cosmos DB
      • Azure OpenAI
    • Eliminates public network exposure
  • Private DNS Zones
    • Configured:
      • privatelink.documents.azure.com
      • privatelink.openai.azure.com
  • Internal MCP Service
    • New option to make the MCP service internal-only
    • Accessible only from within the Container Apps environment

🔐 Identity & Access

  • Managed Identity Authentication
    • Cosmos DB and Azure OpenAI accessed using a user-assigned managed identity
    • No API keys required
  • Key Vault Removed
    • No longer needed due to managed identity usage
  • RBAC Roles
    • Cosmos DB Data Contributor
    • Cognitive Services OpenAI User

🛠️ Infrastructure Improvements

  • Fixed deploy.ps1
    • Script no longer overwrites existing tfvars files
  • Embedding Model Added
    • Support for text-embedding-ada-002
  • Subnet Sizing Fix
    • Container Apps subnet set to /23 (minimum required for workload profiles)

⚙️ Configuration Options

James N. and others added 30 commits November 14, 2025 09:38
Enhanced Agentic AI with Secure Azure Deployment
…ough an exception

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ted NPM libraries and added Dockerfile for containerization
* WIP: Save local changes before switching to int-agentic

* Fix WebSocket reconnect issue and Vite build compatibility

- Add intentionalClose flag to WebSocket manager to prevent auto-reconnect on intentional close
- Fix Dockerfile to copy from Vite 'dist' instead of CRA 'build' directory
- Update backend static file serving to handle both Vite (assets/) and CRA (static/) structures
- Add catch-all exception handler for WebSocket disconnections in backend

---------

Co-authored-by: James N. <james.nguyen@microsoft.com>
tjsullivan1 and others added 12 commits December 19, 2025 21:46
Updated environment variable handling for jobs based on event types and branch names.
Added commands to ensure key vault is reachable and update its networking settings.
Add checks for existing key vault before updating settings.
Updated Key Vault role assignment to use user assigned identity and added a user assigned managed identity resource for the backend container app.
Infrastructure Automation with Testing
* WIP: Save local changes before switching to int-agentic

* Fix WebSocket reconnect issue and Vite build compatibility

- Add intentionalClose flag to WebSocket manager to prevent auto-reconnect on intentional close
- Fix Dockerfile to copy from Vite 'dist' instead of CRA 'build' directory
- Update backend static file serving to handle both Vite (assets/) and CRA (static/) structures
- Add catch-all exception handler for WebSocket disconnections in backend

* update authentication and bicep deployment to use AAD authentication instead of key

* complete terraform deployment

* update DEPLOYMENT and Terraform

* update DEPLOYMENT and Terraform

* Changed AZURE_OPENAI_API_VERSION to use a variable

* Reverted the OIDC changes on providers.tf

* Reverted the OIDC changes on providers.tf

* Removing key vault referene from orchestration workflow

* removing key vault reference and openai secret key from infrastructure workflow. I have also commented out all the tests for model endpoint, since that currently relies on key based access.

* changing docker to build off new image

* changing docker to build off new image

* changing docker to build off new image

* Making backend config optionally remote in the proper way

* Reverting backend change, seems to have broken state connection

* adding a local provider file so I can have flexible backends

* upgrade version of agent-framework and allow mcp in internal communication to be insecure

* Updated to work with both local and remote state

* optimize reflection agent code and remove workflow reflection agent

* add github workflow

* update github workflow to use repo level variables

* update github workflow to use repo level variables

* update github workflow to use repo level variables

* update github workflow to use repo level variables

* update test cases & test timeout & excluce MCP test bc mcp is deployed internal

* move test to after deployment

* move test to after deployment

* fix api version

* fix api version

* fix test run

* fix: Use placeholder image for Container Apps initial deployment

- Use mcr.microsoft.com/k8se/quickstart:latest as placeholder image
- Add lifecycle ignore_changes for container image (managed by update-containers)
- Solves chicken-and-egg problem: Container Apps created before images exist in ACR
- update-containers.yml sets real images after Docker builds complete

* fix: Remove pull_request triggers from Docker workflows

- Docker workflows should only run via workflow_call from orchestrate.yml
- Prevents duplicate/orphan runs that occur before infrastructure exists
- Manual dispatch still available for ad-hoc builds

* feat: Add james-dev to destroy-infrastructure condition

* feat: Update Bicep for feature parity with Terraform

- Add placeholder image support (mcr.microsoft.com/k8se/quickstart:latest)
- Fix MCP allowInsecure when mcpInternalOnly is true
- Add readiness probe to application container (/docs endpoint)
- Add missing env vars: AZURE_AI_AGENT_MODEL_DEPLOYMENT_NAME, AZURE_OPENAI_EMBEDDING_DEPLOYMENT
- Make AZURE_OPENAI_API_VERSION configurable via parameter
- Align naming convention with environment suffix
- Change image name from workshop-app to backend-app for consistency

* docs: enhance README with Mermaid diagrams and enterprise deployment guide

- Replace ASCII architecture diagrams with interactive Mermaid diagrams
- Add comprehensive enterprise security sections (VNet, Private Endpoints, Managed Identity)
- Document security profiles (Dev/Staging/Production)
- Add CI/CD with GitHub Actions OIDC section linking to GITHUB_ACTIONS_SETUP.md
- Update main README with enterprise deployment table linking to all guides
- Add data flow and authentication flow sequence diagrams
- Include troubleshooting guide with common issues

* docs: enhance README with Mermaid diagrams and enterprise deployment guide

- Replace ASCII architecture diagrams with interactive Mermaid diagrams
- Add comprehensive enterprise security sections (VNet, Private Endpoints, Managed Identity)
- Document security profiles (Dev/Staging/Production)
- Add CI/CD with GitHub Actions OIDC section linking to GITHUB_ACTIONS_SETUP.md
- Update main README with enterprise deployment table linking to all guides
- Add data flow and authentication flow sequence diagrams
- Include troubleshooting guide with common issues

* Updated deployment to reference tfvars file for local file/iteration value

---------

Co-authored-by: James N. <james.nguyen@microsoft.com>
Co-authored-by: Tim Sullivan <timothyj.sullivan1@gmail.com>
@james-tn james-tn requested a review from tjsullivan1 January 9, 2026 19:14
James N. and others added 3 commits January 9, 2026 12:32
* WIP: Save local changes before switching to int-agentic

* Fix WebSocket reconnect issue and Vite build compatibility

- Add intentionalClose flag to WebSocket manager to prevent auto-reconnect on intentional close
- Fix Dockerfile to copy from Vite 'dist' instead of CRA 'build' directory
- Update backend static file serving to handle both Vite (assets/) and CRA (static/) structures
- Add catch-all exception handler for WebSocket disconnections in backend

* update authentication and bicep deployment to use AAD authentication instead of key

* complete terraform deployment

* update DEPLOYMENT and Terraform

* update DEPLOYMENT and Terraform

* Changed AZURE_OPENAI_API_VERSION to use a variable

* Reverted the OIDC changes on providers.tf

* Reverted the OIDC changes on providers.tf

* Removing key vault referene from orchestration workflow

* removing key vault reference and openai secret key from infrastructure workflow. I have also commented out all the tests for model endpoint, since that currently relies on key based access.

* changing docker to build off new image

* changing docker to build off new image

* changing docker to build off new image

* Making backend config optionally remote in the proper way

* Reverting backend change, seems to have broken state connection

* adding a local provider file so I can have flexible backends

* upgrade version of agent-framework and allow mcp in internal communication to be insecure

* Updated to work with both local and remote state

* optimize reflection agent code and remove workflow reflection agent

* add github workflow

* update github workflow to use repo level variables

* update github workflow to use repo level variables

* update github workflow to use repo level variables

* update github workflow to use repo level variables

* update test cases & test timeout & excluce MCP test bc mcp is deployed internal

* move test to after deployment

* move test to after deployment

* fix api version

* fix api version

* fix test run

* fix: Use placeholder image for Container Apps initial deployment

- Use mcr.microsoft.com/k8se/quickstart:latest as placeholder image
- Add lifecycle ignore_changes for container image (managed by update-containers)
- Solves chicken-and-egg problem: Container Apps created before images exist in ACR
- update-containers.yml sets real images after Docker builds complete

* fix: Remove pull_request triggers from Docker workflows

- Docker workflows should only run via workflow_call from orchestrate.yml
- Prevents duplicate/orphan runs that occur before infrastructure exists
- Manual dispatch still available for ad-hoc builds

* feat: Add james-dev to destroy-infrastructure condition

* feat: Update Bicep for feature parity with Terraform

- Add placeholder image support (mcr.microsoft.com/k8se/quickstart:latest)
- Fix MCP allowInsecure when mcpInternalOnly is true
- Add readiness probe to application container (/docs endpoint)
- Add missing env vars: AZURE_AI_AGENT_MODEL_DEPLOYMENT_NAME, AZURE_OPENAI_EMBEDDING_DEPLOYMENT
- Make AZURE_OPENAI_API_VERSION configurable via parameter
- Align naming convention with environment suffix
- Change image name from workshop-app to backend-app for consistency

* docs: enhance README with Mermaid diagrams and enterprise deployment guide

- Replace ASCII architecture diagrams with interactive Mermaid diagrams
- Add comprehensive enterprise security sections (VNet, Private Endpoints, Managed Identity)
- Document security profiles (Dev/Staging/Production)
- Add CI/CD with GitHub Actions OIDC section linking to GITHUB_ACTIONS_SETUP.md
- Update main README with enterprise deployment table linking to all guides
- Add data flow and authentication flow sequence diagrams
- Include troubleshooting guide with common issues

* docs: enhance README with Mermaid diagrams and enterprise deployment guide

- Replace ASCII architecture diagrams with interactive Mermaid diagrams
- Add comprehensive enterprise security sections (VNet, Private Endpoints, Managed Identity)
- Document security profiles (Dev/Staging/Production)
- Add CI/CD with GitHub Actions OIDC section linking to GITHUB_ACTIONS_SETUP.md
- Update main README with enterprise deployment table linking to all guides
- Add data flow and authentication flow sequence diagrams
- Include troubleshooting guide with common issues

* refactor: merge MCP backends into unified contoso_tools with env switch

- Create _backend_sqlite.py for local SQLite development
- Create _backend_cosmos.py for production Cosmos DB
- Update contoso_tools.py to select backend via USE_COSMOSDB env var
- Remove mcp_service_cosmos.py (merged into mcp_service.py)
- Remove contoso_tools_cosmos.py (merged into _backend_cosmos.py)
- Remove unused sqlite3 import from mcp_service.py

Usage: Set USE_COSMOSDB=true for Cosmos DB, false (default) for SQLite

* Update Cosmos DB setup scripts to reference unified backend with USE_COSMOSDB env var

* Enable MCP deployment with CosmosDB: add all 12 containers, fix env vars, add data seeding option

* Simplify deploy.ps1 for local-only execution with sensible defaults

* Remove unused local.env.ps1 - all config is in dev.tfvars

* Updated deployment to reference tfvars file for local file/iteration value

* update mcp service to support CosmosDB

* add bicep update & MCP with Cosmos

* fix bicep script

* update infra readme and mcp readme for CosmosDB as option for mcp backend

---------

Co-authored-by: James N. <james.nguyen@microsoft.com>
Co-authored-by: Tim Sullivan <timothyj.sullivan1@gmail.com>
There is a security warning if we don't set permissions on the GitHub token. I'm adding contents read as a minimum, this may or may not be enough.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@james-tn I'll leave this for now, but I had thought we packaged the React app into the python FastAPI application.

Copy link
Contributor

@tjsullivan1 tjsullivan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are good to go.

@tjsullivan1 tjsullivan1 merged commit 82c8867 into main Jan 12, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants