Skip to content

NathanMaine/blind-scenario-testing

Repository files navigation

Blind Scenario Testing Framework

A framework for blind behavioral testing of live API systems. Ship an independent test suite that validates your platform's security, auth, RBAC, input handling, and UI behavior — without importing a single line of your application code.

What Is Blind Scenario Testing?

Traditional testing verifies code from the inside. Blind scenario testing verifies behavior from the outside. Your test suite:

  • Lives in a separate repository from the platform it tests
  • Has zero access to source code, internal models, or implementation details
  • Treats the platform as a black box and asserts only on observable HTTP behavior
  • Runs against a digital twin — a Docker-composed replica of your production stack with a mock LLM backend

This approach catches regressions that unit tests miss: broken auth cookies, missing security headers, RBAC gaps, injection vulnerabilities, and policy enforcement failures. Because the tests don't know how the platform works internally, they can't be accidentally coupled to implementation details.

Why?

Problem How blind testing solves it
Tests coupled to implementation details Tests only know HTTP endpoints and expected status codes
Security regressions slip through Dedicated injection, header, and privilege escalation scenarios
Auth/RBAC changes break silently Role-based scenarios catch permission drift immediately
"Works on my machine" Docker-composed digital twin is the single source of truth
LLM costs during testing Mock LLM server returns deterministic, keyword-matched responses

Quick Start

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose
  • Your platform's source code (for building the digital twin)

Setup

# Clone this repo
git clone https://github.com/NathanMaine/blind-scenario-testing.git
cd blind-scenario-testing

# Create a virtual environment
python -m venv .venv && source .venv/bin/activate

# Install dependencies
pip install -e .

# Install Playwright browsers (for UI tests)
playwright install chromium

# Copy and configure environment
cp .env.example .env
# Edit .env — set PLATFORM_PATH to your platform's source directory

Run

# Full sweep: start stack, seed data, run all tests, tear down
make sweep

# Or step by step:
make up        # Start the digital twin stack
make seed      # Create test users and upload fixture documents
make test      # Run all scenario tests
make down      # Tear down the stack

# Run subsets:
make test-api  # API tests only (skip UI/Playwright)
make test-ui   # UI tests only

Project Structure

blind-scenario-testing/
├── conftest.py                    # Root fixtures: base URLs and httpx clients
├── pytest.ini                     # Test configuration and markers
├── pyproject.toml                 # Dependencies
├── Makefile                       # Lifecycle targets (up/seed/test/down/sweep)
├── docker-compose.test.yml        # Digital twin stack definition
├── .env.example                   # Environment variable template
│
├── mock_llm/                      # Ollama-compatible mock LLM server
│   ├── Dockerfile
│   ├── server.py                  # FastAPI server (OpenAI + Ollama endpoints)
│   └── responses.py               # Keyword→response rules (customize this!)
│
├── fixtures/
│   ├── seed.py                    # Bootstrap test users and documents
│   ├── users.json                 # Test user definitions (roles + credentials)
│   └── documents/                 # Fixture documents for RAG testing
│       └── sample_document.txt
│
├── scenarios/                     # All test scenarios
│   ├── conftest.py                # Pre-authenticates test users
│   ├── auth/                      # Authentication tests
│   │   ├── test_login_flows.py
│   │   └── test_cookie_security.py
│   ├── authz/                     # Authorization / RBAC tests
│   │   ├── test_rbac_enforcement.py
│   │   └── test_privilege_escalation.py
│   ├── chat/                      # Chat pipeline tests
│   │   └── test_chat_pipeline.py
│   ├── health/                    # Health endpoint tests
│   │   └── test_health_endpoints.py
│   ├── security/                  # Security header + injection tests
│   │   ├── test_security_headers.py
│   │   └── test_injection_attacks.py
│   └── ui/                        # Playwright browser tests
│       ├── conftest.py            # Browser/page fixtures + screenshot on failure
│       └── test_login_ui.py
│
├── ci/
│   ├── run-sweep.sh               # Full sweep script with cleanup trap
│   └── github-actions.yml         # Example GitHub Actions workflow
│
└── reports/                       # Generated test reports (gitignored)

How to Adapt This for Your Project

Step 1: Configure the Digital Twin

Edit docker-compose.test.yml to add your platform's services. The mock LLM is already configured — just wire your app to point at http://mock-llm:11434 for its LLM backend.

services:
  mock-llm:
    build: ./mock_llm        # Already configured

  your-app:
    build:
      context: ${PLATFORM_PATH}
    ports:
      - "18080:8080"
    environment:
      - LLM_HOST=http://mock-llm:11434
    depends_on:
      mock-llm:
        condition: service_healthy

Step 2: Customize Auth Fixtures

Edit scenarios/conftest.py to match your platform's login endpoint:

def _login(client, username, password):
    resp = client.post("/your/login/endpoint", json={
        "username": username,
        "password": password,
    })
    resp.raise_for_status()
    return resp.json()["your_token_field"]

Update fixtures/users.json with your platform's roles and credentials.

Step 3: Customize Mock LLM Responses

Edit mock_llm/responses.py with domain-specific keyword/response pairs:

KEYWORD_RESPONSES = [
    (["your-domain-term"], "Domain-specific response here."),
    (["another-keyword"], "Another response."),
]

Step 4: Write Your Scenarios

Use the included example scenarios as templates. Each scenario category has its own directory under scenarios/. The pattern is simple:

import pytest
pytestmark = pytest.mark.auth

class TestYourFeature:
    def test_something_works(self, client, admin_headers):
        resp = client.get("/your/endpoint", headers=admin_headers)
        assert resp.status_code == 200

Step 5: Add New Scenario Categories

  1. Create a new directory under scenarios/ (e.g., scenarios/payments/)
  2. Add an __init__.py
  3. Register the marker in pytest.ini
  4. Write your test files

Scenario Categories

Category What It Tests
auth Login flows, cookie security, password policy, session management, MFA
authz RBAC enforcement, privilege escalation, unauthenticated access
chat Chat pipeline, response structure, input validation
health Service health endpoints, component status
security HTTP headers, injection attacks (SQL, XSS, path traversal), oversized payloads
ui Browser-based login, interaction, and error handling

Design Principles

Separation of Concerns

  • This repo = behavioral assertions only. No application imports.
  • Your platform repo = the system under test. Built via Docker.
  • Mock LLM = deterministic responses. Zero GPU, zero cost, fully reproducible.

Test Isolation

  • Session-scoped fixtures for base URLs and auth tokens (created once)
  • Throwaway users with UUID suffixes for mutation-heavy tests
  • Each UI test gets a fresh browser context

Fail-Closed Assertions

Tests assert on both "the right thing happened" and "the wrong thing didn't happen":

# Not just: "did login succeed?"
# Also: "does a forged JWT get rejected?"
# Also: "does a viewer get 403 on admin endpoints?"

CI Integration

GitHub Actions

Copy ci/github-actions.yml to your repo as .github/workflows/scenario-sweep.yml. You'll need:

  1. A PLATFORM_TOKEN secret with access to your platform repo
  2. Update the repository field to point at your platform

Local Sweep

./ci/run-sweep.sh

This runs the full lifecycle with automatic cleanup on failure.

Extending the Framework

Adding PII/Data Protection Tests

If your platform handles sensitive data, create a scenarios/pii/ category:

class TestPIIBlocking:
    def test_ssn_blocked(self, client, admin_headers):
        resp = client.post("/api/chat", json={
            "message": "My SSN is 123-45-6789",
        }, headers=admin_headers)
        # Assert the platform blocks or redacts PII
        assert resp.status_code in (200, 502)
        if resp.status_code == 200:
            data = resp.json()
            assert data.get("policy_decision") in ("blocked_pii", "redacted")

Adding Rate Limiting Tests

class TestRateLimiting:
    def test_burst_triggers_throttle(self, client, admin_headers):
        results = []
        for _ in range(50):
            resp = client.post("/api/chat",
                json={"message": "test"},
                headers=admin_headers)
            results.append(resp.status_code)
        assert 429 in results, "No rate limiting detected"

Adding Document/Upload Tests

class TestDocumentLifecycle:
    def test_upload_and_retrieve(self, client, admin_headers):
        with open("fixtures/documents/sample_document.txt", "rb") as f:
            resp = client.post("/api/documents",
                files={"file": ("test.txt", f)},
                headers=admin_headers)
        assert resp.status_code in (200, 201)

License

MIT License. See LICENSE for details.

Background

This framework was extracted from a production blind testing system used to validate a CMMC compliance platform. The original implementation ran 150+ parametrized scenarios across authentication, authorization, PII/CUI protection, chat pipelines, document management, gateway policies, audit logging, and browser UI — all without importing a single line of the platform's source code.

The "Dark Factory" pattern (named for manufacturing's lights-out factory concept) treats your entire platform as a sealed black box. You build a digital twin, point your tests at it, and assert on behavior. If a test fails, the platform has a behavioral regression — no matter what the internal code looks like.

About

Framework for blind behavioral testing of live API systems. Black-box scenario testing using Docker digital twins and mock LLM backends.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors