Blind Scenario Testing Framework

A framework for blind behavioral testing of live API systems. Ship an independent test suite that validates your platform's security, auth, RBAC, input handling, and UI behavior — without importing a single line of your application code.

What Is Blind Scenario Testing?

Traditional testing verifies code from the inside. Blind scenario testing verifies behavior from the outside. Your test suite:

Lives in a separate repository from the platform it tests
Has zero access to source code, internal models, or implementation details
Treats the platform as a black box and asserts only on observable HTTP behavior
Runs against a digital twin — a Docker-composed replica of your production stack with a mock LLM backend

This approach catches regressions that unit tests miss: broken auth cookies, missing security headers, RBAC gaps, injection vulnerabilities, and policy enforcement failures. Because the tests don't know how the platform works internally, they can't be accidentally coupled to implementation details.

Why?

Problem	How blind testing solves it
Tests coupled to implementation details	Tests only know HTTP endpoints and expected status codes
Security regressions slip through	Dedicated injection, header, and privilege escalation scenarios
Auth/RBAC changes break silently	Role-based scenarios catch permission drift immediately
"Works on my machine"	Docker-composed digital twin is the single source of truth
LLM costs during testing	Mock LLM server returns deterministic, keyword-matched responses

Quick Start

Prerequisites

Python 3.11+
Docker & Docker Compose
Your platform's source code (for building the digital twin)

Setup

# Clone this repo
git clone https://github.com/NathanMaine/blind-scenario-testing.git
cd blind-scenario-testing

# Create a virtual environment
python -m venv .venv && source .venv/bin/activate

# Install dependencies
pip install -e .

# Install Playwright browsers (for UI tests)
playwright install chromium

# Copy and configure environment
cp .env.example .env
# Edit .env — set PLATFORM_PATH to your platform's source directory

Run

# Full sweep: start stack, seed data, run all tests, tear down
make sweep

# Or step by step:
make up        # Start the digital twin stack
make seed      # Create test users and upload fixture documents
make test      # Run all scenario tests
make down      # Tear down the stack

# Run subsets:
make test-api  # API tests only (skip UI/Playwright)
make test-ui   # UI tests only

Project Structure

blind-scenario-testing/
├── conftest.py                    # Root fixtures: base URLs and httpx clients
├── pytest.ini                     # Test configuration and markers
├── pyproject.toml                 # Dependencies
├── Makefile                       # Lifecycle targets (up/seed/test/down/sweep)
├── docker-compose.test.yml        # Digital twin stack definition
├── .env.example                   # Environment variable template
│
├── mock_llm/                      # Ollama-compatible mock LLM server
│   ├── Dockerfile
│   ├── server.py                  # FastAPI server (OpenAI + Ollama endpoints)
│   └── responses.py               # Keyword→response rules (customize this!)
│
├── fixtures/
│   ├── seed.py                    # Bootstrap test users and documents
│   ├── users.json                 # Test user definitions (roles + credentials)
│   └── documents/                 # Fixture documents for RAG testing
│       └── sample_document.txt
│
├── scenarios/                     # All test scenarios
│   ├── conftest.py                # Pre-authenticates test users
│   ├── auth/                      # Authentication tests
│   │   ├── test_login_flows.py
│   │   └── test_cookie_security.py
│   ├── authz/                     # Authorization / RBAC tests
│   │   ├── test_rbac_enforcement.py
│   │   └── test_privilege_escalation.py
│   ├── chat/                      # Chat pipeline tests
│   │   └── test_chat_pipeline.py
│   ├── health/                    # Health endpoint tests
│   │   └── test_health_endpoints.py
│   ├── security/                  # Security header + injection tests
│   │   ├── test_security_headers.py
│   │   └── test_injection_attacks.py
│   └── ui/                        # Playwright browser tests
│       ├── conftest.py            # Browser/page fixtures + screenshot on failure
│       └── test_login_ui.py
│
├── ci/
│   ├── run-sweep.sh               # Full sweep script with cleanup trap
│   └── github-actions.yml         # Example GitHub Actions workflow
│
└── reports/                       # Generated test reports (gitignored)

How to Adapt This for Your Project

Step 1: Configure the Digital Twin

Edit docker-compose.test.yml to add your platform's services. The mock LLM is already configured — just wire your app to point at http://mock-llm:11434 for its LLM backend.

services:
  mock-llm:
    build: ./mock_llm        # Already configured

  your-app:
    build:
      context: ${PLATFORM_PATH}
    ports:
      - "18080:8080"
    environment:
      - LLM_HOST=http://mock-llm:11434
    depends_on:
      mock-llm:
        condition: service_healthy

Step 2: Customize Auth Fixtures

Edit scenarios/conftest.py to match your platform's login endpoint:

def _login(client, username, password):
    resp = client.post("/your/login/endpoint", json={
        "username": username,
        "password": password,
    })
    resp.raise_for_status()
    return resp.json()["your_token_field"]

Update fixtures/users.json with your platform's roles and credentials.

Step 3: Customize Mock LLM Responses

Edit mock_llm/responses.py with domain-specific keyword/response pairs:

KEYWORD_RESPONSES = [
    (["your-domain-term"], "Domain-specific response here."),
    (["another-keyword"], "Another response."),
]

Step 4: Write Your Scenarios

Use the included example scenarios as templates. Each scenario category has its own directory under scenarios/. The pattern is simple:

import pytest
pytestmark = pytest.mark.auth

class TestYourFeature:
    def test_something_works(self, client, admin_headers):
        resp = client.get("/your/endpoint", headers=admin_headers)
        assert resp.status_code == 200

Step 5: Add New Scenario Categories

Create a new directory under scenarios/ (e.g., scenarios/payments/)
Add an __init__.py
Register the marker in pytest.ini
Write your test files

Scenario Categories

Category	What It Tests
`auth`	Login flows, cookie security, password policy, session management, MFA
`authz`	RBAC enforcement, privilege escalation, unauthenticated access
`chat`	Chat pipeline, response structure, input validation
`health`	Service health endpoints, component status
`security`	HTTP headers, injection attacks (SQL, XSS, path traversal), oversized payloads
`ui`	Browser-based login, interaction, and error handling

Design Principles

Separation of Concerns

This repo = behavioral assertions only. No application imports.
Your platform repo = the system under test. Built via Docker.
Mock LLM = deterministic responses. Zero GPU, zero cost, fully reproducible.

Test Isolation

Session-scoped fixtures for base URLs and auth tokens (created once)
Throwaway users with UUID suffixes for mutation-heavy tests
Each UI test gets a fresh browser context

Fail-Closed Assertions

Tests assert on both "the right thing happened" and "the wrong thing didn't happen":

# Not just: "did login succeed?"
# Also: "does a forged JWT get rejected?"
# Also: "does a viewer get 403 on admin endpoints?"

CI Integration

GitHub Actions

Copy ci/github-actions.yml to your repo as .github/workflows/scenario-sweep.yml. You'll need:

A PLATFORM_TOKEN secret with access to your platform repo
Update the repository field to point at your platform

Local Sweep

./ci/run-sweep.sh

This runs the full lifecycle with automatic cleanup on failure.

Extending the Framework

Adding PII/Data Protection Tests

If your platform handles sensitive data, create a scenarios/pii/ category:

class TestPIIBlocking:
    def test_ssn_blocked(self, client, admin_headers):
        resp = client.post("/api/chat", json={
            "message": "My SSN is 123-45-6789",
        }, headers=admin_headers)
        # Assert the platform blocks or redacts PII
        assert resp.status_code in (200, 502)
        if resp.status_code == 200:
            data = resp.json()
            assert data.get("policy_decision") in ("blocked_pii", "redacted")

Adding Rate Limiting Tests

class TestRateLimiting:
    def test_burst_triggers_throttle(self, client, admin_headers):
        results = []
        for _ in range(50):
            resp = client.post("/api/chat",
                json={"message": "test"},
                headers=admin_headers)
            results.append(resp.status_code)
        assert 429 in results, "No rate limiting detected"

Adding Document/Upload Tests

class TestDocumentLifecycle:
    def test_upload_and_retrieve(self, client, admin_headers):
        with open("fixtures/documents/sample_document.txt", "rb") as f:
            resp = client.post("/api/documents",
                files={"file": ("test.txt", f)},
                headers=admin_headers)
        assert resp.status_code in (200, 201)

License

MIT License. See LICENSE for details.

Background

This framework was extracted from a production blind testing system used to validate a CMMC compliance platform. The original implementation ran 150+ parametrized scenarios across authentication, authorization, PII/CUI protection, chat pipelines, document management, gateway policies, audit logging, and browser UI — all without importing a single line of the platform's source code.

The "Dark Factory" pattern (named for manufacturing's lights-out factory concept) treats your entire platform as a sealed black box. You build a digital twin, point your tests at it, and assert on behavior. If a test fails, the platform has a behavioral regression — no matter what the internal code looks like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blind Scenario Testing Framework

What Is Blind Scenario Testing?

Why?

Quick Start

Prerequisites

Setup

Run

Project Structure

How to Adapt This for Your Project

Step 1: Configure the Digital Twin

Step 2: Customize Auth Fixtures

Step 3: Customize Mock LLM Responses

Step 4: Write Your Scenarios

Step 5: Add New Scenario Categories

Scenario Categories

Design Principles

Separation of Concerns

Test Isolation

Fail-Closed Assertions

CI Integration

GitHub Actions

Local Sweep

Extending the Framework

Adding PII/Data Protection Tests

Adding Rate Limiting Tests

Adding Document/Upload Tests

License

Background

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ci		ci
fixtures		fixtures
mock_llm		mock_llm
scenarios		scenarios
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conftest.py		conftest.py
docker-compose.test.yml		docker-compose.test.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

Blind Scenario Testing Framework

What Is Blind Scenario Testing?

Why?

Quick Start

Prerequisites

Setup

Run

Project Structure

How to Adapt This for Your Project

Step 1: Configure the Digital Twin

Step 2: Customize Auth Fixtures

Step 3: Customize Mock LLM Responses

Step 4: Write Your Scenarios

Step 5: Add New Scenario Categories

Scenario Categories

Design Principles

Separation of Concerns

Test Isolation

Fail-Closed Assertions

CI Integration

GitHub Actions

Local Sweep

Extending the Framework

Adding PII/Data Protection Tests

Adding Rate Limiting Tests

Adding Document/Upload Tests

License

Background

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages