MariaDB · brunstof · Jan 14, 2026 · Jan 14, 2026 · Jan 14, 2026 · Jan 14, 2026
diff --git a/.claude/commands/fix-issue.md b/.claude/commands/fix-issue.md
@@ -0,0 +1,20 @@
+# Fix GitHub Issue
+
+Fix GitHub issue #$ARGUMENTS
+
+## Workflow
+
+1. **Fetch issue details** using `gh issue view $ARGUMENTS`
+2. **Understand the root cause** by reading relevant code
+3. **Create a plan** for the fix using extended thinking
+4. **Implement the fix** with minimal changes
+5. **Write or update tests** to cover the fix
+6. **Run tests** to verify the fix works
+7. **Create a commit** with message "Fixes #$ARGUMENTS: <description>"
+
+## Guidelines
+
+- Focus on the specific issue - avoid unrelated changes
+- Follow existing code patterns in the repository
+- Ensure backward compatibility unless explicitly requested
+- Update documentation if behavior changes
diff --git a/.claude/commands/review-pr.md b/.claude/commands/review-pr.md
@@ -0,0 +1,34 @@
+# Review Pull Request
+
+Review PR #$ARGUMENTS
+
+## Workflow
+
+1. **Fetch PR details** using `gh pr view $ARGUMENTS`
+2. **Get the diff** using `gh pr diff $ARGUMENTS`
+3. **Understand the changes** - what problem does this PR solve?
+4. **Review code quality**:
+   - Check for bugs or logic errors
+   - Verify error handling
+   - Look for security issues (SQL injection, XSS, etc.)
+   - Check naming conventions and code clarity
+5. **Verify test coverage** - are changes adequately tested?
+6. **Check for breaking changes** - is backward compatibility maintained?
+7. **Provide constructive feedback** with specific suggestions
+
+## Review Checklist
+
+- [ ] Code follows project conventions
+- [ ] No obvious security vulnerabilities
+- [ ] Error cases are handled appropriately
+- [ ] Tests cover the changes
+- [ ] Documentation updated if needed
+- [ ] No unrelated changes included
+
+## Output Format
+
+Provide a summary with:
+- Overall assessment (approve/request changes/comment)
+- Specific issues found (with line references)
+- Suggestions for improvement
+- Questions for the author
diff --git a/.claude/settings.local.json b/.claude/settings.local.json
@@ -0,0 +1,19 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(git add:*)",
+      "Bash(git commit:*)",
+      "Bash(git push:*)",
+      "Bash(git remote add:*)",
+      "Bash(where:*)",
+      "Bash(/c/Users/brunstof/AppData/Local/gh/bin/gh.exe:*)",
+      "mcp__github__fork_repository",
+      "mcp__github__create_pull_request",
+      "mcp__github__get_pull_request",
+      "mcp__MCP_DOCKER__mcp-find",
+      "mcp__MCP_DOCKER__mcp-config-set",
+      "mcp__MCP_DOCKER__mcp-add",
+      "mcp__MCP_DOCKER__browser_navigate"
+    ]
+  }
+}
diff --git a/.claude/skills/mariadb-debug/SKILL.md b/.claude/skills/mariadb-debug/SKILL.md
@@ -0,0 +1,119 @@
+---
+name: mariadb-debug
+description: Debug MariaDB MCP server issues, analyze connection pool problems, troubleshoot embedding service failures, diagnose vector store operations. Use when working with database connectivity, embedding errors, or MCP tool failures.
+---
+
+# MariaDB MCP Server Debugging
+
+## Key Files to Check
+
+1. **src/server.py** - Main MCP server and tool definitions
+   - Connection pool initialization (`initialize_pool`)
+   - Tool registration (`register_tools`)
+   - Query execution (`_execute_query`)
+
+2. **src/config.py** - Configuration loading
+   - Environment variables validation
+   - Logging setup
+   - Embedding provider configuration
+
+3. **src/embeddings.py** - Embedding service
+   - Provider initialization (OpenAI, Gemini, HuggingFace)
+   - Model dimension lookup
+   - Embedding generation
+
+4. **logs/mcp_server.log** - Server logs
+
+## Common Issues & Solutions
+
+### Connection Pool Exhaustion
+- **Symptom**: "Database connection pool not available"
+- **Check**: `MCP_MAX_POOL_SIZE` in .env (default: 10)
+- **Solution**: Increase pool size or check for connection leaks
+
+### Embedding Service Failures
+- **Symptom**: "Embedding provider not configured" or API errors
+- **Check**: `EMBEDDING_PROVIDER` must be: openai, gemini, or huggingface
+- **Verify**: Corresponding API key is set (OPENAI_API_KEY, GEMINI_API_KEY, or HF_MODEL)
+
+### Read-Only Mode Violations
+- **Symptom**: "Operation forbidden: Server is in read-only mode"
+- **Check**: `MCP_READ_ONLY` environment variable
+- **Note**: Only SELECT, SHOW, DESCRIBE queries allowed when true
+
+### Vector Store Creation Fails
+- **Symptom**: "Failed to create vector store"
+- **Check**:
+  - Database exists and user has CREATE TABLE permission
+  - Embedding dimension matches model (e.g., text-embedding-3-small = 1536)
+  - MariaDB version supports VECTOR type
+
+### Tool Not Registered
+- **Symptom**: Tool not found errors
+- **Check**: EMBEDDING_PROVIDER must be set for vector tools
+- **Verify**: Pool initialized before tool registration
+
+### Connection Timeout
+- **Symptom**: Queries hang or timeout errors
+- **Check**: `DB_CONNECT_TIMEOUT`, `DB_READ_TIMEOUT`, `DB_WRITE_TIMEOUT` in .env
+- **Defaults**: 10s connect, 30s read/write
+- **Solution**: Increase timeout values or check database server load
+
+### Large Result Sets
+- **Symptom**: Memory errors or slow responses
+- **Check**: `MCP_MAX_RESULTS` in .env (default: 10000)
+- **Solution**: Add LIMIT to queries or reduce MCP_MAX_RESULTS
+
+### Embedding Rate Limiting
+- **Symptom**: API quota exceeded or 429 errors
+- **Check**: `EMBEDDING_MAX_CONCURRENT` in .env (default: 5)
+- **Solution**: Reduce concurrent limit or upgrade API plan
+
+### Health Check Failures (Docker)
+- **Symptom**: Container marked unhealthy
+- **Check**: `/health` endpoint returns 503
+- **Verify**: Database connection pool is initialized
+- **Solution**: Check DB credentials and network connectivity
+
+### Multiple Database Config Mismatch
+- **Symptom**: Warning about array length mismatch
+- **Check**: `DB_HOSTS`, `DB_USERS`, `DB_PASSWORDS` must have same length
+- **Solution**: Ensure comma-separated values align across all multi-DB env vars
+
+### Metadata JSON Parse Errors
+- **Symptom**: Warning logs about metadata parsing
+- **Check**: `logs/mcp_server.log` for JSON decode errors
+- **Solution**: Verify metadata stored correctly in vector store
+
+## Debugging Commands
+
+```bash
+# Check server logs
+tail -f logs/mcp_server.log
+
+# Test database connection
+uv run python -c "from config import *; print(f'DB: {DB_HOST}:{DB_PORT}')"
+
+# Verify environment
+uv run python -c "from config import *; print(f'Provider: {EMBEDDING_PROVIDER}')"
+
+# Run tests
+uv run -m pytest src/tests/ -v
+```
+
+## Environment Variables Reference
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| DB_HOST | Yes | localhost | MariaDB host |
+| DB_PORT | No | 3306 | MariaDB port |
+| DB_USER | Yes | - | Database username |
+| DB_PASSWORD | Yes | - | Database password |
+| DB_CONNECT_TIMEOUT | No | 10 | Connection timeout (seconds) |
+| DB_READ_TIMEOUT | No | 30 | Read timeout (seconds) |
+| DB_WRITE_TIMEOUT | No | 30 | Write timeout (seconds) |
+| MCP_READ_ONLY | No | true | Enforce read-only |
+| MCP_MAX_POOL_SIZE | No | 10 | Max connections in pool |
+| MCP_MAX_RESULTS | No | 10000 | Max rows per query |
+| EMBEDDING_PROVIDER | No | None | openai/gemini/huggingface |
+| EMBEDDING_MAX_CONCURRENT | No | 5 | Max concurrent embedding calls |
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,26 @@
+# Virtual environments
+.venv/
+venv/
+__pycache__/
+
+# Git
+.git/
+.gitignore
+
+# IDE
+.vscode/
+.idea/
+.claude/
+
+# Logs and caches
+logs/
+*.log
+*.pyc
+*.pyo
+
+# Downloaded data
+scripts/geography_data/
+
+# Local env files (use .env in container)
+.env.local
+.env.*.local
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,78 @@
+## Quick orientation for AI coding agents
+
+This repository implements a Model Context Protocol (MCP) server that exposes MariaDB-focused tools and optional vector/embedding features.
+
+- Entry points & important files:
+  - `src/server.py` — main MCP server implementation and tool definitions (list_databases, list_tables, execute_sql, vector-store tools, etc.). Read this first to understand available tools and their contracts.
+  - `src/embeddings.py` — provider-agnostic EmbeddingService (OpenAI, Gemini, HuggingFace). Embedding clients are initialized at runtime based on env config.
+  - `src/config.py` — loads `.env` and environment variables; contains defaults (notably `MCP_READ_ONLY` default=true) and validation that can raise on missing keys.
+  - `src/tests/` — integration-style tests that demonstrate how the server is started and how the FastMCP client calls tools. Useful runnable examples.
+  - `README.md` — installation, run commands and example tool payloads (useful to replicate CLI behavior).
+
+## Big-picture architecture (short)
+
+- FastMCP-based server: `MariaDBServer` builds a `FastMCP` instance and registers tools. Tools are asynchronous methods on `MariaDBServer`.
+- Database access: Uses `asyncmy` connection pool. Pool is created by `MariaDBServer.initialize_pool()` and used by `_execute_query()` for all SQL operations.
+- Embeddings: Optional feature toggled by `EMBEDDING_PROVIDER` in env. `EmbeddingService` supports OpenAI, Gemini, and HuggingFace. When disabled, all vector-store tools should be treated as unavailable.
+- Vector-store implementation: persisted in MariaDB tables (VECTOR column + VECTOR INDEX). The server uses information_schema queries to validate existence and structure of vector stores.
+
+Why certain choices matter for edits:
+- `config.py` reads env at import time and will raise if required embedding keys are missing — set env before importing modules in tests or scripts.
+- `MCP_READ_ONLY` influences `self.autocommit` and `_execute_query` enforcement: code blocks non-read-only queries when read-only mode is enabled.
+
+## Developer workflows and concrete commands
+
+- Python version: 3.11 (see `pyproject.toml`).
+- Dependency & environment setup (as in README):
+  - Install `uv` and sync dependencies:
+    ```bash
+    pip install uv
+    uv lock
+    uv sync
+    ```
+- Run server (examples shown in README):
+  - stdio (default): `uv run server.py`
+  - SSE transport: `uv run server.py --transport sse --host 127.0.0.1 --port 9001`
+  - HTTP transport: `uv run server.py --transport http --host 127.0.0.1 --port 9001 --path /mcp`
+- Tests: tests live in `src/tests/` and use `unittest.IsolatedAsyncioTestCase` with `anyio` and `fastmcp.client.Client`. They start the server in-process by calling `MariaDBServer.run_async_server('stdio')` and then call tools through `Client(self.server.mcp)`. Run them with your preferred runner, e.g.:
+  ```bash
+  # With unittest discovery
+  python -m unittest discover -s src/tests
+  ```
+
+## Project-specific patterns & gotchas for agents
+
+- Environment-on-import: `config.py` performs validation and logs/raises if required env vars are not set (e.g., DB_USER/DB_PASSWORD, provider-specific API keys). Always ensure env is configured before importing modules.
+- Read-only enforcement: `_execute_query()` strips comments and checks an allow-list of SQL prefixes (`SELECT`, `SHOW`, `DESC`, `DESCRIBE`, `USE`). Any mutation must either run with `MCP_READ_ONLY=false` or be explicitly implemented as a server tool that bypasses that check (not recommended).
+- Validation via information_schema: many tools check existence and vector-store status using `information_schema` queries — prefer reproducing those queries when writing migrations or tools that manipulate schema.
+- Embedding service lifecycle: `EmbeddingService` may try to import provider SDKs on init and raise if missing; tests and CI should supply the right env or mock the service.
+
+## Integration & external dependencies
+
+- DB: MariaDB reachable via `DB_HOST`, `DB_PORT`, `DB_USER`, `DB_PASSWORD`. `DB_NAME` is optional; many tools accept `database_name` parameter.
+- Embedding providers:
+  - `openai` (requires `OPENAI_API_KEY`) — uses `openai` AsyncOpenAI client when available.
+  - `gemini` (requires `GEMINI_API_KEY`) — uses `google.genai` when present.
+  - `huggingface` (requires `HF_MODEL`) — uses `sentence-transformers` and may dynamically load models.
+- Logs: default file at `logs/mcp_server.log` (configurable via env). Use this for debugging server startup or tool call failures.
+
+## Examples extracted from the codebase
+
+- How tests start the server (see `src/tests/test_mcp_server.py`):
+  - Instantiate server: `server = MariaDBServer(autocommit=False)`
+  - Start background server task: `tg.start_soon(server.run_async_server, 'stdio')`
+  - Create client: `client = fastmcp.client.Client(server.mcp)` and call `await client.call_tool('list_databases', {})`.
+
+- Tool payload example (from README):
+  ```json
+  {"tool":"execute_sql","parameters":{"database_name":"test_db","sql_query":"SELECT * FROM users WHERE id = %s","parameters":[123]}}
+  ```
+
+## Short checklist for code changes
+
+1. Ensure required env vars are set before imports (or mock config/EmbeddingService in tests).
+2. If adding SQL tools, follow `_execute_query()`'s comment-stripping + prefix checks; avoid enabling writes unless intended.
+3. If changing embedding behavior, reference `src/embeddings.py` model lists and `pyproject.toml` dependencies — CI must install required SDKs.
+4. Run unit/integration tests in `src/tests/` using unittest discovery or pytest if present.
+
+If anything in this document is unclear or you'd like more concrete examples (unit test scaffolds, CI matrix entries, or mock patterns for `EmbeddingService`), tell me which section to expand and I'll iterate.
diff --git a/.gitignore b/.gitignore
@@ -9,5 +9,6 @@ src/logs/*
 .env
 uv.lock
 .DS_Store
-.env
-.env
+
+# Downloaded data files
+scripts/geography_data/
diff --git a/.python-version b/.python-version
@@ -1 +1 @@
-3.11
+3.13