Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
3273113
Add CLAUDE.md with project guidance for Claude Code
brunstof Jan 14, 2026
00675e2
Improve CLAUDE.md with accuracy fixes and additional patterns
brunstof Jan 14, 2026
c1635f9
Fix Python version inconsistency and improve CLAUDE.md
brunstof Jan 14, 2026
7cf4139
Add MariaDB version requirement and logging config to CLAUDE.md
brunstof Jan 14, 2026
bbdd419
Add production-readiness improvements and health check endpoint
brunstof Jan 15, 2026
4c3f8bb
Add troubleshooting scenarios to mariadb-debug skill
brunstof Jan 15, 2026
48b6ce6
Fix mypy type errors and add dev tooling
brunstof Jan 20, 2026
2a531ed
Fix asyncmy pool initialization issues
brunstof Jan 20, 2026
7494245
Fix container health checks for MariaDB 11 compatibility
brunstof Jan 26, 2026
6fd852b
Add geography database scripts and MCP client tests
brunstof Jan 26, 2026
03f8f02
Fix Docker container issues and improve compatibility
brunstof Jan 26, 2026
d4ce561
Add geography database views for querying cities
brunstof Jan 26, 2026
ce3ddc3
Update CLAUDE.md with improved documentation
brunstof Jan 29, 2026
a5f1aa7
Clean up settings.local.json: replace one-time entries with reusable …
brunstof Feb 25, 2026
5219db1
Resolve docker-compose.yml merge conflict: combine healthchecks with …
brunstof Feb 25, 2026
3d05a73
Add curl healthcheck to Dockerfile and fix main.py project name
brunstof Feb 25, 2026
cd5525f
Fix two blockers: register /health route and remove .env* from Docker…
brunstof Feb 25, 2026
b731510
Fix five code review warnings across server, config, embeddings, and …
brunstof Feb 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .claude/commands/fix-issue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Fix GitHub Issue

Fix GitHub issue #$ARGUMENTS

## Workflow

1. **Fetch issue details** using `gh issue view $ARGUMENTS`
2. **Understand the root cause** by reading relevant code
3. **Create a plan** for the fix using extended thinking
4. **Implement the fix** with minimal changes
5. **Write or update tests** to cover the fix
6. **Run tests** to verify the fix works
7. **Create a commit** with message "Fixes #$ARGUMENTS: <description>"

## Guidelines

- Focus on the specific issue - avoid unrelated changes
- Follow existing code patterns in the repository
- Ensure backward compatibility unless explicitly requested
- Update documentation if behavior changes
34 changes: 34 additions & 0 deletions .claude/commands/review-pr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Review Pull Request

Review PR #$ARGUMENTS

## Workflow

1. **Fetch PR details** using `gh pr view $ARGUMENTS`
2. **Get the diff** using `gh pr diff $ARGUMENTS`
3. **Understand the changes** - what problem does this PR solve?
4. **Review code quality**:
- Check for bugs or logic errors
- Verify error handling
- Look for security issues (SQL injection, XSS, etc.)
- Check naming conventions and code clarity
5. **Verify test coverage** - are changes adequately tested?
6. **Check for breaking changes** - is backward compatibility maintained?
7. **Provide constructive feedback** with specific suggestions

## Review Checklist

- [ ] Code follows project conventions
- [ ] No obvious security vulnerabilities
- [ ] Error cases are handled appropriately
- [ ] Tests cover the changes
- [ ] Documentation updated if needed
- [ ] No unrelated changes included

## Output Format

Provide a summary with:
- Overall assessment (approve/request changes/comment)
- Specific issues found (with line references)
- Suggestions for improvement
- Questions for the author
19 changes: 19 additions & 0 deletions .claude/settings.local.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"permissions": {
"allow": [
"Bash(git add:*)",
"Bash(git commit:*)",
"Bash(git push:*)",
"Bash(git remote add:*)",
"Bash(where:*)",
"Bash(/c/Users/brunstof/AppData/Local/gh/bin/gh.exe:*)",
"mcp__github__fork_repository",
"mcp__github__create_pull_request",
"mcp__github__get_pull_request",
"mcp__MCP_DOCKER__mcp-find",
"mcp__MCP_DOCKER__mcp-config-set",
"mcp__MCP_DOCKER__mcp-add",
"mcp__MCP_DOCKER__browser_navigate"
]
}
}
119 changes: 119 additions & 0 deletions .claude/skills/mariadb-debug/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
name: mariadb-debug
description: Debug MariaDB MCP server issues, analyze connection pool problems, troubleshoot embedding service failures, diagnose vector store operations. Use when working with database connectivity, embedding errors, or MCP tool failures.
---

# MariaDB MCP Server Debugging

## Key Files to Check

1. **src/server.py** - Main MCP server and tool definitions
- Connection pool initialization (`initialize_pool`)
- Tool registration (`register_tools`)
- Query execution (`_execute_query`)

2. **src/config.py** - Configuration loading
- Environment variables validation
- Logging setup
- Embedding provider configuration

3. **src/embeddings.py** - Embedding service
- Provider initialization (OpenAI, Gemini, HuggingFace)
- Model dimension lookup
- Embedding generation

4. **logs/mcp_server.log** - Server logs

## Common Issues & Solutions

### Connection Pool Exhaustion
- **Symptom**: "Database connection pool not available"
- **Check**: `MCP_MAX_POOL_SIZE` in .env (default: 10)
- **Solution**: Increase pool size or check for connection leaks

### Embedding Service Failures
- **Symptom**: "Embedding provider not configured" or API errors
- **Check**: `EMBEDDING_PROVIDER` must be: openai, gemini, or huggingface
- **Verify**: Corresponding API key is set (OPENAI_API_KEY, GEMINI_API_KEY, or HF_MODEL)

### Read-Only Mode Violations
- **Symptom**: "Operation forbidden: Server is in read-only mode"
- **Check**: `MCP_READ_ONLY` environment variable
- **Note**: Only SELECT, SHOW, DESCRIBE queries allowed when true

### Vector Store Creation Fails
- **Symptom**: "Failed to create vector store"
- **Check**:
- Database exists and user has CREATE TABLE permission
- Embedding dimension matches model (e.g., text-embedding-3-small = 1536)
- MariaDB version supports VECTOR type

### Tool Not Registered
- **Symptom**: Tool not found errors
- **Check**: EMBEDDING_PROVIDER must be set for vector tools
- **Verify**: Pool initialized before tool registration

### Connection Timeout
- **Symptom**: Queries hang or timeout errors
- **Check**: `DB_CONNECT_TIMEOUT`, `DB_READ_TIMEOUT`, `DB_WRITE_TIMEOUT` in .env
- **Defaults**: 10s connect, 30s read/write
- **Solution**: Increase timeout values or check database server load

### Large Result Sets
- **Symptom**: Memory errors or slow responses
- **Check**: `MCP_MAX_RESULTS` in .env (default: 10000)
- **Solution**: Add LIMIT to queries or reduce MCP_MAX_RESULTS

### Embedding Rate Limiting
- **Symptom**: API quota exceeded or 429 errors
- **Check**: `EMBEDDING_MAX_CONCURRENT` in .env (default: 5)
- **Solution**: Reduce concurrent limit or upgrade API plan

### Health Check Failures (Docker)
- **Symptom**: Container marked unhealthy
- **Check**: `/health` endpoint returns 503
- **Verify**: Database connection pool is initialized
- **Solution**: Check DB credentials and network connectivity

### Multiple Database Config Mismatch
- **Symptom**: Warning about array length mismatch
- **Check**: `DB_HOSTS`, `DB_USERS`, `DB_PASSWORDS` must have same length
- **Solution**: Ensure comma-separated values align across all multi-DB env vars

### Metadata JSON Parse Errors
- **Symptom**: Warning logs about metadata parsing
- **Check**: `logs/mcp_server.log` for JSON decode errors
- **Solution**: Verify metadata stored correctly in vector store

## Debugging Commands

```bash
# Check server logs
tail -f logs/mcp_server.log

# Test database connection
uv run python -c "from config import *; print(f'DB: {DB_HOST}:{DB_PORT}')"

# Verify environment
uv run python -c "from config import *; print(f'Provider: {EMBEDDING_PROVIDER}')"

# Run tests
uv run -m pytest src/tests/ -v
```

## Environment Variables Reference

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| DB_HOST | Yes | localhost | MariaDB host |
| DB_PORT | No | 3306 | MariaDB port |
| DB_USER | Yes | - | Database username |
| DB_PASSWORD | Yes | - | Database password |
| DB_CONNECT_TIMEOUT | No | 10 | Connection timeout (seconds) |
| DB_READ_TIMEOUT | No | 30 | Read timeout (seconds) |
| DB_WRITE_TIMEOUT | No | 30 | Write timeout (seconds) |
| MCP_READ_ONLY | No | true | Enforce read-only |
| MCP_MAX_POOL_SIZE | No | 10 | Max connections in pool |
| MCP_MAX_RESULTS | No | 10000 | Max rows per query |
| EMBEDDING_PROVIDER | No | None | openai/gemini/huggingface |
| EMBEDDING_MAX_CONCURRENT | No | 5 | Max concurrent embedding calls |
26 changes: 26 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Virtual environments
.venv/
venv/
__pycache__/

# Git
.git/
.gitignore

# IDE
.vscode/
.idea/
.claude/

# Logs and caches
logs/
*.log
*.pyc
*.pyo

# Downloaded data
scripts/geography_data/

# Local env files (use .env in container)
.env.local
.env.*.local
78 changes: 78 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
## Quick orientation for AI coding agents

This repository implements a Model Context Protocol (MCP) server that exposes MariaDB-focused tools and optional vector/embedding features.

- Entry points & important files:
- `src/server.py` — main MCP server implementation and tool definitions (list_databases, list_tables, execute_sql, vector-store tools, etc.). Read this first to understand available tools and their contracts.
- `src/embeddings.py` — provider-agnostic EmbeddingService (OpenAI, Gemini, HuggingFace). Embedding clients are initialized at runtime based on env config.
- `src/config.py` — loads `.env` and environment variables; contains defaults (notably `MCP_READ_ONLY` default=true) and validation that can raise on missing keys.
- `src/tests/` — integration-style tests that demonstrate how the server is started and how the FastMCP client calls tools. Useful runnable examples.
- `README.md` — installation, run commands and example tool payloads (useful to replicate CLI behavior).

## Big-picture architecture (short)

- FastMCP-based server: `MariaDBServer` builds a `FastMCP` instance and registers tools. Tools are asynchronous methods on `MariaDBServer`.
- Database access: Uses `asyncmy` connection pool. Pool is created by `MariaDBServer.initialize_pool()` and used by `_execute_query()` for all SQL operations.
- Embeddings: Optional feature toggled by `EMBEDDING_PROVIDER` in env. `EmbeddingService` supports OpenAI, Gemini, and HuggingFace. When disabled, all vector-store tools should be treated as unavailable.
- Vector-store implementation: persisted in MariaDB tables (VECTOR column + VECTOR INDEX). The server uses information_schema queries to validate existence and structure of vector stores.

Why certain choices matter for edits:
- `config.py` reads env at import time and will raise if required embedding keys are missing — set env before importing modules in tests or scripts.
- `MCP_READ_ONLY` influences `self.autocommit` and `_execute_query` enforcement: code blocks non-read-only queries when read-only mode is enabled.

## Developer workflows and concrete commands

- Python version: 3.11 (see `pyproject.toml`).
- Dependency & environment setup (as in README):
- Install `uv` and sync dependencies:
```bash
pip install uv
uv lock
uv sync
```
- Run server (examples shown in README):
- stdio (default): `uv run server.py`
- SSE transport: `uv run server.py --transport sse --host 127.0.0.1 --port 9001`
- HTTP transport: `uv run server.py --transport http --host 127.0.0.1 --port 9001 --path /mcp`
- Tests: tests live in `src/tests/` and use `unittest.IsolatedAsyncioTestCase` with `anyio` and `fastmcp.client.Client`. They start the server in-process by calling `MariaDBServer.run_async_server('stdio')` and then call tools through `Client(self.server.mcp)`. Run them with your preferred runner, e.g.:
```bash
# With unittest discovery
python -m unittest discover -s src/tests
```

## Project-specific patterns & gotchas for agents

- Environment-on-import: `config.py` performs validation and logs/raises if required env vars are not set (e.g., DB_USER/DB_PASSWORD, provider-specific API keys). Always ensure env is configured before importing modules.
- Read-only enforcement: `_execute_query()` strips comments and checks an allow-list of SQL prefixes (`SELECT`, `SHOW`, `DESC`, `DESCRIBE`, `USE`). Any mutation must either run with `MCP_READ_ONLY=false` or be explicitly implemented as a server tool that bypasses that check (not recommended).
- Validation via information_schema: many tools check existence and vector-store status using `information_schema` queries — prefer reproducing those queries when writing migrations or tools that manipulate schema.
- Embedding service lifecycle: `EmbeddingService` may try to import provider SDKs on init and raise if missing; tests and CI should supply the right env or mock the service.

## Integration & external dependencies

- DB: MariaDB reachable via `DB_HOST`, `DB_PORT`, `DB_USER`, `DB_PASSWORD`. `DB_NAME` is optional; many tools accept `database_name` parameter.
- Embedding providers:
- `openai` (requires `OPENAI_API_KEY`) — uses `openai` AsyncOpenAI client when available.
- `gemini` (requires `GEMINI_API_KEY`) — uses `google.genai` when present.
- `huggingface` (requires `HF_MODEL`) — uses `sentence-transformers` and may dynamically load models.
- Logs: default file at `logs/mcp_server.log` (configurable via env). Use this for debugging server startup or tool call failures.

## Examples extracted from the codebase

- How tests start the server (see `src/tests/test_mcp_server.py`):
- Instantiate server: `server = MariaDBServer(autocommit=False)`
- Start background server task: `tg.start_soon(server.run_async_server, 'stdio')`
- Create client: `client = fastmcp.client.Client(server.mcp)` and call `await client.call_tool('list_databases', {})`.

- Tool payload example (from README):
```json
{"tool":"execute_sql","parameters":{"database_name":"test_db","sql_query":"SELECT * FROM users WHERE id = %s","parameters":[123]}}
```

## Short checklist for code changes

1. Ensure required env vars are set before imports (or mock config/EmbeddingService in tests).
2. If adding SQL tools, follow `_execute_query()`'s comment-stripping + prefix checks; avoid enabling writes unless intended.
3. If changing embedding behavior, reference `src/embeddings.py` model lists and `pyproject.toml` dependencies — CI must install required SDKs.
4. Run unit/integration tests in `src/tests/` using unittest discovery or pytest if present.

If anything in this document is unclear or you'd like more concrete examples (unit test scaffolds, CI matrix entries, or mock patterns for `EmbeddingService`), tell me which section to expand and I'll iterate.
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ src/logs/*
.env
uv.lock
.DS_Store
.env
.env

# Downloaded data files
scripts/geography_data/
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.11
3.13
Loading