Extract information from images using text - an interactive chat interface powered local models.
Install uv and clone the repository:
git clone <repo-url>
cd image-chat
export PYTORCH_ENABLE_MPS_FALLBACK=1 # For macOS
uv syncInstall Ollama and pull the model:
ollama pull qwen2.5:3bLaunch the ollama in the background.
Terminal 1 - Start MCP server:
uv run python -m detection_mcp_server.mainThe server will start on http://127.0.0.1:8000. Verify it's running:
curl http://127.0.0.1:8000/healthTerminal 2 - Start Chat API server:
uv run python -m chat_api.mainTerminal 3 - Start Gradio client:
uv run python -m gradio_chat_client.mainOpen http://127.0.0.1:7860 in your browser. Upload an image and chat!
- "What is in this image?"
- "Detect all cars"
- "How many did you find?"
