This application provides a chat interface to interact with your documents using a locally running Small Language Model (SLM). It works entirely offline.
- Offline SLM: Uses
Phi-3-mini-4k-instructviallama-cpp-python. - RAG (Retrieval-Augmented Generation): Uses
ChromaDBandSentenceTransformersto index and retrieve relevant document chunks. - FastAPI Backend: Handles document ingestion and chat inference.
- Streamlit Frontend: Provides a user-friendly chat interface.
- Python 3.10+
- Basic build tools (for
llama-cpp-pythoncompilation)
-
Create a Virtual Environment (Recommended):
python3 -m venv venv source venv/bin/activate -
Install Dependencies:
pip install -r requirements.txt
-
Download Model:
python download_model.py
You can use the helper script to start both backend and frontend:
./start.shOr run them manually:
Backend:
uvicorn backend.main:app --reload --port 8000Frontend:
streamlit run frontend/app.py --server.port 8501- Open the Streamlit app (usually http://localhost:8501).
- Upload a PDF or Text file in the sidebar.
- Click "Ingest Document".
- Ask questions in the chat interface.
See archdocs/architecture.md for C4 model diagrams.
The application uses backend/config.py to manage settings. You can easily swap the SLM model by updating this file.
-
Edit
backend/config.py:class Config: MODEL_REPO = "microsoft/Phi-3-mini-4k-instruct-gguf" MODEL_FILENAME = "Phi-3-mini-4k-instruct-q4.gguf" PROMPT_TEMPLATE = "phi3" # Options: phi3, chatml, llama2
-
Download New Model:
python download_model.py
-
Restart Application:
./stop.sh ./start.sh