Your friendly workspace buddy for Miles on Slack. Milo uses local AI models through Ollama and Retrieval Augmented Generation (RAG) to provide helpful responses while keeping your data private.
This project demonstrates a complete RAG pipeline in .NET, integrated with Slack. Key features include:
- Setting up a PostgreSQL vector database for knowledge storage
- Generating embeddings from documents using local Ollama models
- Storing documents and their embeddings securely
- Retrieving relevant document chunks based on semantic search
- Using retrieved context to enhance LLM queries with Ollama
- Chatting with Milo directly in Slack by mentioning
@milo - Privacy-focused: all processing happens locally or within your controlled environment
- .NET 9.0 SDK
- PostgreSQL 15+ with pgvector extension
- Ollama running locally with compatible models (embedding and generation models)
- (Optional) Slack Bot Token and App Token for Slack integration
The solution consists of five main projects:
- DBSetup: Contains the
Program.cswhich executes SQL scripts to set up the PostgreSQL database schema, including enabling thevectorextension and creatingdocumentsandembeddingstables with appropriate indexing. - Utils: Provides shared utilities, including database connection helpers (
DatabaseHelper.cs), Ollama client configuration (OllamaClient.cs), and potentially text processing functions used across other projects. Relies onNpgsqlandMicrosoft.Extensions.AIlibraries. - Embeddings: Reads documents (e.g., from
HandbookDocuments), processes them into chunks, generates vector embeddings using the configured Ollama embedding model viaUtils, and stores both the document content/source and the embeddings in the PostgreSQL database usingUtils. - QueryOllama: Takes a user query, generates an embedding for it, performs a semantic similarity search against the stored embeddings in PostgreSQL (using
pgvector's<=>operator), retrieves the top-k relevant document chunks, constructs a prompt including this context, sends the prompt to the configured Ollama generation model, and outputs the response. - SlackIntegration: Implements a Slack bot using
SlackNet(and potentiallyMicrosoft.Extensions.Hosting). It listens for mentions (@milo), extracts the user's query, uses the logic fromQueryOllama(or similar shared logic) to get a RAG-based answer, and posts the response back to the Slack channel. Handles configuration loading for Slack tokens and potentially Ollama/DB settings.
Set up the following environment variables. These are typically loaded using Microsoft.Extensions.Configuration within the applications. Default values might be provided in the code if variables are not set.
# PostgreSQL connection string (Required by DBSetup, Embeddings, QueryOllama, SlackIntegration)
POSTGRES_CONNECTION_STRING="Host=localhost;Username=your_username;Password=your_password;Database=your_database"
# Ollama settings (Used by Embeddings, QueryOllama, SlackIntegration)
# Defaults: Endpoint=http://localhost:11434, Model=gemma3, EmbeddingModel=jeffh/intfloat-multilingual-e5-large-instruct:f16
OLLAMA_ENDPOINT="http://localhost:11434"
OLLAMA_MODEL="gemma3"
OLLAMA_EMBEDDING_MODEL="jeffh/intfloat-multilingual-e5-large-instruct:f16"
# Slack settings (Required by SlackIntegration)
SLACK_BOT_TOKEN="xoxb-your-bot-token"
SLACK_APP_TOKEN="xapp-your-app-level-token"Ensure PostgreSQL is running and the pgvector extension is available. Then, run the DBSetup project:
cd Milo/DBSetup
dotnet runThis creates:
- A
documentstable (id,content,source) - An
embeddingstable (id,document_id,embedding VECTOR(1024)) - An IVFFlat index (
idx_embeddings) on theembeddingcolumn for efficient similarity search.
Place your text-based documents (e.g., .md, .txt files) in the Milo/Embeddings/HandbookDocuments directory. The Embeddings project will recursively scan this directory.
Run the Embeddings project:
cd Milo/Embeddings
dotnet runThis process will:
- Find document files.
- Read and chunk the content of each file.
- For each chunk, call the Ollama API (via
OLLAMA_ENDPOINT) to generate an embedding usingOLLAMA_EMBEDDING_MODEL. - Store the document source, chunk content, and the generated embedding vector in the PostgreSQL database.
Run the QueryOllama project to ask questions using the RAG approach:
cd Milo/QueryOllama
dotnet run "Your question here?"
# Example: dotnet run "What is the process for requesting time off?"You can provide your question as a command-line argument. The system performs the RAG steps: embedding the query, searching for similar chunks in the DB, creating a context-enhanced prompt, and querying the OLLAMA_MODEL.
To interact with the RAG system via Slack:
cd Milo/SlackIntegration
dotnet runThis starts the Slack bot. Ensure your bot is configured in Slack (App Manifest, OAuth scopes, Socket Mode enabled, Event Subscriptions for app_mention) and invited to relevant channels. Mention the bot (e.g., @milo How do I reset my password?) to get a response.
The system leverages PostgreSQL with the pgvector extension:
- Storage: Stores
VECTORdata type directly in theembeddingstable. The dimension (e.g., 1024) is set during table creation, matching the output of the chosen embedding model. - Indexing: Uses an
IVFFlatindex for Approximate Nearest Neighbor (ANN) search, significantly speeding up similarity queries compared to exact K-Nearest Neighbor (KNN) search on large datasets. - Similarity Search: Employs the cosine distance operator (
<=>) provided bypgvectorto find embeddings (document chunks) most similar to the query embedding. The query typically looks for1 - (embedding <=> queryEmbedding)to get similarity scores (where 1 is most similar).
Database schema (simplified):
-- Ensure pgvector is enabled
CREATE EXTENSION IF NOT EXISTS vector;
-- Table for original document source and content
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
source TEXT -- e.g., filename or URL
);
-- Table for embeddings linked to document chunks
CREATE TABLE embeddings (
id SERIAL PRIMARY KEY,
document_id INT REFERENCES documents(id) ON DELETE CASCADE,
embedding VECTOR(1024) -- Dimension matches the embedding model
);
-- Index for fast similarity search
CREATE INDEX idx_embeddings ON embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Adjust 'lists' based on dataset sizeThe Embeddings project uses an LLM-based approach for chunking:
- Reading: Reads text content from files in the specified directory (
HandbookDocuments). - LLM-based Splitting: Instead of a simple rule-based splitter, it sends the entire document content to the configured Ollama generation model (e.g.,
gemma3) via theEmbeddings/Chunking/LLMChunking.csclass. - Prompting for Chunks: A specific system prompt instructs the LLM to act as an assistant that divides documents into meaningful segments suitable for embedding. The prompt encourages grouping related content, favoring larger chunks over smaller ones, and maintaining the original language. The LLM is expected to return the chunks in a structured JSON format (defined by
Utils/Models/ChunkedData.cs). - Metadata: Each resulting chunk is associated with its original source document (
sourcecolumn indocumentstable) when stored. - Goal: Leverage the LLM's understanding to create semantically coherent chunks, potentially leading to better context retrieval compared to simple fixed-size or sentence splitting.
Vector embeddings are generated via the Ollama API:
- Model: Uses the model specified by
OLLAMA_EMBEDDING_MODEL(e.g.,jeffh/intfloat-multilingual-e5-large-instruct:f16). - Client: Sses
Microsoft.Extensions.AI.Ollamato interact with the Ollama/api/embeddingsendpoint. - Process: Text chunks are sent to Ollama, which returns corresponding vector embeddings. These vectors are then stored in the
embeddingstable.
The QueryOllama project (and SlackIntegration) performs semantic search:
- Query Embedding: The user's question is first converted into a vector embedding using the same
OLLAMA_EMBEDDING_MODEL. - Database Query: A SQL query is executed against the PostgreSQL database using
Npgsql. It uses the<=>operator to find theembeddingswith the smallest cosine distance to the query embedding. - Retrieval: The
contentandsourcefrom the correspondingdocumentstable for the top-k most similar embeddings are retrieved.
Example Npgsql query structure:
using var conn = new NpgsqlConnection(_connectionString);
conn.Open();
using var command = new NpgsqlCommand();
command.Connection = conn;
command.CommandText =
"SELECT d.id, d.content, d.source, 1 - (e.embedding <=> @embedding::vector) as similarity " +
"FROM embeddings e " +
"JOIN documents d ON e.document_id = d.id " +
"ORDER BY similarity DESC " +
"LIMIT @limit";
command.Parameters.AddWithValue("@embedding", embeddings);
command.Parameters.AddWithValue("@limit", limit);
using var reader = command.ExecuteReader();
while (reader.Read())
{
results.Add(new DocumentSearchResult
{
Id = reader.GetInt32(0),
Content = reader.GetString(1),
Source = reader.GetString(2),
Similarity = reader.GetDouble(3)
});
}The retrieved document chunks provide context for the final query to the LLM:
- Prompt Construction: A prompt is dynamically created, typically including:
- System instructions (e.g., "Answer the user's question based only on the provided context. Cite sources if possible.")
- The retrieved context chunks (formatted clearly).
- The original user question.
- LLM Interaction: The combined prompt is sent to the Ollama generation model.
- Response Generation: The LLM generates an answer based on the user's question and the provided context.
The SlackIntegration project bridges the RAG pipeline with Slack:
- Framework: Uses
SlackNetfor handling Slack API interactions (Events API, Web API) via Socket Mode or HTTP. - Event Handling: Listens for
app mentionorslash commandevents. - Workflow:
- Receives event.
- Extracts the user's text query from the event payload.
- Invokes the RAG logic (embedding, search, context prompt, LLM query).
- Replies with the final LLM response back to the user.
- Configuration: Loads
SLACK_BOT_TOKENandSLACK_APP_TOKENfrom environment variables or other configuration sources.
- .NET 9: Core framework.
- PostgreSQL: Relational database.
- pgvector: PostgreSQL extension for vector similarity search.
- Ollama: Local inference server for running LLMs and embedding models.
- Npgsql: .NET data provider for PostgreSQL.
- SlackNet: .NET library for Slack API interaction.
Milo uses Ollama models for both embeddings and language generation. You can pull different models to experiment or use specialized ones.
Install additional models with:
ollama pull <model-name>Some recommended models:
gemma3(Default LLM model)jeffh/intfloat-multilingual-e5-large-instruct:f16(Default Embedding Model)llamamistral
Ensure the models you want to use are specified in your environment variables or the application configuration.
- DB Connection Issues: Verify
POSTGRES_CONNECTION_STRINGis correct and accessible from where the application is running. Check PostgreSQL logs. Ensure the user has permissions on the database and tables. - pgvector Not Found: Ensure the
pgvectorextension is installed in your PostgreSQL instance and enabled in the target database (CREATE EXTENSION IF NOT EXISTS vector;). - Ollama Connection Issues: Verify
OLLAMA_ENDPOINTis correct and Ollama is running and accessible. Check firewall rules. Test the endpoint directly (e.g.,curl http://localhost:11434/api/tags). Ensure the specifiedOLLAMA_MODELandOLLAMA_EMBEDDING_MODELare pulled (ollama list). - Slack Integration Errors: Double-check
SLACK_BOT_TOKENandSLACK_APP_TOKEN. Ensure the bot has the correct OAuth scopes (e.g.,app_mentions:read,chat:write). Verify Socket Mode is enabled in Slack app settings if used. Check Slack API dashboard for errors. - Embedding Dimension Mismatch: Ensure the
VECTOR(dimension)size in theembeddingstable schema matches the output dimension of theOLLAMA_EMBEDDING_MODEL. If you change the model, you may need to recreate the table and re-embed documents.
- Document Loaders: Modify
Embeddings/Program.csto support different file types (PDF, DOCX) by adding relevant parsing libraries (e.g.,PdfPig,DocumentFormat.OpenXml). - Chunking Strategies: Experiment with different text splitting methods (e.g., recursive character splitting, semantic chunking) in the
Embeddingsproject. - Vector Databases: Replace PostgreSQL/pgvector with another vector store (e.g., ChromaDB, Qdrant, Weaviate) by updating the data access logic in
Utilsand dependent projects. - Models: Change
OLLAMA_MODELorOLLAMA_EMBEDDING_MODELenvironment variables to use different Ollama models. Ensure compatibility (e.g., embedding dimensions). - LLM Providers: Adapt the code in
QueryOllamaandUtilsto call different LLM APIs (e.g., OpenAI, Anthropic) instead of Ollama.
Milo processes all queries and document embeddings locally using Ollama and your PostgreSQL database, ensuring your conversations and data stay private and secure within your infrastructure.
Issues and pull requests are welcome! Feel free to contribute to make Milo even better.