ROS 2 wrapper for Retrieval-Augmented Generation (RAG) systems, providing integration with LangChain and LangGraph for intelligent question-answering and document retrieval capabilities.
This package provides a ROS 2 service node for RAG (Retrieval-Augmented Generation) operations. It uses a Chroma vector store with HuggingFace embeddings for semantic search and document retrieval. The node exposes services for storing and retrieving documents, and automatically captures ROS 2 log messages from the /rosout topic for storage in the database.
Features:
- Hybrid Search: Combines semantic search (vector similarity) with BM25 keyword search via EnsembleRetriever
- Semantic search using Chroma vector store and HuggingFace embeddings
- ROS 2 service interface for document retrieval and storage
- Log message storage from /rosout topic with automatic metadata extraction
- Flexible configuration via ROS 2 parameters
- Support for metadata-rich document storage and filtering
- Configurable embedding models and search strategies
Keywords: ROS2, RAG, LangChain, Vector Store, Semantic Search
Author: Alberto Tudela
The rag_ros package has been tested under ROS2 Jazzy on Ubuntu 24.04. This is research code, expect that it changes often and any fitness for a particular purpose is disclaimed.
- Robot Operating System (ROS) 2 (middleware for robotics)
- llm_interactions_msgs (Custom ROS 2 messages for LLM interactions)
- LangChain (Framework for LLM applications)
- Chroma (Vector store for embeddings)
- HuggingFace Transformers (Pre-trained embeddings)
To build from source, clone the latest version from the repository into your colcon workspace and compile the package using:
cd colcon_workspace/src
git clone https://github.com/grupo-avispa/rag_ros.git
cd ../
rosdep install -i --from-path src --rosdistro jazzy -y
colcon build --symlink-installRun the RAG service node with:
ros2 launch rag_ros default.launch.pyROS 2 service node for RAG operations.
-
retrieve_documents(llm_interactions_msgs/srv/RetrieveDocuments)Retrieve relevant documents from the vector database based on a query with optional filtering.
Request:
query(string): The input query to retrieve relevant documentsk(int32): Number of documents to retrieve (default: 8)filters(string): Optional metadata filters as JSON string. Supported filter keys:source,node_name,node_function,log_level
Response:
status(string): Response statustotal_results(int32): Total number of documents retrievedresults(Document[]): Array of retrieved documents with the following structure:id(int32): Unique identifier for the documentcontent(string): Text content of the documentmetadata(Metadata): Metadata associated with the documentsource(string): Source or origin of the documentnode_name(string): Name of the node that processed the documentnode_function(string): Function of the node that processed the documentlog_level(string): Log level of the message (DEBUG, INFO, WARN, ERROR, FATAL)
-
store_document(llm_interactions_msgs/srv/StoreDocument)Store a new document in the vector database.
Request:
document(Document): Document to store with the following structure:id(int32): Unique identifier for the documentcontent(string): Text content to storemetadata(Metadata): Metadata associated with the documentsource(string): Source or origin of the documentnode_name(string): Name of the node processing the documentnode_function(string): Function of the node processing the documentlog_level(string): Log level of the message (DEBUG, INFO, WARN, ERROR, FATAL)
Response:
success(bool): Operation success statusmessage(string): Status message
-
chroma_directory(string, default: "./chroma_db")Directory where Chroma vector database persistence data will be stored.
-
embedding_model(string, default: "sentence-transformers/all-MiniLM-L6-v2")HuggingFace embedding model to use for semantic search.
-
top_k(int, default: 8)Default number of documents to retrieve per query.
-
use_hybrid_search(bool, default: true)Enable hybrid search combining semantic search (vector similarity) with BM25 keyword-based search. When enabled, uses EnsembleRetriever with equal weights (50% semantic + 50% BM25) for more comprehensive document retrieval.
# Basic retrieval
ros2 service call /retrieve_documents llm_interactions_msgs/srv/RetrieveDocuments "{query: 'machine learning', k: 5}"
# Retrieval with log level filter
ros2 service call /retrieve_documents llm_interactions_msgs/srv/RetrieveDocuments "{query: 'error', k: 5, filters: '{\"log_level\": \"ERROR\"}'}"
# Retrieval with multiple filters
ros2 service call /retrieve_documents llm_interactions_msgs/srv/RetrieveDocuments "{query: 'database', k: 5, filters: '{\"log_level\": \"ERROR\", \"node_name\": \"my_node\"}'}"ros2 service call /store_document llm_interactions_msgs/srv/StoreDocument "{document: {id: 1, content: 'Machine learning is a subset of artificial intelligence', metadata: {source: 'example.txt', node_name: 'example_node', node_function: 'process', log_level: 'INFO'}}}"You can customize the RAG service behavior by passing parameters to the launch file:
# Basic configuration with custom k and directory
ros2 launch rag_ros default.launch.py chroma_directory:=/path/to/chroma top_k:=10
# With custom embedding model
ros2 launch rag_ros default.launch.py embedding_model:='sentence-transformers/all-mpnet-base-v2'
# Enable/disable hybrid search
ros2 launch rag_ros default.launch.py use_hybrid_search:=true
# Full configuration example
ros2 launch rag_ros default.launch.py \
chroma_directory:=/path/to/chroma \
embedding_model:='sentence-transformers/all-MiniLM-L6-v2' \
top_k:=5 \
use_hybrid_search:=true