A Retrieval-Augmented Generation (RAG) system for documents, texts, and articles using a local Large Language Model (LLM).
This project implements a RAG system that stores documents on a Raspberry Pi (PC A, meant to be low performance but also capable of storing all the documents) and performs computations on a high-performance PC (PC B) equipped with a GPU. The system allows for semantic search and retrieval of documents based on user queries, utilizing the hkunlp/instructor-large model for generating embeddings.
- Document Storage: Store PDFs and text files on a Raspberry Pi (PC A).
- Semantic Search: Perform similarity searches using embeddings.
- Document Retrieval: Retrieve and download relevant documents (PDF or text) based on queries.
- Local Processing: All computations are performed locally without relying on external services.
Dual RAG Paper System/
├── server/
│ ├── app.py
│ └── transform_pdf_text.sh
├── processing/
│ ├── embedding_generation.py
│ ├── vector_search.py
│ └── call_server.py
└── README.md
- Python 3
- Flask (
pip install flask) - NumPy (
pip install numpy) - Annoy (
pip install annoy) - Poppler Utils (
sudo apt-get install poppler-utils)
- Python 3
- PyTorch (
pip install torch) - Sentence Transformers (
pip install -U sentence-transformers) - NumPy (
pip install numpy) - Annoy (
pip install annoy) - Requests (
pip install requests)
git clone https://github.com/frrobledo/RAG_paper_search.gitcd RAG_paper_search/server
pip install flask numpy annoy
sudo apt-get install poppler-utils-
Place your PDF files in
~/documents/PDF/. -
Run the script to convert PDFs to text:
./transform_pdf_text.sh
python app.pygit clone https://github.com/frrobledo/RAG_paper_search.gitcd RAG_paper_search/processing
pip install -U sentence-transformers numpy annoy requests-
Generate Embeddings:
python embedding_generation.py
-
Build Annoy Index:
python vector_search.py
-
Transfer Files to PC A:
Copy
embeddings.npy,doc_ids.npy, andannoy_index.annto the Raspberry Pi.scp embeddings.npy doc_ids.npy annoy_index.ann pi@<raspberry_pi_ip>:/home/pi/
python call_server.py-
Start the Flask Server on PC A:
cd server python app.py -
Run the Client on PC B:
cd processing python call_server.py -
Enter Your Query:
When prompted, input your search query. The script will retrieve and save the most relevant documents as well as show the cosine similarity with the query.
-
Adjusting the Number of Results:
In
call_server.py, you can change'num_results': 5to the desired number of documents to retrieve. -
Changing the Instruction for Embeddings:
In
embedding_generation.pyandcall_server.py, you can modify the instruction given to the model to better suit your documents.
-
Out of Memory Errors:
If you encounter memory errors on PC B, reduce the
batch_sizeinembedding_generation.py. -
Connection Issues:
Ensure that both PCs are on the same network and that the IP addresses are correctly specified.
-
File Not Found Errors:
Verify that the documents exist in the specified directories on the Raspberry Pi.
Contributions are welcome! Please open an issue or submit a pull request.