A tool for retrieving Discord messages and storing them in a vector database for semantic search and analysis.
This project provides a secure, privacy-respecting way to retrieve messages from Discord channels using the Discord MCP Server, process them for privacy concerns, and store them in a vector database (ChromaDB) for semantic search and analysis.
- Robust Message Retrieval: Fetch all messages from Discord channels with pagination support
- Privacy Protection: PII detection and redaction, opt-out registry
- Resilient Design: Checkpointing, error recovery, and circuit breaker patterns
- Vector Database Integration: Convert messages to embeddings for semantic search
- Ethical Considerations: Built-in privacy controls and data minimization
# Clone the repository
git clone https://github.com/yourusername/discord_vector_db.git
cd discord_vector_db
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtfrom discord_retriever.fetcher import DiscordMessageFetcher
# Initialize fetcher
fetcher = DiscordMessageFetcher(
channel_id="731607577481314359",
save_directory="messages"
)
# Retrieve all messages
fetcher.fetch_all()from discord_retriever.processor import process_for_vector_db
# Process messages and add to vector database
collection = process_for_vector_db(
messages_directory="messages",
collection_name="discord_messages"
)
# Search for semantically similar messages
results = collection.query(
query_texts=["tell me about security concerns"],
n_results=5
)# Fetch messages from a Discord channel
python -m discord_retriever.cli fetch --channel-id 731607577481314359 --save-dir messages
# Process messages for vector database
python -m discord_retriever.cli process --messages-dir messages --collection discord_messages
# Search for messages
python -m discord_retriever.cli search --collection discord_messages --query "security concerns"This tool is designed with privacy and ethics in mind:
- All personal identifiable information (PII) can be automatically redacted
- Users can opt-out of having their messages processed
- Data minimization principles are applied by default
- Secure storage options for sensitive data
MIT
Contributions are welcome! Please see CONTRIBUTING.md for details.