IMPORTANT NOTICE: This project is still a work in progress. Some features may be incomplete or subject to change.
PrivacyChatBoX is a comprehensive Python-based AI privacy protection platform that provides an intuitive and engaging approach to safeguarding sensitive information across multiple document types and providers.
This application offers a privacy-focused environment for AI interactions with multiple model integrations while ensuring user data remains secure.
- Multi-Provider AI Integration: Seamlessly switch between OpenAI, Anthropic Claude, Google Gemini, and local LLM models
- Privacy Scanning: Automatically scans text for sensitive information before sending to AI models
- Document Anonymization: Detects and anonymizes sensitive information in documents
- Privacy Alerts: Visual indicators (
⚠️ ) for conversations with detected sensitive information - Role-Based Access Control: Admin and regular user roles with appropriate permissions
- Privacy-Focused Administration: Admins can see metadata and privacy alerts but not conversation content
- Microsoft DLP Integration: Blocks sensitive files based on Microsoft Sensitivity labels
- Conversation Management: Save, export, and manage conversation history
- Azure AD Authentication: Enterprise-ready authentication with Microsoft identities
- Admin Dashboard: User management, system metrics, and configuration
- Analytics: Comprehensive privacy metrics and visualization
- PDF Export: Export conversations to well-formatted PDF documents
- Web Search: Integrated web search capabilities through SerpAPI
- Docker Deployment: Containerized deployment for easy installation and scaling
Security Notice: In the current version, the database is not encrypted. While administrators cannot access conversation content through the user interface, the data is stored in plaintext in the database. Database encryption is planned for future versions to enhance security (see Future Development section).
The application is built using the following technologies:
- Frontend & Backend: Streamlit (Python web application framework)
- Database: PostgreSQL
- AI Providers: OpenAI API, Anthropic Claude API, Google Gemini API
- Authentication: Local authentication with password hashing, Azure AD integration
- Privacy Analysis: Custom regex patterns and Microsoft DLP integration
PrivacyChatBoX/
├── app.py # Main application entry point
├── pages/ # Streamlit pages
│ ├── admin.py # Admin dashboard
│ ├── chat.py # Main chat interface
│ ├── history.py # Conversation history and analytics
│ ├── model_manager.py # Local LLM model management (admin only)
│ └── settings.py # User settings
├── models.py # Database models
├── database.py # Database connection utilities
├── database_check.py # Database schema validation
├── ai_providers.py # AI provider integration
├── privacy_scanner.py # Privacy scanning functionality
├── ms_dlp.py # Microsoft DLP integration
├── auth.py # Authentication utilities
├── azure_auth.py # Azure AD authentication
├── utils.py # General utilities
├── utils_auth.py # Authentication utilities
├── pdf_export.py # PDF export functionality
├── shared_sidebar.py # Shared UI components (with role-based visibility)
├── style.py # Custom CSS styling
├── assets/ # Static assets
│ ├── logo.png # Application logo
│ └── ... # Other assets
├── .env # Environment variables (not in repo)
├── .streamlit/ # Streamlit configuration
│ └── config.toml # Streamlit configuration file
├── requirements.txt # Python dependencies
├── pyproject.toml # Project metadata
├── migration_add_dlp_columns.py # Microsoft DLP integration migration
├── migration_add_local_llm_columns.py # Local LLM settings migration
├── migration_pattern_levels.py # Privacy pattern levels migration
├── model_utils.py # Local LLM model utilities
├── test_local_llm.py # Testing script for local LLM integration
├── setup.sh # Automated installation script
├── models/ # Directory for local LLM models
└── docs/ # Documentation
├── Modules.md # Module documentation
├── Database.md # Database documentation
├── Setup_Guide.md # Detailed setup instructions
├── Troubleshooting.md # Solutions for common issues
├── ConversationData.md # Conversation data formatting guide
├── LocalLLM.md # Local LLM integration guide
└── Optimization_Guide.md # Performance optimization techniques
- Python 3.11 or higher
- PostgreSQL database (recommended)
- API keys for desired AI providers (OpenAI, Claude, Gemini) - optional
-
Clone the repository:
git clone https://github.com/yourusername/PrivacyChatBoX.git cd PrivacyChatBoX -
Run the setup script:
./setup.sh
This script will:
- Create a Python virtual environment
- Install required dependencies
- Set up the PostgreSQL database
- Create necessary directories
- Configure environment variables
- Run database migrations
-
Start the application:
source venv/bin/activate streamlit run app.py -
Access the application at
http://localhost:5000
For detailed setup instructions, troubleshooting, and manual setup options, see Setup Guide.
If you prefer a manual setup:
-
Clone the repository:
git clone https://github.com/yourusername/PrivacyChatBoX.git cd PrivacyChatBoX -
Install dependencies:
pip install . -
Configure environment variables by creating a
.envfile:# Database Configuration DATABASE_URL=postgresql://username:password@localhost/privacychatbox # OpenAI API (Optional) OPENAI_API_KEY=your_openai_api_key # Anthropic API (Optional) ANTHROPIC_API_KEY=your_anthropic_api_key # Google Gemini API (Optional) GOOGLE_API_KEY=your_gemini_api_key # SerpAPI for web search (Optional) SERPAPI_KEY=your_serpapi_key # Azure AD Authentication (Optional) AZURE_CLIENT_ID=your_azure_client_id AZURE_CLIENT_SECRET=your_azure_client_secret AZURE_TENANT_ID=your_azure_tenant_id AZURE_REDIRECT_URI=http://localhost:5000/ # Microsoft DLP Integration (Optional) MS_CLIENT_ID=your_ms_client_id MS_CLIENT_SECRET=your_ms_client_secret MS_TENANT_ID=your_ms_tenant_id MS_DLP_ENDPOINT_ID=your_ms_dlp_endpoint_id
-
Create Streamlit configuration:
mkdir -p .streamlit echo "[server]" > .streamlit/config.toml echo "headless = true" >> .streamlit/config.toml echo "address = \"0.0.0.0\"" >> .streamlit/config.toml echo "port = 5000" >> .streamlit/config.toml
-
Run database migrations:
python database_check.py
-
Start the application:
streamlit run app.py
-
Access the application at
http://localhost:5000
The application can be easily deployed using Docker and Docker Compose:
-
Clone the repository:
git clone https://github.com/yourusername/PrivacyChatBoX.git cd PrivacyChatBoX -
Start the application:
docker-compose up -d
-
Access the application at
http://localhost:5000
For a complete step-by-step guide on Docker deployment, configuration, and troubleshooting, see:
- Docker Setup Guide - Comprehensive instructions for Docker deployment
- Docker Guide - Detailed reference and advanced configurations
Important: On first run, the application automatically creates an admin user:
- Username:
admin - Password:
admin
This default admin account is created by the init_auth() function in auth.py, which runs during application startup. The admin user has full access to all features, including user management.
The application supports two types of user roles with different permissions:
- Can access Chat, History, and Settings pages
- Can only see and manage their own conversations
- Can configure their own AI provider settings and privacy preferences
- Cannot access Admin Panel or Model Manager pages
- Have full access to all pages including Admin Panel and Model Manager
- Can manage users (create, delete, change roles and passwords)
- Can view metadata about all users' conversations (titles, timestamps, file attachments, privacy alerts)
- Can see which conversations contain privacy alerts (
⚠️ indicator) - Cannot view the actual content of other users' conversations, only metadata
- Can manage system-wide settings and local LLM models
Security Note: At the current stage, the database is not encrypted. While administrators cannot access conversation content through the user interface, the data is stored in plaintext in the database. Future versions will implement database encryption for enhanced security.
- Login: Use the login form or Azure AD login if configured
- Chat: Navigate to the chat page to start conversations with AI
- Settings: Configure your AI providers, privacy settings, and more
- History: View your conversation history and analytics
- Admin: Manage users and view system metrics (admin only)
- Model Manager: Download and configure local LLM models (admin only)
This application uses environment variables to store sensitive configuration like API keys and database credentials. This approach enhances security by keeping secrets out of your code repository.
-
Create your .env file:
- Copy the provided
.env.examplefile:cp .env.example .env
- Edit the
.envfile with your actual credentials
- Copy the provided
-
Security measures in place:
- The
.gitignorefile is configured to exclude the.envfile from git - Environment variables are loaded at runtime and never stored in the database
- API keys are accessed only when needed for specific operations
- The
-
Available variables:
- Database connection:
DATABASE_URL - API keys:
OPENAI_API_KEY,ANTHROPIC_API_KEY,GOOGLE_API_KEY,SERPAPI_KEY - Azure authentication:
AZURE_CLIENT_ID,AZURE_CLIENT_SECRET, etc. - Microsoft DLP:
MS_CLIENT_ID,MS_CLIENT_SECRET, etc.
- Database connection:
You can also refer to the Settings page > Environment Config tab for a complete list of available environment variables and additional descriptions.
- NEVER commit your
.envfile to version control - Double-check your
.gitignorefile includes.enventries - Regularly rotate your API keys for production deployments
- Use separate API keys for development and production environments
This application requires various API keys for full functionality:
- OpenAI API Key: Get from OpenAI Platform
- Claude API Key: Get from Anthropic Console
- Gemini API Key: Get from Google AI Studio
- SerpAPI Key: Get from SerpAPI
PrivacyChatBoX includes several performance optimizations to improve scanning speed and memory efficiency:
- Pre-compiled Regex Patterns: All privacy patterns are pre-compiled at startup for faster matching
- Confidence Scoring: Each pattern has a confidence score to reduce false positives
- Chunked File Processing: Large files are processed in manageable chunks to prevent memory issues
- Parallel Processing: Multi-threading support for faster document scanning
- Adaptive Loading: Document processing libraries are loaded conditionally as needed
For detailed information on performance optimizations, see Optimization Guide.
PrivacyChatBoX supports running local language models without requiring an internet connection or API keys, which enhances privacy and reduces operational costs.
The application supports GGUF format models through the llama-cpp-python library. Popular models include:
- Llama 2: Meta's Llama 2 models in various sizes
- Mistral: Mistral AI's efficient models
- Phi-2: Microsoft's compact but capable models
- Any GGUF format model: Compatible with the llama-cpp-python library
The Model Manager page provides a user-friendly interface for:
- Downloading pre-configured models directly within the application
- Customizing model parameters like context length and temperature
- Testing models before deploying them in the chat interface
- Offline Operation: Process all requests entirely on your own hardware
- Bypass Privacy Scanning: Option to disable privacy scanning for local models (since data never leaves your system)
- Hardware Acceleration: GPU acceleration support for faster inference
This application uses several database migration scripts to handle schema updates:
- migration_add_dlp_columns.py: Adds Microsoft DLP integration columns to the Settings table
- migration_add_local_llm_columns.py: Adds local LLM configuration columns to the Settings table
- migration_pattern_levels.py: Adds 'level' attribute to custom patterns in Settings table, enabling categorization of patterns into standard and strict modes
If you encounter database-related errors, especially with missing columns, make sure to run these migration scripts:
python migration_add_dlp_columns.py
python migration_add_local_llm_columns.py
python migration_pattern_levels.pyThe application includes auto-migration checks that will attempt to detect and apply necessary migrations when features are accessed.
-
Missing Database Columns Error:
- Error:
column settings.enable_ms_dlp does not existor similar - Solution: Run the appropriate migration script as mentioned above
- Error:
-
Detached Instance Errors:
- Error:
Instance <User at 0x...> is not bound to a Session - Solution: The application uses the session_scope context manager to properly handle database sessions. Check that all database operations are performed within a session_scope block.
- Error:
-
Conversation Display Issues:
- Problem: Conversations not displaying correctly or errors when accessing message properties
- Solution: The application uses a robust message formatting function that handles both model objects and dictionaries. Make sure to use the format_conversation_messages function when working with conversation data.
The following features and improvements are planned for future versions of PrivacyChatBoX:
-
Database Encryption: Currently, the database stores data in plaintext. Future versions will implement end-to-end encryption for sensitive data in the database to enhance security.
-
Chat Interface Optimization: Improve the chat interface performance and responsiveness, including faster message rendering and reduced latency for long conversations.
-
Multi-modal AI Support: Expand capabilities to handle image, audio, and video inputs/outputs with privacy-preserving processing for all media types.
-
Advanced RAG Implementation: Add Retrieval-Augmented Generation capabilities with local vector databases for organizational knowledge bases, with privacy-aware embedding generation.
-
Federated Learning Integration: Implement privacy-preserving model fine-tuning using federated learning techniques to improve AI responses without compromising user data.
-
Enterprise SSO Integration: Expand authentication options to support additional enterprise SSO providers beyond Azure AD, including Okta, Auth0, and Google Workspace.
-
Compliance Reporting: Implement automated compliance reporting for privacy regulations like GDPR, HIPAA, and CCPA with audit trails for all AI interactions.
Here are some screenshots showing the various interfaces of the PrivacyChatBoX application:
The welcome screen with login form and key features overview
The chat interface showing privacy protection in action with anonymized phone number
AI model configuration settings with API key management
Custom regex pattern configuration for privacy scanning
Conversation history with privacy alert indicators
This project is licensed under a dual-license model:
- Free for personal, educational, research, and non-profit use
- Permission to use, modify, and redistribute for non-commercial purposes
- Must include original copyright notice and license terms
- Contributions back to the project welcome under same license terms
- Requires a paid commercial license for any business or revenue-generating applications
- Contact the copyright holders for commercial licensing options and pricing
- Enterprise support and customization available under commercial agreements
The full license terms are available in the LICENSE.md file, which includes all conditions, disclaimers, and limitations. This dual-licensing approach helps maintain the project's sustainable development while providing free access for non-commercial users.
DISCLAIMER: This software is provided "AS IS" without warranty of any kind. Use at your own risk. The developers assume no liability for any damages or losses resulting from the use of this software.
