PrivacyChatBoX

IMPORTANT NOTICE: This project is still a work in progress. Some features may be incomplete or subject to change.

Overview

PrivacyChatBoX is a comprehensive Python-based AI privacy protection platform that provides an intuitive and engaging approach to safeguarding sensitive information across multiple document types and providers.

This application offers a privacy-focused environment for AI interactions with multiple model integrations while ensuring user data remains secure.

Key Features

Multi-Provider AI Integration: Seamlessly switch between OpenAI, Anthropic Claude, Google Gemini, and local LLM models
Privacy Scanning: Automatically scans text for sensitive information before sending to AI models
Document Anonymization: Detects and anonymizes sensitive information in documents
Privacy Alerts: Visual indicators (⚠️) for conversations with detected sensitive information
Role-Based Access Control: Admin and regular user roles with appropriate permissions
Privacy-Focused Administration: Admins can see metadata and privacy alerts but not conversation content
Microsoft DLP Integration: Blocks sensitive files based on Microsoft Sensitivity labels
Conversation Management: Save, export, and manage conversation history
Azure AD Authentication: Enterprise-ready authentication with Microsoft identities
Admin Dashboard: User management, system metrics, and configuration
Analytics: Comprehensive privacy metrics and visualization
PDF Export: Export conversations to well-formatted PDF documents
Web Search: Integrated web search capabilities through SerpAPI
Docker Deployment: Containerized deployment for easy installation and scaling

Security Notice: In the current version, the database is not encrypted. While administrators cannot access conversation content through the user interface, the data is stored in plaintext in the database. Database encryption is planned for future versions to enhance security (see Future Development section).

Architecture

The application is built using the following technologies:

Frontend & Backend: Streamlit (Python web application framework)
Database: PostgreSQL
AI Providers: OpenAI API, Anthropic Claude API, Google Gemini API
Authentication: Local authentication with password hashing, Azure AD integration
Privacy Analysis: Custom regex patterns and Microsoft DLP integration

Folder Structure

PrivacyChatBoX/
├── app.py                  # Main application entry point
├── pages/                  # Streamlit pages
│   ├── admin.py            # Admin dashboard
│   ├── chat.py             # Main chat interface
│   ├── history.py          # Conversation history and analytics
│   ├── model_manager.py    # Local LLM model management (admin only)
│   └── settings.py         # User settings
├── models.py               # Database models
├── database.py             # Database connection utilities
├── database_check.py       # Database schema validation
├── ai_providers.py         # AI provider integration
├── privacy_scanner.py      # Privacy scanning functionality
├── ms_dlp.py               # Microsoft DLP integration
├── auth.py                 # Authentication utilities
├── azure_auth.py           # Azure AD authentication
├── utils.py                # General utilities
├── utils_auth.py           # Authentication utilities
├── pdf_export.py           # PDF export functionality
├── shared_sidebar.py       # Shared UI components (with role-based visibility)
├── style.py                # Custom CSS styling
├── assets/                 # Static assets
│   ├── logo.png            # Application logo
│   └── ...                 # Other assets
├── .env                    # Environment variables (not in repo)
├── .streamlit/             # Streamlit configuration
│   └── config.toml         # Streamlit configuration file
├── requirements.txt        # Python dependencies
├── pyproject.toml          # Project metadata
├── migration_add_dlp_columns.py      # Microsoft DLP integration migration
├── migration_add_local_llm_columns.py # Local LLM settings migration
├── migration_pattern_levels.py       # Privacy pattern levels migration
├── model_utils.py          # Local LLM model utilities
├── test_local_llm.py       # Testing script for local LLM integration
├── setup.sh                # Automated installation script
├── models/                 # Directory for local LLM models
└── docs/                   # Documentation
    ├── Modules.md          # Module documentation
    ├── Database.md         # Database documentation
    ├── Setup_Guide.md      # Detailed setup instructions
    ├── Troubleshooting.md  # Solutions for common issues
    ├── ConversationData.md # Conversation data formatting guide
    ├── LocalLLM.md         # Local LLM integration guide
    └── Optimization_Guide.md # Performance optimization techniques

Setup and Installation

Prerequisites

Python 3.11 or higher
PostgreSQL database (recommended)
API keys for desired AI providers (OpenAI, Claude, Gemini) - optional

Automated Setup (Recommended)

Clone the repository:

git clone https://github.com/yourusername/PrivacyChatBoX.git
cd PrivacyChatBoX

Run the setup script:
```
./setup.sh
```
This script will:
- Create a Python virtual environment
- Install required dependencies
- Set up the PostgreSQL database
- Create necessary directories
- Configure environment variables
- Run database migrations

Start the application:

source venv/bin/activate
streamlit run app.py

Access the application at http://localhost:5000

For detailed setup instructions, troubleshooting, and manual setup options, see Setup Guide.

Manual Installation

If you prefer a manual setup:

Clone the repository:

git clone https://github.com/yourusername/PrivacyChatBoX.git
cd PrivacyChatBoX

Install dependencies:
```
pip install .
```

Configure environment variables by creating a .env file:

# Database Configuration
DATABASE_URL=postgresql://username:password@localhost/privacychatbox

# OpenAI API (Optional)
OPENAI_API_KEY=your_openai_api_key

# Anthropic API (Optional)
ANTHROPIC_API_KEY=your_anthropic_api_key

# Google Gemini API (Optional)
GOOGLE_API_KEY=your_gemini_api_key

# SerpAPI for web search (Optional)
SERPAPI_KEY=your_serpapi_key

# Azure AD Authentication (Optional)
AZURE_CLIENT_ID=your_azure_client_id
AZURE_CLIENT_SECRET=your_azure_client_secret
AZURE_TENANT_ID=your_azure_tenant_id
AZURE_REDIRECT_URI=http://localhost:5000/

# Microsoft DLP Integration (Optional)
MS_CLIENT_ID=your_ms_client_id
MS_CLIENT_SECRET=your_ms_client_secret
MS_TENANT_ID=your_ms_tenant_id
MS_DLP_ENDPOINT_ID=your_ms_dlp_endpoint_id

Create Streamlit configuration:

mkdir -p .streamlit
echo "[server]" > .streamlit/config.toml
echo "headless = true" >> .streamlit/config.toml
echo "address = \"0.0.0.0\"" >> .streamlit/config.toml
echo "port = 5000" >> .streamlit/config.toml

Run database migrations:
```
python database_check.py
```
Start the application:
```
streamlit run app.py
```
Access the application at http://localhost:5000

Docker Deployment

The application can be easily deployed using Docker and Docker Compose:

Quick Docker Setup

Clone the repository:

git clone https://github.com/yourusername/PrivacyChatBoX.git
cd PrivacyChatBoX

Start the application:
```
docker-compose up -d
```
Access the application at http://localhost:5000

For a complete step-by-step guide on Docker deployment, configuration, and troubleshooting, see:

Docker Setup Guide - Comprehensive instructions for Docker deployment
Docker Guide - Detailed reference and advanced configurations

Initial Setup

Important: On first run, the application automatically creates an admin user:

Username: admin
Password: admin

This default admin account is created by the init_auth() function in auth.py, which runs during application startup. The admin user has full access to all features, including user management.

⚠️ Security Warning: It's strongly recommended to change this password immediately after your first login by going to the Admin panel > User Management.

User Roles and Permissions

The application supports two types of user roles with different permissions:

Regular Users

Can access Chat, History, and Settings pages
Can only see and manage their own conversations
Can configure their own AI provider settings and privacy preferences
Cannot access Admin Panel or Model Manager pages

Administrators

Have full access to all pages including Admin Panel and Model Manager
Can manage users (create, delete, change roles and passwords)
Can view metadata about all users' conversations (titles, timestamps, file attachments, privacy alerts)
Can see which conversations contain privacy alerts (⚠️ indicator)
Cannot view the actual content of other users' conversations, only metadata
Can manage system-wide settings and local LLM models

Security Note: At the current stage, the database is not encrypted. While administrators cannot access conversation content through the user interface, the data is stored in plaintext in the database. Future versions will implement database encryption for enhanced security.

Usage

Login: Use the login form or Azure AD login if configured
Chat: Navigate to the chat page to start conversations with AI
Settings: Configure your AI providers, privacy settings, and more
History: View your conversation history and analytics
Admin: Manage users and view system metrics (admin only)
Model Manager: Download and configure local LLM models (admin only)

Environment Variables and Security

This application uses environment variables to store sensitive configuration like API keys and database credentials. This approach enhances security by keeping secrets out of your code repository.

Environment Setup

Create your .env file:
- Copy the provided .env.example file:
```
cp .env.example .env
```
- Edit the .env file with your actual credentials
Security measures in place:
- The .gitignore file is configured to exclude the .env file from git
- Environment variables are loaded at runtime and never stored in the database
- API keys are accessed only when needed for specific operations
Available variables:
- Database connection: DATABASE_URL
- API keys: OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, SERPAPI_KEY
- Azure authentication: AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, etc.
- Microsoft DLP: MS_CLIENT_ID, MS_CLIENT_SECRET, etc.

You can also refer to the Settings page > Environment Config tab for a complete list of available environment variables and additional descriptions.

⚠️ Important Security Notes

NEVER commit your .env file to version control
Double-check your .gitignore file includes .env entries
Regularly rotate your API keys for production deployments
Use separate API keys for development and production environments

API Keys

This application requires various API keys for full functionality:

OpenAI API Key: Get from OpenAI Platform
Claude API Key: Get from Anthropic Console
Gemini API Key: Get from Google AI Studio
SerpAPI Key: Get from SerpAPI

Performance Optimizations

PrivacyChatBoX includes several performance optimizations to improve scanning speed and memory efficiency:

Pre-compiled Regex Patterns: All privacy patterns are pre-compiled at startup for faster matching
Confidence Scoring: Each pattern has a confidence score to reduce false positives
Chunked File Processing: Large files are processed in manageable chunks to prevent memory issues
Parallel Processing: Multi-threading support for faster document scanning
Adaptive Loading: Document processing libraries are loaded conditionally as needed

For detailed information on performance optimizations, see Optimization Guide.

Local LLM Integration

PrivacyChatBoX supports running local language models without requiring an internet connection or API keys, which enhances privacy and reduces operational costs.

Local Model Support

The application supports GGUF format models through the llama-cpp-python library. Popular models include:

Llama 2: Meta's Llama 2 models in various sizes
Mistral: Mistral AI's efficient models
Phi-2: Microsoft's compact but capable models
Any GGUF format model: Compatible with the llama-cpp-python library

Model Manager

The Model Manager page provides a user-friendly interface for:

Downloading pre-configured models directly within the application
Customizing model parameters like context length and temperature
Testing models before deploying them in the chat interface

Privacy Features with Local Models

Offline Operation: Process all requests entirely on your own hardware
Bypass Privacy Scanning: Option to disable privacy scanning for local models (since data never leaves your system)
Hardware Acceleration: GPU acceleration support for faster inference

Troubleshooting

Database Migrations

This application uses several database migration scripts to handle schema updates:

migration_add_dlp_columns.py: Adds Microsoft DLP integration columns to the Settings table
migration_add_local_llm_columns.py: Adds local LLM configuration columns to the Settings table
migration_pattern_levels.py: Adds 'level' attribute to custom patterns in Settings table, enabling categorization of patterns into standard and strict modes

If you encounter database-related errors, especially with missing columns, make sure to run these migration scripts:

python migration_add_dlp_columns.py
python migration_add_local_llm_columns.py
python migration_pattern_levels.py

The application includes auto-migration checks that will attempt to detect and apply necessary migrations when features are accessed.

Common Issues and Solutions

Missing Database Columns Error:
- Error: column settings.enable_ms_dlp does not exist or similar
- Solution: Run the appropriate migration script as mentioned above
Detached Instance Errors:
- Error: Instance <User at 0x...> is not bound to a Session
- Solution: The application uses the session_scope context manager to properly handle database sessions. Check that all database operations are performed within a session_scope block.
Conversation Display Issues:
- Problem: Conversations not displaying correctly or errors when accessing message properties
- Solution: The application uses a robust message formatting function that handles both model objects and dictionaries. Make sure to use the format_conversation_messages function when working with conversation data.

Future Development

The following features and improvements are planned for future versions of PrivacyChatBoX:

Database Encryption: Currently, the database stores data in plaintext. Future versions will implement end-to-end encryption for sensitive data in the database to enhance security.
Chat Interface Optimization: Improve the chat interface performance and responsiveness, including faster message rendering and reduced latency for long conversations.
Multi-modal AI Support: Expand capabilities to handle image, audio, and video inputs/outputs with privacy-preserving processing for all media types.
Advanced RAG Implementation: Add Retrieval-Augmented Generation capabilities with local vector databases for organizational knowledge bases, with privacy-aware embedding generation.
Federated Learning Integration: Implement privacy-preserving model fine-tuning using federated learning techniques to improve AI responses without compromising user data.
Enterprise SSO Integration: Expand authentication options to support additional enterprise SSO providers beyond Azure AD, including Okta, Auth0, and Google Workspace.
Compliance Reporting: Implement automated compliance reporting for privacy regulations like GDPR, HIPAA, and CCPA with audit trails for all AI interactions.

Application Screenshots

Here are some screenshots showing the various interfaces of the PrivacyChatBoX application:

Login and Main Screen

The welcome screen with login form and key features overview

Chat Interface with Anonymization

The chat interface showing privacy protection in action with anonymized phone number

Settings Panel

AI model configuration settings with API key management

Custom Privacy Patterns

Custom regex pattern configuration for privacy scanning

Conversation History with Privacy Alerts

Conversation history with privacy alert indicators

License

This project is licensed under a dual-license model:

Non-Commercial Use

Free for personal, educational, research, and non-profit use
Permission to use, modify, and redistribute for non-commercial purposes
Must include original copyright notice and license terms
Contributions back to the project welcome under same license terms

Commercial Use

Requires a paid commercial license for any business or revenue-generating applications
Contact the copyright holders for commercial licensing options and pricing
Enterprise support and customization available under commercial agreements

The full license terms are available in the LICENSE.md file, which includes all conditions, disclaimers, and limitations. This dual-licensing approach helps maintain the project's sustainable development while providing free access for non-commercial users.

DISCLAIMER: This software is provided "AS IS" without warranty of any kind. Use at your own risk. The developers assume no liability for any damages or losses resulting from the use of this software.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.streamlit		.streamlit
assets		assets
attached_assets		attached_assets
docs		docs
models		models
pages		pages
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.replit		.replit
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
ai_providers.py		ai_providers.py
app.py		app.py
auth.py		auth.py
azure_auth.py		azure_auth.py
database.py		database.py
database_check.py		database_check.py
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
file_processor.py		file_processor.py
generated-icon.png		generated-icon.png
migration_add_dlp_columns.py		migration_add_dlp_columns.py
migration_add_local_llm_columns.py		migration_add_local_llm_columns.py
migration_pattern_levels.py		migration_pattern_levels.py
model_utils.py		model_utils.py
models.py		models.py
ms_dlp.py		ms_dlp.py
pdf_export.py		pdf_export.py
privacy_scanner.py		privacy_scanner.py
pyproject.toml		pyproject.toml
pyproject.toml.bak		pyproject.toml.bak
replit.nix		replit.nix
setup.sh		setup.sh
shared_sidebar.py		shared_sidebar.py
style.py		style.py
test_local_llm.py		test_local_llm.py
test_optimizations.py		test_optimizations.py
utils.py		utils.py
utils_auth.py		utils_auth.py
uv.lock		uv.lock

License

mbelsis/PrivacyChatBox

Folders and files

Latest commit

History

Repository files navigation

PrivacyChatBoX

Overview

Key Features

Architecture

Folder Structure

Setup and Installation

Prerequisites

Automated Setup (Recommended)

Manual Installation

Docker Deployment

Quick Docker Setup

Initial Setup

User Roles and Permissions

Regular Users

Administrators

Usage

Environment Variables and Security

Environment Setup

⚠️ Important Security Notes

API Keys

Performance Optimizations

Local LLM Integration

Local Model Support

Model Manager

Privacy Features with Local Models

Troubleshooting

Database Migrations

Common Issues and Solutions

Future Development

Application Screenshots

Login and Main Screen

Chat Interface with Anonymization

Settings Panel

Custom Privacy Patterns

Conversation History with Privacy Alerts

License

Non-Commercial Use

Commercial Use

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages