Democratizing Access to Uncensored AI Language Models
A sophisticated command-line interface tool for downloading, managing, and interacting with powerful AI models completely offline.
- Overview
- Key Features
- Supported Models
- System Requirements
- Installation
- Usage
- Architecture
- Configuration
- API Reference
- Performance Optimization
- Troubleshooting
- Development
- Security & Privacy
- Contributing
- Changelog
- License
- Support
UnleashedLLM is a comprehensive Python application that provides seamless access to uncensored AI language models through an intuitive command-line interface. The tool enables users to download, manage, and interact with various AI models while ensuring complete privacy and offline functionality after initial setup.
To democratize access to powerful AI language models by providing a user-friendly, privacy-focused platform that operates entirely offline without compromising on functionality or performance.
- Advanced Model Management: Download, organize, and manage AI models from a curated registry
- Interactive Chat Interface: Engage in real-time conversations with context-aware AI models
- Single Inference Engine: Perform one-off text generation tasks with optimized parameters
- Complete Offline Operation: Full functionality without internet dependency post-installation
- Paginated Model Browser: Browse available models with detailed specifications and filtering
- Real-time Progress Tracking: Download progress with speed indicators and ETA calculations
- Comprehensive System Diagnostics: Automated system requirement validation and performance analysis
- Status Dashboard: Monitor storage usage, model availability, and system health
- Conversation History Management: Maintain context across extended chat sessions
- Thread-safe Operations: Concurrent model operations with robust error handling
- GGUF Format Support: Optimized for CPU inference with quantized models
- Dynamic Memory Management: Adaptive memory allocation based on model requirements
- Configurable Threading: Optimized CPU utilization with thread limiting
- Automatic Dependency Resolution: Seamless installation of required packages
| Category | Size Range | Memory Requirement | Performance Level |
|---|---|---|---|
| Lightweight | 1-3GB | 4GB RAM | Fast inference, basic capabilities |
| Medium | 4-8GB | 8GB RAM | Balanced performance and quality |
| Large | 7-15GB | 16GB RAM | High quality, advanced reasoning |
| Extra Large | 20GB+ | 32GB RAM | Maximum capability, research-grade |
| Model | Size | Context Length | Capabilities | Specialization |
|---|---|---|---|---|
| Phi-2 Uncensored | 1.6GB | 2,048 tokens | Text generation, Chat | Lightweight conversations |
| Llama2 7B Uncensored | 4.1GB | 4,096 tokens | Text generation, Chat, Coding | General purpose |
| OpenChat 7B | 4.2GB | 8,192 tokens | Text generation, Chat, Reasoning | Conversational AI |
| CodeLlama 13B | 7.3GB | 16,384 tokens | Coding, Text generation | Code generation |
| Mixtral 8x7B | 26.9GB | 32,768 tokens | Text generation, Chat, Reasoning | Mixture of experts |
| Llama2 70B Uncensored | 39.5GB | 4,096 tokens | Advanced reasoning, Complex tasks | Research applications |
- Operating System: Linux (Nix-based environment)
- Python Version: 3.11 or higher
- Memory: 8GB RAM
- Storage: 10GB available space (plus model storage)
- CPU: Multi-core processor (4+ cores recommended)
- Memory: 16GB+ RAM for optimal performance
- Storage: SSD with 100GB+ available space
- CPU: 8+ core processor with high clock speed
- Network: Stable internet connection for initial model downloads
- Lightweight Models: 2-5 tokens/second on 4-core CPU
- Medium Models: 1-3 tokens/second on 8-core CPU
- Large Models: 0.5-1.5 tokens/second on high-end CPU
# Clone the repository
git clone <repository-url>
cd UnleashedLLM
# Run the application (dependencies auto-install)
python3 main.py# Install core dependencies
pip install llama-cpp-python
# Verify installation
python3 -c "import llama_cpp; print('Installation successful')"# Set environment variables (optional)
export LLAMA_CPP_LOG_LEVEL=ERROR
export OMP_NUM_THREADS=8Launch the main application:
python3 main.pyThe application provides an intuitive menu system:
- Browse Model Library - Explore available models with detailed specifications
- Download Model - Select and download models with progress tracking
- Interactive Chat - Start conversational sessions with downloaded models
- Single Inference - Perform individual text generation tasks
- Manage Models - View, organize, and delete downloaded models
- System Diagnostics - Check system compatibility and performance metrics
- Status Dashboard - Monitor application status and resource usage
- Exit - Safely terminate the application
# Download multiple models in sequence
python3 main.py --batch-download llama2-7b-uncensored,openchat-7b# Override default parameters
python3 main.py --max-threads 4 --context-length 2048# Start chat with specific model
python3 main.py --chat --model llama2-7b-uncensored- Centralized model metadata management
- Version control and compatibility tracking
- Dynamic model discovery and registration
- File system operations and storage management
- Download resumption and integrity verification
- Model lifecycle management
- Context-aware conversation handling
- Session persistence and history management
- Response formatting and sanitization
- Direct integration with llama-cpp-python
- Memory optimization and thread management
- Performance monitoring and adjustment
- Real-time system monitoring
- Performance benchmarking
- Resource usage tracking
- Singleton Pattern: Ensures single instance of critical managers
- Factory Pattern: Dynamic model instantiation based on configuration
- Observer Pattern: Event-driven status updates and notifications
- Strategy Pattern: Pluggable inference backends and optimization strategies
User Input → CLI Parser → Model Manager → Inference Engine → Response Formatter → User Output
↓ ↓ ↓ ↓ ↑
System Check → Model Registry → Model Loading → Context Management → Result Processing
DEFAULT_CONFIG = {
"max_threads": 8,
"context_length": "auto", # Dynamically set per model
"batch_size": 512,
"temperature": 0.8,
"top_p": 0.95,
"top_k": 40,
"repeat_penalty": 1.1,
"max_tokens": 400
}| Variable | Description | Default |
|---|---|---|
LLAMA_CPP_LOG_LEVEL |
Logging verbosity | ERROR |
OMP_NUM_THREADS |
OpenMP thread count | 8 |
MODEL_CACHE_DIR |
Model storage directory | ./models |
MAX_CONTEXT_LENGTH |
Maximum context tokens | 32768 |
UnleashedLLM/
├── main.py # Main application entry point
├── models/ # Downloaded model storage
│ ├── llama-2-7b-chat.Q4_K_M.gguf
│ ├── openchat-3.5-0106.Q4_K_M.gguf
│ └── ...
├── cache/ # Runtime cache and temporary files
├── logs/ # Application logs and diagnostics
├── config/ # User configuration files
└── README.md # Documentation
class ModelRegistry:
@staticmethod
def get_model_info(model_id: str) -> Dict
@staticmethod
def list_models_by_category(category: str) -> List[Dict]
@staticmethod
def validate_model(model_data: Dict) -> boolclass ModelManager:
def download_model(self, model_id: str, progress_callback=None) -> bool
def list_downloaded_models(self) -> List[str]
def delete_model(self, model_id: str) -> bool
def get_model_path(self, model_id: str) -> Pathclass ChatInterface:
def start_chat(self, model_id: str) -> None
def send_message(self, message: str) -> str
def end_chat(self) -> None
def get_chat_history(self) -> List[Dict]- Thread count limited to prevent CPU overload
- Automatic CPU core detection and utilization
- Dynamic batch size adjustment based on system performance
- Lazy model loading to minimize memory footprint
- Automatic garbage collection for conversation history
- Memory-mapped file access for large models
- Streaming downloads with resume capability
- Asynchronous file operations
- Compressed model storage when possible
| Model Size | RAM Usage | CPU Utilization | Tokens/Second |
|---|---|---|---|
| 1.6GB | 2.1GB | 45% | 4.2 |
| 4.1GB | 5.8GB | 65% | 2.8 |
| 7.3GB | 9.2GB | 78% | 1.9 |
| 26.9GB | 31.4GB | 85% | 0.8 |
Issue: llama-cpp-python compilation fails
# Solution: Install build dependencies
sudo apt update
sudo apt install build-essential cmake
pip install --upgrade pip setuptools wheelIssue: Out of memory when loading large models
- Reduce context length in configuration
- Close other memory-intensive applications
- Consider using smaller model variants
Issue: Slow inference speed
- Adjust thread count based on CPU cores
- Ensure sufficient RAM is available
- Check for background processes consuming resources
Issue: Network timeouts during model download
- Check internet connection stability
- Verify sufficient disk space
- Resume download using the same model selection
# Check system resources
python3 main.py --diagnostics
# Verify model integrity
python3 main.py --verify-models
# Performance benchmark
python3 main.py --benchmarkApplication logs are stored in the logs/ directory:
application.log: General application eventsperformance.log: Performance metrics and benchmarkserrors.log: Error messages and stack traces
src/
├── core/
│ ├── model_registry.py
│ ├── model_manager.py
│ ├── chat_interface.py
│ └── diagnostics.py
├── utils/
│ ├── file_operations.py
│ ├── network_utils.py
│ └── formatting.py
├── config/
│ ├── default_config.py
│ └── model_definitions.py
└── tests/
├── unit/
├── integration/
└── performance/
# Create development environment
python3 -m venv dev-env
source dev-env/bin/activate
# Install development dependencies
pip install pytest black flake8 mypy
# Run tests
pytest tests/
# Code formatting
black src/
flake8 src/- Code Style: Follow PEP 8 guidelines with Black formatting
- Testing: Maintain >90% test coverage for new features
- Documentation: Update README and inline documentation
- Performance: Benchmark new features for regression testing
- Security: Review code for potential vulnerabilities
class CustomInferenceEngine:
def load_model(self, model_path: str) -> bool
def generate(self, prompt: str, **kwargs) -> str
def unload_model(self) -> Noneclass UnleashedLLMPlugin:
def initialize(self, app_context: Dict) -> None
def process_input(self, user_input: str) -> str
def cleanup(self) -> None- Complete Offline Operation: No data transmission after model download
- Local Storage: All models and conversations stored locally
- No Telemetry: Zero usage data collection or external communications
- Secure Model Verification: Checksum validation for downloaded files
- Input Sanitization: All user inputs are sanitized before processing
- File System Isolation: Models stored in dedicated directories with restricted permissions
- Memory Protection: Sensitive data cleared from memory after use
- Audit Trail: Comprehensive logging of all system operations
- Regularly update the application for security patches
- Monitor system resources for unusual activity
- Validate model sources before downloading
- Use appropriate file permissions for model storage
-
Fork the Repository
git clone <your-fork-url> cd UnleashedLLM
-
Create Feature Branch
git checkout -b feature/your-feature-name
-
Make Changes
- Follow coding standards
- Add comprehensive tests
- Update documentation
-
Submit Pull Request
- Provide clear description of changes
- Reference related issues
- Ensure CI passes
- Issues: Use GitHub issues for bug reports and feature requests
- Code Review: All changes require peer review
- Testing: Automated testing with CI/CD pipeline
- Documentation: Keep documentation in sync with code changes
- Enhanced model registry with 30+ models
- Improved download resumption capabilities
- Added system diagnostics and performance monitoring
- Optimized memory management for large models
- Enhanced chat interface with better context handling
- Initial release with core functionality
- Basic model download and chat capabilities
- Command-line interface implementation
- Support for GGUF model format
This project is developed for educational and research purposes. Users are responsible for compliance with applicable AI model licenses and terms of use.
- llama-cpp-python: MIT License
- Model Files: Various licenses (see individual model documentation)
- Documentation: Comprehensive README and inline documentation
- Issues: GitHub Issues for bug reports and feature requests
- Community: Discussion forums and community support
- Developer: 0x0806
- Project Repository: [GitHub Repository URL]
- Issue Tracker: [GitHub Issues URL]
UnleashedLLM v1.1.0 | Python 3.11+ | Linux/Nix Environment
Last Updated: 2025