An all-purpose Docker container for data science, machine learning, and NLP work with large datasets.
- Python 3.11 with optimized libraries for large datasets
- ML/DL Frameworks: TensorFlow, PyTorch, scikit-learn
- NLP Tools: spaCy, NLTK, fuzzywuzzy
- Data Processing: pandas, numpy, with performance optimizations
- Jupyter Lab for interactive development
- Claude Code CLI for AI-assisted coding
- Memory optimized for datasets up to 7GB
- Stata support (requires license)
-
Build the container:
./run.sh build
-
Start Jupyter Lab:
./run.sh jupyter
Then open http://localhost:8888 in your browser.
-
Run Python scripts:
./run.sh run my_script.py
-
Interactive Python/Bash:
./run.sh python # Python shell ./run.sh bash # Bash shell ./run.sh claude # Claude Code CLI
data/- Mount point for your datasetsnotebooks/- Jupyter notebookscode/- Python scripts
The container is configured with:
- 16GB memory limit (adjustable in docker-compose.yml)
- 32GB swap limit
- 2GB shared memory
Adjust these in docker-compose.yml based on your system.
Edit docker-compose.yml to mount your existing data directories:
volumes:
- ~/path/to/your/datasets:/workspace/external_data:ro./run.sh install package_nameOr add to Dockerfile and rebuild for permanent inclusion.
-
For large datasets, use chunked reading:
for chunk in pd.read_csv('large_file.csv', chunksize=10000): process(chunk)
-
Monitor memory usage:
import psutil print(f"Memory usage: {psutil.virtual_memory().percent}%")
-
Use appropriate data types to reduce memory:
df = pd.read_csv('file.csv', dtype={'id': 'int32', 'category': 'category'})
To add Stata support:
- Place your Stata installation files in this directory
- Uncomment the Stata installation lines in Dockerfile
- Rebuild the image
-
Copy the environment template:
cp .env.example .env
-
Add your Anthropic API key to
.env:ANTHROPIC_API_KEY=your_api_key_here
-
Start Claude Code:
./run.sh claude
Note: Get your API key from https://console.anthropic.com/
For GPU support, uncomment the nvidia runtime lines in docker-compose.yml and ensure nvidia-docker is installed.