Data Science Docker Container

An all-purpose Docker container for data science, machine learning, and NLP work with large datasets.

Features

Python 3.11 with optimized libraries for large datasets
ML/DL Frameworks: TensorFlow, PyTorch, scikit-learn
NLP Tools: spaCy, NLTK, fuzzywuzzy
Data Processing: pandas, numpy, with performance optimizations
Jupyter Lab for interactive development
Claude Code CLI for AI-assisted coding
Memory optimized for datasets up to 7GB
Stata support (requires license)

Quick Start

Build the container:
```
./run.sh build
```
Start Jupyter Lab:
```
./run.sh jupyter
```
Then open http://localhost:8888 in your browser.
Run Python scripts:
```
./run.sh run my_script.py
```

Interactive Python/Bash:

./run.sh python  # Python shell
./run.sh bash    # Bash shell
./run.sh claude  # Claude Code CLI

Directory Structure

data/ - Mount point for your datasets
notebooks/ - Jupyter notebooks
code/ - Python scripts

Memory Configuration

The container is configured with:

16GB memory limit (adjustable in docker-compose.yml)
32GB swap limit
2GB shared memory

Adjust these in docker-compose.yml based on your system.

Adding Datasets

Edit docker-compose.yml to mount your existing data directories:

volumes:
  - ~/path/to/your/datasets:/workspace/external_data:ro

Installing Additional Packages

./run.sh install package_name

Or add to Dockerfile and rebuild for permanent inclusion.

Performance Tips

For large datasets, use chunked reading:

for chunk in pd.read_csv('large_file.csv', chunksize=10000):
    process(chunk)

Monitor memory usage:

import psutil
print(f"Memory usage: {psutil.virtual_memory().percent}%")

Use appropriate data types to reduce memory:

df = pd.read_csv('file.csv', dtype={'id': 'int32', 'category': 'category'})

Stata Integration

To add Stata support:

Place your Stata installation files in this directory
Uncomment the Stata installation lines in Dockerfile
Rebuild the image

Claude Code Setup

Copy the environment template:
```
cp .env.example .env
```
Add your Anthropic API key to .env:
```
ANTHROPIC_API_KEY=your_api_key_here
```
Start Claude Code:
```
./run.sh claude
```

Note: Get your API key from https://console.anthropic.com/

GPU Support

For GPU support, uncomment the nvidia runtime lines in docker-compose.yml and ensure nvidia-docker is installed.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
data		data
notebooks		notebooks
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.env.example		.env.example
CLAUDE_CODE.md		CLAUDE_CODE.md
Dockerfile		Dockerfile
README.md		README.md
SETUP.md		SETUP.md
USAGE.md		USAGE.md
docker-compose.yml		docker-compose.yml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Science Docker Container

Features

Quick Start

Directory Structure

Memory Configuration

Adding Datasets

Installing Additional Packages

Performance Tips

Stata Integration

Claude Code Setup

GPU Support

About

Uh oh!

Releases

Packages

Languages

CyrusDioun/docker-setup

Folders and files

Latest commit

History

Repository files navigation

Data Science Docker Container

Features

Quick Start

Directory Structure

Memory Configuration

Adding Datasets

Installing Additional Packages

Performance Tips

Stata Integration

Claude Code Setup

GPU Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages