LLM Playground

Ever wondered what actually happens when you hit "send" on ChatGPT?

This isn't just another wrapper around an API. It's a complete learning environment where you can peek under the hood, run experiments, and actually understand how modern language models work.

I built this while learning about LLMs myself. Figured others might find it useful too.

What's This About?

You know how everyone's using ChatGPT but nobody really knows what's happening inside? This project fixes that.

It's a full playground that runs entirely on your machine (no API costs, no data leaving your computer). You can:

See exactly what happens when you change "temperature" from 0.7 to 1.5
Compare zero-shot vs few-shot prompting side-by-side
Watch how small prompt changes create wildly different outputs
Track every single token, measure latency, estimate costs

Everything's logged. Everything's observable. No black boxes.

The Interface

Web UI (7 Interactive Tabs)

Quick Chat - Your testing ground

Simple prompt → response interface
Real-time parameter controls (temperature, max tokens, top-p)
Instant metrics: tokens used, latency, cost per request

Zero-Shot Experiments - No examples needed

Test 8 different task types (sentiment, Q&A, code gen, etc.)
See how well models handle tasks "cold"
Compare accuracy across different prompts

Few-Shot Learning - Power of examples

Side-by-side comparison: with vs without examples
4 pre-built scenarios (sentiment, entity extraction, classification)
Watch accuracy jump when you add just 2-3 examples

Temperature Testing - Creative vs Deterministic

Run same prompt at different temperatures simultaneously
Visual comparison of outputs
See diversity metrics in real-time

Prompt Sensitivity - Why wording matters

Test variations like "explain simply" vs "explain technically"
Observe how tiny changes create massive differences
Learn what actually makes prompts work

Logs Viewer - Your experiment history

Every interaction captured with full context
Searchable, filterable results
Token usage and cost tracking over time

Documentation - Theory meets practice

Full LLM concepts guide (transformers, attention, tokenization)
Tutorials and examples
All accessible without leaving the app

Command Line

For when you want speed:

# Quick generation
python cli.py generate "Your prompt here"

# Temperature experiments
python cli.py temperature "Write a story" --temps 0.1 0.7 1.5

# Zero-shot testing
python cli.py zero-shot --task sentiment_analysis

# List available models
python cli.py list-models

Perfect for scripting and automation.

Quick Start

Prerequisites:

macOS or Linux
Python 3.8+
5 minutes of your time

# 1. Install Ollama (one-time setup)
brew install ollama  # macOS
# or visit https://ollama.com for other platforms

# 2. Start Ollama
ollama serve  # Keep this running in a terminal

# 3. Clone and setup (in a new terminal)
git clone https://github.com/PyExtreme/llm_101.git
cd llm_101
./setup.sh

# 4. Launch!
streamlit run app.py

Your browser opens automatically. Start experimenting.

What You Actually Learn

Week 1: The Basics

How transformers actually work (self-attention explained properly)
What tokens are and why they matter
Temperature, top-p, and when to use what
Why prompt engineering isn't voodoo

Week 2: Getting Good

Zero-shot vs few-shot (with real metrics)
Context window limits and how they bite you
Cost vs quality tradeoffs
Reading and analyzing logs

Month 1: Mastery

Chain-of-thought prompting
Building custom experiments
Systematic prompt optimization
Understanding model behavior patterns

All with hands-on experiments. No fluff.

The Theory Part

Look, you can use this without reading theory. But if you want to understand it:

CONCEPTS.md - 8,000+ words on:

Transformer architecture (with actual math)
Self-attention mechanism (explained visually)
Tokenization (BPE, SentencePiece)
Why context windows exist
How generation actually works

TUTORIAL.md - Guided experiments:

Your first 5 minutes
Temperature deep dive
Prompt sensitivity analysis
Context window stress tests
3 mini-projects to build

LEARNING_OUTCOMES.md - 12+ follow-up experiments when you're ready for more.

Why Local Models?

Ollama = Free + Private

No API keys needed
Your data never leaves your machine
Run unlimited experiments
No per-token costs
Fast enough for learning

The playground also supports OpenAI (GPT-3.5, GPT-4) if you want to compare, but Ollama's llama2 or mistral work great for learning.

Project Structure

llm_101/
├── app.py                    # Streamlit UI (main interface)
├── cli.py                    # Command-line tool
├── models/                   # Model abstraction
│   ├── ollama_model.py      # Local models
│   └── openai_model.py      # Optional: GPT support
├── experiments/              # 5 experiment types
│   ├── zero_shot.py
│   ├── few_shot.py
│   ├── sampling_params.py
│   ├── context_window.py
│   └── prompt_sensitivity.py
├── logger.py                 # Structured logging
└── logs/                     # Your experiment history

Clean, modular, easy to extend.

Real Use Cases

For Students:

Understand how LLMs work (way beyond "it's just autocomplete")
See theory applied to practice
Build intuition through experimentation

For Developers:

Test prompts before production
Estimate costs and latency
Compare models systematically
Debug weird model behavior

For Researchers:

Reproducible experiments
Structured logging
Easy to extend with new tests
Compare different approaches

For Educators:

Ready-to-use teaching tool
Theory + practice in one place
Observable behavior
Students can run it themselves

What Makes This Different

vs Just Using ChatGPT:

You see what's happening (tokens, probabilities, parameters)
You control everything (local models, your rules)
You learn the "why" not just the "how"

vs LangChain/LlamaIndex:

This is for learning, not production
Everything's explained, not abstracted
Theory included, not assumed
Simpler, focused scope

vs Research Papers:

You can actually run the code
Examples work out of the box
Explanations in plain English
Hands-on, not theoretical

Extending It

All the experiment types are standalone modules. Want to add your own?

# experiments/my_experiment.py
from models.base import BaseModel
from logger import Logger

def run_my_experiment(model: BaseModel, logger: Logger, ...):
    response = model.generate(...)
    logger.log_interaction(...)
    return results

Add it to the UI, done. The architecture's clean enough that you won't get lost.

Common Questions

Do I need a powerful computer? Nope. Ollama handles smaller models fine on a laptop. llama2 runs on 8GB RAM.

Is it really free? Yes. Ollama is open source. No hidden costs.

Can I use GPT-4? Yes, if you have an OpenAI API key. But you'll pay per token.

How long to get started? 10 minutes if Ollama downloads fast. 20 if it's slow.

Do I need to know Python? For using it? No. For extending it? Basic Python helps.

Contributing

Found a bug? Have an idea? PRs welcome.

The code's intentionally simple - no complex abstractions, clear naming, lots of comments. Should be easy to jump in.

Credits

Built while procrastinating on other projects. Turned out useful, so here we are.

Uses:

Ollama for local model inference
Streamlit for the web UI
OpenAI SDK (optional) for GPT models

License

MIT - Do whatever you want with it.

Get Started

./setup.sh
streamlit run app.py

Have fun. Learn something. Break things and see what happens.

That's the point.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
experiments		experiments
models		models
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHECKLIST.md		CHECKLIST.md
CONCEPTS.md		CONCEPTS.md
LEARNING_OUTCOMES.md		LEARNING_OUTCOMES.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TUTORIAL.md		TUTORIAL.md
app.py		app.py
cli.py		cli.py
config.py		config.py
example.py		example.py
logger.py		logger.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Playground

What's This About?

The Interface

Web UI (7 Interactive Tabs)

Command Line

Quick Start

What You Actually Learn

Week 1: The Basics

Week 2: Getting Good

Month 1: Mastery

The Theory Part

Why Local Models?

Project Structure

Real Use Cases

What Makes This Different

Extending It

Common Questions

Contributing

Credits

License

Get Started

About

Uh oh!

Releases

Packages

Languages

PyExtreme/llm_101

Folders and files

Latest commit

History

Repository files navigation

LLM Playground

What's This About?

The Interface

Web UI (7 Interactive Tabs)

Command Line

Quick Start

What You Actually Learn

Week 1: The Basics

Week 2: Getting Good

Month 1: Mastery

The Theory Part

Why Local Models?

Project Structure

Real Use Cases

What Makes This Different

Extending It

Common Questions

Contributing

Credits

License

Get Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages