Skip to content

PyExtreme/llm_101

Repository files navigation

LLM Playground

Ever wondered what actually happens when you hit "send" on ChatGPT?

This isn't just another wrapper around an API. It's a complete learning environment where you can peek under the hood, run experiments, and actually understand how modern language models work.

I built this while learning about LLMs myself. Figured others might find it useful too.


What's This About?

You know how everyone's using ChatGPT but nobody really knows what's happening inside? This project fixes that.

It's a full playground that runs entirely on your machine (no API costs, no data leaving your computer). You can:

  • See exactly what happens when you change "temperature" from 0.7 to 1.5
  • Compare zero-shot vs few-shot prompting side-by-side
  • Watch how small prompt changes create wildly different outputs
  • Track every single token, measure latency, estimate costs

Everything's logged. Everything's observable. No black boxes.


The Interface

Web UI (7 Interactive Tabs)

Quick Chat - Your testing ground

  • Simple prompt → response interface
  • Real-time parameter controls (temperature, max tokens, top-p)
  • Instant metrics: tokens used, latency, cost per request

Zero-Shot Experiments - No examples needed

  • Test 8 different task types (sentiment, Q&A, code gen, etc.)
  • See how well models handle tasks "cold"
  • Compare accuracy across different prompts

Few-Shot Learning - Power of examples

  • Side-by-side comparison: with vs without examples
  • 4 pre-built scenarios (sentiment, entity extraction, classification)
  • Watch accuracy jump when you add just 2-3 examples

Temperature Testing - Creative vs Deterministic

  • Run same prompt at different temperatures simultaneously
  • Visual comparison of outputs
  • See diversity metrics in real-time

Prompt Sensitivity - Why wording matters

  • Test variations like "explain simply" vs "explain technically"
  • Observe how tiny changes create massive differences
  • Learn what actually makes prompts work

Logs Viewer - Your experiment history

  • Every interaction captured with full context
  • Searchable, filterable results
  • Token usage and cost tracking over time

Documentation - Theory meets practice

  • Full LLM concepts guide (transformers, attention, tokenization)
  • Tutorials and examples
  • All accessible without leaving the app

Command Line

For when you want speed:

# Quick generation
python cli.py generate "Your prompt here"

# Temperature experiments
python cli.py temperature "Write a story" --temps 0.1 0.7 1.5

# Zero-shot testing
python cli.py zero-shot --task sentiment_analysis

# List available models
python cli.py list-models

Perfect for scripting and automation.


Quick Start

Prerequisites:

  • macOS or Linux
  • Python 3.8+
  • 5 minutes of your time
# 1. Install Ollama (one-time setup)
brew install ollama  # macOS
# or visit https://ollama.com for other platforms

# 2. Start Ollama
ollama serve  # Keep this running in a terminal

# 3. Clone and setup (in a new terminal)
git clone https://github.com/PyExtreme/llm_101.git
cd llm_101
./setup.sh

# 4. Launch!
streamlit run app.py

Your browser opens automatically. Start experimenting.


What You Actually Learn

Week 1: The Basics

  • How transformers actually work (self-attention explained properly)
  • What tokens are and why they matter
  • Temperature, top-p, and when to use what
  • Why prompt engineering isn't voodoo

Week 2: Getting Good

  • Zero-shot vs few-shot (with real metrics)
  • Context window limits and how they bite you
  • Cost vs quality tradeoffs
  • Reading and analyzing logs

Month 1: Mastery

  • Chain-of-thought prompting
  • Building custom experiments
  • Systematic prompt optimization
  • Understanding model behavior patterns

All with hands-on experiments. No fluff.


The Theory Part

Look, you can use this without reading theory. But if you want to understand it:

CONCEPTS.md - 8,000+ words on:

  • Transformer architecture (with actual math)
  • Self-attention mechanism (explained visually)
  • Tokenization (BPE, SentencePiece)
  • Why context windows exist
  • How generation actually works

TUTORIAL.md - Guided experiments:

  • Your first 5 minutes
  • Temperature deep dive
  • Prompt sensitivity analysis
  • Context window stress tests
  • 3 mini-projects to build

LEARNING_OUTCOMES.md - 12+ follow-up experiments when you're ready for more.


Why Local Models?

Ollama = Free + Private

  • No API keys needed
  • Your data never leaves your machine
  • Run unlimited experiments
  • No per-token costs
  • Fast enough for learning

The playground also supports OpenAI (GPT-3.5, GPT-4) if you want to compare, but Ollama's llama2 or mistral work great for learning.


Project Structure

llm_101/
├── app.py                    # Streamlit UI (main interface)
├── cli.py                    # Command-line tool
├── models/                   # Model abstraction
│   ├── ollama_model.py      # Local models
│   └── openai_model.py      # Optional: GPT support
├── experiments/              # 5 experiment types
│   ├── zero_shot.py
│   ├── few_shot.py
│   ├── sampling_params.py
│   ├── context_window.py
│   └── prompt_sensitivity.py
├── logger.py                 # Structured logging
└── logs/                     # Your experiment history

Clean, modular, easy to extend.


Real Use Cases

For Students:

  • Understand how LLMs work (way beyond "it's just autocomplete")
  • See theory applied to practice
  • Build intuition through experimentation

For Developers:

  • Test prompts before production
  • Estimate costs and latency
  • Compare models systematically
  • Debug weird model behavior

For Researchers:

  • Reproducible experiments
  • Structured logging
  • Easy to extend with new tests
  • Compare different approaches

For Educators:

  • Ready-to-use teaching tool
  • Theory + practice in one place
  • Observable behavior
  • Students can run it themselves

What Makes This Different

vs Just Using ChatGPT:

  • You see what's happening (tokens, probabilities, parameters)
  • You control everything (local models, your rules)
  • You learn the "why" not just the "how"

vs LangChain/LlamaIndex:

  • This is for learning, not production
  • Everything's explained, not abstracted
  • Theory included, not assumed
  • Simpler, focused scope

vs Research Papers:

  • You can actually run the code
  • Examples work out of the box
  • Explanations in plain English
  • Hands-on, not theoretical

Extending It

All the experiment types are standalone modules. Want to add your own?

# experiments/my_experiment.py
from models.base import BaseModel
from logger import Logger

def run_my_experiment(model: BaseModel, logger: Logger, ...):
    response = model.generate(...)
    logger.log_interaction(...)
    return results

Add it to the UI, done. The architecture's clean enough that you won't get lost.


Common Questions

Do I need a powerful computer? Nope. Ollama handles smaller models fine on a laptop. llama2 runs on 8GB RAM.

Is it really free? Yes. Ollama is open source. No hidden costs.

Can I use GPT-4? Yes, if you have an OpenAI API key. But you'll pay per token.

How long to get started? 10 minutes if Ollama downloads fast. 20 if it's slow.

Do I need to know Python? For using it? No. For extending it? Basic Python helps.


Contributing

Found a bug? Have an idea? PRs welcome.

The code's intentionally simple - no complex abstractions, clear naming, lots of comments. Should be easy to jump in.


Credits

Built while procrastinating on other projects. Turned out useful, so here we are.

Uses:

  • Ollama for local model inference
  • Streamlit for the web UI
  • OpenAI SDK (optional) for GPT models

License

MIT - Do whatever you want with it.


Get Started

./setup.sh
streamlit run app.py

Have fun. Learn something. Break things and see what happens.

That's the point.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published