Ever wondered what actually happens when you hit "send" on ChatGPT?
This isn't just another wrapper around an API. It's a complete learning environment where you can peek under the hood, run experiments, and actually understand how modern language models work.
I built this while learning about LLMs myself. Figured others might find it useful too.
You know how everyone's using ChatGPT but nobody really knows what's happening inside? This project fixes that.
It's a full playground that runs entirely on your machine (no API costs, no data leaving your computer). You can:
- See exactly what happens when you change "temperature" from 0.7 to 1.5
- Compare zero-shot vs few-shot prompting side-by-side
- Watch how small prompt changes create wildly different outputs
- Track every single token, measure latency, estimate costs
Everything's logged. Everything's observable. No black boxes.
Quick Chat - Your testing ground
- Simple prompt → response interface
- Real-time parameter controls (temperature, max tokens, top-p)
- Instant metrics: tokens used, latency, cost per request
Zero-Shot Experiments - No examples needed
- Test 8 different task types (sentiment, Q&A, code gen, etc.)
- See how well models handle tasks "cold"
- Compare accuracy across different prompts
Few-Shot Learning - Power of examples
- Side-by-side comparison: with vs without examples
- 4 pre-built scenarios (sentiment, entity extraction, classification)
- Watch accuracy jump when you add just 2-3 examples
Temperature Testing - Creative vs Deterministic
- Run same prompt at different temperatures simultaneously
- Visual comparison of outputs
- See diversity metrics in real-time
Prompt Sensitivity - Why wording matters
- Test variations like "explain simply" vs "explain technically"
- Observe how tiny changes create massive differences
- Learn what actually makes prompts work
Logs Viewer - Your experiment history
- Every interaction captured with full context
- Searchable, filterable results
- Token usage and cost tracking over time
Documentation - Theory meets practice
- Full LLM concepts guide (transformers, attention, tokenization)
- Tutorials and examples
- All accessible without leaving the app
For when you want speed:
# Quick generation
python cli.py generate "Your prompt here"
# Temperature experiments
python cli.py temperature "Write a story" --temps 0.1 0.7 1.5
# Zero-shot testing
python cli.py zero-shot --task sentiment_analysis
# List available models
python cli.py list-modelsPerfect for scripting and automation.
Prerequisites:
- macOS or Linux
- Python 3.8+
- 5 minutes of your time
# 1. Install Ollama (one-time setup)
brew install ollama # macOS
# or visit https://ollama.com for other platforms
# 2. Start Ollama
ollama serve # Keep this running in a terminal
# 3. Clone and setup (in a new terminal)
git clone https://github.com/PyExtreme/llm_101.git
cd llm_101
./setup.sh
# 4. Launch!
streamlit run app.pyYour browser opens automatically. Start experimenting.
- How transformers actually work (self-attention explained properly)
- What tokens are and why they matter
- Temperature, top-p, and when to use what
- Why prompt engineering isn't voodoo
- Zero-shot vs few-shot (with real metrics)
- Context window limits and how they bite you
- Cost vs quality tradeoffs
- Reading and analyzing logs
- Chain-of-thought prompting
- Building custom experiments
- Systematic prompt optimization
- Understanding model behavior patterns
All with hands-on experiments. No fluff.
Look, you can use this without reading theory. But if you want to understand it:
CONCEPTS.md - 8,000+ words on:
- Transformer architecture (with actual math)
- Self-attention mechanism (explained visually)
- Tokenization (BPE, SentencePiece)
- Why context windows exist
- How generation actually works
TUTORIAL.md - Guided experiments:
- Your first 5 minutes
- Temperature deep dive
- Prompt sensitivity analysis
- Context window stress tests
- 3 mini-projects to build
LEARNING_OUTCOMES.md - 12+ follow-up experiments when you're ready for more.
Ollama = Free + Private
- No API keys needed
- Your data never leaves your machine
- Run unlimited experiments
- No per-token costs
- Fast enough for learning
The playground also supports OpenAI (GPT-3.5, GPT-4) if you want to compare, but Ollama's llama2 or mistral work great for learning.
llm_101/
├── app.py # Streamlit UI (main interface)
├── cli.py # Command-line tool
├── models/ # Model abstraction
│ ├── ollama_model.py # Local models
│ └── openai_model.py # Optional: GPT support
├── experiments/ # 5 experiment types
│ ├── zero_shot.py
│ ├── few_shot.py
│ ├── sampling_params.py
│ ├── context_window.py
│ └── prompt_sensitivity.py
├── logger.py # Structured logging
└── logs/ # Your experiment history
Clean, modular, easy to extend.
For Students:
- Understand how LLMs work (way beyond "it's just autocomplete")
- See theory applied to practice
- Build intuition through experimentation
For Developers:
- Test prompts before production
- Estimate costs and latency
- Compare models systematically
- Debug weird model behavior
For Researchers:
- Reproducible experiments
- Structured logging
- Easy to extend with new tests
- Compare different approaches
For Educators:
- Ready-to-use teaching tool
- Theory + practice in one place
- Observable behavior
- Students can run it themselves
vs Just Using ChatGPT:
- You see what's happening (tokens, probabilities, parameters)
- You control everything (local models, your rules)
- You learn the "why" not just the "how"
vs LangChain/LlamaIndex:
- This is for learning, not production
- Everything's explained, not abstracted
- Theory included, not assumed
- Simpler, focused scope
vs Research Papers:
- You can actually run the code
- Examples work out of the box
- Explanations in plain English
- Hands-on, not theoretical
All the experiment types are standalone modules. Want to add your own?
# experiments/my_experiment.py
from models.base import BaseModel
from logger import Logger
def run_my_experiment(model: BaseModel, logger: Logger, ...):
response = model.generate(...)
logger.log_interaction(...)
return resultsAdd it to the UI, done. The architecture's clean enough that you won't get lost.
Do I need a powerful computer?
Nope. Ollama handles smaller models fine on a laptop. llama2 runs on 8GB RAM.
Is it really free? Yes. Ollama is open source. No hidden costs.
Can I use GPT-4? Yes, if you have an OpenAI API key. But you'll pay per token.
How long to get started? 10 minutes if Ollama downloads fast. 20 if it's slow.
Do I need to know Python? For using it? No. For extending it? Basic Python helps.
Found a bug? Have an idea? PRs welcome.
The code's intentionally simple - no complex abstractions, clear naming, lots of comments. Should be easy to jump in.
Built while procrastinating on other projects. Turned out useful, so here we are.
Uses:
- Ollama for local model inference
- Streamlit for the web UI
- OpenAI SDK (optional) for GPT models
MIT - Do whatever you want with it.
./setup.sh
streamlit run app.pyHave fun. Learn something. Break things and see what happens.
That's the point.