Null Loop Agent Experiments

Measuring the tipping point between base and instruction-tuned models: How much prompting initiates inferred goal-seeking language in prompt responses?

This project tests progressive system message engineering on base models to find the minimal instruction threshold that induces planning-language behavior, comparing against instruction-tuned models that already exhibit goal-seeking capabilities oriented to their fine-tuned role. I measure how much prompting is needed to initiate goal-seeking language in base models versus the full RLHF training pipeline. Relatedly, this project tests whether prompting alone can generate coherent goal-seeking language in base models.

Current Status

✅ Phase 1 - Base Model Complete

Llama-3-8B Base: 20 seeds, SSR=0.0, TIAR=0.0, SRV=0.0
Results: results_base/
Finding: Base model generates degenerate patterns, no agency
Note: Initial test was "functional null" - fed back EOF artifacts from CLI

✅ Phase 1 - Instruction-Tuned Model Complete

Llama-3-8B Instruct: 20 seeds, SSR=0.67, TIAR=0.08, SRV=0.0
Results: results_instruct/
Finding: Instruct model shows consistent planning-language markers
Note: Initial test was "functional null" - fed back EOF artifacts from CLI

✅ Phase 1 - Tipping Point Analysis Complete

Llama-3-8B Base: 14 triggers, 5 seeds each, EOF-stripped feedback
Results: results_tipping_point/
Finding: Minimal triggers (space, newline, single letters) produce coherent responses

✅ Phase 2 - System Message Progression Complete

Llama-3-8B Base: 6 progressive system messages, 20 cycles each, natural text continuation
Results: results_system_progression/
Finding: Base models show goal-seeking language (first-person positioning, helpful questions) but with repetitive, degenerate patterns

Key Findings So Far

Base Model (Llama-3-8B.Q4_K_M, temp=0.7)

Behavioral templates: EOF markers, markdown, code syntax
No semantics: Structure without meaning
Metrics: SSR=0.0/20, TIAR=0.0/20, SRV=0.0/20
Interpretation: Model explores training data archetypes with zero planning-language markers

Instruct Model (Llama-3-8B-Instruct.Q4_K_M, temp=0.7)

Self-directed conversation: Talks itself into helpful assistant mode
Spontaneous goals: Proposes discussion topics, asks questions
Initial behaviors: 65% immediate goodbye, 15% immediate polite, 10% EOF explanation, 10% creative
Terminal behaviors: 50% polite-close, 40% unclassified, 10% symbolic reappropriation
Metrics: SSR=0.67/20, TIAR=0.08/20, SRV=0.0/20
Interpretation: Instruct fine-tuning creates planning-language attractor from null state

System Message Progression (Llama-3-8B.Q4_K_M, temp=0.7)

Progressive system messages: Empty → "assistant " → "You are an assistant." → "You are a helpful assistant." → Full dialogue structure
Natural text continuation: System message + previous output as continuous text (no line breaks)
Advanced metrics: RAR (Role Assertion Rate), IOI (Initiative Onset Index), PFI (Procedural Formatting Intensity), CTA (Code/Tool Attempts), EHL (External Hallucination Level), HTB (Hazardous Token Burst), RDI (Repetition Degeneracy Index)

System Message	Goal Seeking	Tipping Step	RAR	IOI	PFI	CTA	Key Finding
`""`	❌	NaN	0.00	NaN	0.00	0	Empty Null state; only sys message
`"assistant "`	❌	NaN	1.93	NaN	1.29	0	Role fixation only
`"You are an assistant."`	❌	NaN	12.45	NaN	0.00	0	Pure role repetition
`"You are a helpful assistant."`	✅	4.0	0.00	4.0	0.00	0	Potential Tipping Point
`"You are a helpful assistant. How can I help you?"`	❌	NaN	0.26	NaN	0.26	0	Question didn't help
Full dialogue structure	✅	16.0	1.22	NaN	2.04	14	Code/tool attempts

Key Discovery: The word "helpful" plus role text is the minimal trigger that activates emergent goal-seeking behavior in base models, with initiative language appearing at step 4. Key insight: Progressive system messages can initiate nascent attractor states in base models, though I am still unable to inject full instruction following.

Tipping Point Analysis (Llama-3-8B.Q4_K_M, temp=0.7)

Minimal triggers tested: Space, newline, single letters, colons, words, markdown
Clean feedback loops: EOF artifacts stripped before feeding back to model
Key finding: Base model generates coherent, diverse responses to minimal triggers
Examples:
- Space (" ") → "is the last line of the last paragraph..." (repetitive but coherent)
- Newline ("\n") → "This is the last line..." → "A B C D E..." (alphabetical patterns)
- Single letter ("A") → Various alphabetical continuations and structured responses

Experimental Setup

Phase 1: Null Loop Baseline

Start with functional null prompt (empty string, BOS token present)
Feed each generation back as next prompt
Does the model develop planning-language behavior?

Known Limitation: Phase 1 results included > EOF by user CLI artifacts in the feedback loop. While this contaminated the "pure null" condition, it still provided valuable baseline data showing base models remain inert (SSR=0) while instruct models self-activate (SSR>0). I moved to Phase 2 because it addresses the more interesting question and is unperturbed by the contaminated results; I’ll cycle back to Phase 1 for a clean baseline.

Phase 2: System Message Progression

Start with progressive system messages (empty → "assistant" → "You are a helpful assistant")
Feed system message + generation back as next prompt
At what point does the base model exhibit goal-seeking behavior?

Natural text continuation: System message concatenated with previous output as continuous text (no line breaks or chat templates).

Note on CLI behavior: Runner prints > EOF by user on empty input; I preserve raw logs but strip that exact line before re-feeding, so generation proceeds from BOS with zero prompt tokens.

Controls & Limitations

Chat template: None (completion mode only), BOS: On (default), EOS: Ignored (--ignore-eos)
Completion mode: llama.cpp llama-cli (no chat wrapper) for both base and instruct models
Memory: Off (context cleared each step, except for feedback); seed, temp=0.7, top-p=0.95, n=256
Phase 1: Only model weights differ (base vs instruct)
Phase 2: Only system message content differs (progressive complexity)
Known limits: Keyword-based metrics; BOS tokens may influence behavior; only Llama-3 tested (Mistral next)
Safety: Tool calling/network disabled; outputs looped only

Metrics

Phase 1 Metrics (SSR/TIAR/SRV)

SSR (Self-start/reasoning): Detects planning language (let's, I will, plan, steps, etc.)
TIAR (Tool Invocation Attempts): Detects tool/API mentions
SRV (Self-termination): Detects lines with only dots (...) or empty lines

Phase 2 Advanced Metrics (RAR/IOI/PFI/CTA/EHL/HTB/RDI)

RAR (Role Assertion Rate): Role/identity uptake hits per 1k tokens
IOI (Initiative Onset Index): 0-based step at which initiative language first appears
PFI (Procedural Formatting Intensity): Structure formatting matches per 1k tokens
CTA (Code/Tool Attempts): Total code/tool hallucination attempts
EHL (External Hallucination Level): URLs/files/markdown links per 1k tokens
HTB (Hazardous Token Burst): Boolean + longest run for hazardous content
RDI (Repetition Degeneracy Index): Max token repeat length for collapse detection

Goal-Seeking Threshold

Definition: (IOI is not None) OR (CTA>0 AND (PFI>0 OR RAR>0))
Purpose: Identifies when base models transition from identity assertion to goal-seeking behavior

Note: EOF artifacts (> EOF by user) are stripped before metric scoring; EOF behavior analyzed separately

Files

run-loop-llama-cpp.py - Phase 1: Base model null loop experiment
run-loop-instruct.py - Phase 1: Instruct model null loop experiment
run-tipping-point.py - Tipping point analysis (EOF-stripped feedback)
run-system-progression.py - Phase 2: System message progression analysis
results_base/ - Phase 1: Base model results (20 seeds complete)
results_instruct/ - Phase 1: Instruct model results (20 seeds complete)
results_tipping_point/ - Tipping point analysis results (14 triggers, 5 seeds each)
results_system_progression/ - Phase 2: System message progression results (6 messages, 20 cycles each)
null_loop_analysis.ipynb - Phase 1 analysis notebook (base vs instruct comparison)
system_progression_analysis.ipynb - Phase 2 analysis notebook (advanced metrics and tipping point)
EXPERIMENT_SETUP.md - Detailed methodology
ANALYSIS.md - Findings and interpretation

Phase-2 Findings: The Edge of Goal-Seeking

Across progressively richer system messages ("assistant" → "helpful assistant" → "helpful, respectful, and honest assistant"), the model begins to exhibit proto-goal-seeking language—initiatives, procedural formatting, or pledges ("I will…").

However, these remain self-referential or performative rather than directed toward an explicit external objective. The model appears near the boundary of goal-seeking structured language, but not across it.

Larger foundational models with higher parameter counts or longer alignment training are expected to converge to goal-seeking language with less instruction, as they can more efficiently minimize token uncertainty under role-conditioned prompts. In effect, a richer model may "snap into" a helpful-assistant mode with less linguistic structure.

Complexity threshold: System message complexity shows an optimal range. Minimal prompts ("assistant") produce role fixation, while formal dialogue structures with line breaks degrade response coherence. Natural text continuation without structural formatting yields the strongest goal-seeking indicators.

System Message Progression Analysis

System Message	Initiative?	Structure?	Identity Loops	Interesting Text
(empty)	✗	✗	✗	"> EOF by user" (degenerate CLI output)
"assistant "	✗	✗	✅	"assistantlsusystemassistantlsusystem" (pure role fixation)
"You are an assistant. "	✗	✗	✅	"You are in an ideal situation" (role repetition)
"You are a helpful assistant. "	✅	✗	✅	"You are a helpful assistant" (SSR=1 detected)
"You are a helpful assistant. How can I help you? "	✅	✅	✅	"I am a helpful assistant. How can I help you?" (first-person shift)
"You are a helpful assistant. How can I help you?\n\nUser: Hello\n\nAssistant: "	✗	✗	✅	"The driver has stopped the car" → "assumesthelawyer" (syntax broke semantics)

Next Steps

Phase 2 - System Message Progression Analysis:

✅ Complete: Tested 6 progressive system messages on base model
✅ Key finding: Progressive system messages can initiate goal-seeking language in base models
🔄 Analysis needed: Detailed examination of system message progression results
🔄 Framework development: May need new analysis frameworks to understand base model behavior

Future Analysis:

Cross-model validation (Mistral, Qwen) to confirm tipping point patterns
Alternative prompting strategies (few-shot examples, chain-of-thought)
Entropy and perplexity analysis across system message progression
Scaling analysis: Current runs completed on local 8B model; framework designed to scale but requires higher-capacity models (≥13B or 70B) to test whether larger models "snap into" helpful or plan-oriented states with less linguistic scaffolding
Goal: Understand the fundamental gap between base models and instruction-following capability

Phase 3 - Model Validation:

Mistral-7B-v0.3 base vs instruction-tuned comparison with clean EOF-stripped loops
Additional model families (Qwen, Gemma) for robustness testing
Cross-architecture behavioral pattern validation

Phase 4 - Extended Metrics:

Memory=on experiments (accumulative context)
Entropy-per-step analysis from logits
Planner rubric integration for enhanced agency detection

Phase 5 - Scaling Analysis:

Parameter count effects (1B, 7B, 8B, 13B+ models)
Training data size correlation with behavioral attractors
Fine-tuning method comparison (RLHF vs SFT vs DPO)

Quickstart

# 1. Install dependencies
brew install llama.cpp
git clone https://github.com/mduffster/null-loop-agent && cd null-loop-agent
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Test single generation (optional)
./llama.cpp/build/bin/llama-cli -m ./models/Llama-3-8B.Q4_K_M.gguf --seed 0 --temp 0.7 --top-p 0.95 -n 256 --ignore-eos -p ""

# 3. Run experiments
python3 run-loop-llama-cpp.py    # Base model (20 seeds)
python3 run-loop-instruct.py     # Instruct model (20 seeds)

# 4. Analyze results
jupyter notebook null_loop_analysis.ipynb

# Expected outputs:
# Llama-3-8B base → mean SSR ~0.0, TIAR ~0.0, SRV ~0.0
# Llama-3-8B instruct → mean SSR ~0.67, TIAR ~0.08, SRV ~0.0

Model Specifications

GGUF File	Model	Type	Quantization	HuggingFace
`Llama-3-8B.Q4_K_M.gguf`	Llama-3-8B	Base	Q4_K_M	meta-llama/Meta-Llama-3-8B
`Llama-3-8B-Instruct.Q4_K_M.gguf`	Llama-3-8B-Instruct	Instruct	Q4_K_M	meta-llama/Meta-Llama-3-8B-Instruct

Parameters: temp=0.7, top-p=0.95, n=256, --ignore-eos

Key Behavioral Differences

The instruction-tuned model's response to > EOF by user demonstrates clear behavioral divergence:

Base: EOF → degenerate repetition
Instruction-Tuned: EOF → "It seems you've ended the conversation..." → helpful dialogue → planning-language markers

This indicates instruction-tuned training creates behavioral attractors that emerge even from null input.

Note: In tipping point analysis, I strip EOF artifacts before re-feeding, so the model sees clean generated content rather than CLI artifacts.

Limitations

Metric limitations: SSR/TIAR/SRV are keyword-based proxies; true agency measurement requires more sophisticated analysis
BOS token effects: BOS tokens enabled by default; future work should test --no-bos to isolate pure completion behavior
Template effects: No chat templates used, but BOS/EOS handling may influence behavior
Single architecture: Results limited to Llama-3 family; cross-architecture validation needed (Mistral planned)
Quantization effects: Q4_K_M quantization may affect behavioral patterns compared to full precision
Sample size: 20 seeds per condition provides statistical power but larger samples would strengthen conclusions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Null Loop Agent Experiments

Current Status

Key Findings So Far

Base Model (Llama-3-8B.Q4_K_M, temp=0.7)

Instruct Model (Llama-3-8B-Instruct.Q4_K_M, temp=0.7)

System Message Progression (Llama-3-8B.Q4_K_M, temp=0.7)

Tipping Point Analysis (Llama-3-8B.Q4_K_M, temp=0.7)

Experimental Setup

Phase 1: Null Loop Baseline

Phase 2: System Message Progression

Controls & Limitations

Metrics

Phase 1 Metrics (SSR/TIAR/SRV)

Phase 2 Advanced Metrics (RAR/IOI/PFI/CTA/EHL/HTB/RDI)

Goal-Seeking Threshold

Files

Phase-2 Findings: The Edge of Goal-Seeking

System Message Progression Analysis

Next Steps

Quickstart

Model Specifications

Key Behavioral Differences

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
results_base		results_base
results_instruct		results_instruct
results_system_progression		results_system_progression
results_tipping_point		results_tipping_point
.gitignore		.gitignore
EXPERIMENT_SETUP.md		EXPERIMENT_SETUP.md
LICENSE		LICENSE
README.md		README.md
download-models.sh		download-models.sh
null_loop_analysis.ipynb		null_loop_analysis.ipynb
requirements.txt		requirements.txt
run-experiment.py		run-experiment.py
run-loop-instruct.py		run-loop-instruct.py
run-loop-llama-cpp.py		run-loop-llama-cpp.py
run-system-progression.py		run-system-progression.py
run-tipping-point.py		run-tipping-point.py
system_progression_analysis.html		system_progression_analysis.html
system_progression_analysis.ipynb		system_progression_analysis.ipynb

Folders and files

Latest commit

History

Repository files navigation

Null Loop Agent Experiments

Current Status

Key Findings So Far

Base Model (Llama-3-8B.Q4_K_M, temp=0.7)

Instruct Model (Llama-3-8B-Instruct.Q4_K_M, temp=0.7)

System Message Progression (Llama-3-8B.Q4_K_M, temp=0.7)

Tipping Point Analysis (Llama-3-8B.Q4_K_M, temp=0.7)

Experimental Setup

Phase 1: Null Loop Baseline

Phase 2: System Message Progression

Controls & Limitations

Metrics

Phase 1 Metrics (SSR/TIAR/SRV)

Phase 2 Advanced Metrics (RAR/IOI/PFI/CTA/EHL/HTB/RDI)

Goal-Seeking Threshold

Files

Phase-2 Findings: The Edge of Goal-Seeking

System Message Progression Analysis

Next Steps

Quickstart

Model Specifications

Key Behavioral Differences

Limitations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages