[gemma3] Add text-only runner for gemma-3-1B-it model by seyeong-han · Pull Request #16885 · pytorch/executorch

seyeong-han · 2026-01-26T22:32:31Z

Summary

This PR adds support for running the Gemma-3-1B-IT text-only model on ExecuTorch with CPU backend. The new gemma3_text_runner provides a lightweight alternative to the existing multimodal gemma3_e2e_runner, without requiring image processing dependencies.

Dependencies

⚠️ Required: This PR depends on huggingface/optimum-executorch#206 which adds proper EOS token handling for Gemma models.

The optimum-executorch PR modifies utils.py to include the <end_of_turn> token (ID 106) in get_eos_ids for Gemma models. Without this change, the text runner will not stop generation at <end_of_turn> and will continue until max_new_tokens is reached.

Changes

New Files

examples/models/gemma3/text_runner.cpp - Text-only inference runner with:
- gflags command-line interface (model_path, tokenizer_path, prompt, temperature, max_new_tokens, cpu_threads, warmup)
- Gemma3 chat template formatting (<start_of_turn>user\n...<end_of_turn>\n<start_of_turn>model\n)
- Integration with TextLLMRunner for text generation
- Threadpool configuration support

Modified Files

examples/models/gemma3/CMakeLists.txt - Added gemma3_text_runner executable target
examples/models/gemma3/CMakePresets.json - Added gemma3-text-cpu build and workflow presets
Makefile - Added gemma3-text-cpu target
examples/models/gemma3/README.md - Comprehensive documentation for both models

Test Plan

# Build
make gemma3-text-cpu

# Export (requires optimum-executorch with PR #206 merged)
optimum-cli export executorch \
  --model "google/gemma-3-1b-it" \
  --task "text-generation" \
  --recipe "xnnpack" \
  --use_custom_sdpa \
  --use_custom_kv_cache \
  --output_dir="gemma-3/gemma-3-1b-it"

# Download tokenizer
curl -L https://huggingface.co/google/gemma-3-1b-it/resolve/main/tokenizer.json -o gemma-3/tokenizer.json

# Run
./cmake-out/examples/models/gemma3/gemma3_text_runner \
  --model_path=gemma-3/gemma-3-1b-it/model.pte \
  --tokenizer_path=gemma-3/tokenizer.json \
  --prompt="What is the capital of France?" \
  --max_new_tokens=50

Result

./cmake-out/examples/models/gemma3/gemma3_text_runner \
    --model_path=/Users/younghan/project/executorch/gemma-3/gemma-3-1b-it/model.pte \
    --tokenizer_path=/Users/younghan/project/executorch/gemma-3/tokenizer.json \
    --prompt="What is the capital of France?" \
    --max_new_tokens=50 --warmup

I tokenizers:regex.cpp:27] Registering override fallback regex
I tokenizers:hf_tokenizer.cpp:142] Setting up normalizer...
I tokenizers:hf_tokenizer.cpp:146] Normalizer set up
I tokenizers:hf_tokenizer.cpp:160] Setting up pretokenizer...
I tokenizers:hf_tokenizer.cpp:164] Pretokenizer set up
I tokenizers:hf_tokenizer.cpp:180] Loading BPE merges...
I tokenizers:hf_tokenizer.cpp:240] Loaded 513511 BPE merge rules
I tokenizers:hf_tokenizer.cpp:252] Built merge ranks map with 236249 entries
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1769466105.443906 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.443994 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444053 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444080 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444089 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444107 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466105.444161 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444169 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444189 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444194 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444206 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'user' -> 'user'
E0000 00:00:1769466105.444228 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444249 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444748 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444769 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'What is the capital of France?' -> 'What▁is▁the▁capital▁of▁France?'
E0000 00:00:1769466105.445614 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.445625 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.445639 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466105.445653 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.445667 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466105.445676 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'model' -> 'model'

E0000 00:00:1769466106.615110 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615139 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615147 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615168 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615173 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615185 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466106.615198 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615205 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615225 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615229 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615245 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'user' -> 'user'
E0000 00:00:1769466106.615256 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615281 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615287 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615300 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'What is the capital of France?' -> 'What▁is▁the▁capital▁of▁France?'
E0000 00:00:1769466106.615351 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615358 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615371 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466106.615381 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615391 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466106.615399 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'model' -> 'model'
<start_of_turn>user
What is the capital of France?<end_of_turn>
<start_of_turn>model
The capital of France is **Paris**.

<end_of_turn>

PyTorchObserver {"prompt_tokens":15,"generated_tokens":9,"model_load_start_ms":0,"model_load_end_ms":0,"inference_start_ms":1769466106615,"inferen
ce_end_ms":1769466107233,"prompt_eval_end_ms":1769466106746,"first_token_ms":1769466106746,"aggregate_sampling_time_ms":4,"SCALING_FACTOR_UNITS_PER_SECOND":1000}

pytorch-bot · 2026-01-26T22:32:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16885

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 690d2dd with merge base ecc7dd0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-26T22:33:19Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

seyeong-han · 2026-02-02T21:56:42Z

I need to update this runner to utilize jinja format of chat-template similar to this PR

add text-only runner for gemma-3-1B-it model

690d2dd

seyeong-han requested review from kirklandsign, larryliu0820 and lucylq as code owners January 26, 2026 22:32

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 26, 2026

mergennachin self-requested a review January 30, 2026 20:52

seyeong-han marked this pull request as draft February 2, 2026 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gemma3] Add text-only runner for gemma-3-1B-it model#16885

[gemma3] Add text-only runner for gemma-3-1B-it model#16885
seyeong-han wants to merge 1 commit intopytorch:mainfrom
seyeong-han:gemma3-text-runner

seyeong-han commented Jan 26, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

seyeong-han commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seyeong-han commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Dependencies

Changes

New Files

Modified Files

Test Plan

Result

Uh oh!

pytorch-bot bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16885

✅ No Failures

Uh oh!

github-actions bot commented Jan 26, 2026

This PR needs a release notes: label

Uh oh!

seyeong-han commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

seyeong-han commented Jan 26, 2026 •

edited

Loading

pytorch-bot bot commented Jan 26, 2026 •

edited

Loading

This PR needs a `release notes:` label