timing updates, updating the script for llama functionality by doquangg · Pull Request #39 · feifeibear/LLMSpeculativeSampling

doquangg · 2025-04-01T17:33:11Z

I tried to run main.py, targeting my local Llama2 models from HuggingFace. However, I wasn't able to do so due to incompatibilities with the previous implementation, specifically:


line 62, in _forward_with_kvcache
self._past_key_values = self._past_key_values + (outputs.past_key_values,)
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for +: 'DynamicCache' and 'tuple'

I've implemented DynamicCache to get this to work with Llama 2 while maintaining the bloom functionality. Additionally, I've added corrected timings to the main.py script, so the end user can have quantitative measurements about the time decrease created by speculative decoding.

Please let me know if you have any questions.

timing updates, updating the script for bloom functionality

7c8146a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timing updates, updating the script for llama functionality #39

timing updates, updating the script for llama functionality #39
doquangg wants to merge 1 commit intofeifeibear:mainfrom
doquangg:bloom_and_timing_updates

doquangg commented Apr 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

doquangg commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

doquangg commented Apr 1, 2025 •

edited

Loading