Skip to content

timing updates, updating the script for llama functionality #39

Open
doquangg wants to merge 1 commit intofeifeibear:mainfrom
doquangg:bloom_and_timing_updates
Open

timing updates, updating the script for llama functionality #39
doquangg wants to merge 1 commit intofeifeibear:mainfrom
doquangg:bloom_and_timing_updates

Conversation

@doquangg
Copy link

@doquangg doquangg commented Apr 1, 2025

I tried to run main.py, targeting my local Llama2 models from HuggingFace. However, I wasn't able to do so due to incompatibilities with the previous implementation, specifically:


line 62, in _forward_with_kvcache
self._past_key_values = self._past_key_values + (outputs.past_key_values,)
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for +: 'DynamicCache' and 'tuple'

I've implemented DynamicCache to get this to work with Llama 2 while maintaining the bloom functionality. Additionally, I've added corrected timings to the main.py script, so the end user can have quantitative measurements about the time decrease created by speculative decoding.

Please let me know if you have any questions.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments