This repository was archived by the owner on Sep 18, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 52
This repository was archived by the owner on Sep 18, 2025. It is now read-only.
Issue: Error when running calibration for the FP8 Quantization using INC notebook #145
Copy link
Copy link
Open
Description
I'm trying to follow the example here: https://github.com/HabanaAI/Gaudi-tutorials/blob/main/PyTorch/vLLM_Tutorials/FP8_Quantization_using_INC/FP8_Quantization_using_INC.ipynb
But I'm getting the error below when I try to run the calibration step:
./calibrate_model.sh -m $MODEL_NAME -d /root/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl -o g3 -b 128 -t 8 -l 1024
Processed prompts: 0%| | 0/65 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][rank0]: Traceback (most recent call last):
[rank0]: File "/root/vllm-hpu-extension/calibration/step-2-measure-scales.py", line 81, in <module>
[rank0]: generate_responses(llm, input_batch, args)
[rank0]: File "/root/vllm-hpu-extension/calibration/step-2-measure-scales.py", line 25, in generate_responses
[rank0]: responses = llm.generate(input_batch, sampling_params, use_tqdm=True)
[rank0]: File "/root/vllm-fork/vllm/utils.py", line 1158, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/root/vllm-fork/vllm/entrypoints/llm.py", line 469, in generate
[rank0]: outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]: File "/root/vllm-fork/vllm/entrypoints/llm.py", line 1397, in _run_engine
[rank0]: step_outputs = self.llm_engine.step()
[rank0]: File "/root/vllm-fork/vllm/engine/llm_engine.py", line 1330, in step
[rank0]: ) = self.scheduler[virtual_engine].schedule()
[rank0]: File "/root/vllm-fork/vllm/core/scheduler.py", line 1392, in schedule
[rank0]: scheduler_outputs: SchedulerOutputs = self._schedule()
[rank0]: File "/root/vllm-fork/vllm/core/scheduler.py", line 1351, in _schedule
[rank0]: return self._schedule_default()
[rank0]: File "/root/vllm-fork/vllm/core/scheduler.py", line 1174, in _schedule_default
[rank0]: prefills = self._schedule_prefills(budget,
[rank0]: File "/root/vllm-fork/vllm/core/scheduler.py", line 1087, in _schedule_prefills
[rank0]: or not budget.can_schedule(**can_schedule_kwargs)):
[rank0]: File "/root/vllm-fork/vllm/core/scheduler.py", line 182, in can_schedule
[rank0]: num_new_padded_tokens = padding_fn(new_batch_size, new_max_seq_len)
[rank0]: File "/root/vllm-fork/vllm/core/scheduler.py", line 141, in _hpu_padding_fn
[rank0]: return padded_bs * padded_seq
[rank0]: TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'Metadata
Metadata
Assignees
Labels
No labels