-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
What?
(AFAIK) Some huggingface models are distributed in torch.bfloat16 format without dtype manifested.
While trying examples of Llama and Qwen, I faced this error.
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/dayo/git/TICO/tico/quantization/wrapq/wrappers/nn/quant_linear.py", line 57, in forward
w = self.obs_weight.fake_quant(w)
File "/home/dayo/git/TICO/tico/quantization/wrapq/observers/affine_base.py", line 152, in fake_quant
return torch.fake_quantize_per_channel_affine(
RuntimeError: !needs_dynamic_casting<func_t>::check(iter) INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cpu/Loops.h":311, please report a bug to PyTorch. After changing dtypes for model/inputs into torch.float32, it passed.
/cc @mhs4670go
From this example #485 (comment)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels