Support Qwen3-32B-AWQ?

Not support autoawq? Maybe is the problem from qwen?

```
 ➤  heretic Qwen/Qwen3-32B-AWQ
█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀  v1.1.0
█▀█░█▀▀░█▀▄░█▀▀░░█░░█░█░░
▀░▀░▀▀▀░▀░▀░▀▀▀░░▀░░▀░▀▀▀  https://github.com/p-e-w/heretic

GPU type: Tesla V100-SXM2-16GB

Loading model Qwen/Qwen3-32B-AWQ...
model.safetensors.index.json: 132kB [00:00, 117MB/s]
model-00004-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.41G/4.41G [20:20<00:00, 3.61MB/s]
model-00003-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.95G/4.95G [21:04<00:00, 3.91MB/s]
model-00002-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.00G/5.00G [21:15<00:00, 3.92MB/s]
model-00001-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.97G/4.97G [21:16<00:00, 3.89MB/s]
Fetching 4 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [21:18<00:00, 319.63s/it]
/home/debian/.pyvenv/python3.14/lib/python3.14/site-packages/awq/__init__.py:21: DeprecationWarning: ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████             | 4.63G/4.97G [21:14<00:07, 46.4MB/s]
I have left this message as the final dev message to help you transition.████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.97G/4.97G [21:16<00:00, 103MB/s]

Important Notice:
- AutoAWQ is officially deprecated and will no longer be maintained.
- The last tested configuration used Torch 2.6.0 and Transformers 4.51.3.
- If future versions of Transformers break AutoAWQ compatibility, please report the issue to the Transformers project.

Alternative:
- AutoAWQ has been adopted by the vLLM Project: https://github.com/vllm-project/llm-compressor

For further inquiries, feel free to reach out:
- X: https://x.com/casper_hansen_
- LinkedIn: https://www.linkedin.com/in/casper-hansen-804005170/

  warnings.warn(_FINAL_DEV_MESSAGE, category=DeprecationWarning, stacklevel=1)
Failed (cannot import name 'PytorchGELUTanh' from 'transformers.activations' (/home/debian/.pyvenv/python3.14/lib/python3.14/site-packages/transformers/activations.py))
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.69s/it]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 239/239 [00:00<00:00, 2.23MB/s]
Failed (at 70:4:

    offsets_k = pid_z * BLOCK_SIZE_K + tl.arange(0, BLOCK_SIZE_K)
    offsets_a = K * offsets_am[:, None] + offsets_k[None, :]
    offsets_b = (N // 8) * offsets_k[:, None] + offsets_bn[None, :]

    a_ptrs = a_ptr + offsets_a
    b_ptrs = b_ptr + offsets_b

    # NOTE: Use this in TRITON_INTERPRET=1 mode instead of tl.cdiv
    # block_offset = BLOCK_SIZE_K * SPLIT_K
    # for k in range(0, (K + block_offset - 1) // (block_offset)):
    for k in range(0, tl.cdiv(K, BLOCK_SIZE_K * SPLIT_K)):
    ^
AttributeError("module 'ast' has no attribute 'Num'"))
* Trying dtype bfloat16... /home/debian/.pyvenv/python3.14/lib/python3.14/site-packages/accelerate/utils/modeling.py:1566: UserWarning: Current model requires 8106147968 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.
  warnings.warn(
Failed (You are attempting to load an AWQ model with a device_map that contains a CPU or disk device. This is not supported. Please remove the CPU or disk device from the device_map.)
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:44<00:00, 26.15s/it]
Failed (at 70:4:

    offsets_k = pid_z * BLOCK_SIZE_K + tl.arange(0, BLOCK_SIZE_K)
    offsets_a = K * offsets_am[:, None] + offsets_k[None, :]
    offsets_b = (N // 8) * offsets_k[:, None] + offsets_bn[None, :]

    a_ptrs = a_ptr + offsets_a
    b_ptrs = b_ptr + offsets_b

    # NOTE: Use this in TRITON_INTERPRET=1 mode instead of tl.cdiv
    # block_offset = BLOCK_SIZE_K * SPLIT_K
    # for k in range(0, (K + block_offset - 1) // (block_offset)):
    for k in range(0, tl.cdiv(K, BLOCK_SIZE_K * SPLIT_K)):
    ^
AttributeError("module 'ast' has no attribute 'Num'"))
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/debian/.pyvenv/python3.14/bin/heretic:10 in <module>                                       │
│                                                                                                  │
│    7 │   │   sys.argv[0] = sys.argv[0][:-11]                                                     │
│    8 │   elif sys.argv[0].endswith(".exe"):                                                      │
│    9 │   │   sys.argv[0] = sys.argv[0][:-4]                                                      │
│ ❱ 10 │   sys.exit(main())                                                                        │
│   11                                                                                             │
│                                                                                                  │
│ /home/debian/.pyvenv/python3.14/lib/python3.14/site-packages/heretic/main.py:576 in main         │
│                                                                                                  │
│   573 │   install()                                                                              │
│   574 │                                                                                          │
│   575 │   try:                                                                                   │
│ ❱ 576 │   │   run()                                                                              │
│   577 │   except BaseException as error:                                                         │
│   578 │   │   # Transformers appears to handle KeyboardInterrupt (or BaseException)              │
│   579 │   │   # internally in some places, which can re-raise a different error in the handler   │
│                                                                                                  │
│ /home/debian/.pyvenv/python3.14/lib/python3.14/site-packages/heretic/main.py:133 in run          │
│                                                                                                  │
│   130 │   # Silence the warning about multivariate TPE being experimental.                       │
│   131 │   warnings.filterwarnings("ignore", category=ExperimentalWarning)                        │
│   132 │                                                                                          │
│ ❱ 133 │   model = Model(settings)                                                                │
│   134 │                                                                                          │
│   135 │   print()                                                                                │
│   136 │   print(f"Loading good prompts from [bold]{settings.good_prompts.dataset}[/]...")        │
│                                                                                                  │
│ /home/debian/.pyvenv/python3.14/lib/python3.14/site-packages/heretic/model.py:92 in __init__     │
│                                                                                                  │
│    89 │   │   │   break                                                                          │
│    90 │   │                                                                                      │
│    91 │   │   if self.model is None:                                                             │
│ ❱  92 │   │   │   raise Exception("Failed to load model with all configured dtypes.")            │
│    93 │   │                                                                                      │
│    94 │   │   print(f"* Transformer model with [bold]{len(self.get_layers())}[/] layers")        │
│    95 │   │   print("* Abliterable components:")                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: Failed to load model with all configured dtypes.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Qwen3-32B-AWQ? #102

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support Qwen3-32B-AWQ? #102

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions