-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Name and Version
/media/SSD_2T/Projects/llama_stepfun.cpp/build/bin/llama-cli --version
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
version: 8039 (e384c6f)
built with GNU 13.3.0 for Linux x86_64
Compiled with
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)"
rm -rf build &&
cmake -S . -B build -DGGML_HIP=ON -DLLAMA_BUILD_SERVER=ON -DLLAMA_BUILD_EXAMPLES=ON -DGPU_TARGETS=gfx906 -DCMAKE_BUILD_TYPE=Release
&& cmake --build build --config Release -- -j 32
Run with
LLAMA_SET_ROWS=1 /media/SSD_2T/Projects/llama_stepfun.cpp/build/bin/llama-server
--model /media/SSD_2T/models/Step3.5/Step-3.5-Flash-IQ4_XS-00001-of-00004.gguf
--ctx-size 130000
--temp 1.0
--repeat-penalty 1.0
--min-p 0.01
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64
--batch-size 1928
--host 0.0.0.0 --port 1235
Operating systems
Linux
GGML backends
HIP
Hardware
MI50 32GB * 4, DDR4 512GB, Epyc 7532
Models
https://huggingface.co/ubergarm/Step-3.5-Flash-GGUF/tree/main/IQ4_XS
This mainline compatible quant does not use imatrix.
Problem description & steps to reproduce
Instant crash
First Bad Commit
No response