Commit 079799c
authored
Run sampler as a method if available (#16888)
With this PR: huggingface/optimum-executorch#207
we are adding a new method "sampler" to ASR models, alongside with
"encoder" and "text_decoder". The flow becomes: if temperature is 0 and
sampler method is available, run that method. Otherwise still go with
the old path. This change should largely improve the performance on CUDA
since we don't have to copy logits from device to CPU for sampling
purpose.
Benchmark result on RTX 5080:
```
======================================================================
BENCHMARK SUMMARY
======================================================================
Total runs: 30
Generated tokens per run: 104
THROUGHPUT (tokens/sec):
Min: 793.89 t/s
Max: 845.53 t/s
Mean: 820.35 t/s
Stdev: 11.86 t/s
MODEL LOAD TIME (ms):
Min: 620 ms
Max: 2170 ms
Mean: 700 ms
Stdev: 279 ms
ENCODE TIME (ms, inference_start to prompt_eval_end):
Min: 36 ms
Max: 38 ms
Mean: 37 ms
Stdev: 1 ms
DECODE TIME (ms, prompt_eval_end to inference_end):
Min: 123 ms
Max: 131 ms
Mean: 127 ms
Stdev: 2 ms
======================================================================
```1 parent 429f014 commit 079799c
File tree
4 files changed
+94
-21
lines changed- .ci/docker/ci_commit_pins
- backends/cuda/runtime
- extension/asr/runner
4 files changed
+94
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| 23 | + | |
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
| |||
60 | 62 | | |
61 | 63 | | |
62 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
63 | 100 | | |
64 | 101 | | |
65 | 102 | | |
| |||
83 | 120 | | |
84 | 121 | | |
85 | 122 | | |
86 | | - | |
| 123 | + | |
87 | 124 | | |
88 | 125 | | |
89 | 126 | | |
| |||
316 | 353 | | |
317 | 354 | | |
318 | 355 | | |
319 | | - | |
| 356 | + | |
320 | 357 | | |
321 | 358 | | |
322 | 359 | | |
| |||
325 | 362 | | |
326 | 363 | | |
327 | 364 | | |
328 | | - | |
| 365 | + | |
329 | 366 | | |
330 | 367 | | |
331 | 368 | | |
| |||
352 | 389 | | |
353 | 390 | | |
354 | 391 | | |
355 | | - | |
| 392 | + | |
356 | 393 | | |
357 | 394 | | |
358 | 395 | | |
| |||
382 | 419 | | |
383 | 420 | | |
384 | 421 | | |
385 | | - | |
| 422 | + | |
386 | 423 | | |
387 | 424 | | |
388 | 425 | | |
389 | | - | |
| 426 | + | |
390 | 427 | | |
391 | 428 | | |
392 | 429 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| |||
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
50 | | - | |
| 51 | + | |
| 52 | + | |
51 | 53 | | |
52 | 54 | | |
53 | 55 | | |
| |||
96 | 98 | | |
97 | 99 | | |
98 | 100 | | |
| 101 | + | |
| 102 | + | |
99 | 103 | | |
100 | 104 | | |
101 | 105 | | |
| |||
109 | 113 | | |
110 | 114 | | |
111 | 115 | | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
112 | 121 | | |
| 122 | + | |
| 123 | + | |
113 | 124 | | |
114 | | - | |
115 | | - | |
116 | | - | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
117 | 129 | | |
118 | | - | |
| 130 | + | |
119 | 131 | | |
120 | 132 | | |
121 | 133 | | |
| |||
264 | 276 | | |
265 | 277 | | |
266 | 278 | | |
| 279 | + | |
267 | 280 | | |
268 | 281 | | |
269 | 282 | | |
| |||
276 | 289 | | |
277 | 290 | | |
278 | 291 | | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
288 | 322 | | |
289 | 323 | | |
290 | 324 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
| 111 | + | |
| 112 | + | |
111 | 113 | | |
112 | 114 | | |
113 | 115 | | |
| |||
0 commit comments