Optimize AES-GCM-SIV AES CTR step on x86 #7

starius · 2025-10-20T05:16:16Z

Summary

I used the generator of AES CTR assembly from Go standard library and add flag -le to it to generate assembly for SIV.

SIV uses little-endian and uses only the low int32 part of the 16 byte word as a counter. When the low int32 overflows, it just wraps, according to RFC 8452. This is actually very convenient for filling the registers with counters. In the SIV move I just load the value for the first block (via pointer blockPtr) to a XMM register and then produce counter values +1, +2, ... up to +7 using instruction PADDD with constants like [1, 0, 0, 0], [2, 0, 0, 0], ...

Speedup

Performance is substantially improved on amd64:

							B/s        │      B/s       vs base
Seal1K_AES_GCM_SIV_128-2  588.3Mi ± 0%   1418.1Mi ± 1%  +141.05% (p=0.000 n=20)
Open1K_AES_GCM_SIV_128-2  589.4Mi ± 0%   1405.4Mi ± 0%  +138.43% (p=0.000 n=20)
Seal8K_AES_GCM_SIV_128-2  731.0Mi ± 0%   2679.0Mi ± 0%  +266.50% (p=0.000 n=20)
Open8K_AES_GCM_SIV_128-2  730.8Mi ± 0%   2669.4Mi ± 0%  +265.24% (p=0.000 n=20)
Seal1K_AES_GCM_SIV_256-2  473.1Mi ± 0%   1126.2Mi ± 0%  +138.04% (p=0.000 n=20)
Open1K_AES_GCM_SIV_256-2  474.2Mi ± 0%   1115.0Mi ± 0%  +135.12% (p=0.000 n=20)
Seal8K_AES_GCM_SIV_256-2  595.4Mi ± 1%   2134.7Mi ± 0%  +258.56% (p=0.000 n=20)
Open8K_AES_GCM_SIV_256-2  593.6Mi ± 1%   2133.6Mi ± 0%  +259.42% (p=0.000 n=20)
geomean                   590.0Mi         1.693Gi       +193.78%

Notes

My plan is to contribute this improvement further to the standard library of Go, that is why I used the generator of AES CTR mode and added a flag to it instead of writing something completely new. It would be great if this change can be incorporated into the pending CL 538396.

Do not create new aes.NewCipher on every call.

Go commit: a5f55a441ef497d8e2a12610f4ec2bd32fdc04b2 The generator in master is broken, so the fixed version was taken from https://go.dev/cl/712920

With flag "-le" the generators produce ASM code for SIV. Without that flag it produces ASM for normal AES CTR in Go std lib.

Performance is substantially improved on amd64: B/s │ B/s vs base Seal1K_AES_GCM_SIV_128-2 588.3Mi ± 0% 1418.1Mi ± 1% +141.05% (p=0.000 n=20) Open1K_AES_GCM_SIV_128-2 589.4Mi ± 0% 1405.4Mi ± 0% +138.43% (p=0.000 n=20) Seal8K_AES_GCM_SIV_128-2 731.0Mi ± 0% 2679.0Mi ± 0% +266.50% (p=0.000 n=20) Open8K_AES_GCM_SIV_128-2 730.8Mi ± 0% 2669.4Mi ± 0% +265.24% (p=0.000 n=20) Seal1K_AES_GCM_SIV_256-2 473.1Mi ± 0% 1126.2Mi ± 0% +138.04% (p=0.000 n=20) Open1K_AES_GCM_SIV_256-2 474.2Mi ± 0% 1115.0Mi ± 0% +135.12% (p=0.000 n=20) Seal8K_AES_GCM_SIV_256-2 595.4Mi ± 1% 2134.7Mi ± 0% +258.56% (p=0.000 n=20) Open8K_AES_GCM_SIV_256-2 593.6Mi ± 1% 2133.6Mi ± 0% +259.42% (p=0.000 n=20) geomean 590.0Mi 1.693Gi +193.78%

starius added 4 commits October 20, 2025 01:53

reuse main AES keys derivation

d4e9c76

Do not create new aes.NewCipher on every call.

add amd64 AES CTR generator from Go std lib

29e6291

Go commit: a5f55a441ef497d8e2a12610f4ec2bd32fdc04b2 The generator in master is broken, so the fixed version was taken from https://go.dev/cl/712920

amd64 CTR AES generator: add little-endian mode

c940299

With flag "-le" the generators produce ASM code for SIV. Without that flag it produces ASM for normal AES CTR in Go std lib.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize AES-GCM-SIV AES CTR step on x86 #7

Optimize AES-GCM-SIV AES CTR step on x86 #7

Uh oh!

starius commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimize AES-GCM-SIV AES CTR step on x86 #7

Are you sure you want to change the base?

Optimize AES-GCM-SIV AES CTR step on x86 #7

Uh oh!

Conversation

starius commented Oct 20, 2025

Summary

Speedup

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant