Skip to content

Conversation

@starius
Copy link

@starius starius commented Oct 20, 2025

Summary

I used the generator of AES CTR assembly from Go standard library and add flag -le to it to generate assembly for SIV.

SIV uses little-endian and uses only the low int32 part of the 16 byte word as a counter. When the low int32 overflows, it just wraps, according to RFC 8452. This is actually very convenient for filling the registers with counters. In the SIV move I just load the value for the first block (via pointer blockPtr) to a XMM register and then produce counter values +1, +2, ... up to +7 using instruction PADDD with constants like [1, 0, 0, 0], [2, 0, 0, 0], ...

Speedup

Performance is substantially improved on amd64:

							B/s        │      B/s       vs base
Seal1K_AES_GCM_SIV_128-2  588.3Mi ± 0%   1418.1Mi ± 1%  +141.05% (p=0.000 n=20)
Open1K_AES_GCM_SIV_128-2  589.4Mi ± 0%   1405.4Mi ± 0%  +138.43% (p=0.000 n=20)
Seal8K_AES_GCM_SIV_128-2  731.0Mi ± 0%   2679.0Mi ± 0%  +266.50% (p=0.000 n=20)
Open8K_AES_GCM_SIV_128-2  730.8Mi ± 0%   2669.4Mi ± 0%  +265.24% (p=0.000 n=20)
Seal1K_AES_GCM_SIV_256-2  473.1Mi ± 0%   1126.2Mi ± 0%  +138.04% (p=0.000 n=20)
Open1K_AES_GCM_SIV_256-2  474.2Mi ± 0%   1115.0Mi ± 0%  +135.12% (p=0.000 n=20)
Seal8K_AES_GCM_SIV_256-2  595.4Mi ± 1%   2134.7Mi ± 0%  +258.56% (p=0.000 n=20)
Open8K_AES_GCM_SIV_256-2  593.6Mi ± 1%   2133.6Mi ± 0%  +259.42% (p=0.000 n=20)
geomean                   590.0Mi         1.693Gi       +193.78%

Notes

My plan is to contribute this improvement further to the standard library of Go, that is why I used the generator of AES CTR mode and added a flag to it instead of writing something completely new. It would be great if this change can be incorporated into the pending CL 538396.

Do not create new aes.NewCipher on every call.
Go commit: a5f55a441ef497d8e2a12610f4ec2bd32fdc04b2
The generator in master is broken, so the fixed version was taken
from https://go.dev/cl/712920
With flag "-le" the generators produce ASM code for SIV. Without that flag
it produces ASM for normal AES CTR in Go std lib.
Performance is substantially improved on amd64:

                            B/s        │      B/s       vs base
Seal1K_AES_GCM_SIV_128-2  588.3Mi ± 0%   1418.1Mi ± 1%  +141.05% (p=0.000 n=20)
Open1K_AES_GCM_SIV_128-2  589.4Mi ± 0%   1405.4Mi ± 0%  +138.43% (p=0.000 n=20)
Seal8K_AES_GCM_SIV_128-2  731.0Mi ± 0%   2679.0Mi ± 0%  +266.50% (p=0.000 n=20)
Open8K_AES_GCM_SIV_128-2  730.8Mi ± 0%   2669.4Mi ± 0%  +265.24% (p=0.000 n=20)
Seal1K_AES_GCM_SIV_256-2  473.1Mi ± 0%   1126.2Mi ± 0%  +138.04% (p=0.000 n=20)
Open1K_AES_GCM_SIV_256-2  474.2Mi ± 0%   1115.0Mi ± 0%  +135.12% (p=0.000 n=20)
Seal8K_AES_GCM_SIV_256-2  595.4Mi ± 1%   2134.7Mi ± 0%  +258.56% (p=0.000 n=20)
Open8K_AES_GCM_SIV_256-2  593.6Mi ± 1%   2133.6Mi ± 0%  +259.42% (p=0.000 n=20)
geomean                   590.0Mi         1.693Gi       +193.78%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant