Optimize AES-GCM-SIV AES CTR step on x86 #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
I used the generator of AES CTR assembly from Go standard library and add flag
-leto it to generate assembly for SIV.SIV uses little-endian and uses only the low int32 part of the 16 byte word as a counter. When the low int32 overflows, it just wraps, according to RFC 8452. This is actually very convenient for filling the registers with counters. In the SIV move I just load the value for the first block (via pointer
blockPtr) to a XMM register and then produce counter values +1, +2, ... up to +7 using instructionPADDDwith constants like [1, 0, 0, 0], [2, 0, 0, 0], ...Speedup
Performance is substantially improved on amd64:
Notes
My plan is to contribute this improvement further to the standard library of Go, that is why I used the generator of AES CTR mode and added a flag to it instead of writing something completely new. It would be great if this change can be incorporated into the pending CL 538396.