Skip to content

Reason for the subpar performance of the lookup validation algorithm in Java [discussion] #10

@lemire

Description

@lemire

This Java code can be 'fast' but not nearly at the speed of a C implementation.

I believe that the reason is this code section:

    ByteVector byte1High = prev1.lanewise(LSHR, 4).selectFrom(lut.byte1HighLookup());
    ByteVector byte1Low = prev1.and((byte) 0x0f).selectFrom(lut.byte1LowLookup());
    ByteVector byte2High = input.lanewise(LSHR, 4).selectFrom(lut.byte2HighLookup());

At a glance, selectFrom looks like a standard vectorized table lookup like vpshufb or vtbl. But I think it is not, unfortunately. It appears to generate a long flow of instructions.

So the people working on the Java Vector API appear to have assumed that instructions like vpshufb or vtbl were of secondary importance and did not need to be exposed to the programmer. I believe that it is a mistake.

The C# approach is quite different. In .NET 8, you have Ssse3.Shuffle or AdvSimd.Arm64.VectorTableLookup for example.

See https://mail.openjdk.org/pipermail/panama-dev/2024-June/020476.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions