Garuda: CVXIF coprocessor optimizing batch-1 attention microkernels with 7.5-9× lower p99 latency. RISC-V INT8 MAC accelerator for transformer inference.
machine-learning neural-network inference simd low-latency systemverilog attention-mechanism risc-v int8 systemverilog-hdl systolic-arrays edge-ai hardware-accelerator int8-quantization cva6 custom-instructions ai-hardware cvxif
-
Updated
Jan 23, 2026 - SystemVerilog