https://arxiv.org/pdf/2501.10301
AraXL: A Physically Scalable, Ultra-Wide RISC-V
Vector Processor Design for Fast and Efficient
Computation on Long Vectors
Navaneeth Kunhi Purayil*1, Matteo Perotti*1, Tim Fischer1 and Luca Benini1,2
1ETH Z ̈urich, Z ̈urich, Switzerland, 2Universit`a di Bologna, Bologna, Italy
{nkunhi,mperotti,fischeti,
lbenini}@iis.ee.ethz.ch
Abstract—The ever-growing scale of data parallelism in today’s
HPC and ML applications presents a big challenge for computing
architectures’ energy efficiency and performance. Vector processors address the scale-up challenge by decoupling Vector Register
File (VRF) and datapath widths, allowing the VRF to host long
vectors and increase register-stored data reuse while reducing the
relative cost of instruction fetch and decode. However, even the
largest vector processor designs today struggle to scale to more
than 8 vector lanes with double-precision Floating Point Units
(FPUs) and 256 64-bit elements per vector register. This limitation
is induced by difficulties in the physical implementation, which
becomes wire-dominated and inefficient.
In this work, we present AraXL, a modular and scalable 64-bit
RISC-V V vector architecture targeting long-vector applications
for HPC and ML. AraXL addresses the physical scalability
challenges of state-of-the-art vector processors with a distributed
and hierarchical interconnect, supporting up to 64 parallel vector
lanes and reaching the maximum Vector Register File size of
64 Kibit/vreg permitted by the RISC-V V 1.0 ISA specification.
Implemented in a 22-nm technology node, our 64-lane AraXL
achieves a performance peak of 146 GFLOPs on computationintensive HPC/ML kernels (>99% FPU utilization) and energy
efficiency of 40.1 GFLOPs/W (1.15 GHz, TT, 0.8V), with only
3.8× the area of a 16-lane instance.
Slack Conversation