CGO 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Wed 4 Feb 2026 10:10 - 10:30 at Bronte - Tensor Optimization Chair(s): Bastian Hagedorn

Sparsity is inherent in many applications such as machine learning and graph analytics. However, achieving high efficiency in sparse computations requires specialized hardware accelerators like FPGAs, as traditional accelerators typically cater to dense data. While high level synthesis enables the automatic generation of FPGA-based accelerators, generic solutions produced via C-based synthesis flows often demand extensive development time, leading designers to prioritize broad applicability over fine-grained structural specialization. Consequently, these accelerators fail to fully exploit FPGA's reconfigurablility, leaving substantial performance and efficiency gains untapped.

This paper pushes the boundary by automatically generating specialized accelerators that match a given fixed sparse structure (e.g. in static graph analytics and pruned neural networks). It achieves this by leveraging functional abstractions within high level synthesis, an approach that has already proven effective in automating the generation of specialized dense tensor accelerator. Tensor shapes are encoded directly in the type system and specialized primitives for irregular data are introduced. Together, these innovations enable a concise specification of sparse accelerators and drive advanced optimizations—including dynamic partitioning and vector sharding—to produce hardware precisely tailored to the sparsity pattern of the underlying tensors.

Compared to state-of-the-art generic accelerators (HiSparse, HiSpMV and GraphLily), the approach achieves up to a 2.8x improvement in bandwidth efficiency for sparse matrix computations and a 1.8x speedup on graph algorithms. Against the hls4ml neural network acceleration framework, it achieves up to a 1.8x improvement in throughput with a 4x reduction in resource usage, enabling scaling to larger networks. These results establish this approach as a flexible, powerful, and rapid solution for designing high-performance specialized sparse accelerators.

Wed 4 Feb

Displayed time zone: Hobart change

09:50 - 11:10
Tensor OptimizationMain Conference at Bronte
Chair(s): Bastian Hagedorn NVIDIA
09:50
20m
Talk
Multidirectional Propagation of Sparsity Information across Tensor Slices
Main Conference
Kaio Henrique Andrade Ananias Universidade Federal de Minas Gerais, Danila Seliayeu University of Alberta, Jose Nelson Amaral University of Alberta, Fernando Magno Quintão Pereira Federal University of Minas Gerais
Pre-print Media Attached
10:10
20m
Talk
Synthesizing Specialized Sparse Tensor Accelerators for FPGAs via High-Level Functional Abstractions
Main Conference
Hamza Javed McGill University, Canada, Christophe Dubach McGill University
Pre-print
10:30
20m
Talk
Progressive Low-Precision Approximation of Tensor Operators on GPUs: Enabling Greater Trade-Offs between Performance and Accuracy
Main Conference
Fan Luo Institute of Computing Technology at Chinese Academy of Sciences, Guangli Li Institute of Computing Technology, Chinese Academy of Sciences, Zhaoyang Hao Institute of Computing Technology at Chinese Academy of Sciences, Xueying Wang Beijing University of Posts and Telecommunications, Xiaobing Feng ICT CAS, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences, Jingling Xue UNSW Sydney
Pre-print
10:50
20m
Talk
Tensor Program Superoptimization through Cost-Guided Symbolic Program Synthesis
Main Conference
Alexander Brauckmann Google, Aarsh Chaube University of Edinburgh, José Wesley De Souza Magalhães University of Edinburgh, Elizabeth Polgreen University of Edinburgh, Michael F. P. O'Boyle University of Edinburgh
Pre-print Media Attached