CGO 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Mon 2 Feb 2026 14:50 - 15:10 at Bronte - DSLs Chair(s): Olivia Hsu

Tensor accelerators now represent a growing share of compute resources in modern CPUs and GPUs. However, they are hard to program, leading developers to use vendor-provided kernel libraries that support tensor accelerators. As a result, the usage of tensor accelerators is limited to the provided interface, mainly designed for traditional ML and scientific computing workloads.

In this paper, we show that tensor accelerators can improve the performance of applications beyond simple variants of MatMul. For example, many image processing pipelines are linear transformations over matrices in disguise and can therefore utilize such specialized hardware. This is nonetheless hindered by the difficulties in programming tensor accelerators. We tackle this problem with compiler-based techniques. We use the Halide user-schedulable language and express operations as Halide algorithms succinctly. To this end, we implement a flexible tensor instruction selector based on equality saturation. The tensor instruction selector supports both CPU- and GPU-attached tensor accelerators and works with existing scheduling operations (e.g., producer-consumer fusion). Together, this enables developers to write diverse accelerator-leveraging applications in a few dozen lines.

Using our system, we demonstrate the potential of tensor accelerators beyond their traditional domains. We implement several image processing pipelines (e.g., filtering, resampling, and denoising) in our system and evaluate them against non-accelerator-leveraging baselines. We show that these pipelines can achieve significant speedups. For example, a downsampling routine is sped up by $6.1\times$ by utilizing Tensor Cores on an Nvidia RTX 4070 GPU.

Mon 2 Feb

Displayed time zone: Hobart change

14:10 - 15:30
DSLsMain Conference at Bronte
Chair(s): Olivia Hsu Stanford University
14:10
20m
Talk
FORTE: Online DataFrame Query Optimizer
Main Conference
Yoonho Choi POSTECH, Kyoungtae Lee Seoul National University, Minji Kim Ewha Womans University, Hyungsoo Jung Seoul National University, Hyojin Sung Seoul National University
Pre-print
14:30
20m
Talk
LEGO: A Layout Expression Language for Code Generation of Hierarchical Mapping
Main Conference
Amir Mohammad Tavakkoli University of Utah, Cosmin E. Oancea University of Copenhagen, Denmark, Mary Hall University of Utah
Pre-print Media Attached
14:50
20m
Talk
Pushing Tensor Accelerators beyond MatMul in a User-Schedulable Language
Main Conference
Yihong Zhang University of Washington, Derek Gerstmann Adobe, Andrew Adams Adobe Research, Maaz Bin Safeer Ahmad University of Washington, Seattle
Pre-print Media Attached
15:10
20m
Talk
Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References
Main Conference
Hongzheng Chen Cornell University, Bin Fan Nvidia, Alexander Collins NVIDIA, Bastian Hagedorn NVIDIA, Evghenii Gaburov NVIDIA, Masahiro Masuda NVIDIA, Matthew Brookhart NVIDIA, Chris Sullivan NVIDIA, Jason Knight NVIDIA, Zhiru Zhang Cornell University, USA, Vinod Grover NVIDIA
Pre-print