CGO 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Wed 4 Feb 2026 10:50 - 11:10 at Bronte - Tensor Optimization Chair(s): Bastian Hagedorn

Modern tensor compiler frameworks like JAX and PyTorch accelerate numerical programs by compiling or mapping Domain-Specific Language (DSL) code to efficient executables. However, they rely on a fixed set of transformation rules and heuristics, which means they can miss profitable optimization opportunities. This leaves significant optimization potential unused for programs that fall outside these fixed patterns. This paper presents STENSO, a tensor DSL program superoptimizer that discovers such missing rewrites. STENSO's core is a symbolic program synthesis based search algorithm that systematically explores the space of equivalent programs. By combining symbolic execution and sketch-based program synthesis, it generates equivalent candidate implementations. To make the search computationally tractable, STENSO further integrates a cost model with a branch-and-bound algorithm scheme. This effectively prunes the search space, arriving at optimal solutions in a reasonable time. We evaluate STENSO on over 30 benchmarks. The discovered programs achieve geometric mean speedups of 3.8x over NumPy and 1.6x over state-of-the-art compilers like JAX and PyTorch-Inductor. These results underscore the limitations of heuristic-based compilation and demonstrate STENSO's effectiveness in finding such optimizations automatically.

Wed 4 Feb

Displayed time zone: Hobart change

09:50 - 11:10
Tensor OptimizationMain Conference at Bronte
Chair(s): Bastian Hagedorn NVIDIA
09:50
20m
Talk
Multidirectional Propagation of Sparsity Information across Tensor Slices
Main Conference
Kaio Henrique Andrade Ananias Universidade Federal de Minas Gerais, Danila Seliayeu University of Alberta, Jose Nelson Amaral University of Alberta, Fernando Magno Quintão Pereira Federal University of Minas Gerais
Pre-print Media Attached
10:10
20m
Talk
Synthesizing Specialized Sparse Tensor Accelerators for FPGAs via High-Level Functional Abstractions
Main Conference
Hamza Javed McGill University, Canada, Christophe Dubach McGill University
Pre-print
10:30
20m
Talk
Progressive Low-Precision Approximation of Tensor Operators on GPUs: Enabling Greater Trade-Offs between Performance and Accuracy
Main Conference
Fan Luo Institute of Computing Technology at Chinese Academy of Sciences, Guangli Li Institute of Computing Technology, Chinese Academy of Sciences, Zhaoyang Hao Institute of Computing Technology at Chinese Academy of Sciences, Xueying Wang Beijing University of Posts and Telecommunications, Xiaobing Feng ICT CAS, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences, Jingling Xue University of New South Wales
Pre-print
10:50
20m
Talk
Tensor Program Superoptimization through Cost-Guided Symbolic Program Synthesis
Main Conference
Alexander Brauckmann University of Edinburgh, Aarsh Chaube University of Edinburgh, José Wesley De Souza Magalhães University of Edinburgh, Elizabeth Polgreen University of Edinburgh, Michael F. P. O'Boyle University of Edinburgh
Pre-print Media Attached