CGO 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Mon 2 Feb 2026 10:30 - 10:50 at Bronte - Compiling for ML 1 Chair(s): Albert Cohen

The rapid growth in the size of deep learning models strains the capabilities of dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but existing deep learning frameworks lack extensive support for sparse operations. Current approaches either require manual scheduling expertise or rely on exhaustive search taking hours to days, which are both incompatible with the interactive development essential to machine learning research. We present three algorithmic contributions that enable fast, automatic optimization of sparse tensor computations. First, we develop a heuristic-based loop ordering algorithm that avoids asymptotic performance cliffs while compiling in milliseconds rather than hours. Second, we introduce a tiling algorithm specialized for mixed sparse-dense computations that achieves performance comparable to hand-optimized kernels. Third, we present a format inference algorithm that automatically selects appropriate sparse tensor formats for intermediate and output tensors based on operation semantics. These algorithms are grounded in the computational properties of sparse tensor algebra, making them predictable and robust across diverse workloads. We implement these techniques in Scorch, a prototype sparse tensor compiler for PyTorch that demonstrates their practical effectiveness. With only minimal code changes, our approach achieves 1.05–5.80$\times$ speedups over PyTorch Sparse on end-to-end tasks including graph neural networks, sparse autoencoders, and sparse transformers, with compilation times fast enough for interactive ML development.

Mon 2 Feb

Displayed time zone: Hobart change

09:50 - 11:10
Compiling for ML 1Main Conference at Bronte
Chair(s): Albert Cohen Google DeepMind
09:50
20m
Talk
Enabling Spill-Free Compilation via Affine-Based Live Range Reduction Optimization
Main Conference
Pre-print
10:10
20m
Talk
GRANII: Selection and Ordering of Primitives in GRAph Neural Networks using Input Inspection
Main Conference
Damitha Lenadora University of Illinois at Urbana-Champaign, Vimarsh Sathia University of Illinois Urbana Champaign, Gerasimos Gerogiannis University of Illinois at Urbana-Champaign, Serif Yesil NVIDIA, Josep Torrellas University of Illinois at Urbana-Champaign, Charith Mendis University of Illinois at Urbana-Champaign
Pre-print
10:30
20m
Talk
Fast Autoscheduling for Sparse ML Frameworks
Main Conference
Bobby Yan Stanford University, Alexander J Root Stanford University, Trevor Gale Stanford University, David Broman KTH Royal Institute of Technology, Fredrik Kjolstad Stanford University
Pre-print
10:50
20m
Talk
Eliminating Redundancy: Ultra-compact Code Generation for Programmable Dataflow Accelerators
Main Conference
Prasanth Chatarasi IBM Research, Alex Gatea IBM, Bardia Mahjour IBM, Jintao Zhang Unaffiliated, Alberto Mannari IBM, Chris Bowler IBM, Shubham Jain IBM Research, Masoud Ataei Jaliseh IBM, Nicole Khoun IBM, Kamlesh Kumar Unaffiliated, Viji Srinivasan IBM Research, Swagath Venkataramani IBM Research
Pre-print