CGO 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Mon 2 Feb 2026 09:50 - 10:10 at Bronte - Compiling for ML 1 Chair(s): Albert Cohen

AI Accelerators employ dataflow architectures to achieve impressive peak compute performance (TOPS) and processing efficiencies (TOPS/W). Typically, dataflow architectures use wide data-paths to connect off-chip memory to dense compute arrays (via hierarchy of on-chip memories/vector register files) for efficient data movement with reuse, as well as compute. Such architectures often possess an independent lightweight control-path for loading programs and initializing registers, and lack traditional architectural features like instruction cache and execution stacks. This poses a unique challenge to compiler requiring program generation of complex compute kernels to fit within an instruction buffer and allocating a limited set of scalar registers without support to spill to memory.

This paper contributes a significant step towards spill-free compilation and proposes a Live range reduction optimization based on Affine expression propagation analysis. Our solution performs a global, compiler-directed analysis to model variable values as affine expressions of in-scope variables, enabling safe symbolic re-materialization of values at their use-sites leveraging near-by variables without introducing new operations. This shortens variable lifetimes, while significantly reducing register pressure without incurring program binary and execution overhead. The static nature and regular memory access patterns of AI applications make them well-suited for the proposed optimization. We demonstrate the effectiveness of the technique in the context of IBM Spyre accelerator and its compiler. Our results over a range of AI workloads spanning transformer and CNN models demonstrate spill-free code generation, with most of the workloads requiring less than 50% of the available registers.

Mon 2 Feb

Displayed time zone: Hobart change

09:50 - 11:10
Compiling for ML 1Main Conference at Bronte
Chair(s): Albert Cohen Google DeepMind
09:50
20m
Talk
Enabling Spill-Free Compilation via Affine-Based Live Range Reduction Optimization
Main Conference
Pre-print
10:10
20m
Talk
GRANII: Selection and Ordering of Primitives in GRAph Neural Networks using Input Inspection
Main Conference
Damitha Lenadora University of Illinois at Urbana-Champaign, Vimarsh Sathia University of Illinois Urbana Champaign, Gerasimos Gerogiannis University of Illinois at Urbana-Champaign, Serif Yesil NVIDIA, Josep Torrellas University of Illinois at Urbana-Champaign, Charith Mendis University of Illinois at Urbana-Champaign
Pre-print
10:30
20m
Talk
Fast Autoscheduling for Sparse ML Frameworks
Main Conference
Bobby Yan Stanford University, Alexander J Root Stanford University, Trevor Gale Stanford University, David Broman KTH Royal Institute of Technology, Fredrik Kjolstad Stanford University
Pre-print
10:50
20m
Talk
Eliminating Redundancy: Ultra-compact Code Generation for Programmable Dataflow Accelerators
Main Conference
Prasanth Chatarasi IBM Research, Alex Gatea IBM, Bardia Mahjour IBM, Jintao Zhang Unaffiliated, Alberto Mannari IBM, Chris Bowler IBM, Shubham Jain IBM Research, Masoud Ataei Jaliseh IBM, Nicole Khoun IBM, Kamlesh Kumar Unaffiliated, Viji Srinivasan IBM Research, Swagath Venkataramani IBM Research
Pre-print