CGO 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Mon 2 Feb 2026 15:10 - 15:30 at Bronte - DSLs Chair(s): Olivia Hsu

Modern GPUs feature specialized hardware units that enable high-performance, asynchronous dataflow execution. However, the conventional SIMT programming model is fundamentally misaligned with this task-parallel hardware, creating a significant programmability gap. While hardware-level warp specialization is the key to unlocking peak performance, it forces developers to manually orchestrate complex, low-level communication and software pipelines–a process that is labor-intensive, error-prone, and unsustainable. To address this challenge, we present Tawa, an automated compiler that systematically generates high-performance, warp-specialized code from a high-level, tile-based program. Central to our approach is a novel IR abstraction, asynchronous references (aref), which
expresses warp-level communication without exposing low-level hardware details. Using this abstraction, Tawa automatically partitions programs into producer-consumer roles and manages the intricate dataflow pipeline, relieving developers of invasive kernel rewriting. Evaluation on NVIDIA H100 GPUs across representative LLM kernels shows that Tawa delivers high hardware utilization, achieving up to 1.1x speedup over highly optimized cuBLAS GEMM kernels. For attention workloads, Tawa attains 1.2x speedup over Triton and matches the performance of the hand-optimized CUTLASS C++ FlashAttention-3 kernel with far less programming effort.

Mon 2 Feb

Displayed time zone: Hobart change

14:10 - 15:30
DSLsMain Conference at Bronte
Chair(s): Olivia Hsu Stanford University
14:10
20m
Talk
FORTE: Online DataFrame Query Optimizer
Main Conference
Yoonho Choi POSTECH, Kyoungtae Lee Seoul National University, Minji Kim Ewha Womans University, Hyungsoo Jung Seoul National University, Hyojin Sung Seoul National University
Pre-print
14:30
20m
Talk
LEGO: A Layout Expression Language for Code Generation of Hierarchical Mapping
Main Conference
Amir Mohammad Tavakkoli University of Utah, Cosmin E. Oancea University of Copenhagen, Denmark, Mary Hall University of Utah
Pre-print Media Attached
14:50
20m
Talk
Pushing Tensor Accelerators beyond MatMul in a User-Schedulable Language
Main Conference
Yihong Zhang University of Washington, Derek Gerstmann Adobe, Andrew Adams Adobe Research, Maaz Bin Safeer Ahmad University of Washington, Seattle
Pre-print Media Attached
15:10
20m
Talk
Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References
Main Conference
Hongzheng Chen Cornell University, Bin Fan Nvidia, Alexander Collins NVIDIA, Bastian Hagedorn NVIDIA, Evghenii Gaburov NVIDIA, Masahiro Masuda NVIDIA, Matthew Brookhart NVIDIA, Chris Sullivan NVIDIA, Jason Knight NVIDIA, Zhiru Zhang Cornell University, USA, Vinod Grover NVIDIA
Pre-print