CGO 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026

This program is tentative and subject to change.

Mon 2 Feb 2026 16:10 - 16:30 at Bronte - Parallelization / Vectorization Chair(s): V Krishna Nandivada

Python exhibits inferior performance relative to traditional high performance computing (HPC) languages such as C, C++, and Fortran. This performance gap is largely due to Python's interpreted nature and the Global Interpreter Lock (GIL), which restricts multithreading efficiency. However, the introduction of a GIL-free variant in the Python interpreter opens the door to more effective exploitation of multithreading parallelism in Python. Based on this important new feature, we introduce OMP4Py with the aim of bringing OpenMP's familiar directive-based parallelization paradigm to Python. Its dual-runtime architecture design combines the benefits of a pure Python implementation with the performance and low-level capabilities required to maximize efficiency in compute-intensive tasks. In this way, OMP4Py offers both full Python support and the high performance required by HPC workloads.

This program is tentative and subject to change.

Mon 2 Feb

Displayed time zone: Hobart change

15:50 - 17:10
Parallelization / VectorizationMain Conference at Bronte
Chair(s): V Krishna Nandivada IIT Madras
15:50
20m
Talk
Enabling Automatic Compiler-Driven Vectorization of Transformers
Main Conference
Shreya Alladi University of Murcia, Alberto Ros University of Murcia, Alexandra Jimborean University of Murcia
Pre-print Media Attached
16:10
20m
Talk
Unlocking Python Multithreading Capabilities using OpenMP-Based Programming with OMP4Py
Main Conference
César Piñeiro University of Santiago de Compostela, Juan C. Pichel University of Santiago de Compostela
Pre-print Media Attached
16:30
20m
Talk
The Parallel-Semantics Program Dependence Graph for Parallel Optimization
Main Conference
Yian Su Northwestern University, Brian Homerding Northwestern University, Haocheng Gao Northwestern University, Federico Sossai Northwestern University, Yebin Chon Princeton University, David I. August Princeton University, Simone Campanoni Google / Northwestern University
Pre-print Media Attached
16:50
20m
Talk
From Threads to Tiles: T2T, a Compiler for CUDA-to-NPU Translation via 2D Vectorization
Main Conference
Shuaijiang Li Institute of Computing Technology at Chinese Academy of Sciences, Jiacheng Zhao Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences; Zhongguancun Laboratory, Ying Liu Institute of Computing Technology, Chinese Academy of Sciences, Shuoming Zhang Institute of Computing Technology at Chinese Academy of Sciences, Lei Chen University of Chinese Academy of Sciences, Yijin Li Institute of Computing Technology at Chinese Academy of Sciences, Yangyu Zhang Institute of Computing Technology,Chinese Academy of Sciences, lizhicheng Institute of Computing Technology at Chinese Academy of Sciences, Runyu Zhou Institute of Computing Technology at Chinese Academy of Sciences, Xiyu Shi Institute of Computing Technology at Chinese Academy of Sciences, Chunwei Xia University of Leeds, Yuan Wen University of Aberdeen, Xiaobing Feng ICT CAS, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences
Pre-print
Hide past events