The Parallel-Semantics Program Dependence Graph for Parallel Optimization (CGO 2026 - Main Conference)

Who

Yian Su, Brian Homerding, Haocheng Gao, Federico Sossai, Yebin Chon, David I. August, Simone Campanoni

Track

CGO 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 2 Feb 2026 16:30 - 16:50 at Bronte - Parallelization / Vectorization Chair(s): V Krishna Nandivada

Abstract

Modern shared-memory parallel programming models, such as OpenMP and Cilk, enable developers to encode a parallel execution plan within their code. Existing compilers, including Clang and GCC, directly lower or add additional compatible parallelism on top of the developers' plan. However, when better parallel execution plans exist that are incompatible with the original plan, compilers lack the capability of disregarding it and replacing it with a better one. To address this problem, this paper introduces the parallel-semantics program dependence graph (PS-PDG), an extension of the program dependence graph (PDG) abstraction that can simultaneously represent parallel semantics derived from both the developer's original plan and the compiler's own analysis. To demonstrate the power of PS-PDG, this paper also introduces GINO, an LLVM-based compiler capable of optimizing parallel execution plans using PS-PDG. Through exploring, reasoning, and implementing better parallel execution plans unlocked by PS-PDG, GINO outperforms the developer's original parallel execution plan by 46.6% at most, and by 15% on average over 56 cores across 8 benchmarks from the NAS benchmark suite.

Link to Preprint

https://www.conference-publishing.com/Proc/CGO26/cgo26/cgo26main-p58-p

Yian Su

Northwestern University

United States

Brian Homerding

Northwestern University

United States

Haocheng Gao

Northwestern University

United States

Federico Sossai

Northwestern University

United States

Yebin Chon

Princeton University

United States

David I. August

Princeton University

United States

Simone Campanoni

Google / Northwestern University

United States

Media

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 2 Feb
Displayed time zone: Hobart change

15:50 - 17:10	Parallelization / VectorizationMain Conference at Bronte Chair(s): V Krishna Nandivada IIT Madras

15:50 20m Talk		Enabling Automatic Compiler-Driven Vectorization of Transformers Main Conference Shreya Alladi University of Murcia, Alberto Ros University of Murcia, Alexandra Jimborean University of Murcia Pre-print Media Attached
16:10 20m Talk		Unlocking Python Multithreading Capabilities using OpenMP-Based Programming with OMP4Py Main Conference César Piñeiro University of Santiago de Compostela, Juan C. Pichel University of Santiago de Compostela Pre-print Media Attached
16:30 20m Talk		The Parallel-Semantics Program Dependence Graph for Parallel Optimization Main Conference Yian Su Northwestern University, Brian Homerding Northwestern University, Haocheng Gao Northwestern University, Federico Sossai Northwestern University, Yebin Chon Princeton University, David I. August Princeton University, Simone Campanoni Google / Northwestern University Pre-print Media Attached
16:50 20m Talk		From Threads to Tiles: T2T, a Compiler for CUDA-to-NPU Translation via 2D Vectorization Main Conference Shuaijiang Li Institute of Computing Technology at Chinese Academy of Sciences, Jiacheng Zhao Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences; Zhongguancun Laboratory, Ying Liu Institute of Computing Technology, Chinese Academy of Sciences, Shuoming Zhang Institute of Computing Technology at Chinese Academy of Sciences, Lei Chen University of Chinese Academy of Sciences, Yijin Li Institute of Computing Technology at Chinese Academy of Sciences, Yangyu Zhang Institute of Computing Technology,Chinese Academy of Sciences, lizhicheng Institute of Computing Technology at Chinese Academy of Sciences, Runyu Zhou Institute of Computing Technology at Chinese Academy of Sciences, Xiyu Shi Institute of Computing Technology at Chinese Academy of Sciences, Chunwei Xia University of Leeds, Yuan Wen University of Aberdeen, Xiaobing Feng ICT CAS, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences Pre-print