CGO 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026

As usual, the main conferences will be preceded by CC and two days of workshops and tutorials. When you register, you may chose to attend one, both or neither of the pre-conference days. You will also be asked to nominate which activities you plan to attend on each day.

For each day of the conference that you have registered, you will be free to attend any of the running activities, as long as there is sufficient space. If space is short, priority will always be given to those who registered for the specific event.

Schedule

Jan 31
Saturday
Feb 01
Sunday
Feb 02
Monday
Feb 03
Tuesday
Feb 04
Wednesday
CC HPCA
Arch4Health EMC2
DISCC MCCSys CGO
XiangShan HPSmartNet Reduce FIDESlib
DDDLBA PEPNAIA Beethoven Prefetch PPoPP
CLPE C4ML
LLVM LATHC
SODA Proteus
XTC DiffPP
CACHP ScaleDNN DDRP
MLIR

CC

The ACM SIGPLAN International Conference on Compiler Construction (CC) is an annual conference on processing programs in the most general sense: analyzing, transforming or executing input programs that describe how a system operates, including traditional compiler construction as a special case.

Workshops

HPCA

Arch4Health

Second Workshop on Architecture for Health (Arch4Health). This workshop focuses on identifying key computational challenges in health-related applications and discussing how computer architects can contribute to advancing healthcare by addressing these challenges.

DISCC

Workshop on Data Integrity and Secure Cloud Computing (DISCC). This brings together aspects of data integrity and security in a single, unified forum. The workshop will comprise of a keynote speech, several contributed papers and, time permitting, a closing panel session involving leading edge experts in data integrity and security in a hyper-scale cloud computing setting.

EMC2

11th Workshop on Energy Efficient Machine Learning & Cognitive Computing (EMC2). In this edition of EMC2 workshop, we plan to facilitate conversation about the sustainability of large-scale AI computing systems being developed to meet the ever-increasing demands of generative AI. This involves discussions spanning multiple interrelated areas.

MCCSys

4th Workshop + Tutorial on Memory-Centric Computing Systems (MCCSys). This combined tutorial and workshop will focus on the latest advances in PIM technology, spanning both hardware and software.

Prefetch

4th Data Prefetching Championship (Prefetch). This workshop will bring together academic researchers, industry practitioners, and students to collectively advance this crucial area. By providing open-source infrastructures, standardized evaluation methodologies, and well-defifined workload suites, the championship will not only highlight the best-performing designs but also cultivate a living benchmark ecosystem for future research.

CGO

LLVM

10th LLVM Performance Workshop at CGO. We invite speakers from academia and industry to present their work on the following list of topics (including and not limited to):

  • Improving performance and code-size of applications built by LLVM toolchains
  • Improving performance of LLVM’s runtime libraries
  • Improving the security of generated code
  • Any tools or products developed by using one of the libraries in LLVM infrastructure
  • Performance tracking over time
  • Compiler flags, annotations and remarks to understand and improve performance
  • Any other topic related to improving and maintaining the performance and quality of LLVM generated code

While the primary focus of the workshop is on these topics, we welcome any submission related to the LLVM project, its sub-projects (clang, mlir, lldb, Polly, lld, openmp, pstl, compiler-rt, etc.), as well as their use in industry and academia.

C4ML

6th Compilers for Machine Learning Workshop (C4ML). Machine learning applications are ubiquitous in large-scale production systems. With that growth and the scaling in data volume and model complexity, the focus on efficiently executing machine learning models has become even greater. The push for higher energy efficiency has led to the emergence of diverse heterogeneous system and accelerator architectures. In parallel, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, type systems, language embeddings, frameworks and libraries. Compilers have historically been the bridge between programmer efficiency and high performance code, allowing the expression of code that remains understandable and productive to port and extend, while producing high-performance code for diverse architectures. As such, compiler techniques have been increasingly incorporated into machine learning frameworks. This goes both ways: given the broadening gap between high-level constructs and hardware accelerators, compilers in machine learning frameworks also emerged as natural clients of machine learning techniques, from domain-specific heuristics to automatic optimization discovery and application.

LATHC

4th Languages, Architectures, and Tools for Heterogeneous Computing (LATHC) Workshop. The goal of this workshop is to gather researchers and developers to present and discuss recent developments in knowledge and technology that supports heterogeneous computing systems. This one-day workshop invites presentation of relevant recent proposals, technology, and systems evaluation, design concepts, and other developments that are expected to have an impact on the performance, usability, availability, and design cycle, power consumption or other aspects of a heterogeneous system.

PPoPP

CACHP

Fastcode: Workshop on Co-evolution of Algorithms, Compilers, and Hardware for Performance (CACHP). The CACHP workshop aims to bring together researchers and practitioners to explore this co-evolution of algorithms, compilers, and hardware. The central idea behind CACHP is that performance and efficient use of resources must remain a core focus of computing. Achieving these goals requires advances not only in algorithm design, but also in programming models, software construction, compilers, and hardware.

ScaleDNN

Principles And Practice Of Scalable And Distributed Deep Neural Networks Training And Inference (ScaleDNN). This workshop provides an overview of recent trends in DL and the role of cutting-edge hardware architectures and interconnects in moving the field forward. We will also present an overview of different DNN architectures, DL frameworks and DL Training and Inference with special focus on parallelization strategies for model training. We highlight new challenges and opportunities for communication runtimes to exploit high-performance CPU/GPU architectures to efficiently support large-scale distributed training. We also highlight some of our co-design efforts to utilize MPI for large-scale DNN training on cutting-edge CPU and GPU architectures available on modern HPC clusters.

DiffPP

3rd Workshop on Differentiable Parallel Programming (DiffPP). This workshop aims to bring together researchers interested in methods, tools and frameworks relying on automatic differentiation, and practitioners who need derivatives for parallel or HPC workloads, in application areas spanning applied mathematics, scientific computing, computational engineering, and machine learning.

Tutorials

HPCA

XiangShan

XiangShan: An Open-Source High-Performance RISC-V Processor and Infrastructure for Architecture Research.

HPSmartNet

High Performance and Smart Networking Technologies for HPC and AI (HPSmartNet).

DDDLBA

A Deep Dive into Deep Learning Benchmarking and Analysis (DDDLBA).

PEPNAIA

Programming and Evaluating Performance of Novel AI Accelerators (PEPNAIA).

Reduce

The Hidden Impacts of AI: Exploiting Architectural Designs to Reduce Water and Health Footprints of Data Centers (Reduce).

FIDESlib

Accelerating Private Cloud Computing with FIDESlib.

Beethoven

Beethoven: Tutorial for An Open-Source Accelerator Composer.

CGO

CLPE

Core-Level Performance Engineering (CLPE). This tutorial conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware on the level of a single CPU core and the lowest memory hierarchy level (the L1 cache). We also demonstrate performance analysis and performance engineering using the Open-Source Architecture Code Analyzer (OSACA) in combination with a dedicated instance of the well-known Compiler Explorer. Various hands-on exercises will allow attendees to make their own experiments and measurements and identify in-core performance bottlenecks. Furthermore, we show real-life use cases from computational science (sparse solvers, lattice QCD) to emphasize how profitable in-core performance engineering can be.

SODA

SODA Synthesizer: Accelerating Artificial Intelligence Applications with an End-to-End Silicon Compiler. The tutorial will provide a hands-on ands-on experience of the SOftware Defined Accelerators (SODA) Synthesizer, an open-source compiler-based toolchain composed of SODA-OPT, a front-end and optimizer that interfaces with productive programming data science frameworks in Python based on the MLIR framework, and Bambu, the most advanced open-source HLS tool available, able to generate optimized accelerators for data-intensive kernels.

XTC

Chasing Peak Performance: Hands-on AI Operator Optimization with XTC. This half-day tutorial introduces the fundamentals of optimizing AI operators on CPUs. We will use the XTC research platform to optimize several linear algebra operators (matrix multiplication, convolutions) for an Intel microarchitecture. Participants will learn how to:

  • Specify various loop transformations (tiling, splitting, unrolling, vectorization, etc.).
  • Measure the performance of the generated code and benchmark it against the target hardware’s peak performance.
  • Experiment with different backends (MLIR, TVM) and compare them.
  • Serialize the IR at different abstraction levels (down to assembly) to understand the transformations.
  • Iterate in order to reach peak performance.

Proteus

Proteus: Portable JIT compilation and optimization for C/C++ CPU and GPU codes. The first session will cover the Proteus JIT compiler concepts and integration with C/C++ codes. The second session will dive into runtime optimizations and advanced auto-tuning using Mneme. We will keep the sessions interactive by welcoming audience questions.

PPoPP

MLIR

Introduction to Multi-Level Intermediate Representation (MLIR). This tutorial seeks to demystify this and provide an overview of how to use MLIR to build a compiler to target CPUs and GPUs. Starting with an overview of core MLIR concepts, like Dialects, Operations, Types and Interfaces, the tutorial will give participants hands-on experience in build an optimizing compiler using MLIR. At the end of this tutorial participants should have a good understanding of the power of transformations in MLIR and how to use them.

DDRP

Dynamic Data-Race Prediction: Fundmentals, Theory and Practice (DDRP). This tutorial will present the foundations of race prediction, summarise latest advances in race prediction in a concise and present avenues for systems research. State-of-the-art predictive techniques will be explained out of first principles, followed by a comparison between soundness, completeness and complexity guarantees provided in each case.