Proton: Towards Multi-level, Adaptive Profiling for Triton (CGO 2026 - Main Conference)

Who

Keren Zhou, Tianle Zhong, Hao Wu, Jihyeong Lee, Yue Guan, Yufei Ding, Corbin Robeck, Yuanwei Fang, Jeff Niu, Philippe Tillet

Track

CGO 2026 Main Conference

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 11:50 - 12:10 at Bronte - Profiling / Instrumentation Chair(s): Mircea Trofin

Abstract

Domain-Specific languages (DSLs) such as Triton enable developers to write high-performance GPU kernels in a Python-friendly manner; however, profiling these kernels with existing tools often incurs runtime and storage overhead while failing to deliver actionable insights for both kernel authors and framework developers.
We present Proton, a multi-level, adaptive profiler tailored for the Triton programming language and compiler.
Proton provides frontend APIs to selectively profile relevant regions, aggregate results, capture custom metrics not available through hardware counters, and query profiles using a SQL-like language.
Proton's backend design unifies vendor profiling APIs with instrumentation-based profiling, ensuring portability and extensibility.
Using Proton, users are able to query custom and hardware metrics across the relevant levels of abstraction—full end-to-end model execution, isolated neural network layers, language-specific Triton operators, and compiler intermediate representations.
We demonstrate the tool's effectiveness through case studies on production-grade kernel development, continuous integration, multi-GPU analysis, language model inference, and intra-kernel profiling.
Our evaluations on end-to-end workloads, as well as standalone Triton kernels, demonstrate that Proton imposes lower runtime overhead and delivers significant reductions in profile sizes relative to existing framework and vendor profilers while being fully open source.

Link to Preprint

https://www.conference-publishing.com/Proc/CGO26/cgo26/cgo26main-p46-p

Keren Zhou

George Mason University

Tianle Zhong

University of Virginia

United States

Hao Wu

George Mason University

Jihyeong Lee

George Mason University

United States

Yue Guan

University of California at San Diego

United States

Yufei Ding

University of California at Santa Barbara

Corbin Robeck

United States

Yuanwei Fang

United States

Jeff Niu

OpenAI

United States

Philippe Tillet

OpenAI

United States

Media

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

11:30 - 12:50	Profiling / InstrumentationMain Conference at Bronte Chair(s): Mircea Trofin Google

11:30 20m Talk		TRACE4J: A Lightweight, Flexible, and Insightful Performance Tracing Tool for Java Main Conference Haide He UC Merced, Pengfei Su University of California, Merced Pre-print Media Attached
11:50 20m Talk		Proton: Towards Multi-level, Adaptive Profiling for Triton Main Conference Keren Zhou George Mason University, Tianle Zhong University of Virginia, Hao Wu George Mason University, Jihyeong Lee George Mason University, Yue Guan University of California at San Diego, Yufei Ding University of California at Santa Barbara, Corbin Robeck Meta, Yuanwei Fang Meta, Jeff Niu OpenAI, Philippe Tillet OpenAI Pre-print Media Attached
12:10 20m Talk		On the Precision of Dynamic Program Fingerprints Based on Performance Counters Main Conference Anderson Faustino da Silva State University of Maringá, Sergio Queiroz de Medeiros Universidade Federal do Rio Grande do Norte, Marcelo Borges Nogueira Federal University of Rio Grande do Norte, Jeronimo Castrillon TU Dresden, Germany, Fernando Magno Quintão Pereira Federal University of Minas Gerais Pre-print Media Attached
12:30 20m Talk		PASTA: A Modular Program Analysis Tool Framework for Accelerators Main Conference Mao Lin University of California Merced, Hyeran Jeon University of California, Merced, Keren Zhou George Mason University Pre-print Media Attached

Hide past events