Proton: Towards Multi-level, Adaptive Profiling for Triton
This program is tentative and subject to change.
Domain-Specific languages (DSLs) such as Triton enable developers to write high-performance GPU kernels in a Python-friendly manner; however, profiling these kernels with existing tools often incurs runtime and storage overhead while failing to deliver actionable insights for both kernel authors and framework developers.
We present Proton, a multi-level, adaptive profiler tailored for the Triton programming language and compiler.
Proton provides frontend APIs to selectively profile relevant regions, aggregate results, capture custom metrics not available through hardware counters, and query profiles using a SQL-like language.
Proton's backend design unifies vendor profiling APIs with instrumentation-based profiling, ensuring portability and extensibility.
Using Proton, users are able to query custom and hardware metrics across the relevant levels of abstraction—full end-to-end model execution, isolated neural network layers, language-specific Triton operators, and compiler intermediate representations.
We demonstrate the tool's effectiveness through case studies on production-grade kernel development, continuous integration, multi-GPU analysis, language model inference, and intra-kernel profiling.
Our evaluations on end-to-end workloads, as well as standalone Triton kernels, demonstrate that Proton imposes lower runtime overhead and delivers significant reductions in profile sizes relative to existing framework and vendor profilers while being fully open source.