Neural compiler optimizations are increasingly deployed in production compilers, yet debugging their failures remains a significant challenge. When a learned optimizer produces suboptimal code or incorrect transformations, compiler developers lack systematic methods to isolate the root cause. We present AdverTest, a framework that automatically synthesizes minimal failing test cases exposing weaknesses in neural code optimizers. Unlike traditional fuzzing which generates random programs, AdverTest uses adversarial program synthesis guided by the optimizer’s learned model to discover semantically meaningful failure modes. Our approach generates programs that are (1) minimal in complexity, (2) maximally likely to trigger optimizer failures, and (3) human-interpretable for debugging. Preliminary results on LLVM’s ML-guided inliner show AdverTest discovers 3.2× more optimizer failures than random fuzzing while generating programs 67% smaller on average.
Ercüment Kaya Technical University of München and Leibniz Supercomputing Centre, Hossam Ahmed Technical University of München and Leibniz Supercomputing Centre, Martin Schulz Technical University of Munich