CGO 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Mon 2 Feb 2026 14:10 - 14:30 at Balmoral - Memory Chair(s): Christophe Guillon

With the increasing popularity of artificial intelligence (AI) applications, deep neural networks (DNNs) are in demand for on-device serving in various real-life fields.
Running DNN inference on a resource-constrained edge device requires aggressive memory optimization.
While several recent tiling-based techniques reduce peak memory usage by partitioning large tensors into micro-tensors, they are specialized for the MCU environment and do not provide scalability to various edge platforms.
Moreover, they greedily search for the target of tiling without considering the memory flow across the model while partitioning.
In this paper, we propose OKO, a compiler-based optimization technique that minimizes peak memory usage by considering both tiling and the corresponding operation rescheduling.
OKO estimates the memory savings from the tiling method based on the lifetime and dependencies of the tensor, and then algorithmically selects the optimal tiling strategy.
It further maximizes the reuse of memory spaces by efficiently reordering operations and immediately releasing unnecessary tensors.
Evaluations on various edge devices show that OKO achieves effective memory savings of up to 80% and an average of 59% with no loss of accuracy and negligible overhead, supporting memory-efficient inference across a broad range of target devices.

Mon 2 Feb

Displayed time zone: Hobart change

14:10 - 15:30
MemoryMain Conference at Balmoral
Chair(s): Christophe Guillon STMicroelectronics
14:10
20m
Talk
Flow-Graph-Aware Tiling and Rescheduling for Memory-Efficient On-Device Inference
Main Conference
Yeonoh Jeong Yonsei University, Taehyeong Park Yonsei University, Yongjun Park Yonsei University
Pre-print
14:30
20m
Talk
VFlatten: Selective Value-Object Flattening using Hybrid Static and Dynamic Analysis
Main Conference
Arjun H. Kumar IIT Mandi, Bhavya Hirani SVNIT, Surat, Hang Shao IBM, Tobi Ajila IBM, Vijay Sundaresan IBM Canada, Daryl Maier IBM Canada, Manas Thakur IIT Bombay
Pre-print Media Attached
14:50
20m
Talk
FRUGAL: Pushing GPU Applications beyond Memory Limits
Main Conference
Lingqi Zhang RIKEN RCCS, Tengfei Wang Google Cloud, Jiajun Huang University of California, Riverside, Chen Zhuang Tokyo Institute of Technology, Riken Center for Computational Science, Ivan Ivanov Institute of Science Tokyo, Peng Chen RIKEN RCCS, Toshio Endo , Mohamed Wahib RIKEN Center for Computational Science
Pre-print
15:10
20m
Talk
Automatic Data Enumeration for Fast Collections
Main Conference
Tommy McMichen Northwestern University, Simone Campanoni Google / Northwestern University
Pre-print Media Attached