FRUGAL: Pushing GPU Applications beyond Memory Limits (CGO 2026 - Main Conference)

Sat 31 January - Wed 4 February 2026 Sydney, Australia

co-located with HPCA/CGO/PPoPP/CC 2026

Who

Lingqi Zhang, Tengfei Wang, Jiajun Huang, Chen Zhuang, Ivan Ivanov, Peng Chen, Toshio Endo, Mohamed Wahib

Track

CGO 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 2 Feb 2026 14:50 - 15:10 at Balmoral - Memory Chair(s): Christophe Guillon

Abstract

GPUs power modern scientific and AI applications, but their limited memory capacity restricts scalability. Buying GPUs with larger HBM is prohibitively expensive and still bounded by market limits. Existing solutions either exploit application-specific knowledge through out-of-core techniques, which lack generality, or rely on system-level page faulting, which is transparent but inefficient. We propose FRUGAL, an application-agnostic framework and methodology that reduces GPU memory footprint while sustaining high performance. FRUGAL formulates memory management as an optimization over an application’s execution graph, encompassing prefetching, kernel execution, and offloading. Using static analysis and profiling, FRUGAL applies a two-phase scheduling and migration strategy, solving an otherwise intractable optimization efficiently. Evaluations on Tiled Cholesky Decomposition, Tiled LU Decomposition, Tiny-CUDA-NN, and QuEST show that FRUGAL significantly reduces maximum GPU memory usage by 80.21%, 80.20%, 64.75% and 60.86% with only a geometric mean of 28.31% slowdown. FRUGAL allows applications to exceed hardware-imposed limits, and maintains strong performance scalability beyond existing GPU memory constraints, without additional hardware cost.

Link to Preprint

https://www.conference-publishing.com/Proc/CGO26/cgo26/cgo26main-p193-p

Lingqi Zhang

RIKEN RCCS

Japan

Tengfei Wang

Google Cloud

Japan

Jiajun Huang

University of California, Riverside

Chen Zhuang

Tokyo Institute of Technology, Riken Center for Computational Science

Ivan Ivanov

Institute of Science Tokyo

Japan

Peng Chen

RIKEN RCCS

Japan

Toshio Endo

Mohamed Wahib

RIKEN Center for Computational Science

Japan

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 2 Feb
Displayed time zone: Hobart change

14:10 - 15:30	MemoryMain Conference at Balmoral Chair(s): Christophe Guillon STMicroelectronics

14:10 20m Talk		Flow-Graph-Aware Tiling and Rescheduling for Memory-Efficient On-Device Inference Main Conference Yeonoh Jeong Yonsei University, Taehyeong Park Yonsei University, Yongjun Park Yonsei University Pre-print
14:30 20m Talk		VFlatten: Selective Value-Object Flattening using Hybrid Static and Dynamic Analysis Main Conference Arjun H. Kumar IIT Mandi, Bhavya Hirani SVNIT, Surat, Hang Shao IBM, Tobi Ajila IBM, Vijay Sundaresan IBM Canada, Daryl Maier IBM Canada, Manas Thakur IIT Bombay Pre-print Media Attached
14:50 20m Talk		FRUGAL: Pushing GPU Applications beyond Memory Limits Main Conference Lingqi Zhang RIKEN RCCS, Tengfei Wang Google Cloud, Jiajun Huang University of California, Riverside, Chen Zhuang Tokyo Institute of Technology, Riken Center for Computational Science, Ivan Ivanov Institute of Science Tokyo, Peng Chen RIKEN RCCS, Toshio Endo , Mohamed Wahib RIKEN Center for Computational Science Pre-print
15:10 20m Talk		Automatic Data Enumeration for Fast Collections Main Conference Tommy McMichen Northwestern University, Simone Campanoni Google / Northwestern University Pre-print Media Attached