Ember: A Compiler for Embedding Operations on Decoupled Access-Execute Architectures (CGO 2026 - Main Conference)

Who

Marco Siracusa, Olivia Hsu, Víctor Soria-Pardos, Joshua Randall, Arnaud Grasset, Eric Biscondi, Douglas J. Joseph, Randy Allen, Fredrik Kjolstad, Miquel Moreto, Adrià Armejach Sanosa

Track

CGO 2026 Main Conference

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 2 Feb 2026 12:30 - 12:50 at Bronte - Abstractions Chair(s): Antonino Tumeo

Abstract

Decoupled Access-Execute (DAE) architectures separate memory accesses from computation in two specialized units. This design is becoming increasingly popular among hyperscalers to accelerate irregular embedding lookups in recommendation models. In this paper, we first broaden the scope by demonstrating the benefits of DAE architectures across a wider range of irregular embedding operations in several machine learning models. Then, we propose the Ember compiler to automatically compile all of these embedding operations to DAE architectures. Conversely from other DAE compilers, Ember features multiple intermediate representations specifically designed for different optimization levels. In this way, Ember can implement all optimizations to match the performance of hand-written code, unlocking the full potential of DAE architectures at scale.

Link to Preprint

https://www.conference-publishing.com/Proc/CGO26/cgo26/cgo26main-p78-p

Marco Siracusa

Barcelona Supercomputing Center; Universitat Politècnica de Catalunya

Spain

Olivia Hsu

Stanford University

United States

Víctor Soria-Pardos

Barcelona Supercomputing Center

Spain

Joshua Randall

Arm

United States

Arnaud Grasset

Arm

United States

Eric Biscondi

Arm

United States

Douglas J. Joseph

Arm

United States

Randy Allen

Barcelona Supercomputing Center

Spain

Fredrik Kjolstad

Stanford University

United States

Miquel Moreto

Technical Univeristy of Catalonia

Spain

Adrià Armejach Sanosa

Barcelona Supercomputing Center & Universitat Politècnica de Catalunya

Spain

Media

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 2 Feb
Displayed time zone: Hobart change

11:30 - 12:50	AbstractionsMain Conference at Bronte Chair(s): Antonino Tumeo Pacific Northwest National Laboratory

11:30 20m Talk		Partial-Evaluation Templates: Accelerating Partial Evaluation with Pre-compiled Templates Main Conference Florian Huemer JKU Linz, Aleksandar Prokopec Oracle Labs, David Leopoldseder Oracle Labs, Raphael Mosaner Oracle Labs, Hanspeter Mössenböck JKU Linz Pre-print
11:50 20m Talk		Pyls: Enabling Python Hardware Synthesis with Dynamic Polymorphism via LCRS Encoding Main Conference Bolei Tong Wuhan University, Yongyan Fang Wuhan University, Wang Chaorui Wuhan University, Qingan Li Wuhan University, China, Jingling Xue University of New South Wales, YUAN Mengting School of Computer Science, Wuhan University, Wuhan, China Pre-print
12:10 20m Talk		SkeleShare: Algorithmic Skeletons and Equality Saturation for Hardware Resource Sharing Main Conference Jonathan Van der Cruysse McGill University, Tzung-Han Juang McGill University, Shakiba Bolbolian Khah McGill University, Christophe Dubach McGill University Pre-print Media Attached
12:30 20m Talk		Ember: A Compiler for Embedding Operations on Decoupled Access-Execute Architectures Main Conference Marco Siracusa Barcelona Supercomputing Center; Universitat Politècnica de Catalunya, Olivia Hsu Stanford University, Víctor Soria-Pardos Barcelona Supercomputing Center, Joshua Randall Arm, Arnaud Grasset Arm, Eric Biscondi Arm, Douglas J. Joseph Arm, Randy Allen Barcelona Supercomputing Center, Fredrik Kjolstad Stanford University, Miquel Moreto Technical Univeristy of Catalonia, Adrià Armejach Sanosa Barcelona Supercomputing Center & Universitat Politècnica de Catalunya Pre-print Media Attached

Hide past events