Skip to main navigation Skip to search Skip to main content

Exploring Lossy Compression of Activation Data for Emerging AI Accelerators: A Case Study on the Graphcore IPU

  • Milan Shah
  • , Xiaodong Yu
  • , Sheng Di
  • , Michela Becchi
  • , Franck Cappello
  • North Carolina State University
  • Argonne National Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapidly expanding size and computational complexity of AI models in the past several years has necessitated methods for managing growing memory footprint and matching the computational needs of these models. Novel AI accelerators, such as the Graphcore IPU, attempt to address these needs through expanding on-chip memory and enabling massive data parallelism with hundreds of compute cores. Even with more on-chip memory than GPUs, the IPU can suffer from costly on-chip/off-chip memory transactions. Lossy compression can mitigate this bottleneck, improving IPU performance.Our work explores the use of lossy compression of activation data as a means to improve memory utilization and reduce communication overheads within and across IPU devices. We make the following contributions: 1) We study limitations of the software stack of the IPU (and other AI accelerators) complicating the integration of compression in DNN pipelines. 2) To address these issues, we propose a tool that unrolls the backward pass of a DNN, facilitating activation access and enabling compression. Our tool converts a text-based model representation to a PyTorch-level representation with compression/decompression calls and custom gradient operations. 2) We integrate a DCT-based compressor and quantization to reduce activation sizes on the IPU. 3) We model multi-IPU performance with/without compression. 4) We evaluate activation compression performance across single IPU and multi-IPU configurations using different parallelism modes. When compared with using a single IPU with no compression, we observe 1.1-3.5X speedups on one IPU, 17-20X speedups using two pipelined IPUs, over 16X speedup using 16 data parallel IPUs, and over 185X speedup using 16 IPUs with a mix of pipeline and data parallelism.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE International Symposium on Workload Characterization, IISWC 2025
Pages219-232
Number of pages14
ISBN (Electronic)9798331549176
DOIs
StatePublished - 2025
Event28th IEEE International Symposium on Workload Characterization, IISWC 2025 - Irvine, United States
Duration: 12 Oct 202514 Oct 2025

Publication series

NameProceedings - 2025 IEEE International Symposium on Workload Characterization, IISWC 2025

Conference

Conference28th IEEE International Symposium on Workload Characterization, IISWC 2025
Country/TerritoryUnited States
CityIrvine
Period12/10/2514/10/25

Keywords

  • accelerators
  • compression
  • model training

Fingerprint

Dive into the research topics of 'Exploring Lossy Compression of Activation Data for Emerging AI Accelerators: A Case Study on the Graphcore IPU'. Together they form a unique fingerprint.

Cite this