TY - GEN
T1 - Exploring Lossy Compression of Activation Data for Emerging AI Accelerators
T2 - 28th IEEE International Symposium on Workload Characterization, IISWC 2025
AU - Shah, Milan
AU - Yu, Xiaodong
AU - Di, Sheng
AU - Becchi, Michela
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The rapidly expanding size and computational complexity of AI models in the past several years has necessitated methods for managing growing memory footprint and matching the computational needs of these models. Novel AI accelerators, such as the Graphcore IPU, attempt to address these needs through expanding on-chip memory and enabling massive data parallelism with hundreds of compute cores. Even with more on-chip memory than GPUs, the IPU can suffer from costly on-chip/off-chip memory transactions. Lossy compression can mitigate this bottleneck, improving IPU performance.Our work explores the use of lossy compression of activation data as a means to improve memory utilization and reduce communication overheads within and across IPU devices. We make the following contributions: 1) We study limitations of the software stack of the IPU (and other AI accelerators) complicating the integration of compression in DNN pipelines. 2) To address these issues, we propose a tool that unrolls the backward pass of a DNN, facilitating activation access and enabling compression. Our tool converts a text-based model representation to a PyTorch-level representation with compression/decompression calls and custom gradient operations. 2) We integrate a DCT-based compressor and quantization to reduce activation sizes on the IPU. 3) We model multi-IPU performance with/without compression. 4) We evaluate activation compression performance across single IPU and multi-IPU configurations using different parallelism modes. When compared with using a single IPU with no compression, we observe 1.1-3.5X speedups on one IPU, 17-20X speedups using two pipelined IPUs, over 16X speedup using 16 data parallel IPUs, and over 185X speedup using 16 IPUs with a mix of pipeline and data parallelism.
AB - The rapidly expanding size and computational complexity of AI models in the past several years has necessitated methods for managing growing memory footprint and matching the computational needs of these models. Novel AI accelerators, such as the Graphcore IPU, attempt to address these needs through expanding on-chip memory and enabling massive data parallelism with hundreds of compute cores. Even with more on-chip memory than GPUs, the IPU can suffer from costly on-chip/off-chip memory transactions. Lossy compression can mitigate this bottleneck, improving IPU performance.Our work explores the use of lossy compression of activation data as a means to improve memory utilization and reduce communication overheads within and across IPU devices. We make the following contributions: 1) We study limitations of the software stack of the IPU (and other AI accelerators) complicating the integration of compression in DNN pipelines. 2) To address these issues, we propose a tool that unrolls the backward pass of a DNN, facilitating activation access and enabling compression. Our tool converts a text-based model representation to a PyTorch-level representation with compression/decompression calls and custom gradient operations. 2) We integrate a DCT-based compressor and quantization to reduce activation sizes on the IPU. 3) We model multi-IPU performance with/without compression. 4) We evaluate activation compression performance across single IPU and multi-IPU configurations using different parallelism modes. When compared with using a single IPU with no compression, we observe 1.1-3.5X speedups on one IPU, 17-20X speedups using two pipelined IPUs, over 16X speedup using 16 data parallel IPUs, and over 185X speedup using 16 IPUs with a mix of pipeline and data parallelism.
KW - accelerators
KW - compression
KW - model training
UR - https://www.scopus.com/pages/publications/105029026755
UR - https://www.scopus.com/pages/publications/105029026755#tab=citedBy
U2 - 10.1109/IISWC66894.2025.00027
DO - 10.1109/IISWC66894.2025.00027
M3 - Conference contribution
AN - SCOPUS:105029026755
T3 - Proceedings - 2025 IEEE International Symposium on Workload Characterization, IISWC 2025
SP - 219
EP - 232
BT - Proceedings - 2025 IEEE International Symposium on Workload Characterization, IISWC 2025
Y2 - 12 October 2025 through 14 October 2025
ER -