TY - GEN
T1 - A Portable, Fast, DCT-based Compressor for AI Accelerators
AU - Shah, Milan
AU - Yu, Xiaodong
AU - Di, Sheng
AU - Becchi, Michela
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2024 held by the owner/author(s).
PY - 2024/6/3
Y1 - 2024/6/3
N2 - Lossy compression can be an effective tool in AI training and inference to reduce memory requirements, storage footprint, and in some cases, execution time. With the rise of novel architectures designed to accelerate AI workloads, compression can continue to serve these purposes, but must be adapted to the new accelerators. Due to programmability and architectural differences, existing lossy compressors cannot be directly ported to and are not optimized for any AI accelerator, thus requiring new compression designs.In this paper, we propose a novel, portable, DCT-based lossy compressor that can be used across a variety of AI accelerators. More specifically, we make the following contributions: 1) We propose a DCT-based lossy compressor design for training data that uses operators supported across four state-of-the-art AI accelerators: Cerebras CS-2, SambaNova SN30, Groq GroqChip, and Graphcore IPU. 2) We design two optimization techniques to allow for higher resolution compressed data on certain platforms and improved compression ratio on the IPU. 3) We evaluate our compressor's ability to preserve accuracy on four benchmarks, three of which are AI for science benchmarks going beyond image classification. Our experiments show that accuracy degradation can be limited to 3% or less, and sometimes, compression improves accuracy. 4) We study compression/decompression time as a function of resolution and batch size, finding that our compressor can achieve throughputs on the scale of tens of GB/s, depending on the platform.
AB - Lossy compression can be an effective tool in AI training and inference to reduce memory requirements, storage footprint, and in some cases, execution time. With the rise of novel architectures designed to accelerate AI workloads, compression can continue to serve these purposes, but must be adapted to the new accelerators. Due to programmability and architectural differences, existing lossy compressors cannot be directly ported to and are not optimized for any AI accelerator, thus requiring new compression designs.In this paper, we propose a novel, portable, DCT-based lossy compressor that can be used across a variety of AI accelerators. More specifically, we make the following contributions: 1) We propose a DCT-based lossy compressor design for training data that uses operators supported across four state-of-the-art AI accelerators: Cerebras CS-2, SambaNova SN30, Groq GroqChip, and Graphcore IPU. 2) We design two optimization techniques to allow for higher resolution compressed data on certain platforms and improved compression ratio on the IPU. 3) We evaluate our compressor's ability to preserve accuracy on four benchmarks, three of which are AI for science benchmarks going beyond image classification. Our experiments show that accuracy degradation can be limited to 3% or less, and sometimes, compression improves accuracy. 4) We study compression/decompression time as a function of resolution and batch size, finding that our compressor can achieve throughputs on the scale of tens of GB/s, depending on the platform.
KW - AI accelerator
KW - compression
KW - ML training
UR - http://www.scopus.com/inward/record.url?scp=85204939972&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85204939972&partnerID=8YFLogxK
U2 - 10.1145/3625549.3658662
DO - 10.1145/3625549.3658662
M3 - Conference contribution
AN - SCOPUS:85204939972
T3 - HPDC 2024 - Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing
SP - 109
EP - 121
BT - HPDC 2024 - Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing
T2 - 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024
Y2 - 3 June 2024 through 7 June 2024
ER -