TY - GEN
T1 - CereSZ
T2 - 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024
AU - Song, Shihui
AU - Huang, Yafan
AU - Jiang, Peng
AU - Yu, Xiaodong
AU - Zheng, Weijian
AU - Di, Sheng
AU - Cao, Qinglei
AU - Feng, Yunhe
AU - Xie, Zhen
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2024 held by the owner/author(s).
PY - 2024/6/3
Y1 - 2024/6/3
N2 - Today's scientific applications running on supercomputers produce large volumes of data, leading to critical data storage and communication challenges. To tackle the challenges, error-bounded lossy compression is commonly adopted since it can reduce data size drastically within a user-defined error threshold. Previous work has shown that compression techniques can significantly reduce the storage and I/O overhead while retaining good data quality. However, the existing compressors are mainly designed for CPU and GPU. As new AI chips are being incorporated into supercomputers and increasingly used for accelerating scientific computing, there is a growing demand for efficient data compression on the new architecture. In this paper, we propose an efficient lossy compressor, CereSZ, based on the Cerebras CS-2 system. The compression algorithm is mapped onto Cerebras using both data parallelism and pipeline parallelism. In order to achieve a balanced workload on each processing unit, we propose an algorithm to evenly distribute the pipeline stages. Our experiments with six scientific datasets demonstrate that CereSZ can achieve a throughput from 227.93 GB/s to 773.8 GB/s, 2.43x to 10.98x faster than existing GPU compressors.
AB - Today's scientific applications running on supercomputers produce large volumes of data, leading to critical data storage and communication challenges. To tackle the challenges, error-bounded lossy compression is commonly adopted since it can reduce data size drastically within a user-defined error threshold. Previous work has shown that compression techniques can significantly reduce the storage and I/O overhead while retaining good data quality. However, the existing compressors are mainly designed for CPU and GPU. As new AI chips are being incorporated into supercomputers and increasingly used for accelerating scientific computing, there is a growing demand for efficient data compression on the new architecture. In this paper, we propose an efficient lossy compressor, CereSZ, based on the Cerebras CS-2 system. The compression algorithm is mapped onto Cerebras using both data parallelism and pipeline parallelism. In order to achieve a balanced workload on each processing unit, we propose an algorithm to evenly distribute the pipeline stages. Our experiments with six scientific datasets demonstrate that CereSZ can achieve a throughput from 227.93 GB/s to 773.8 GB/s, 2.43x to 10.98x faster than existing GPU compressors.
KW - AI-optimized architecture
KW - error-bounded lossy compression
KW - high-speed compressor
KW - parallel computing
KW - scientific simulation
UR - https://www.scopus.com/pages/publications/85204940610
UR - https://www.scopus.com/inward/citedby.url?scp=85204940610&partnerID=8YFLogxK
U2 - 10.1145/3625549.3658691
DO - 10.1145/3625549.3658691
M3 - Conference contribution
AN - SCOPUS:85204940610
T3 - HPDC 2024 - Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing
SP - 309
EP - 321
BT - HPDC 2024 - Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing
Y2 - 3 June 2024 through 7 June 2024
ER -