TY - GEN
T1 - An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression
AU - Huang, Jiajun
AU - Di, Sheng
AU - Yu, Xiaodong
AU - Zhai, Yujia
AU - Zhang, Zhaorui
AU - Liu, Jinyang
AU - Lu, Xiaoyi
AU - Raffenetti, Ken
AU - Zhou, Hui
AU - Zhao, Kai
AU - Chen, Zizhong
AU - Cappello, Franck
AU - Guo, Yanfei
AU - Thakur, Rajeev
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communications turns out to be a critical bottleneck in large-scale distributed and parallel processing. The large message size in MPI collectives is particularly concerning because it can significantly degrade the overall parallel performance. To address this issue, prior research simply applies the off-the-shelf fix-rate lossy compressors in the MPI collectives, leading to suboptimal performance, limited generalizability, and unbounded errors. In this paper, we propose a novel solution, called C-Coll, which leverages error-bounded lossy compression to significantly reduce the message size, resulting in a substantial reduction in communication cost. The key contributions are three-fold. (1) We develop two general, optimized lossy-compression-based frameworks for both types of MPI collectives (collective data movement as well as collective computation), based on their particular characteristics. Our framework not only reduces communication cost but also preserves data accuracy. (2) We customize SZx, an ultra-fast error-bounded lossy compressor, to meet the specific needs of collective communication. (3) We integrate C-Coll into multiple collectives, such as MPI Allreduce, MPI Scatter, and MPI Bcast, and perform a comprehensive evaluation based on real-world scientific datasets. Experiments show that our solution outperforms the original MPI collectives as well as multiple baselines and related efforts by 1.8-2.7×.
AB - With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communications turns out to be a critical bottleneck in large-scale distributed and parallel processing. The large message size in MPI collectives is particularly concerning because it can significantly degrade the overall parallel performance. To address this issue, prior research simply applies the off-the-shelf fix-rate lossy compressors in the MPI collectives, leading to suboptimal performance, limited generalizability, and unbounded errors. In this paper, we propose a novel solution, called C-Coll, which leverages error-bounded lossy compression to significantly reduce the message size, resulting in a substantial reduction in communication cost. The key contributions are three-fold. (1) We develop two general, optimized lossy-compression-based frameworks for both types of MPI collectives (collective data movement as well as collective computation), based on their particular characteristics. Our framework not only reduces communication cost but also preserves data accuracy. (2) We customize SZx, an ultra-fast error-bounded lossy compressor, to meet the specific needs of collective communication. (3) We integrate C-Coll into multiple collectives, such as MPI Allreduce, MPI Scatter, and MPI Bcast, and perform a comprehensive evaluation based on real-world scientific datasets. Experiments show that our solution outperforms the original MPI collectives as well as multiple baselines and related efforts by 1.8-2.7×.
KW - Distributed Systems
KW - Lossy Compression
KW - MPI Collective
UR - http://www.scopus.com/inward/record.url?scp=85198906371&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85198906371&partnerID=8YFLogxK
U2 - 10.1109/IPDPS57955.2024.00072
DO - 10.1109/IPDPS57955.2024.00072
M3 - Conference contribution
AN - SCOPUS:85198906371
T3 - Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
SP - 752
EP - 764
BT - Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
T2 - 38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
Y2 - 27 May 2024 through 31 May 2024
ER -