TY - GEN
T1 - FZ-GPU
T2 - 32nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2023
AU - Zhang, Boyuan
AU - Tian, Jiannan
AU - Di, Sheng
AU - Yu, Xiaodong
AU - Feng, Yunhe
AU - Liang, Xin
AU - Tao, Dingwen
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/8/7
Y1 - 2023/8/7
N2 - Today's large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and high- ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warp-level optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2× over cuSZ and an average speedup of 37.0× over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3× and an average compression ratio improvement of 2.0× over cuZFP under the same data distortion.
AB - Today's large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and high- ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warp-level optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2× over cuSZ and an average speedup of 37.0× over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3× and an average compression ratio improvement of 2.0× over cuZFP under the same data distortion.
KW - gpu
KW - lossy compression
KW - performance
KW - scientific data
UR - http://www.scopus.com/inward/record.url?scp=85169583701&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85169583701&partnerID=8YFLogxK
U2 - 10.1145/3588195.3592994
DO - 10.1145/3588195.3592994
M3 - Conference contribution
AN - SCOPUS:85169583701
T3 - HPDC 2023 - Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
SP - 129
EP - 142
BT - HPDC 2023 - Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
Y2 - 16 June 2023 through 23 June 2023
ER -