TY - GEN
T1 - cuSZp
T2 - 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
AU - Huang, Yafan
AU - Di, Sheng
AU - Yu, Xiaodong
AU - Li, Guanpeng
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023
Y1 - 2023
N2 - Modern scientific applications and supercomputing systems are generating large amounts of data in various fields, leading to critical challenges in data storage footprints and communication times. To address this issue, error-bounded GPU lossy compression has been widely adopted, since it can reduce the volume of data within a customized threshold on data distortion. In this work, we propose an ultra-fast error-bounded GPU lossy compressor cuSZP. Specifically, cuSZp computes the linear recurrences with hierarchical parallelism to fuse the massive computation into one kernel, drastically improving the end-to-end throughput. In addition, cuSZp adopts a block-wise design along with a lightweight fixed-length encoding and bit-shuffle inside each block such that it achieves high compression ratios and data quality. Our experiments on NVIDIA A100 GPU with 6 representative scientific datasets demonstrate that cuSZp can achieve an ultra-fast end-to-end throughput (95.53x compared with cuSZ) along with a high compression ratio and high reconstructed data quality.
AB - Modern scientific applications and supercomputing systems are generating large amounts of data in various fields, leading to critical challenges in data storage footprints and communication times. To address this issue, error-bounded GPU lossy compression has been widely adopted, since it can reduce the volume of data within a customized threshold on data distortion. In this work, we propose an ultra-fast error-bounded GPU lossy compressor cuSZP. Specifically, cuSZp computes the linear recurrences with hierarchical parallelism to fuse the massive computation into one kernel, drastically improving the end-to-end throughput. In addition, cuSZp adopts a block-wise design along with a lightweight fixed-length encoding and bit-shuffle inside each block such that it achieves high compression ratios and data quality. Our experiments on NVIDIA A100 GPU with 6 representative scientific datasets demonstrate that cuSZp can achieve an ultra-fast end-to-end throughput (95.53x compared with cuSZ) along with a high compression ratio and high reconstructed data quality.
KW - CUDA
KW - Error-bounded Lossy Compression
KW - GPU
KW - High-speed Compressor
KW - Parallel Computing
KW - Scientific Simulation
UR - http://www.scopus.com/inward/record.url?scp=85190431739&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190431739&partnerID=8YFLogxK
U2 - 10.1145/3581784.3607048
DO - 10.1145/3581784.3607048
M3 - Conference contribution
AN - SCOPUS:85178137474
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - SC 2023 - International Conference for High Performance Computing, Networking, Storage and Analysis
Y2 - 12 November 2023 through 17 November 2023
ER -