TY - GEN
T1 - Parallel Implementation of Lossy Data Compression for Temporal Data Sets
AU - Yuan, Zheng
AU - Hendrix, William
AU - Son, Seung Woo
AU - Federrath, Christoph
AU - Agrawal, Ankit
AU - Liao, Wei Keng
AU - Choudhary, Alok
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/2/1
Y1 - 2017/2/1
N2 - Many scientific data sets contain temporal dimensions. These are the data storing information at the same spatial location but different time stamps. Some of the biggest temporal datasets are produced by parallel computing applications such as simulations of climate change and fluid dynamics. Temporal datasets can be very large and cost a huge amount of time to transfer among storage locations. Using data compression techniques, files can be transferred faster and save storage space. NUMARCK is a lossy data compression algorithm for temporal data sets that can learn emerging distributions of element-wise change ratios along the temporal dimension and encodes them into an index table to be concisely represented. This paper presents a parallel implementation of NUMARCK. Evaluated with six data sets obtained from climate and astrophysics simulations, parallel NUMARCK achieved scalable speedups of up to 8788 when running 12800 MPI processes on a parallel computer. We also compare the compression ratios against two lossy data compression algorithms, ISABELA and ZFP. The results show that NUMARCK achieved higher compression ratio than ISABELA and ZFP.
AB - Many scientific data sets contain temporal dimensions. These are the data storing information at the same spatial location but different time stamps. Some of the biggest temporal datasets are produced by parallel computing applications such as simulations of climate change and fluid dynamics. Temporal datasets can be very large and cost a huge amount of time to transfer among storage locations. Using data compression techniques, files can be transferred faster and save storage space. NUMARCK is a lossy data compression algorithm for temporal data sets that can learn emerging distributions of element-wise change ratios along the temporal dimension and encodes them into an index table to be concisely represented. This paper presents a parallel implementation of NUMARCK. Evaluated with six data sets obtained from climate and astrophysics simulations, parallel NUMARCK achieved scalable speedups of up to 8788 when running 12800 MPI processes on a parallel computer. We also compare the compression ratios against two lossy data compression algorithms, ISABELA and ZFP. The results show that NUMARCK achieved higher compression ratio than ISABELA and ZFP.
KW - error-bound
KW - lossy data compression
KW - parallel data compression
KW - temporal change ratio
UR - http://www.scopus.com/inward/record.url?scp=85015155501&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015155501&partnerID=8YFLogxK
U2 - 10.1109/HiPC.2016.017
DO - 10.1109/HiPC.2016.017
M3 - Conference contribution
AN - SCOPUS:85015155501
T3 - Proceedings - 23rd IEEE International Conference on High Performance Computing, HiPC 2016
SP - 62
EP - 71
BT - Proceedings - 23rd IEEE International Conference on High Performance Computing, HiPC 2016
T2 - 23rd IEEE International Conference on High Performance Computing, HiPC 2016
Y2 - 19 December 2016 through 22 December 2016
ER -