COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers

Baixi Sun, Weijin Liu, J. Gregory Pauloski, Jiannan Tian, Jinda Jia, Daoce Wang, Boyuan Zhang, Mingkai Zheng, Sheng Di, Sian Jin, Zhao Zhang, Xiaodong Yu, Kamil A. Iskra, Pete Beckman, Guangming Tan, Dingwen Tao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Second-order optimization methods have been developed to enhance convergence and generalization in deep neural network (DNN) training compared to first-order methods like Stochastic Gradient Descent (SGD). However, these methods face challenges in distributed settings due to high communication overhead. Gradient compression, a technique commonly used to accelerate communication for first-order approaches, often results in low communication reduction ratios, decreased model accuracy, and/or high compression overhead when applied to second-order methods. To address these limitations, we introduce a novel gradient compression method for second-order optimizers called COMPSO. This method effectively reduces communication costs while preserving the advantages of second-order optimization. COMPSO employs stochastic rounding to maintain accuracy and filters out minor gradients to improve compression ratios. Additionally, we develop GPU optimizations to minimize compression overhead and performance modeling to ensure end-to-end performance gains across various systems. Evaluation of COMPSO on different DNN models shows that it achieves a compression ratio of 22.1×, reduces communication time by 14.2×, and improves overall performance by 1.9×, all without any drop in model accuracy.

Original languageEnglish
Title of host publicationPPoPP 2025 - Proceedings of the 2025 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
Pages212-224
Number of pages13
ISBN (Electronic)9798400714436
DOIs
StatePublished - 28 Feb 2025
Event30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2025 - Las Vegas, United States
Duration: 1 Mar 20255 Mar 2025

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
ISSN (Print)1542-0205

Conference

Conference30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2025
Country/TerritoryUnited States
CityLas Vegas
Period1/03/255/03/25

Keywords

  • data compression
  • Deep learning
  • distributed training
  • K-FAC
  • second-order optimization

Fingerprint

Dive into the research topics of 'COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers'. Together they form a unique fingerprint.

Cite this