DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks

Zhuang Wang, Zhaozhuo Xu, Xinyu Crystal Wu, Anshumali Shrivastava, T. S.Eugene Ng

Research output: Contribution to journalConference articlepeer-review

5 Scopus citations

Abstract

Data-parallel distributed training (DDT) has become the de-facto standard for accelerating the training of most deep learning tasks on massively parallel hardware. In the DDT paradigm, the communication overhead of gradient synchronization is the major efficiency bottleneck. A widely adopted approach to tackle this issue is gradient sparsification (GS). However, the current GS methods introduce significant new overhead in compressing the gradients, outweighing the communication overhead and becoming the new efficiency bottleneck. In this paper, we propose DRAGONN, a randomized hashing algorithm for GS in DDT. DRAGONN can significantly reduce the compression time by up to 70% compared to state-of-the-art GS approaches, and achieve up to 3.52× speedup in total training throughput.

Original languageEnglish
Pages (from-to)23274-23291
Number of pages18
JournalProceedings of Machine Learning Research
Volume162
StatePublished - 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: 17 Jul 202223 Jul 2022

Fingerprint

Dive into the research topics of 'DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks'. Together they form a unique fingerprint.

Cite this