Abstract
Data-parallel distributed training (DDT) has become the de-facto standard for accelerating the training of most deep learning tasks on massively parallel hardware. In the DDT paradigm, the communication overhead of gradient synchronization is the major efficiency bottleneck. A widely adopted approach to tackle this issue is gradient sparsification (GS). However, the current GS methods introduce significant new overhead in compressing the gradients, outweighing the communication overhead and becoming the new efficiency bottleneck. In this paper, we propose DRAGONN, a randomized hashing algorithm for GS in DDT. DRAGONN can significantly reduce the compression time by up to 70% compared to state-of-the-art GS approaches, and achieve up to 3.52× speedup in total training throughput.
| Original language | English |
|---|---|
| Pages (from-to) | 23274-23291 |
| Number of pages | 18 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 162 |
| State | Published - 2022 |
| Event | 39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States Duration: 17 Jul 2022 → 23 Jul 2022 |
Fingerprint
Dive into the research topics of 'DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver