TY - GEN
T1 - Accelerating DNN Architecture Search at Scale Using Selective Weight Transfer
AU - Liu, Hongyuan
AU - Nicolae, Bogdan
AU - Di, Sheng
AU - Cappello, Franck
AU - Jog, Adwait
N1 - Publisher Copyright:
©2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Deep learning applications are rapidly gaining traction both in industry and scientific computing. Unsurprisingly, there has been significant interest in adopting deep learning at a very large scale on supercomputing infrastructures for a variety of scientific applications. A key issue in this context is how to find an appropriate model architecture that is suitable to solve the problem. We call this the neural architecture search (NAS) problem. Over time, many automated approaches have been proposed that can explore a large number of candidate models. However, this remains a time-consuming and resource expensive process: the candidates are often trained from scratch for a small number of epochs in order to obtain a set of top-K best performers, which are fully trained in a second phase. To address this problem, we propose a novel method that leverages checkpoints of previously discovered candidates to accelerate NAS. Based on the observation that the candidates feature high structural similarity, we propose the idea that new candidates need not be trained starting from random weights, but rather from the weights of similar layers of previously evaluated candidates. Thanks to this approach, the convergence of the candidate models can be significantly accelerated and produces candidates that are statistically better based on the objective metrics. Furthermore, once the top-K models are identified, our approach provides a significant speed-up (1.4~1.5× on the average) for the full training.
AB - Deep learning applications are rapidly gaining traction both in industry and scientific computing. Unsurprisingly, there has been significant interest in adopting deep learning at a very large scale on supercomputing infrastructures for a variety of scientific applications. A key issue in this context is how to find an appropriate model architecture that is suitable to solve the problem. We call this the neural architecture search (NAS) problem. Over time, many automated approaches have been proposed that can explore a large number of candidate models. However, this remains a time-consuming and resource expensive process: the candidates are often trained from scratch for a small number of epochs in order to obtain a set of top-K best performers, which are fully trained in a second phase. To address this problem, we propose a novel method that leverages checkpoints of previously discovered candidates to accelerate NAS. Based on the observation that the candidates feature high structural similarity, we propose the idea that new candidates need not be trained starting from random weights, but rather from the weights of similar layers of previously evaluated candidates. Thanks to this approach, the convergence of the candidate models can be significantly accelerated and produces candidates that are statistically better based on the objective metrics. Furthermore, once the top-K models are identified, our approach provides a significant speed-up (1.4~1.5× on the average) for the full training.
KW - Checkpointing
KW - Deep Learning
KW - Neural Architecture Search
UR - http://www.scopus.com/inward/record.url?scp=85118915124&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118915124&partnerID=8YFLogxK
U2 - 10.1109/Cluster48925.2021.00051
DO - 10.1109/Cluster48925.2021.00051
M3 - Conference contribution
AN - SCOPUS:85118915124
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 82
EP - 93
BT - Proceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
T2 - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
Y2 - 7 September 2021 through 10 September 2021
ER -