TY - GEN
T1 - Centimani
T2 - 2024 USENIX Annual Technical Conference, ATC 2024
AU - Xie, Zhen
AU - Emani, Murali
AU - Yu, Xiaodong
AU - Tao, Dingwen
AU - He, Xin
AU - Su, Pengfei
AU - Zhou, Keren
AU - Vishwanath, Venkatram
N1 - Publisher Copyright:
© 2024 Proceedings of the 2024 USENIX Annual Technical Conference, ATC 2024. All rights reserved.
PY - 2024
Y1 - 2024
N2 - For an extended period, graphics processing units (GPUs) have stood as the exclusive choice for training deep neural network (DNN) models. Over time, to serve the growing demands in a more targeted manner, various artificial intelligence-specific hardware, referred to as AI accelerators, have been vigorously developed, aiming to provide more efficient DNN acceleration solutions. However, sufficient solutions are also heterogeneous and thus introduce complexities in accelerator selection. Given a DNN model and a training objective, such as throughput or price-performance ratio, it remains challenging to arrive at the optimal decision among many options due to high reimplementation costs and unexpected performance. To tackle this challenge, we propose Centimani, a performance predictor that accurately and rapidly predicts DNN training throughput on various AI accelerators, thereby facilitating the accelerator selection process. To achieve this goal, we first analyze typical AI accelerators and draw observations that abstract AI accelerator designs and guide our performance modeling approach. In particular, we construct a memory estimation model and decoupled performance models to select the most appropriate batch size and predict the execution time of DNN training. We validate our approach by applying Centimani to six common DNN models on four typical AI accelerators. Results show that Centimani predicts the throughput with an average accuracy of 93.1% on single-device training and 90.4% on multiple-device training, thus the optimal accelerator corresponding to the user’s training objective can be obtained.
AB - For an extended period, graphics processing units (GPUs) have stood as the exclusive choice for training deep neural network (DNN) models. Over time, to serve the growing demands in a more targeted manner, various artificial intelligence-specific hardware, referred to as AI accelerators, have been vigorously developed, aiming to provide more efficient DNN acceleration solutions. However, sufficient solutions are also heterogeneous and thus introduce complexities in accelerator selection. Given a DNN model and a training objective, such as throughput or price-performance ratio, it remains challenging to arrive at the optimal decision among many options due to high reimplementation costs and unexpected performance. To tackle this challenge, we propose Centimani, a performance predictor that accurately and rapidly predicts DNN training throughput on various AI accelerators, thereby facilitating the accelerator selection process. To achieve this goal, we first analyze typical AI accelerators and draw observations that abstract AI accelerator designs and guide our performance modeling approach. In particular, we construct a memory estimation model and decoupled performance models to select the most appropriate batch size and predict the execution time of DNN training. We validate our approach by applying Centimani to six common DNN models on four typical AI accelerators. Results show that Centimani predicts the throughput with an average accuracy of 93.1% on single-device training and 90.4% on multiple-device training, thus the optimal accelerator corresponding to the user’s training objective can be obtained.
UR - http://www.scopus.com/inward/record.url?scp=85201221892&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85201221892&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85201221892
T3 - Proceedings of the 2024 USENIX Annual Technical Conference, ATC 2024
SP - 1203
EP - 1221
BT - Proceedings of the 2024 USENIX Annual Technical Conference, ATC 2024
Y2 - 10 July 2024 through 12 July 2024
ER -