TY - GEN
T1 - Sparse Progressive Distillation
T2 - 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
AU - Huang, Shaoyi
AU - Xu, Dongkuan
AU - Yen, Ian En Hsu
AU - Wang, Yijue
AU - Chang, Sung En
AU - Li, Bingbing
AU - Chen, Shiyang
AU - Xie, Mimi
AU - Rajasekaran, Sanguthevar
AU - Liu, Hang
AU - Ding, Caiwen
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Conventional wisdom in pruning Transformer-based language models is that pruning reduces the model expressiveness and thus is more likely to underfit rather than overfit. However, under the trending pretrain-and-finetune paradigm, we postulate a counter-traditional hypothesis, that is: pruning increases the risk of overfitting when performed at the fine-tuning phase. In this paper, we aim to address the overfitting problem and improve pruning performance via progressive knowledge distillation with error-bound properties. We show for the first time that reducing the risk of overfitting can help the effectiveness of pruning under the pretrain-and-finetune paradigm. Ablation studies and experiments on the GLUE benchmark show that our method outperforms the leading competitors across different tasks.
AB - Conventional wisdom in pruning Transformer-based language models is that pruning reduces the model expressiveness and thus is more likely to underfit rather than overfit. However, under the trending pretrain-and-finetune paradigm, we postulate a counter-traditional hypothesis, that is: pruning increases the risk of overfitting when performed at the fine-tuning phase. In this paper, we aim to address the overfitting problem and improve pruning performance via progressive knowledge distillation with error-bound properties. We show for the first time that reducing the risk of overfitting can help the effectiveness of pruning under the pretrain-and-finetune paradigm. Ablation studies and experiments on the GLUE benchmark show that our method outperforms the leading competitors across different tasks.
UR - http://www.scopus.com/inward/record.url?scp=85138618135&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138618135&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.acl-long.16
DO - 10.18653/v1/2022.acl-long.16
M3 - Conference contribution
AN - SCOPUS:85138618135
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 190
EP - 200
BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
A2 - Muresan, Smaranda
A2 - Nakov, Preslav
A2 - Villavicencio, Aline
Y2 - 22 May 2022 through 27 May 2022
ER -