Accommodating Transformer onto FPGA: Coupling the Balanced Model Compression and FPGA-Implementation Optimization

Panjie Qi, Yuhong Song, Hongwu Peng, Shaoyi Huang, Qingfeng Zhuge, Edwin Hsing Mean Sha

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

34 Scopus citations

Abstract

Recently, Transformers gradually gain popularity and perform outstanding for many Natural Language Processing (NLP) tasks. However, Transformers suffer from heavy computation and memory footprint, making it difficult to deploy on embedded devices. The field-programmable gate array (FPGA) is widely used to accelerate deep learning algorithms for its advantages. However, the trained Transformer models are too large to accommodate to an FPGA fabric. To accommodate Transformer onto FPGA and achieve efficient execution, we propose an acceleration framework coupling the balanced model compression at the algorithm level and FPGA-implementation optimization at the hardware level. At algorithm level, we adopt a block-balanced pruning and propose an efficient sparse matrix storage format for this pruning technique, named Compressed Block Row (CBR). At the hardware level, we design an accelerator for sparse model. And we also abstract a performance analytic model to evaluate the performance of accelerator. Experiments show that our CBR format perform better than general formats and can significantly save storage space. And our accelerator can achieve $38\times$ and $1.93\times$ speedup compared to other works on CPU and GPU respectively.

Original languageEnglish
Title of host publicationGLSVLSI 2021 - Proceedings of the 2021 Great Lakes Symposium on VLSI
Pages163-168
Number of pages6
ISBN (Electronic)9781450383936
DOIs
StatePublished - 22 Jun 2021
Event31st Great Lakes Symposium on VLSI, GLSVLSI 2021 - Virtual, Online, United States
Duration: 22 Jun 202125 Jun 2021

Publication series

NameProceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI

Conference

Conference31st Great Lakes Symposium on VLSI, GLSVLSI 2021
Country/TerritoryUnited States
CityVirtual, Online
Period22/06/2125/06/21

Keywords

  • fpga
  • model compression
  • nlp
  • transformer

Fingerprint

Dive into the research topics of 'Accommodating Transformer onto FPGA: Coupling the Balanced Model Compression and FPGA-Implementation Optimization'. Together they form a unique fingerprint.

Cite this