On-GPU thread-data remapping for branch divergence reduction

Huanxin Lin, Cho Li Wang, Hongyuan Liu

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

General Purpose GPU computing (GPGPU) plays an increasingly vital role in high performance computing and other areas like deep learning. However, arising from the SIMD execution model, the branch divergence issue lowers efciency of conditional branching on GPUs, and hinders the development of GPGPU. To achieve runtime on-the-spot branch divergence reduction, we propose the frst on-GPU thread-data remapping scheme. Before kernel launching, our solution inserts codes into GPU kernels immediately before each target branch so as to acquire actual runtime divergence information. GPU software threads can be remapped to datasets multiple times during single kernel execution. We propose two thread-data remapping algorithms that are tailored to the GPU architecture. Effective on two generations of GPUs from both NVIDIA and AMD, our solution achieves speedups up to 2.718 with third-party benchmarks. We also implement three GPGPU frontier benchmarks from areas including computer vision, algorithmic trading and data analytics. They are hindered by more complex divergence coupled with different memory access patterns, and our solution works better than the traditional thread-data remapping scheme in all cases. As a compiler-assisted runtime solution, it can better reduce divergence for divergent applications that gain little acceleration on GPUs for the time being.

Original languageEnglish
Article number39
JournalACM Transactions on Architecture and Code Optimization
Volume15
Issue number3
DOIs
StatePublished - Oct 2018

Keywords

  • Branch divergence
  • GPGPU
  • Parallel computing
  • SIMD

Fingerprint

Dive into the research topics of 'On-GPU thread-data remapping for branch divergence reduction'. Together they form a unique fingerprint.

Cite this