Lightweight dependency checking for parallelizing loops with non-deterministic dependency on GPU

Hongyuan Liu, King Tin Lam, Huanxin Lin, Cho Li Wang, Junchao Ma

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

General-purpose GPUs have been prevalent for a decade. Nevertheless, GPU programming remains an onerous job practically exclusive to veteran developers who must know both domain-specific knowledge and GPU architecture well. Although current parallelizing compilers that automatically parallelize and offload sizable loops onto the GPU have helped in unfettering the power of the GPU with minimal programming effort, there are still a family of loops that carry statically non-deterministic data dependencies and cannot be parallelized. To tackle this issue, we propose two lightweight dependency checking schemes that are very different from existing conservative compilers to assist parallelizing loops with non-deterministic data dependencies. Our schemes feature linear work complexity for memory operations, lower memory consumption compared to previous work, and minimal false positives by leveraging the lockstep execution on the GPU's SIMD lanes. Experiments done using microbenchmarking and real-life applications on the latest advanced AMD discrete GPUs show that our schemes can achieve 2.2 × speedup over existing solutions in dependency-free cases while only taking about 20% of time compared to existing solutions in the case with statically unproven loop-carried dependencies.

Original languageEnglish
Title of host publicationProceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
EditorsXiaofei Liao, Robert Lovas, Xipeng Shen, Ran Zheng
Pages884-893
Number of pages10
ISBN (Electronic)9781509044573
DOIs
StatePublished - 2 Jul 2016
Event22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016 - Wuhan, Hubei, China
Duration: 13 Dec 201616 Dec 2016

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume0
ISSN (Print)1521-9097

Conference

Conference22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
Country/TerritoryChina
CityWuhan, Hubei
Period13/12/1616/12/16

Keywords

  • Code Generation;
  • Dependency Checking
  • GPGPU
  • Loop Parallelization

Fingerprint

Dive into the research topics of 'Lightweight dependency checking for parallelizing loops with non-deterministic dependency on GPU'. Together they form a unique fingerprint.

Cite this