Analyzing and leveraging remote-core bandwidth for enhanced performance in GPUs

Mohamed Assem Ibrahim, Hongyuan Liu, Onur Kayiran, Adwait Jog

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

Bandwidth achieved from local/shared caches and memory is a major performance determinant in Graphics Processing Units (GPUs). These existing sources of bandwidth are often not enough for optimal GPU performance. Therefore, to enhance the performance further, we focus on efficiently unlocking an additional potential source of bandwidth, which we call as remote-core bandwidth. The source of this bandwidth is based on the observation that a fraction of data (i.e., L1 read misses) required by one GPU core can also be found in the local (L1) caches of other GPU cores. In this paper, we propose to efficiently coordinate the data movement across cores in GPUs to exploit this remote-core bandwidth. However, we find that its efficient detection and utilization presents several challenges. To this end, we specifically address: A) which data is shared across cores, b) which cores have the shared data, and c) how we can get the data as soon as possible. Our extensive evaluation across a wide set of GPGPU applications shows that significant performance improvement can be achieved at a modest hardware cost on account of the additional bandwidth received from the remote cores.

Original languageEnglish
Title of host publicationProceedings - 2019 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019
Pages257-270
Number of pages14
ISBN (Electronic)9781728136134
DOIs
StatePublished - Sep 2019
Event28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019 - Seattle, United States
Duration: 21 Sep 201925 Sep 2019

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
Volume2019-September
ISSN (Print)1089-795X

Conference

Conference28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019
Country/TerritoryUnited States
CitySeattle
Period21/09/1925/09/19

Keywords

  • Bandwidth
  • GPUs
  • Network-on-Chip

Fingerprint

Dive into the research topics of 'Analyzing and leveraging remote-core bandwidth for enhanced performance in GPUs'. Together they form a unique fingerprint.

Cite this