MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training

Hongwu Peng, Xi Xie, Kaustubh Shivdikar, Md Amit Hasan, Jiahui Zhao, Shaoyi Huang, Omer Khan, David Kaeli, Caiwen Ding

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

In the acceleration of deep neural network training, the graphics processing unit (GPU) has become the mainstream platform. GPUs face substantial challenges on Graph Neural Networks (GNNs), such as workload imbalance and memory access irregularities, leading to underutilized hardware. Existing solutions such as PyG, DGL with cuSPARSE, and GNNAdvisor frameworks partially address these challenges. However, the memory traffic involved with Sparse-Dense Matrix Matrix Multiplication (SpMM) is still significant.We argue that drastic performance improvements can only be achieved by the vertical optimization of algorithm and system innovations, rather than treating the speedup optimization as an "after-thought"(i.e., (i) given a GNN algorithm, designing an accelerator, or (ii) given hardware, mainly optimizing the GNN algorithm). In this paper, we present MaxK-GNN, an advanced high-performance GPU training system integrating algorithm and system innovation. (i) We introduce the MaxK nonlinearity and provide a theoretical analysis of MaxK nonlinearity as a universal approximator, and present the Compressed Balanced Sparse Row (CBSR) format, designed to store the data and index of the feature matrix after nonlinearity; (ii) We design a coalescing enhanced forward computation with row-wise product-based Sparse Matrix-Matrix Multiplication (SpGEMM) Kernel using CBSR for input feature matrix fetching and strategic placement of a sparse output accumulation buffer in shared memory; (iii) We develop an optimized backward computation with outer product-based and Sampled Sparse Matrix Dense Matrix Multiplication (SSpMM) Kernel.We conduct extensive evaluations of MaxK-GNN and report the system training time. Experiments show that MaxK-GNN system could approach the speedup limit according to Amdahl's law. We achieve comparable accuracy to SOTA GNNs, but at a significantly increased speed: 3.22×/4.24× speedup (vs. 5.52×/7.27×) on Reddit compared to DGL and GNNAdvisor implementations.

Original languageEnglish
Title of host publicationSummer Cycle
Pages683-698
Number of pages16
ISBN (Electronic)9798400703850
DOIs
StatePublished - 27 Apr 2024
Event29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024 - San Diego, United States
Duration: 27 Apr 20241 May 2024

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
Volume2

Conference

Conference29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024
Country/TerritoryUnited States
CitySan Diego
Period27/04/241/05/24

Keywords

  • GPUs
  • graph neural network
  • MaxK nonlinearity
  • parallel computing
  • sampled sparse matrix dense matrix multiplication (SSpMM)
  • SpGEMM
  • SpMM

Fingerprint

Dive into the research topics of 'MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training'. Together they form a unique fingerprint.

Cite this