TY - JOUR
T1 - GHyPart
T2 - GPU-friendly End-to-End Hypergraph Partitioner
AU - Wu, Zhenlin
AU - Zhao, Haosong
AU - Liu, Hongyuan
AU - Wen, Wujie
AU - Li, Jiajia
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/3/21
Y1 - 2025/3/21
N2 - Hypergraph partitioning finds practical applications in various fields, such as high-performance computing and circuit partitioning in VLSI physical design, where high-performance solutions often demand substantial parallelism beyond what existing CPU-based solutions can offer. While GPUs are promising in this regard, their potential in hypergraph partitioning remains unexplored. In this work, we first develop an end-to-end deterministic hypergraph partitioner on GPUs, ported from state-of-the-art multi-threaded CPU work, and identify three major performance challenges by characterizing its performance. We propose the first end-to-end solution, gHyPart, to unleash the potentials of hypergraph partitioning on GPUs. To overcome the challenges of GPU thread underutilization due to imbalanced workload, long critical path, and high work complexity due to excessive operations, we redesign GPU algorithms with diverse parallelization strategies thus expanding optimization space; to address the challenge of no one-size-fits-all implementation for various input hypergraphs, we propose a decision tree-based strategy to choose a suitable parallelization strategy for each kernel. Evaluation on 500 hypergraphs shows up to 125.7× (17.5× on average), 640.0× (24.2× on average), and 171.6× (1.4× on average) speedups over two CPU partitioners and our GPU baseline gHyPart-B, respectively.
AB - Hypergraph partitioning finds practical applications in various fields, such as high-performance computing and circuit partitioning in VLSI physical design, where high-performance solutions often demand substantial parallelism beyond what existing CPU-based solutions can offer. While GPUs are promising in this regard, their potential in hypergraph partitioning remains unexplored. In this work, we first develop an end-to-end deterministic hypergraph partitioner on GPUs, ported from state-of-the-art multi-threaded CPU work, and identify three major performance challenges by characterizing its performance. We propose the first end-to-end solution, gHyPart, to unleash the potentials of hypergraph partitioning on GPUs. To overcome the challenges of GPU thread underutilization due to imbalanced workload, long critical path, and high work complexity due to excessive operations, we redesign GPU algorithms with diverse parallelization strategies thus expanding optimization space; to address the challenge of no one-size-fits-all implementation for various input hypergraphs, we propose a decision tree-based strategy to choose a suitable parallelization strategy for each kernel. Evaluation on 500 hypergraphs shows up to 125.7× (17.5× on average), 640.0× (24.2× on average), and 171.6× (1.4× on average) speedups over two CPU partitioners and our GPU baseline gHyPart-B, respectively.
KW - GPU
KW - Hypergraph partitioning
KW - parallelization strategies
UR - http://www.scopus.com/inward/record.url?scp=105003643670&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105003643670&partnerID=8YFLogxK
U2 - 10.1145/3711925
DO - 10.1145/3711925
M3 - Article
AN - SCOPUS:105003643670
SN - 1544-3566
VL - 22
JO - ACM Transactions on Architecture and Code Optimization
JF - ACM Transactions on Architecture and Code Optimization
IS - 1
M1 - 38
ER -