TY - GEN
T1 - Optimized Parallel Breadth-First Search with Adaptive Strategies
AU - Li, Chaoqun
AU - Hu, Runbang
AU - Du, Xiaojiang
AU - Ji, Yuede
N1 - Publisher Copyright:
© 2025 Copyright is held by the owner/author(s).
PY - 2025/5/2
Y1 - 2025/5/2
N2 - Breadth-First Search (BFS) is a fundamental graph traversal algorithm in a level-by-level pattern. It has been widely used in real-world applications, such as social network analysis, scientific computing, and web crawling. However, achieving high performance for BFS on large-scale graphs remains a challenging task due to irregular memory access patterns, diverse graph structures, and the necessity for efficient parallelization. This paper addresses these challenges by designing a highly optimized parallel BFS implementation based on the top-down and bottom-up traversal strategies. It further integrates several key innovations, including graph type-aware computation strategy selection, graph pruning, two-level bottom-up, and efficient parallel implementation. We evaluate our method on 11 diverse graphs in terms of size, diameter, and density. On a CPU server with 48 threads, our method achieves an average speedup of 9.5× over the serial BFS implementation. Also, on a synthetic dense graph, our method processes 9.3 billion edges per second, showing its efficiency in dense graph traversal.
AB - Breadth-First Search (BFS) is a fundamental graph traversal algorithm in a level-by-level pattern. It has been widely used in real-world applications, such as social network analysis, scientific computing, and web crawling. However, achieving high performance for BFS on large-scale graphs remains a challenging task due to irregular memory access patterns, diverse graph structures, and the necessity for efficient parallelization. This paper addresses these challenges by designing a highly optimized parallel BFS implementation based on the top-down and bottom-up traversal strategies. It further integrates several key innovations, including graph type-aware computation strategy selection, graph pruning, two-level bottom-up, and efficient parallel implementation. We evaluate our method on 11 diverse graphs in terms of size, diameter, and density. On a CPU server with 48 threads, our method achieves an average speedup of 9.5× over the serial BFS implementation. Also, on a synthetic dense graph, our method processes 9.3 billion edges per second, showing its efficiency in dense graph traversal.
UR - https://www.scopus.com/pages/publications/105008268265
UR - https://www.scopus.com/pages/publications/105008268265#tab=citedBy
U2 - 10.1145/3711708.3723449
DO - 10.1145/3711708.3723449
M3 - Conference contribution
AN - SCOPUS:105008268265
T3 - Proceedings of the 1st FastCode Programming Challenge, FCPC 2025
SP - 28
EP - 32
BT - Proceedings of the 1st FastCode Programming Challenge, FCPC 2025
T2 - 1st FastCode Programming Challenge, FCPC 2025
Y2 - 1 March 2025 through 5 March 2025
ER -