TY - GEN
T1 - Fast multipole method on GPU
T2 - 2011 48th ACM/EDAC/IEEE Design Automation Conference, DAC 2011
AU - Zhao, Xueqian
AU - Feng, Zhuo
PY - 2011
Y1 - 2011
N2 - To facilitate full chip capacitance extraction, field solvers are typically deployed for characterizing capacitance libraries for various interconnect structures and configurations. In the past decades, various algorithms for accelerating boundary element methods (BEM) have been developed to improve the efficiency of field solvers for capacitance extraction. This paper presents the first massively parallel capacitance extraction algorithm FMMGpu that accelerates the well-known fast multipole methods (FMM) on modern Graphics Processing Units (GPUs). We propose GPU-friendly data structures and SIMD parallel algorithm flows to facilitate the FMM-based 3-D capacitance extraction on GPU. Effective GPU performance modeling methods are also proposed to properly balance the workload of each critical kernel in our FMMGpu implementation, by taking advantage of the latest Fermi GPU's concurrent kernel executions on streaming multiprocessors (SMs). Our experimental results show that FMMGpu brings 22X to 30X speedups in capacitance extractions for various test cases. We also show that even for small test cases that may not well utilize GPU's hardware resources, the proposed cube clustering and workload balancing techniques can bring 20% to 60% extra performance improvements.
AB - To facilitate full chip capacitance extraction, field solvers are typically deployed for characterizing capacitance libraries for various interconnect structures and configurations. In the past decades, various algorithms for accelerating boundary element methods (BEM) have been developed to improve the efficiency of field solvers for capacitance extraction. This paper presents the first massively parallel capacitance extraction algorithm FMMGpu that accelerates the well-known fast multipole methods (FMM) on modern Graphics Processing Units (GPUs). We propose GPU-friendly data structures and SIMD parallel algorithm flows to facilitate the FMM-based 3-D capacitance extraction on GPU. Effective GPU performance modeling methods are also proposed to properly balance the workload of each critical kernel in our FMMGpu implementation, by taking advantage of the latest Fermi GPU's concurrent kernel executions on streaming multiprocessors (SMs). Our experimental results show that FMMGpu brings 22X to 30X speedups in capacitance extractions for various test cases. We also show that even for small test cases that may not well utilize GPU's hardware resources, the proposed cube clustering and workload balancing techniques can bring 20% to 60% extra performance improvements.
KW - Capacitance extraction
KW - GPU
KW - parallel fast multipole method
UR - http://www.scopus.com/inward/record.url?scp=80052663049&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052663049&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:80052663049
SN - 9781450306362
T3 - Proceedings - Design Automation Conference
SP - 558
EP - 563
BT - 2011 48th ACM/EDAC/IEEE Design Automation Conference, DAC 2011
Y2 - 5 June 2011 through 9 June 2011
ER -