TY - JOUR
T1 - Asynchronous Automata Processing on GPUs
AU - Liu, Hongyuan
AU - Pai, Sreepathi
AU - Jog, Adwait
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/6/19
Y1 - 2023/6/19
N2 - Finite-state automata serve as compute kernels for application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1) input stream level, 2) automaton-level, and 3) state-level. Among these, only state-level parallelism is intrinsic to automata while the other two levels of parallelism depend on the number of automata and input streams to be processed. As GPU resources increase, a parallelism-limited automata processing task can underutilize GPU compute resources. To overcome this, we propose AsyncAP, a low-overhead approach that optimizes scalability and throughput. Our insight is that most automata processing tasks have an additional source of parallelism originating from the input symbols which has not been leveraged before. By making the matching process asynchronous, which involves having parallel GPU threads process an input stream from different input locations instead of processing it serially, AsyncAP is able to significantly improve throughput and scale with input length. Detailed evaluation across 12 applications shows that AsyncAP achieves an average speedup of 58x speedup over the state-of-the-art GPU automata processing engine when the task does not have enough parallelism to utilize all GPU cores. When tasks have enough parallelism to utilize GPU cores, AsyncAP still achieves 2.4x speedup.
AB - Finite-state automata serve as compute kernels for application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1) input stream level, 2) automaton-level, and 3) state-level. Among these, only state-level parallelism is intrinsic to automata while the other two levels of parallelism depend on the number of automata and input streams to be processed. As GPU resources increase, a parallelism-limited automata processing task can underutilize GPU compute resources. To overcome this, we propose AsyncAP, a low-overhead approach that optimizes scalability and throughput. Our insight is that most automata processing tasks have an additional source of parallelism originating from the input symbols which has not been leveraged before. By making the matching process asynchronous, which involves having parallel GPU threads process an input stream from different input locations instead of processing it serially, AsyncAP is able to significantly improve throughput and scale with input length. Detailed evaluation across 12 applications shows that AsyncAP achieves an average speedup of 58x speedup over the state-of-the-art GPU automata processing engine when the task does not have enough parallelism to utilize all GPU cores. When tasks have enough parallelism to utilize GPU cores, AsyncAP still achieves 2.4x speedup.
KW - automata processing
KW - GPGPUS
KW - pattern matching
UR - http://www.scopus.com/inward/record.url?scp=85164271302&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85164271302&partnerID=8YFLogxK
U2 - 10.1145/3606376.3593524
DO - 10.1145/3606376.3593524
M3 - Article
AN - SCOPUS:85164271302
SN - 0163-5999
VL - 51
SP - 23
EP - 24
JO - Performance Evaluation Review
JF - Performance Evaluation Review
IS - 1
ER -