TY - GEN
T1 - Asynchronous Automata Processing on GPUs
AU - Liu, Hongyuan
AU - Pai, Sreepathi
AU - Jog, Adwait
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/6/19
Y1 - 2023/6/19
N2 - Finite-state automata serve as compute kernels for application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1) input stream level, 2) automaton-level, and 3) state-level. Among these, only state-level parallelism is intrinsic to automata while the other two levels of parallelism depend on the number of automata and input streams to be processed. As GPU resources increase, a parallelism-limited automata processing task can underutilize GPU compute resources. To overcome this, we propose AsyncAP, a low-overhead approach that optimizes scalability and throughput. Our insight is that most automata processing tasks have an additional source of parallelism originating from the input symbols which has not been leveraged before. By making the matching process asynchronous, which involves having parallel GPU threads process an input stream from different input locations instead of processing it serially, AsyncAP is able to significantly improve throughput and scale with input length. Detailed evaluation across 12 applications shows that AsyncAP achieves an average speedup of 58x speedup over the state-of-The-Art GPU automata processing engine when the task does not have enough parallelism to utilize all GPU cores. When tasks have enough parallelism to utilize GPU cores, AsyncAP still achieves 2.4x speedup.
AB - Finite-state automata serve as compute kernels for application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1) input stream level, 2) automaton-level, and 3) state-level. Among these, only state-level parallelism is intrinsic to automata while the other two levels of parallelism depend on the number of automata and input streams to be processed. As GPU resources increase, a parallelism-limited automata processing task can underutilize GPU compute resources. To overcome this, we propose AsyncAP, a low-overhead approach that optimizes scalability and throughput. Our insight is that most automata processing tasks have an additional source of parallelism originating from the input symbols which has not been leveraged before. By making the matching process asynchronous, which involves having parallel GPU threads process an input stream from different input locations instead of processing it serially, AsyncAP is able to significantly improve throughput and scale with input length. Detailed evaluation across 12 applications shows that AsyncAP achieves an average speedup of 58x speedup over the state-of-The-Art GPU automata processing engine when the task does not have enough parallelism to utilize all GPU cores. When tasks have enough parallelism to utilize GPU cores, AsyncAP still achieves 2.4x speedup.
KW - automata processing
KW - gpgpus
KW - pattern matching
UR - http://www.scopus.com/inward/record.url?scp=85163738538&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163738538&partnerID=8YFLogxK
U2 - 10.1145/3578338.3593524
DO - 10.1145/3578338.3593524
M3 - Conference contribution
AN - SCOPUS:85163738538
T3 - SIGMETRICS 2023 - Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
SP - 23
EP - 24
BT - SIGMETRICS 2023 - Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
T2 - 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2023
Y2 - 19 June 2023 through 23 June 2023
ER -