TY - JOUR
T1 - Asynchronous Automata Processing on GPUs
AU - Liu, Hongyuan
AU - Pai, Sreepathi
AU - Jog, Adwait
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/2/28
Y1 - 2023/2/28
N2 - Finite-state automata serve as compute kernels for many application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1)∼input stream level, 2)∼automaton-level and 3)∼state-level. Among these, only state-level parallelism is intrinsic to automata while the other two levels of parallelism depend on the number of automata and input streams to be processed. As GPU resources increase, a parallelism-limited automata processing task can underutilize GPU compute resources. To this end, we propose AsyncAP, a low-overhead approach that optimizes for both scalability and throughput. Our insight is that most automata processing tasks have an additional source of parallelism originating from the input symbols which has not been leveraged before. Making the matching process associated with the automata tasks asynchronous, i.e., parallel GPU threads start processing an input stream from different input locations instead of processing it serially, improves throughput significantly and scales with input length. When the task does not have enough parallelism to utilize all the GPU cores, detailed evaluation across 12 evaluated applications shows that AsyncAP achieves up to 2.4× speedup on average over the state-of-the-art GPU automata processing engine. When the tasks have enough parallelism to utilize GPU cores, AsyncAP still achieves 2.4× speedup.
AB - Finite-state automata serve as compute kernels for many application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1)∼input stream level, 2)∼automaton-level and 3)∼state-level. Among these, only state-level parallelism is intrinsic to automata while the other two levels of parallelism depend on the number of automata and input streams to be processed. As GPU resources increase, a parallelism-limited automata processing task can underutilize GPU compute resources. To this end, we propose AsyncAP, a low-overhead approach that optimizes for both scalability and throughput. Our insight is that most automata processing tasks have an additional source of parallelism originating from the input symbols which has not been leveraged before. Making the matching process associated with the automata tasks asynchronous, i.e., parallel GPU threads start processing an input stream from different input locations instead of processing it serially, improves throughput significantly and scales with input length. When the task does not have enough parallelism to utilize all the GPU cores, detailed evaluation across 12 evaluated applications shows that AsyncAP achieves up to 2.4× speedup on average over the state-of-the-art GPU automata processing engine. When the tasks have enough parallelism to utilize GPU cores, AsyncAP still achieves 2.4× speedup.
KW - automata processing
KW - GPGPUs
KW - pattern matching
UR - http://www.scopus.com/inward/record.url?scp=85149856268&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149856268&partnerID=8YFLogxK
U2 - 10.1145/3579453
DO - 10.1145/3579453
M3 - Article
AN - SCOPUS:85149856268
SN - 2476-1249
VL - 7
JO - Proceedings of the ACM on Measurement and Analysis of Computing Systems
JF - Proceedings of the ACM on Measurement and Analysis of Computing Systems
IS - 1
M1 - 3579453
ER -