TY - GEN
T1 - Demystifying automata processing
T2 - 31st ACM International Conference on Supercomputing, ICS 2017
AU - Nourian, Marziyeh
AU - Wang, Xiang
AU - Yu, Xiaodong
AU - Feng, Wu Chun
AU - Becchi, Michela
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/6/14
Y1 - 2017/6/14
N2 - Many established and emerging applications perform at their core some form of pattern matching, a computation that maps naturally onto finite automata abstractions. As a consequence, in recent years there has been a substantial amount of work on high-speed automata processing, which has led to a number of implementations targeting a variety of parallel platforms: CPUs, GPUs, FPGAs, ASICs, and Network Processors. More recently, Micron has announced its Automata Processor (AP), a DRAMbased accelerator of non-deterministic finite automata (NFA). Despite the abundance of work in this domain, the advantages and disadvantages of different automata processing accelerators and the innovation space in this area are still unclear. In this work we target this problem and propose a toolchain to allow an apples-to-apples comparison of NFA acceleration engines on three platforms: GPUs, FPGAs and Micron's AP. We discuss the automata optimizations that are applicable to these three platforms. We perform an evaluation on large-scale datasets: to this end, we propose an NFA partitioning algorithm that minimizes the number of state replications required to maintain functional equivalence with an unpartitioned NFA, and we evaluate the scalability of each implementation to both large NFAs and large numbers of input streams. Our experimental evaluation covers resource utilization, traversal throughput, and preprocessing overhead and shows that the FPGA provides the best traversal throughputs (on the order of Gbps) at the cost of significant preprocessing times (on the order of hours); GPUs deliver modest traversal throughputs (on the order of Mbps), but offer low preprocessing times (on the order of seconds or minutes) and good pattern densities (they can accommodate large datasets on a single device); Micron's AP delivers throughputs, pattern densities, and preprocessing times that are intermediate between those of FPGAs and GPUs, and it is most suited for applications that use datasets consisting of many small NFAs with a topology that is fixed and known a priori.
AB - Many established and emerging applications perform at their core some form of pattern matching, a computation that maps naturally onto finite automata abstractions. As a consequence, in recent years there has been a substantial amount of work on high-speed automata processing, which has led to a number of implementations targeting a variety of parallel platforms: CPUs, GPUs, FPGAs, ASICs, and Network Processors. More recently, Micron has announced its Automata Processor (AP), a DRAMbased accelerator of non-deterministic finite automata (NFA). Despite the abundance of work in this domain, the advantages and disadvantages of different automata processing accelerators and the innovation space in this area are still unclear. In this work we target this problem and propose a toolchain to allow an apples-to-apples comparison of NFA acceleration engines on three platforms: GPUs, FPGAs and Micron's AP. We discuss the automata optimizations that are applicable to these three platforms. We perform an evaluation on large-scale datasets: to this end, we propose an NFA partitioning algorithm that minimizes the number of state replications required to maintain functional equivalence with an unpartitioned NFA, and we evaluate the scalability of each implementation to both large NFAs and large numbers of input streams. Our experimental evaluation covers resource utilization, traversal throughput, and preprocessing overhead and shows that the FPGA provides the best traversal throughputs (on the order of Gbps) at the cost of significant preprocessing times (on the order of hours); GPUs deliver modest traversal throughputs (on the order of Mbps), but offer low preprocessing times (on the order of seconds or minutes) and good pattern densities (they can accommodate large datasets on a single device); Micron's AP delivers throughputs, pattern densities, and preprocessing times that are intermediate between those of FPGAs and GPUs, and it is most suited for applications that use datasets consisting of many small NFAs with a topology that is fixed and known a priori.
KW - Automata Processing
KW - FPGA
KW - GPU
KW - Micron's AP
KW - Pattern Matching
KW - Reconfigurable Computing
UR - http://www.scopus.com/inward/record.url?scp=85023779133&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85023779133&partnerID=8YFLogxK
U2 - 10.1145/3079079.3079100
DO - 10.1145/3079079.3079100
M3 - Conference contribution
AN - SCOPUS:85023779133
T3 - Proceedings of the International Conference on Supercomputing
BT - ICS 2017
Y2 - 13 June 2017 through 16 June 2017
ER -