Why GPUs are slow at executing NFAs and how to make them faster

Hongyuan Liu, Sreepathi Pai, Adwait Jog

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

Non-deterministic Finite Automata (NFA) are space-efficient finite state machines that have significant applications in domains such as pattern matching and data analytics. In this paper, we investigate why the Graphics Processing Unit (GPU)-a massively parallel computational device with the highest memory bandwidth available on general-purpose processors-cannot efficiently execute NFAs. First, we identify excessive data movement in the GPU memory hierarchy and describe how to privatize reads effectively using GPU's on-chip memory hierarchy to reduce this excessive data movement. We also show that in several cases, indirect table lookups in NFAs can be eliminated by converting memory reads into computation, to further reduce the number of memory reads. Although our optimization techniques significantly alleviate these memory-related bottlenecks, a side effect of these techniques is the static assignment of work to cores. This leads to poor compute utilization, where GPU cores are wasted on idle NFA states. Therefore, we propose a new dynamic scheme that effectively balances compute utilization with reduced memory usage. Our combined optimizations provide a significant improvement over the previous state-ofthe-art GPU implementations of NFAs. Moreover, they enable current GPUs to outperform the domain-specific accelerator for NFAs (i.e., Automata Processor) across several applications while performing within an order of magnitude for the rest of the applications.

Original languageEnglish
Title of host publicationASPLOS 2020 - 25th International Conference on Architectural Support for Programming Languages and Operating Systems
Pages251-265
Number of pages15
ISBN (Electronic)9781450371025
DOIs
StatePublished - 9 Mar 2020
Event25th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2020 - Lausanne, Switzerland
Duration: 16 Mar 202020 Mar 2020

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Conference

Conference25th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2020
Country/TerritorySwitzerland
CityLausanne
Period16/03/2020/03/20

Keywords

  • Finite State Machine
  • GPU
  • Parallel Computing

Fingerprint

Dive into the research topics of 'Why GPUs are slow at executing NFAs and how to make them faster'. Together they form a unique fingerprint.

Cite this