Skip to main navigation Skip to search Skip to main content

Advancing Matrix Operations for High-Performance and Memory-Efficient Automata Processing on GPUs

  • The Hong Kong University of Science and Technology (Guangzhou)
  • North Carolina State University

Research output: Contribution to journalArticlepeer-review

Abstract

Finite state automata are essential in various domains, such as pattern matching and data analytics, where high throughput is critical. Recent work has explored representing automata execution as matrix algebra and leveraging CPU Basic Linear Algebra Subprograms (BLAS) libraries. While promising, this approach faces bottlenecks in memory usage, data locality, and redundant computation. This work systematically identifies these bottlenecks and develops automata-specific optimizations to address them. We focus on GPUs due to their high compute capabilities and widespread availability. To overcome these challenges, we propose three key techniques to enhance computational and memory efficiency: (1) transition matrix deduplication to reduce memory usage, (2) interleaved state renumbering to improve GPU thread utilization, and (3) state vector caching to eliminate redundant computations. Detailed evaluation shows that the proposed solution matches GPU-based automata engines in performance (up to 6.54× speedup) while using less than 2% of their memory footprint, and outperforms state-of-the-art domain-specific accelerators (up to 965×) for automata workloads.

Original languageEnglish
Article number156
JournalACM Transactions on Architecture and Code Optimization
Volume22
Issue number4
DOIs
StatePublished - 16 Dec 2025

Keywords

  • Automata processing
  • GPU
  • sparse matrix-vector multiplication (SpMV)

Fingerprint

Dive into the research topics of 'Advancing Matrix Operations for High-Performance and Memory-Efficient Automata Processing on GPUs'. Together they form a unique fingerprint.

Cite this