Abstract
Finite state automata are essential in various domains, such as pattern matching and data analytics, where high throughput is critical. Recent work has explored representing automata execution as matrix algebra and leveraging CPU Basic Linear Algebra Subprograms (BLAS) libraries. While promising, this approach faces bottlenecks in memory usage, data locality, and redundant computation. This work systematically identifies these bottlenecks and develops automata-specific optimizations to address them. We focus on GPUs due to their high compute capabilities and widespread availability. To overcome these challenges, we propose three key techniques to enhance computational and memory efficiency: (1) transition matrix deduplication to reduce memory usage, (2) interleaved state renumbering to improve GPU thread utilization, and (3) state vector caching to eliminate redundant computations. Detailed evaluation shows that the proposed solution matches GPU-based automata engines in performance (up to 6.54× speedup) while using less than 2% of their memory footprint, and outperforms state-of-the-art domain-specific accelerators (up to 965×) for automata workloads.
| Original language | English |
|---|---|
| Article number | 156 |
| Journal | ACM Transactions on Architecture and Code Optimization |
| Volume | 22 |
| Issue number | 4 |
| DOIs | |
| State | Published - 16 Dec 2025 |
Keywords
- Automata processing
- GPU
- sparse matrix-vector multiplication (SpMV)
Fingerprint
Dive into the research topics of 'Advancing Matrix Operations for High-Performance and Memory-Efficient Automata Processing on GPUs'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver