TY - GEN
T1 - A framework for fast and fair evaluation of automata processing hardware
AU - Yu, Xiaodong
AU - Hou, Kaixi
AU - Wang, Hao
AU - Feng, Wu Chun
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/5
Y1 - 2017/12/5
N2 - Programming Micron's Automata Processor (AP) requires expertise in both automata theory and the AP architecture, as programmers have to manually manipulate state transition elements (STEs) and their transitions with a low-level Automata Network Markup Language (ANML). When the required STEs of an application exceed the hardware capacity, multiple reconfigurations are needed. However, most previous AP-based designs limit the dataset size to fit into a single AP board and simply neglect the costly overhead of reconfiguration. This results in unfair performance comparisons between the AP and other processors. To address this issue, we propose a framework for the fast and fair evaluation of AP devices. Our framework provides a hierarchical approach that automatically generates automata for large datasets through user-defined paradigms and allows the use of cascadable macros to achieve highly optimized reconfigurations. We highlight the importance of counting the configuration time in the overall AP performance, which in turn, can provide better insight into identifying essential hardware features, specifically for large-scale problem sizes. Our framework shows that the AP can achieve up to 461x overall speedup fairly compared to CPU counterparts.
AB - Programming Micron's Automata Processor (AP) requires expertise in both automata theory and the AP architecture, as programmers have to manually manipulate state transition elements (STEs) and their transitions with a low-level Automata Network Markup Language (ANML). When the required STEs of an application exceed the hardware capacity, multiple reconfigurations are needed. However, most previous AP-based designs limit the dataset size to fit into a single AP board and simply neglect the costly overhead of reconfiguration. This results in unfair performance comparisons between the AP and other processors. To address this issue, we propose a framework for the fast and fair evaluation of AP devices. Our framework provides a hierarchical approach that automatically generates automata for large datasets through user-defined paradigms and allows the use of cascadable macros to achieve highly optimized reconfigurations. We highlight the importance of counting the configuration time in the overall AP performance, which in turn, can provide better insight into identifying essential hardware features, specifically for large-scale problem sizes. Our framework shows that the AP can achieve up to 461x overall speedup fairly compared to CPU counterparts.
UR - http://www.scopus.com/inward/record.url?scp=85046552585&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046552585&partnerID=8YFLogxK
U2 - 10.1109/IISWC.2017.8167767
DO - 10.1109/IISWC.2017.8167767
M3 - Conference contribution
AN - SCOPUS:85046552585
T3 - Proceedings of the 2017 IEEE International Symposium on Workload Characterization, IISWC 2017
SP - 120
EP - 121
BT - Proceedings of the 2017 IEEE International Symposium on Workload Characterization, IISWC 2017
T2 - 2017 IEEE International Symposium on Workload Characterization, IISWC 2017
Y2 - 1 October 2017 through 3 October 2017
ER -