TY - JOUR
T1 - Mitigating Bottlenecks in Wide Area Data Analytics via Machine Learning
AU - Wang, Hao
AU - Li, Baochun
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Over the past decade, we have witnessed exponential growth in the density (petabyte-level) and breadth (across geo-distributed datacenters) of data distribution. It becomes increasingly challenging but imperative to minimize the response times of data analytic queries over multiple geo-distributed datacenters. However, existing scheduling-based solutions have largely been motivated by pre-established mantras (e.g., bandwidth scarcity). Without data-driven insights into performance bottlenecks at runtime, schedulers might blindly assign tasks to workers that are suffering from unidentified bottlenecks. In this paper, we present Lube, a system framework that minimizes query response times by detecting and mitigating bottlenecks at runtime. Lube monitors geo-distributed data analytic queries in real-time, detects potential bottlenecks, and mitigates them with a bottleneck-aware scheduling policy. Our preliminary experiments on a real-world prototype across Amazon EC2 regions have shown that Lube can detect bottlenecks with over 90 percent accuracy, and reduce the median query response time by up to 33 percent compared to Spark's built-in locality-based scheduler.
AB - Over the past decade, we have witnessed exponential growth in the density (petabyte-level) and breadth (across geo-distributed datacenters) of data distribution. It becomes increasingly challenging but imperative to minimize the response times of data analytic queries over multiple geo-distributed datacenters. However, existing scheduling-based solutions have largely been motivated by pre-established mantras (e.g., bandwidth scarcity). Without data-driven insights into performance bottlenecks at runtime, schedulers might blindly assign tasks to workers that are suffering from unidentified bottlenecks. In this paper, we present Lube, a system framework that minimizes query response times by detecting and mitigating bottlenecks at runtime. Lube monitors geo-distributed data analytic queries in real-time, detects potential bottlenecks, and mitigates them with a bottleneck-aware scheduling policy. Our preliminary experiments on a real-world prototype across Amazon EC2 regions have shown that Lube can detect bottlenecks with over 90 percent accuracy, and reduce the median query response time by up to 33 percent compared to Spark's built-in locality-based scheduler.
KW - bottleneck detection
KW - data analytics
KW - machine learning
KW - performance prediction
KW - task scheduling
KW - Wide area
UR - http://www.scopus.com/inward/record.url?scp=85044068855&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85044068855&partnerID=8YFLogxK
U2 - 10.1109/TNSE.2018.2816951
DO - 10.1109/TNSE.2018.2816951
M3 - Article
AN - SCOPUS:85044068855
VL - 7
SP - 155
EP - 166
JO - IEEE Transactions on Network Science and Engineering
JF - IEEE Transactions on Network Science and Engineering
IS - 1
M1 - 8319505
ER -