TY - GEN
T1 - CoFlux
T2 - 2019 International Symposium on Quality of Service, IWQoS 2019
AU - Su, Ya
AU - Zhao, Youjian
AU - Xia, Wentao
AU - Liu, Rong
AU - Bu, Jiahao
AU - Zhu, Jing
AU - Cao, Yuanpu
AU - Li, Haibin
AU - Niu, Chenhao
AU - Zhang, Yiyin
AU - Wang, Zhaogang
AU - Pei, Dan
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/6/24
Y1 - 2019/6/24
N2 - Internet-based service companies monitor a large number of KPIs (Key Performance Indicators) to ensure their service quality and reliability. Correlating KPIs by fluctuations reveals interactions between KPIs under anomalous situations and can be extremely useful for service troubleshooting. However, such a KPI fluxcorrelation has been little studied so far in the domain of Internet service operations management. A major challenge is how to automatically and accurately separate fluctuations from normal variations in KPIs with different structural characteristics (such as seasonal, trend and stationary) for a large number of KPIs. In this paper, we propose CoFlux, an unsupervised approach, to automatically (without manual selection of algorithm fitting and parameter tuning) determine whether two KPIs are correlated by fluctuations, in what temporal order they fluctuate, and whether they fluctuate in the same direction. CoFlux's robust feature engineering and robust correlation score computation enable it to work well against the diverse KPI characteristics. Our extensive experiments have demonstrated that CoFlux achieves the best F1- Scores of 0.84 (0.90), 0.92 (0.95), 0.95 (0.99), in answering these three questions, in the two real datasets from a top global Internet company, respectively. Moreover, we showed that CoFlux is effective in assisting service troubleshooting through the applications of alert compression, recommending Top N causes, and constructing fluctuation propagation chains.
AB - Internet-based service companies monitor a large number of KPIs (Key Performance Indicators) to ensure their service quality and reliability. Correlating KPIs by fluctuations reveals interactions between KPIs under anomalous situations and can be extremely useful for service troubleshooting. However, such a KPI fluxcorrelation has been little studied so far in the domain of Internet service operations management. A major challenge is how to automatically and accurately separate fluctuations from normal variations in KPIs with different structural characteristics (such as seasonal, trend and stationary) for a large number of KPIs. In this paper, we propose CoFlux, an unsupervised approach, to automatically (without manual selection of algorithm fitting and parameter tuning) determine whether two KPIs are correlated by fluctuations, in what temporal order they fluctuate, and whether they fluctuate in the same direction. CoFlux's robust feature engineering and robust correlation score computation enable it to work well against the diverse KPI characteristics. Our extensive experiments have demonstrated that CoFlux achieves the best F1- Scores of 0.84 (0.90), 0.92 (0.95), 0.95 (0.99), in answering these three questions, in the two real datasets from a top global Internet company, respectively. Moreover, we showed that CoFlux is effective in assisting service troubleshooting through the applications of alert compression, recommending Top N causes, and constructing fluctuation propagation chains.
KW - Fluctuation correlation
KW - Key performance indicator
KW - Service operation and management
KW - Service troubleshooting
KW - Time series
UR - http://www.scopus.com/inward/record.url?scp=85069170467&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069170467&partnerID=8YFLogxK
U2 - 10.1145/3326285.3329048
DO - 10.1145/3326285.3329048
M3 - Conference contribution
AN - SCOPUS:85069170467
T3 - Proceedings of the International Symposium on Quality of Service, IWQoS 2019
BT - Proceedings of the International Symposium on Quality of Service, IWQoS 2019
Y2 - 24 June 2019 through 25 June 2019
ER -