TY - JOUR
T1 - Quantitative approaches for optimization of user experience based on network resilience for wireless service provider networks
AU - Kakadia, Deepak
AU - Ramirez-Marquez, Dr Jose Emmanuel
N1 - Publisher Copyright:
© 2019 Elsevier Ltd
PY - 2020/1
Y1 - 2020/1
N2 - Since the 1980′s and in particular 1996, telecom operators and recently mobile operators have been facing increasingly fierce competition, combined with flat subscriber growth and increased data usage resulting in tremendous downward pressures on profitability, forcing operators to differentiate themselves by trying to offer network services with better customer experience at lower operational costs. Wireless operators are challenged with measuring user experience which in itself is subjective, in a manner that accurately reflects the functional and emotional aspects of perceived quality and linking to Network Resiliency which characterizes the network behavior as it responds to disruptions. Current network faults and alarms only consider device failures and do not consider actual impact to user experience. For instance a failed router may not impact the users experience due to built in redundancies in the network. Studies to date, have proposed methods and models that focus on specific aspects of user experience in wired and cellular networks. However, to the best of our knowledge, there is currently very little research that connects linking poor user network experience to root cause. Previous recent work in this area focus on identifying what and where measurements to gage subscriber OoE, modeling and high level concepts, but do not address realistic challenges and approaches that can be automated to materially impact improved customer experiences at lower operational expenses. There is a gap on how operators can automatically associate poor user experience, relevant network metrics and root causes with a suitable model that can be analyzed and optimized. We propose a general framework for a solution that links these entities together, with a quantified approach to optimize user network experience by optimizing network resilience using a model that can be analyzed and optimized using machine learning methods to improve resilience and hence user experience. Results of directly applying existing machine learning algorithms for identifying root causes to network telemetry data have proven to be ineffective in practice due to the fact that existing machine learning algorithms are designed for prediction, classification and ranking not for identifying causal relationships and further complicated by the fact that these algorithms have assumptions on the data and in reality the network data distributions vary wildly during network disturbances. The proposed general framework combines existing methods for anomaly detection and machine learning algorithms, however the novel contribution centers on improving the accuracy of finding associated root causes by dynamically selecting the optimal machine learning algorithm based on the network telemetry data features that are recomputed before, during and after network disturbances. The proposed approach then allows us to automate the time consuming manual tasks of network engineers that proactively monitor key performance metrics for anomalies, correlate with other data sources to ultimately determine actionable insights to maintain a certain acceptable level of user experience by dynamically selecting the appropriate machine learning algorithm for the given data characteristics or features. We describe an example case study specific to wireless provider environment, illustrating the potential viability with results from actual wireless(approx 8 million monthly subscribers) operations data showing promising results by applying the proposed approach. The prototype implementation was able to programmatically detect anomalies, identify potential root causes using different algorithms suitable for the given data and time frame, which dramatically increased the accuracy and efficiency of the small network engineering team, and hence improved the user experience by improving network resiliency.
AB - Since the 1980′s and in particular 1996, telecom operators and recently mobile operators have been facing increasingly fierce competition, combined with flat subscriber growth and increased data usage resulting in tremendous downward pressures on profitability, forcing operators to differentiate themselves by trying to offer network services with better customer experience at lower operational costs. Wireless operators are challenged with measuring user experience which in itself is subjective, in a manner that accurately reflects the functional and emotional aspects of perceived quality and linking to Network Resiliency which characterizes the network behavior as it responds to disruptions. Current network faults and alarms only consider device failures and do not consider actual impact to user experience. For instance a failed router may not impact the users experience due to built in redundancies in the network. Studies to date, have proposed methods and models that focus on specific aspects of user experience in wired and cellular networks. However, to the best of our knowledge, there is currently very little research that connects linking poor user network experience to root cause. Previous recent work in this area focus on identifying what and where measurements to gage subscriber OoE, modeling and high level concepts, but do not address realistic challenges and approaches that can be automated to materially impact improved customer experiences at lower operational expenses. There is a gap on how operators can automatically associate poor user experience, relevant network metrics and root causes with a suitable model that can be analyzed and optimized. We propose a general framework for a solution that links these entities together, with a quantified approach to optimize user network experience by optimizing network resilience using a model that can be analyzed and optimized using machine learning methods to improve resilience and hence user experience. Results of directly applying existing machine learning algorithms for identifying root causes to network telemetry data have proven to be ineffective in practice due to the fact that existing machine learning algorithms are designed for prediction, classification and ranking not for identifying causal relationships and further complicated by the fact that these algorithms have assumptions on the data and in reality the network data distributions vary wildly during network disturbances. The proposed general framework combines existing methods for anomaly detection and machine learning algorithms, however the novel contribution centers on improving the accuracy of finding associated root causes by dynamically selecting the optimal machine learning algorithm based on the network telemetry data features that are recomputed before, during and after network disturbances. The proposed approach then allows us to automate the time consuming manual tasks of network engineers that proactively monitor key performance metrics for anomalies, correlate with other data sources to ultimately determine actionable insights to maintain a certain acceptable level of user experience by dynamically selecting the appropriate machine learning algorithm for the given data characteristics or features. We describe an example case study specific to wireless provider environment, illustrating the potential viability with results from actual wireless(approx 8 million monthly subscribers) operations data showing promising results by applying the proposed approach. The prototype implementation was able to programmatically detect anomalies, identify potential root causes using different algorithms suitable for the given data and time frame, which dramatically increased the accuracy and efficiency of the small network engineering team, and hence improved the user experience by improving network resiliency.
UR - http://www.scopus.com/inward/record.url?scp=85071744134&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071744134&partnerID=8YFLogxK
U2 - 10.1016/j.ress.2019.106606
DO - 10.1016/j.ress.2019.106606
M3 - Article
AN - SCOPUS:85071744134
SN - 0951-8320
VL - 193
JO - Reliability Engineering and System Safety
JF - Reliability Engineering and System Safety
M1 - 106606
ER -