Machine learning approaches for network resiliency optimization for service provider networks

    Research output: Contribution to journalArticlepeer-review

    3 Scopus citations

    Abstract

    Network Service Provider (NSP), loosely defined as an organization that provides IP Network Transport as a service to either direct consumers or to other value add businesses. NSPs have struggled to reduce subscriber churn which we define as customers switching from current NSP to another competitor NSP due to dissatisfaction, for our purposes specifically dissatisfaction of network performance, such as excess latencies or downtime. The focus of this paper is reliability and maintenance, in particular network resiliency and operations. In the context of this paper, network resiliency is defined as the rate of taking corrective action due to an exogenous network disturbance or event that materially impacts the network service level as experienced by users. Operators not only want to mitigate this period of unsatisfactory network service but want to avoid it altogether, at the lowest possible operational costs by proactively monitoring user network experience, to detect anomalies and resolve by automatic root cause determination and ultimately restore satisfactory network service levels. However, in contrast, today, NSPs operate reactively, by employing teams of expensive network engineers, that manually sift through massive amounts of data to determine root causes either as a result of subscribers complaining about poor service (after customer impact) or triggered network alarms that may be a symptom of a more complex underlying root cause, or often noise, not materially impacting users. In this paper we evaluate standard machine learning approaches in extracting root causes and explain a key underlying reason for poor accuracy. The proposed contribution to improve accuracy, is a novel approach using a multi-tier ensemble machine learning approach that dynamically adapts to changing network data features sets or characteristics combinations to yield accurate causal estimations. It is due the complex interactions of different characteristics combinations that impact different algorithms to yield different accurate results. Results show that our approach improves customer experience and network operations by automatically detecting customer impacting network anomalies and identifying root causes with increased accuracy of 65.3% over any single machine learning approach.

    Original languageEnglish
    Article number106519
    JournalComputers and Industrial Engineering
    Volume146
    DOIs
    StatePublished - Aug 2020

    Keywords

    • Causal inference
    • Machine learning
    • Networks
    • Reliability
    • Resilience

    Fingerprint

    Dive into the research topics of 'Machine learning approaches for network resiliency optimization for service provider networks'. Together they form a unique fingerprint.

    Cite this