AI-Driven Self-Healing Systems for Enterprise Devices Using Telemetry Analytics
Main Article Content
Abstract
Artificial intelligence (AI) is becoming an important tool used by enterprise device management to allow self-healing to reduce downtime and ensure the best performance. The paper is an independent research in an AI-based self-healing system that integrates real-time telemetry of enterprise devices to forecast and automatically heal failures. The strategy incorporates machine learning-powered anomaly detection of metrics and logs and automated recovery processes and seeks to enhance reliability without the involvement of people. An implementation on an array of enterprise devices is considered and two important operational indices, namely system downtime and CPU utilization pattern are taken into consideration. Findings show that unplanned downtime is significantly reduced - more than 60 percent when compared to manual operation at the start of the baseline - and patterns of CPU usage are more predictable during incidents. High availability was achieved in the self-healing system through the detection of performance anomalies (e.g. abnormal CPU spikes) before they occurred and the execution of corrective actions, but at minimal overhead. The article provides information on the design and effectiveness of AI remediation of telemetry in enterprise IT contexts.
Article Details

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.