Kubernetes is a portable, extensible, open-source platform for managing containers. It comes with features such as automatic scaling, service discovery, load balancing, fault tolerance, etc. Being such a complex system, which has a lot of internal services and with the ability to manage a lot more user services, Kubernetes comes with a monitoring system, which provides metrics and logs for every service in the cluster. However, most of the time, the monitoring system needs human intervention for detection and troubleshooting defects. Human intervention usually occurs when it is too late, when a defect appears. We think that detecting anomalies in metrics provided by the monitoring system will help to prevent defects. In this paper, we analyze current solutions for automatic anomaly detection and alerting, and also we propose a new solution that will help system administrators to catch and predict anomalies earlier, which may lead to defects. Our solution, which is a technical one, is developed around Prometheus, an open-source monitoring system for metrics.

Observability in Kubernetes Cluster: Automatic Anomalies Detection using Prometheus

Castiglione A.
2020-01-01

Abstract

Kubernetes is a portable, extensible, open-source platform for managing containers. It comes with features such as automatic scaling, service discovery, load balancing, fault tolerance, etc. Being such a complex system, which has a lot of internal services and with the ability to manage a lot more user services, Kubernetes comes with a monitoring system, which provides metrics and logs for every service in the cluster. However, most of the time, the monitoring system needs human intervention for detection and troubleshooting defects. Human intervention usually occurs when it is too late, when a defect appears. We think that detecting anomalies in metrics provided by the monitoring system will help to prevent defects. In this paper, we analyze current solutions for automatic anomaly detection and alerting, and also we propose a new solution that will help system administrators to catch and predict anomalies earlier, which may lead to defects. Our solution, which is a technical one, is developed around Prometheus, an open-source monitoring system for metrics.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4810856
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact