Although to train predictive models Machine Learning approaches completely rely on data, the latter can dynamically evolve over time. This could make predictive models outdated due to the presence of possible data shifts, with a consequent decrease in prediction accuracy. Concept drift detection techniques aim to detect such shifts in order to adopt countermeasures and maintain predictive performance over time. To this end, drift detection methods aim to monitor data distribution shifts, trying to identify changes without evaluating model predictions. In this discussion paper, we present a profiling metadata-driven approach for quantifying concept drift. Specifically, we focus on Relaxed Functional Dependencies (rfds) and formalize the relationship between changes in metadata and performance trends of the predictive models over time. Moreover, we define a suite of rfd-based metrics measuring the distance between two sets of data. To evaluate the proposed approach, we compared it with other distribution-based metrics on datasets with both known and unknown drift. Results proved that the proposed metrics are strongly correlated with the model’s performance according to their trends. Moreover, the defined suite of metrics is also able to capture concept drift more effectively than traditional distribution-based approaches.

Concept Drift Detection in Machine Learning Systems by Exploiting Relaxed Functional Dependencies

Caruccio L.;Cirillo S.;Polese G.;Stanzione R.
2025

Abstract

Although to train predictive models Machine Learning approaches completely rely on data, the latter can dynamically evolve over time. This could make predictive models outdated due to the presence of possible data shifts, with a consequent decrease in prediction accuracy. Concept drift detection techniques aim to detect such shifts in order to adopt countermeasures and maintain predictive performance over time. To this end, drift detection methods aim to monitor data distribution shifts, trying to identify changes without evaluating model predictions. In this discussion paper, we present a profiling metadata-driven approach for quantifying concept drift. Specifically, we focus on Relaxed Functional Dependencies (rfds) and formalize the relationship between changes in metadata and performance trends of the predictive models over time. Moreover, we define a suite of rfd-based metrics measuring the distance between two sets of data. To evaluate the proposed approach, we compared it with other distribution-based metrics on datasets with both known and unknown drift. Results proved that the proposed metrics are strongly correlated with the model’s performance according to their trends. Moreover, the defined suite of metrics is also able to capture concept drift more effectively than traditional distribution-based approaches.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4941976
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact