How to Detect Concept Drift Without Labels

Unsupervised change detection using reference windows, with a Python example

Vitor Cerqueira

Published in

Towards Data Science

6 min read

14 hours ago

—

Photo by Chris Czermak on Unsplash

In a previous article, we explored the basics of concept drift. Concept drift occurs when the distribution of a dataset changes.

This post continues to explore this topic. Here, you’ll learn how to detect concept drift in problems where you don’t have access to labels. This task is challenging because without labels we can’t evaluate models’ performance.

Let’s dive in.

Introduction

Datasets that evolve over time are amenable to concept drift. Changes in distributions can undermine models and the accuracy of their predictions. So, it’s important to detect and adapt to these changes to keep models up to date.

Most change detection approaches rely on tracking the model’s error. The idea is to trigger an alarm when this error increases significantly. Then, some adaptation mechanism kicks in, such as retraining the model.

In the previous article, we argued that having access to labels may be difficult in some cases. Examples appear in many domains, such as fraud detection or credit risk assessment. In the latter, the time it takes for a person to default (and provide a label on their assessment) can take up to several years.

In these cases, you have to detect changes using approaches that do not depend on performance.

Change detection without labels

In general, you have two options to detect changes without labels:

Track the model’s predictions.
Track the input data (explanatory variables).

In both cases, change is detected when the distribution changes significantly.

How does this work exactly?

Change detection without labels is done by comparing two samples of data. One sample represents the most recent data, also referred as the detection window. The other contains data from the original distribution (reference window).

So, the detection process is split into two parts:

Building the two samples

Intel Slips Battlemage Support And Power-Saving Features Into Linux 6.11

“The year of the Linux desktop” is a long-standing meme among PC enthusiasts, but thanks to controversial decisions by Microsoft, continual development effort from Linux

June 21, 2024

Robots in nursing homes can improve patient care, employee retention, finds study – The Robot Report

In nursing homes, robots can move patients in bed and around rooms, help patients move independently, and monitor health data. | Source: Adobe Stock Nursing

January 11, 2025

SQL Explained: Common Table Expressions

Image by AI (Dalle-3) What CTEs are and how you use them Thomas Reid · Follow Published in Towards Data Science · 8 min read

May 22, 2024

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

With expert analysis, comprehensive market coverage, and actionable insights, our newsletter equips you with the knowledge & tools necessary to make informed decisions & maximize your potential returns in the dynamic world of future tech stocks.