Creating SMOTE Oversampling from Scratch

A Python tutorial on how to implement oversampling and how to make custom variations

Hari Devanathan

Published in

Towards Data Science

8 min read

13 hours ago

—

Photo by By Topo on Unsplash

Synthetic Minority Oversampling Technique (SMOTE) is commonly used to handle class imbalances in datasets. Suppose there are two classes and one class has far more samples (majority class) than the other (minority class). In that case, SMOTE will generate more synthetic samples in the minority class so that it’s on par with the majority class.

In the real world, we’re not going to have balanced datasets for classification problems. Take for example a classifier that predicts whether a patient has sickle cell disease. If a patient has abnormal hemoglobin levels (6–11 g/dL), then that’s a strong predictor of sickle cell disease. If a patient has normal hemoglobin levels (12 mg/dL), then that predictor alone doesn’t indicate whether the patient has sickle cell disease.

However, about 100,000 patients in the USA are diagnosed with sickle cell disease. There are currently 334.9 million US citizens. If we have a dataset of every US citizen and label or not the patient has sickle cell disease, we have 0.02% of people who have the disease. We have a major class imbalance. Our model can’t pick up meaningful features to predict this anomaly.

AI headphones create a ‘sound bubble,’ quieting all sounds more than a few feet away

Imagine this: You’re at an office job, wearing noise-canceling headphones to dampen the ambient chatter. A co-worker arrives at your desk and asks a question,

November 14, 2024

Indy Autonomous Challenge returns to the Indianapolis Motor Speedway – The Robot Report

Listen to this article [embedded content] The Indy Autonomous Challenge, or IAC, returned to the Indianapolis Motor Speedway today to highlight the latest self-driving car

September 6, 2024

GDC 2025 gives 240 passes to Amir Satvat’s games community

The Game Developers Conference has decided to help Amir Satvat’s Games Community with the equivalent of a $108,000 grant in the form of GDC Expo

November 11, 2024

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

With expert analysis, comprehensive market coverage, and actionable insights, our newsletter equips you with the knowledge & tools necessary to make informed decisions & maximize your potential returns in the dynamic world of future tech stocks.