W Brett Kennedy

AI

Solving the classic Betting on the World Series problem using hill climbing

A simple example of hill climbing — and solving a problem that’s difficult to solve without optimization techniques W Brett Kennedy · Follow Published in Towards Data Science · 15 min read · 15 hours ago — Betting on the World Series is an old, interesting, and challenging puzzle. It’s also a nice problem to demonstrate an optimization technique called hill climbing, which I’ll cover in this article. Hill climbing is a well-established, and relatively

Read More »
AI

A Simple Example Using PCA for Outlier Detection

Improve accuracy, speed, and memory usage by performing PCA transformation before outlier detection W Brett Kennedy · Follow Published in Towards Data Science · 19 min read · 13 hours ago — This article continues a series related to applications of PCA (principle component analysis) for outlier detection, following Using PCA for Outlier Detection. That article described PCA itself, and introduced the two main ways we can use PCA for outlier detection: evaluating the reconstruction

Read More »
AI

Using PCA for Outlier Detection

A surprisingly effective means to identify outliers in numeric data W Brett Kennedy · Follow Published in Towards Data Science · 13 min read · 3 hours ago — PCA (principle component analysis) is commonly used in data science, generally for dimensionality reduction (and often for visualization), but it is actually also very useful for outlier detection, which I’ll describe in this article. This articles continues my series in outlier detection, which also includes articles

Read More »
AI

FormulaFeatures: A Tool to Generate Highly Predictive Features for Interpretable Models

Create more interpretable models by using concise, highly predictive features, automatically engineered based on arithmetic combinations of numeric features W Brett Kennedy · Follow Published in Towards Data Science · 32 min read · 16 hours ago — In this article, we examine a tool called FormulaFeatures. This is intended for use primarily with interpretable models, such as shallow decision trees, where having a small number of concise and highly predictive features can aid greatly

Read More »
AI

Shared Nearest Neighbors: A More Robust Distance Metric

A distance metric that can improve prediction, clustering, and outlier detection in datasets with many dimensions and with varying densities W Brett Kennedy · Follow Published in Towards Data Science · 28 min read · 7 hours ago — In this article I describe a distance metric called Shared Nearest Neighbors (SNN) and describe its application to outlier detection. I’ll also cover quickly its application to prediction and clustering, but will focus on outlier detection,

Read More »
AI

Achieve Better Classification Results with ClassificationThresholdTuner

A python tool to tune and visualize the threshold choices for binary and multi-class classification problems W Brett Kennedy · Follow Published in Towards Data Science · 30 min read · 2 hours ago — Adjusting the thresholds used in classification problems (that is, adjusting the cut-offs in the probabilities used to decide between predicting one class or another) is a step that’s sometimes forgotten, but is quite easy to do and can significantly improve

Read More »
AI

Create Stronger Decision Trees with bootstrapping and genetic algorithms

A technique to better allow decision trees to be used as interpretable models W Brett Kennedy · Follow Published in Towards Data Science · 24 min read · 2 hours ago — While decision trees can often be effective as interpretable models (they are quite comprehensible), they rely on a greedy approach to construction that can result in sub-optimal trees. In this article, we show how to generate classification decision trees of the same (small)

Read More »
AI

Counts Outlier Detector: Interpretable Outlier Detection

An interpretable outlier detector based on multi-dimensional histograms. W Brett Kennedy · Follow Published in Towards Data Science · 18 min read · 20 hours ago — This article continues a series on interpretable outlier detection. The previous article (Interpretable Outlier Detection: Frequent Patterns Outlier Factor (FPOF) ) covered the FPOF algorithm, as well as some of the basics of outlier detection and interpretability. This builds on that, and presents Counts Outlier Detector, another interpretable

Read More »
AI

PRISM-Rules in Python

A simple python rules-induction system W Brett Kennedy · Follow Published in Towards Data Science · 12 min read · 1 day ago — This article is part of a series covering interpretable predictive models. Previous articles covered ikNN and Additive Decision Trees. PRISM is an existing algorithm (though I did create a python implementation), and the focus in this series is on original algorithms, but I felt it was useful enough to warrant it’s

Read More »
AI

Interpretable Outlier Detection: Frequent Patterns Outlier Factor (FPOF)

An outlier detector method that supports categorical data and provides explanations for the outliers flagged W Brett Kennedy · Follow Published in Towards Data Science · 10 min read · 12 hours ago — Outlier detection is a common task in machine learning. Specifically, it’s a form of unsupervised machine learning: analyzing data where there are no labels. It’s the act of finding items in a dataset that are unusual relative to the others in

Read More »