… No, the solution is not “Adjusted R-Squared”
R-Squared is one of the most popular metrics to evaluate regression models. It’s taught in any statistics class and it’s one of the metrics implemented in Scikit-learn.
However, some doubts have been raised about the reliability of this metric. In the notes for his course at Carnegie Mellon University, Professor Cosma Shalizi claims that R-Squared is useless.
So, should we completely dismiss R-Squared?
I don’t think so.
I admit that this metric has one major flaw, but I also think we shouldn’t lose sight of the positives. In this article, I will explain what is wrong with R-Squared, and suggest a modification that makes it fully reliable.
What is the deeper meaning of R-Squared?
To grasp what is the problem with R-Squared, we first need to understand its meaning. And I mean the deeper meaning, not the sloppy definitions that can be found in most resources.
Let’s start with an example. Suppose we have a predictive model (“model A”) designed to forecast the selling price of a house.
Imagine that our test set consists of four houses. We can visually check the…