Machine Learning unknowns that researchers struggle to understand — from Batch Norm to what SGD hides
It is surprising how some of the basic subjects in machine learning are still unknown by researchers and despite being fundamental and common to use, seem to be mysterious. It’s a fun thing about machine learning that we build things that work and then figure out why they work at all!
Here, I aim to investigate the unknown territory in some machine learning concepts in order to show while these ideas can seem basic, in reality, they are constructed by layers upon layers of abstraction. This helps us to practice questioning the depth of our knowledge.
In this article, we explore several key phenomena in deep learning that challenge our traditional understanding of neural networks.
- We start with Batch Normalization and its underlying mechanisms that remain not fully understood.
- We examine the counterintuitive observation that overparameterized models often generalize better, contradicting the classical machine learning theories.
- We explore the implicit regularization effects of gradient descent, which seem to naturally bias neural networks towards simpler, more…