Introduction to Reinforcement Learning and Solving the Multi-armed Bandit Problem

Dissecting “Reinforcement Learning” by Richard S. Sutton with Custom Python Implementations, Episode I

Reinforcement Learning (RL) is a fascinating subfield of Machine Learning. You might already know it from applications such as playing Go [1], autonomous driving [2], and more.

Equally fascinating in my opinion is Sutton’s and Barto’s famous book, “Reinforcement Learning” [3]. I think it’s a great introduction to the topic, but also dives deep and introduces all important theoretical topics of the field. It can be a lot to read though, and especially upon the first read might look a bit mathy.

Image by Carl Raw on Unsplash

Thus, I decided to start a post series summarizing the book chapter by chapter. I believe getting the contents explained with different words will greatly help understanding. And I will also implement all (most) algorithms from the book in Python and apply them to problems and environments modeled via (formerly) OpenAI’s gymnasium framework [4]. These two points are, as far as I know, novel so far and make this series unique.

This post is the first in the series, and will briefly introduce RL in general, then give a quick overview of how Sutton’s book is structured — and how…