Exploring the AI Alignment Problem with GridWorlds
It’s difficult to build capable AI agents without encountering orthogonal goals Tarik Dzekman · Follow Published in Towards Data Science · 18 min read · 13 hours ago — Design of a “Gridworld” which is hard for an AI agent to learn without encouraging bad behaviour. Image by the Author. This is the essence of the AI alignment problem: An advanced AI model with powerful capabilities may have goals not aligned with our best interests.