
Sparse AutoEncoder: from Superposition to interpretable features
Disentangle features in complex Neural Network with superpositions Shuyang Xiang · Follow Published in Towards Data Science · 5 min read · Just now — Complex neural networks, such as Large Language Models (LLMs), suffer quite often from interpretability challenges. One of the most important reasons for such difficulty is superposition — a phenomenon of the neural network having fewer dimensions than the number of features it has to represent. For example, a toy LLM