
Superposition: What Makes it Difficult to Explain Neural Network
When there are more features than model dimensions Shuyang Xiang · Follow Published in Towards Data Science · 6 min read · 13 hours ago — Introduction It would be ideal if the world of neural network represented a one-to-one relationship: each neuron activates on one and only one feature. In such a world, interpreting the model would be straightforward: this neuron fires for the dog ear feature, and that neuron fires for the wheel