
AI
Linear Attention Is All You Need
Self-attention at a fraction of the cost? Sam Maddrell-Mander · Follow Published in Towards Data Science · 9 min read · 3 days ago — Photo by Guillaume Jaillet on Unsplash “Attention scales badly with long sequence lengths” This is the kind of thing anyone who’s spent much time working with transformers and self-attention will have heard a hundred times. It’s both absolutely true, we’ve all experienced this as you try to increase the context