It all started with GPT having an input context window of 512 tokens. After only 5 years the newest LLMs are capable of handling 1M+ context inputs. Where’s the limit?
I like to think of the LLMs (specifically, of the models parameters, i.e., their weights of the neural network layers and…