There is a race towards language models with longer context windows. But how good are they, and how can we know?
This article was originally published on Art Fish Intelligence.
Introduction
The context window of large language models — the amount of text they can process at once — has been increasing at an exponential rate.
In 2018, language models like BERT, T5, and GPT-1 could take up to 512 tokens as input. Now, in summer of 2024, this number has jumped to 2 million tokens (in publicly available LLMs). But what does this mean for us, and how do we evaluate these increasingly capable models?
What does a large context window mean?
The recently released Gemini 1.5 Pro model can take in up to 2 million tokens. But what does 2 million tokens even mean?
If we estimate 4 words to roughly equal about 3 tokens, it means that 2 million tokens can (almost) fit the entire Harry Potter and Lord of the Ring series.
(The total word count of all seven books in the Harry Potter series is 1,084,625. The total word count of all seven books in the Lord of the Ring series is 481,103. (1,084,625 +…