Going into the Google DeepMind’s “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters”
Recently OpenAI unveiled their newest model o1. Rather than highlight the parameter size of this model, OpenAI instead showcased that the model performs significantly better because it takes more time. When you ask the model a question, it will often taken multiple seconds to respond — a far cry from the millisecond speed most people now expect with Large Language Models (LLMs). Nevertheless, this extra time appears to pay off as o1 scores substantially higher than other models on the LMSYS Chatbot Arena.
Given this leap in performance, the question everyone is asking is, How did they do this?
While OpenAI has not publicly stated how they achieved these results, there have been a few papers recently that are good candidates for what is happening behind the scenes. One such paper is “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters”. This goes into how you can leverage…