OpenAI’s Groundbreaking GPT-4o Teaches ChatGPT Intimate Real Time Conversation

It would be easy to assume based on the name “GPT-4o” that OpenAI’s new model is merely an iteration on its previous work, and in the strictest sense, that’s sort of true. If you look only at benchmarks, GPT-4o is an extremely minor improvement over OpenAI’s previous work on GPT-4 Turbo. This release isn’t about benchmarks, though. This release is about capabilities, and specifically, one capability: native multimodality.

The “o” in “GPT-4o” stands for “Omni,” and it refers to GPT-4o’s ability to handle nearly any kind of input “natively.” You see, other multimodal AI models, including GPT-4 Turbo, require a helper model to transcribe what it sees into text form so that it can then work with that data. GPT-4o doesn’t work that way. For the first time, the same model does all of the text, image, video, and audio processing, including generating those things.

gpt4o benchmarks — *Benchmarks didn’t change much, but that’s not the point. (click for legible)*

This means that when you share an image or speak to GPT-4o, it has access to all of the same information that a human does. It can see specific details in images that are hard to capture in text, like themes or moods. Likewise, when you speak to GPT-4o, it can make inferences and judgments based on the way you say things and the way your voice sounds as well as the content of what you say.

If you’re not putting A and B together to get C yet, let us spell it out for you: GPT-4o can interact with you in a radically more natural fashion than any previous AI model. You can speak to it in natural language and it will respond almost instantaneously. The average latency for a voice response from GPT-4 Turbo was over five seconds—the average response time for GPT-4o is apparently 320 milliseconds, around the same as human reaction time.

altman on gpt4o — *OpenAI’s Sam Altman’s comments on interacting with GPT-4o.*

The model is so quick to respond and so natural with its responses that, in OpenAI’s livestreamed stage demo, GPT-4o actually talks over the presenters a couple of times. We’ve embedded the demo below, where you can watch OpenAI employees chat with GPT-4o running on a smartphone. The demo really has to be seen to be believed, so we won’t waste your time describing it. Just click the video, skip to around the ten minute mark, and watch.
If you enjoyed that, there are numerous other short demo videos on OpenAI’s website announcing the new model, but if you’re in a hurry and don’t want to watch the video, suffice to say that the new model’s capabilities are so impressive that they go rather far beyond “impressive” and well into concerning territory. We’re not AI alarmists here at HotHardware; we applaud every achievement of artificial intelligence. Still, the possibilities for how humans could misuse GPT-4o are staggering.

It’s largely for this reason that you can’t actually play with GPT-4o like they in the video just yet. OpenAI says that GPT-4o’s text and image capabilities are “rolling out” in ChatGPT, both the free tier and Plus users, but that the extremely impressive voice mode will require a Plus subscription when it becomes available “in the coming weeks.”

Challenges and Solutions in Data Mesh — Part 2

“Data as a Product” is a core principle in Data Mesh: why the current definition needs adaptation to fully enable the mesh Bernd Wessely ·

May 29, 2024

JavaScript Libraries for Implementing Trendy Technologies in Web Apps in 2024 | HackerNoon

March 19, 2024

Skate. launches in early access in 2025; console playtesting begins in fall

Amidst discussion of EA’s other franchises, Skate. got a shout-out as its developers reveal it’s planned for an early access launch in 2025.Read More

September 17, 2024

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

With expert analysis, comprehensive market coverage, and actionable insights, our newsletter equips you with the knowledge & tools necessary to make informed decisions & maximize your potential returns in the dynamic world of future tech stocks.