Ben Dickson

1X releases generative world models to train robots

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Robotics startup 1X Technologies has developed a new generative model that can make it much more efficient to train robotics systems in simulation. The model, which the company announced in a new blog post, addresses one of the important challenges of robotics, which is learning “world models” that can predict how the world changes in response

Read More »

Apple aims for on-device user intent understanding with UI-JEPA models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Understanding user intentions based on user interface (UI) interactions is a critical challenge in creating intuitive and helpful AI applications.  In a new paper, researchers from Apple introduce UI-JEPA, an architecture that significantly reduces the computational requirements of UI understanding while maintaining high performance. UI-JEPA aims to enable lightweight, on-device UI understanding, paving the way for

Read More »

DeepMind and UC Berkeley shows how to make the most of LLM inference-time compute

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Given the high costs and slow speed of training large language models (LLMs), there is an ongoing discussion about whether spending more compute cycles on inference can help improve the performance of LLMs without the need for retraining them. In a new study, researchers at DeepMind and the University of California, Berkeley explore ways to improve

Read More »

Nvidia’s Llama-3.1-Minitron 4B is a small language model that punches above its weight

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As tech companies race to deliver on-device AI, we are seeing a growing body of research and techniques for creating small language models (SLMs) that can run on resource-constrained devices.  The latest models, created by a research team at Nvidia, leverage recent advances in pruning and distillation to create Llama-3.1-Minitron 4B, a compressed version of the

Read More »

Meta’s Self-Taught Evaluator enables LLMs to create their own training data

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Human evaluation has been the gold standard for assessing the quality and accuracy of large language models (LLMs), especially for open-ended tasks such as creative writing and coding. However, human evaluation is slow, expensive, and often requires specialized expertise. Researchers at Meta FAIR have introduced a novel approach called the Self-Taught Evaluator, which leverages synthetic data

Read More »

LLMs excel at inductive reasoning but struggle with deductive tasks, new research shows

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Large language models (LLMs) have shown impressive performance on various reasoning and problem-solving tasks. However, there are questions about how these reasoning abilities work and their limitations.  In a new study, researchers at the University of California, Los Angeles, and Amazon have done a comprehensive study of the capabilities of LLMs at deductive and inductive reasoning.

Read More »

FlashAttention-3 unleashes the power of H100 GPUs for LLMs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Attention is a core component of the transformer architecture used in large language models (LLMs). But as LLMs grow larger and handle longer input sequences, the computational cost of attention becomes a bottleneck.  To address this challenge, researchers from Colfax Research, Meta, Nvidia, Georgia Tech, Princeton University, and Together AI have introduced FlashAttention-3, a new technique

Read More »

Meta researchers distill System 2 thinking into LLMs, improving performance on complex reasoning

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Large language models (LLMs) are very good at answering simple questions but require special prompting techniques to handle complex tasks that need reasoning and planning. Often referred to as “System 2” techniques, these prompting schemes enhance the reasoning capabilities of LLMs by forcing them to generate intermediate steps toward solving a problem. While effective, System 2

Read More »

DeepMind’s PEER scales language models with millions of tiny experts

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Mixture-of-Experts (MoE) has become a popular technique for scaling large language models (LLMs) without exploding computational costs. Instead of using the entire model capacity for every input, MoE architectures route the data to small but specialized “expert” modules. MoE enables LLMs to increase their parameter while keeping inference costs low. MoE is used in several popular

Read More »