Chaim Rand

AI

Optimizing Transformer Models for Variable-Length Input Sequences

How PyTorch NestedTensors, FlashAttention2, and xFormers can Boost Performance and Reduce AI Costs Chaim Rand · Follow Published in Towards Data Science · 14 min read · 12 hours ago — Photo by Tanja Zöllner on Unsplash As generative AI (genAI) models grow in both popularity and scale, so do the computational demands and costs associated with their training and deployment. Optimizing these models is crucial for enhancing their runtime performance and reducing their operational

Read More »
AI

On the Programmability of AWS Trainium and Inferentia

Accelerating AI/ML Model Training with Custom Operators — Part 4 Chaim Rand · Follow Published in Towards Data Science · 12 min read · 18 hours ago — Photo by Agata Bres on Unsplash In this post we continue our exploration of the opportunities for runtime optimization of machine learning (ML) workloads through custom operator development. This time, we focus on the tools provided by the AWS Neuron SDK for developing and running new kernels

Read More »
AI

Training AI Models on CPU

Revisiting CPU for ML in an Era of GPU Scarcity Chaim Rand · Follow Published in Towards Data Science · 13 min read · 1 day ago — Photo by Quino Al on Unsplash The recent successes in AI are often attributed to the emergence and evolutions of the GPU. The GPU’s architecture, which typically includes thousands of multi-processors, high-speed memory, dedicated tensor cores, and more, is particularly well-suited to meet the intensive demands of

Read More »