Het Trivedi

AI

Boosting LLM Inference Speed Using Speculative Decoding

A practical guide on using cutting-edge optimization techniques to speed up inference Het Trivedi · Follow Published in Towards Data Science · 6 min read · 3 hours ago — Image generated using Flux Schnell Intro Large language models are extremely power-hungry and require a significant amount of GPU resources to perform well. However, the transformer architecture does not take full advantage of the GPU. GPUs, by design, can process things in parallel, but the

Read More »
AI

Improving RAG Performance Using Rerankers

A tutorial on using rerankers to improve your RAG pipeline Het Trivedi · Follow Published in Towards Data Science · 10 min read · 9 hours ago — Created by author using Stable Diffusion XL Introduction RAG is one of the first tools an engineer will try out when building an LLM application. It’s easy enough to understand and simple to use. The primary motive when using vector search is to gather enough relevant context

Read More »