Het Trivedi, Author at Future Tech Stocks

Boosting LLM Inference Speed Using Speculative Decoding

A practical guide on using cutting-edge optimization techniques to speed up inference Het Trivedi · Follow Published in Towards Data Science · 6 min read · 3 hours ago — Image generated using Flux Schnell Intro Large language models are extremely power-hungry and require a significant amount of GPU resources to perform well. However, the transformer architecture does not take full advantage of the GPU. GPUs, by design, can process things in parallel, but the

Het Trivedi August 27, 2024

Improving RAG Performance Using Rerankers

A tutorial on using rerankers to improve your RAG pipeline Het Trivedi · Follow Published in Towards Data Science · 10 min read · 9 hours ago — Created by author using Stable Diffusion XL Introduction RAG is one of the first tools an engineer will try out when building an LLM application. It’s easy enough to understand and simple to use. The primary motive when using vector search is to gather enough relevant context

Het Trivedi June 25, 2024

Het Trivedi

Boosting LLM Inference Speed Using Speculative Decoding

Improving RAG Performance Using Rerankers

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

Het Trivedi

Boosting LLM Inference Speed Using Speculative Decoding

Improving RAG Performance Using Rerankers

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

Subscribe