How to Achieve Near Human-Level Performance in Chunking for RAGs

The costly yet powerful splitting technique for superior RAG retrieval

Thuwarakesh Murallie

Published in

Towards Data Science

8 min read

6 hours ago

—

Photo by Nataliya Vaitkevich

Good chunks make good RAGs.

Chunking, embedding, and indexing are critical aspects of RAGs. A RAG app that uses the appropriate chunking technique performs well in terms of output quality and speed.

When engineering an LLM pipeline, we use different strategies to split the text. Recursive character splitting is the most popular technique. It uses a sliding window approach with a fixed token length. However, this approach does not guarantee that it can sufficiently hold a theme within its window size. Also, there’s a risk that part of the context falls into different chunks.

The other technique I love is semantic splitting. Semantic splitting breaks the text whenever there’s a significant change between two consecutive sentences. It has no length constraints. So, it can have many sentences or very few. But it’s more likely to capture the different themes more accurately.

Even the semantic splitting approach has a problem.

What if sentences far from each other are closer in their meaning?

The Death of the Static AI Benchmark

Benchmarking as a Measure of Success Sandi Besen · Follow Published in Towards Data Science · 3 min read · 1 day ago — Benchmarks

March 22, 2024

1st Hadrian X bricklaying robot arrives in U.S.

Listen to this article The latest generation of the Hadrian X brick-laying robot arrived in the U.S. to begin a demo program. | Credit: FBR

July 13, 2024

Finally! Google, Not Apple, Reveals RCS Messaging For iPhone Is Coming Soon

The long-awaited announcement of when RCS messaging would become available on Apple’s iOS platform has come from an unexpected source, none other than Google. Until

March 29, 2024

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

With expert analysis, comprehensive market coverage, and actionable insights, our newsletter equips you with the knowledge & tools necessary to make informed decisions & maximize your potential returns in the dynamic world of future tech stocks.