How to Achieve Near Human-Level Performance in Chunking for RAGs

The costly yet powerful splitting technique for superior RAG retrieval

Photo by Nataliya Vaitkevich

Good chunks make good RAGs.

Chunking, embedding, and indexing are critical aspects of RAGs. A RAG app that uses the appropriate chunking technique performs well in terms of output quality and speed.

When engineering an LLM pipeline, we use different strategies to split the text. Recursive character splitting is the most popular technique. It uses a sliding window approach with a fixed token length. However, this approach does not guarantee that it can sufficiently hold a theme within its window size. Also, there’s a risk that part of the context falls into different chunks.

The other technique I love is semantic splitting. Semantic splitting breaks the text whenever there’s a significant change between two consecutive sentences. It has no length constraints. So, it can have many sentences or very few. But it’s more likely to capture the different themes more accurately.

Even the semantic splitting approach has a problem.

What if sentences far from each other are closer in their meaning?