How to Use HyDE for Better LLM RAG Retrieval

Building an advanced local LLM RAG pipeline with hypothetical document embeddings

Implementing HyDE is very simple in Python. Image by the author

Large Language Models (LLMs) can be improved by giving them access to external knowledge through documents.

The basic Retrieval Augmented Generation (RAG) pipeline consists of a user query, an embedding model that converts text into embeddings (high-dimensional numerical vectors), a retrieval step that searches for documents similar to the user query in the embedding space, and a generator LLM that uses the retrieved documents to generate an answer [1].

In practice, the RAG retrieval part is crucial. If the retriever does not find the correct document in the document corpus, the LLM has no chance to generate a solid answer.

A problem in the retrieval step can be that the user query is a very short question — with imperfect grammar, spelling, and punctuation — and the corresponding document is a long passage of well-written text that contains the information we want.

A query and the corresponding passage from the MS MARCO dataset, illustrating that typically query and document have different lengths and formats. Image by the author

HyDE is a proposed technique to improve the RAG retrieval step by converting the user question into a