Authors:
(1) Soyeong Jeong, School of Computing;
(2) Jinheon Baek, Graduate School of AI;
(3) Sukmin Cho, School of Computing;
(4) Sung Ju Hwang, Korea Advanced Institute of Science and Technology;
(5) Jong C. Park, School of Computing.
Table of Links
3 Method and 3.1 Preliminaries
3.2 Adaptive-RAG: Adaptive Retrieval-Augmented Generation
4 Experimental Setups and 4.1 Datasets
4.2 Models and 4.3 Evaluation Metrics
5 Experimental Results and Analyses
6 Conclusion, Limitations, Ethics Statement, Acknowledgements, and References
A Additional Experimental Setups
B Additional Experimental Results
Open-domain QA Open-domain QA is the task of accurately answering a query by sourcing for query-relevant documents, and then interpreting them to provide answers (Chen et al., 2017; Zhu et al., 2021), which, thus, generally involves two modules: a retriever (Karpukhin et al., 2020; Xiong et al., 2021) and a reader (Yang et al., 2019; Izacard and Grave, 2021; Jeong et al., 2023). Along with the emergence of LLMs with superior reasoning capabilities thanks to their billion-sized parameters (Wei et al., 2022a), a synergy between LLMs and retrievers has led to significant advancements (Lazaridou et al., 2022; Ram et al., 2023). Specifically, this integration has been shown to enhance Open-domain QA by mitigating the hallucination problem from LLMs through strengthened reasoning abilities of the reader, as well as utilizing the retrieved, external documents (Cho et al., 2023). Despite these advancements for single-hop retrieval-augmented LLMs, however, the complexity of some queries needs a more complex strategy.
Multi-hop QA Multi-hop QA is an extension of conventional Open-domain QA, which additionally requires the system to comprehensively gather and contextualize information from multiple documents (often iteratively), to answer more complex queries (Trivedi et al., 2022a; Yang et al., 2018). In the realm of multi-hop QA, the approach to iteratively access both LLMs and the retrieval module is generally employed. Specifically, Khattab et al. (2022), Press et al. (2023), Pereira et al. (2023) and Khot et al. (2023) proposed to first decompose the multi-hop queries into simpler single-hop queries, repeatedly access the LLMs and retriever to solve these sub-queries, and merge their solutions to formulate a complete answer. In contrast to this decomposition-based approach, other recent studies, such as Yao et al. (2023) and Trivedi et al. (2023), explored the interleaving of Chain-ofThought reasoning (Wei et al., 2022b) — a method where a logical sequence of thoughts is generated — with document retrieval, repeatedly applying this process until the reasoning chain generates the answer. In addition, Jiang et al. (2023) introduced an approach to repeatedly retrieving new documents if the tokens within generated sentences have low confidence. However, the aforementioned methods overlooked the fact that, in real-world scenarios, queries are of a wide variety of complexities. Therefore, it would be largely inefficient to iteratively access LLMs and retrievers for every query, which might be simple enough with a single retrieval step or even only with an LLM itself.
Adaptive Retrieval To handle queries of varying complexities, the adaptive retrieval strategy aims to dynamically decide whether to retrieve documents or not, based on each query’s complexity. In this vein, Mallen et al. (2023) proposed to decide the query’s complexity level based on the frequency of its entities and suggested using the retrieval modules only when the frequency falls below a certain threshold. However, this approach, focusing solely on the binary decision of whether to retrieve or not, may not be sufficient for more complex queries that require multiple reasoning steps. Additionally, Qi et al. (2021) proposed an approach that performs a fixed set of operations (retrieving, reading, and reranking) multiple times until the answer is derived for the given query, which is built upon traditional BERT-like LMs. However, unlike our Adaptive-RAG which pre-determines the query complexity and adapts the operational behavior of any off-the-shelf LLMs accordingly, this approach applies the same fixed operations to every query regardless of its complexity but also necessitates additional specific training to LMs. Concurrent to our work, Asai et al. (2024) suggested training a sophisticated model to dynamically retrieve, critique, and generate the text. Nevertheless, we argue that all the aforementioned adaptive retrieval methods that rely on a single model might be suboptimal in handling a variety of queries of a range of different complexities since they tend to be either overly simple or complex for all the input queries, which demands a new approach that can select the most suitable strategy of retrieval-augmented LLMs tailored to the query complexity.
3 Method
In this section, we describe our approach to adapting retrieval-augmented LLMs, by pre-determining the query complexity and then selecting the most fitting strategies for retrieval-augmented LLMs.
3.1 Preliminaries
We begin with preliminaries, formally introducing different strategies of retrieval-augmented LLMs.

Single-step Approach for QA To address the aforementioned scenarios where LLM may struggle with queries that are not answerable by LLM itself, we can utilize the external knowledge d, which includes useful information for queries, retrieved from the external knowledge source D that could be an encyclopedia (e.g., Wikipedia) consisting of millions of documents. Specifically, to obtain such d from D, a specific retrieval model is necessary, which returns documents based on their relevance with the given query. This process can be formulated as follows: d = Retriever(q; D), where Retriever is the retrieval model, with d ∈ D. Here, we can use any off-the-shelf retriever (Robertson et al., 1994; Karpukhin et al., 2020).

This process allows LLMs to gain access to external information contained in d, which can provide the supplementary context that the internal knowledge of LLM lacks, which can subsequently improve the accuracy and concurrency of LLMs for QA.
Multi-step Approach for QA Even though the aforementioned single-step approach offers significant improvements over non-retrieval for q that requires external knowledge, it encounters notable limitations, particularly when dealing with complex queries that necessitate synthesizing information from multiple source documents and reasoning over them. This is where a multi-step approach and reasoning for QA become essential.

[1] It is worth noting that implementations of the LLM and retriever vary across different multi-step retrieval-augmented LLM approaches (Trivedi et al., 2023; Press et al., 2023; Yao et al., 2023); therefore, the context ci may incorporate none, some, or all of the previous documents and answers.
