Getting Started with Retrieval Augmented Generation

Subhra

2 min read

AI has come a long way from its early days of rule-based processing to today’s sophisticated neural networks, which can process, understand and reason with most forms of data. This leap forward has been significantly bolstered by the emergence of Large Language Models (LLM). But, the real game-changer came with the introduction of a technology framework called Retrieval Augmented Generation or RAG, that can enable these LLMs to leverage external knowledge, thereby expanding the boundaries of what AI can achieve. In this post, we will delve into the core concept of RAG, how it works, and why it is such a game-changer in paving the way for more accurate, trust-worthy, and context-aware AI systems.

Why do we need RAG?

Powerful as they are, all current LLMs suffer from several critical drawbacks as their capabilities are constrained by the data, they were originally trained on. Let’s look at them one by one:

Static / knowledge cutoff: LLMs can't provide the most up-to-date answers because their knowledge bases are static and out of date due to their reliance on training data. The RAG framework allows LLMs to access the most recent data, allowing them to provide more precise answers.
Lack of Context: LLMs can produce inefficient and erroneous results since they do not have access to private and domain-specific data. By implementing the RAG framework, LLMs can gain access to sensitive and domain-specific data, thereby mitigating these risks.
Hallucination: When presented with hypothetical situations, LLMs often tend to give answers that are not based on reality. RAG reduces the risk of such hallucinations.
Functions as a Blackbox: Since LLMs are trained on massive amounts of disparate data, pinpointing the precise reference point for each given response is quite challenging. RAG enables citing of sources, which makes responses more auditable.
Cost & Complexity: Building or refining a LLM for domain-specific tasks is technically demanding and expensive, limiting the options to RAG.

What is Retrieval Augmented Generation (RAG)?

RAG is an AI framework that enhances the quality and accuracy of responses generated by LLMs by integrating up-to-date and contextually relevant information from external sources. In other words, LLMs can be regulated to use their reasoning ability to generate answers derived from externally specified facts. So, the LLMs are no longer constrained to their static training material and can now access an updatable and expandable knowledge base or a library of information fed by external sources.

Key Components of RAG

RAG combines a Retriever system that searches an external knowledge repository (like a library of information) based on the query and pulls in relevant and up-to-date information, with a Generator system that takes in the retrieved information to weave a coherent and contextually appropriate response.

Retriever: It acts as a research assistant and finds the most relevant information in response to a query and passes them on to the generator. It searches the massive amount of data of the external repository and picks out the most relevant pieces of information that are most likely to help with generating a response. The speed, accuracy and efficiency of the retriever directly impacts the relevance and timeliness of the information provided to the generator.

Generator: The Generator takes over once the Retriever has found the information it needs. It acts as a storyteller, turning the retrieved information and facts into a coherent and informative response that is contextually appropriate.