Should I use RAG to fine-tune my GenAI Application?

Creating and training large language models (LLMs) is an expensive and challenging process that most organizations don’t have the resources and capabilities to undertake. So, how do I get an LLM to provide relevant and accurate information enough to automate a specific business process?

The solution is to enhance the LLM’s capabilities by incorporating additional authoritative data sources and leveraging specific features within the LLM during response generation. This is where RAG comes in.

The decision to add data to an LLM is complicated as it impacts many essential aspects of an AI application, including accuracy, relevance, scope, and ethics.

We’ve already discussed prompt engineering and prompt transformation as methods to manage interaction and some of the advantages and pitfalls associated with their use.

What about RAG?

RAG stands for Retrieval Augmented Generation. This technique intercepts the user’s query and performs a directed search for additional data. Then, it takes the search results and inserts them along with the query into the LLM’s prompt.

RAG allows you to add a layer of data to the AI application without training or retraining the model, significantly reducing the cost of creating a business process-specific AI application.

RAG Risks?

The most significant risk introduced by RAG is the potential that the search and integration of the application-specific data into the LLM query process produces a result that is either unreliable or dangerous. You are also adding more business logic to the application, which can have quality and security issues.

The search process needs to produce relevant results that can be formatted to enrich the query, and the data needs to be relevant. Also, the response from the augmented prompt needs to be interrogated to ensure that proprietary data is not unintentionally leaked.

There is also the risk that RAG will significantly increase the cost of interacting with the LLM without improving output quality. The more data you insert into the prompt, the more processing power the LLM may have to deploy to generate a response.

How do I mitigate these risks?

The key is to factor the risks into the design of the primary components, including data, search, prompt construction, and response processing. Each layer of the RAG delivery architecture needs to be hardened. Data should be rationalized to eliminate unnecessary elements, reducing the risk of leakage. The search process needs to be tailored to the data to produce usable responses, and the results must be analyzed before submission to a prompt.

The tremendous potential of generative AI is easy to see when interacting with LLM via a chat window. The challenge is figuring out how to harness that general-purpose technology to perform specific tasks reliably at the right level of cost and risk.

← Prev: How do we harden a generative AI application? What are Few-Shot Prompts? →

Insights & News

All Insights