Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta,

Retrieval Augmented Generation (RAG) is a powerful hybrid model that merges the best of two worlds: retrieval-based systems and generative models. In RAG, the system first retrieves relevant information from a knowledge base, and then uses this data to generate more accurate and contextually relevant responses. This allows the system to provide fact-based, enriched answers that go beyond standard generative models, which rely only on their training data.

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Deep Dave on 1st Nov 2024.

 

Applause for all the respondents - Sachin Tanwar, Deep Dave.

Featured Replies

Q 716. Compare RAG with fine-tuning for an LLM-powered Agent. When will you use RAG? When will you use fine-tuning? When would you like to use a combination?

 

Note for website visitors -

Solved by Deep Dave

RAG alongside fine-tuning are two distinct approaches with different capabilities that a large language model can acquire.

 

RAG (Retrieval-Augmented Generation)

  • What it is: RAG is a combination of language model and retrieval system. This means that it can get a reference from a database or a document in order to provide the most exact and actual information.
  • When to use it: RAG works best when the model has access to the bulk of data that might be time-varying such as company policies and product details. It is a perfect way to do tasks where the reference or data is quickly changing.

Fine-Tuning

  • What it is: Fine-tuning is the process of making the language model better at specific tasks by training it on a certain dataset. The parameters are then set up to align the model to the new data.
  • When to use it: The fine-tuning method is the best choice when you require high performance in the model when performing a specific task or set of tasks, for example, customer service for a specific product, and interpreting medical terminology. It helps in such cases when the data does not vary regularly, and you need high precision.

Using Both Together

  • When to combine: There are times when you may wish to switch between RAG and fine-tuning, which is also an option. For instance, you may fine-tune a model to fit your organization’s customer service to then exploit RAG for the new product information. Hence, the model is both knowledgeable and current.
  • Solution

Retrieval-Augmented Generation (RAG) and fine-tuning for LLM-Powered Agent (Large Language Models) are both types of AI language generation methods of Natural Language Processing (NLP) with some fundamental difference in response generation. Let's understand:

 

In RAG, the response generation is based on external source of knowledge that helps with real-time information update & the same information is used to augment the model's response. Basically, instead of relying on internal knowledge used for model training, RAG has kind of "Open-Book" approach where any additional or real-time information is looked from external sources.

 

Whereas in Fine-tuning for an LLM-Powered Agent, there is pre-existing internal data source which is fine-tuned through continuous improvement on a specific set of data to specialize it for specific task or domain. Say for example, we are adding specific books in internal data sources for generation of more advanced & niche responses which industry or domain specific.

 

Application for RAG:

  • Dynamic & real-time knowledge requirements (e.g. stock price, latest news, latest research etc.)
  • Model Size Reduction (as no need for huge internal data source required)
  • Cost & Efficiency (requirement of low or no internal data eliminates requirement of high energy consuming servers for storing internal database)

Application for Fine-Tuning with LLM Powered Agent:

  • Nuances & Specialized Tasks (e.g. application with special domain language, knowledge and jargons like medical, legal etc.)
  • Consistent Tone and Style (nuanced in line with ask in the prompt)

Hybrid Use Case (RAG & Fine-Tuning with LLM Powered-Agent)

 

In cases where we need the response that is specialized in language and also dynamic with respect to real-time or latest information update, we need both internal and external sources for response generation. In this scenario, we need to explore the Hybrid model.

 

Examples: Let's say we have created Financial Advisor Chatbot which uses finance related language and tax laws using (Fine-Tuning with LLM) as well as gives advice considering latest market dynamics (stocks, bonds, investment avenues etc. - RAG). In this case, it's best to use both internal and external sources to generate ideal response.

 

To summarize, hybrid model is ideal scenario for specialized and real-time response generation.

 

 

Deep Dave has provided the best answer to this question. Well done!

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.