Lesson 5: Talk to your data with Retrieval-Augmented Generation (RAG)

In this chapter you will learn:

The basics of Retrieval-Augmented Generation (RAG) and how it can be used to enhance the responses of generative AI models.
How to integrate external data sources into your AI application.
How to leverage your data to improve the relevance and accuracy of the AI-generated content.

Setup

If you haven't already, set up your development environment. Here's how you can do it: Setup your environment.

Related resources

This video explains Retrieval Augmented Generation (RAG), a method that helps the AI use your content alongside its training data for improved results.

🎥 Click on the image above to watch a short video about retrieval augmented generation, RAG

💼 Slides: Retrieval augmented generation, RAG

Narrative - Genesis

Note

Our story so far. You are a mechanic from 1860s London. You were working on your automaton and received a letter from Charles Babbage that ended up taking you to a library where you picked up a time travel device. Throughout your travels in time you've ended up in Florence, where you met Leonardo Da Vinci. You also went to the Aztec empire and this is where the story continues.

See Lesson 1 if you want to catch up with the story from the beginning.

Note

While we recommend going through the story (it's fun!), click here if you'd prefer to jump straight to the technical content.

You: "Leonardo, it's time to go," you said, pressing the button. The device whirred to life, and a mechanical voice echoed, "It's time to go home, it's time for 'genesis'."

Leonardo: "Genesis? Che cosa significa?" Leonardo asked, confused. Before you could respond, the world dissolved into a blur of colors and sounds, the temple fading away as you were pulled through time

You land in the garden, it's late at night with a thick fog and eerie lights are flickering in the distance. The mansion looms before you. Leonardo looks around, his eyes wide with wonder.

Running from the Dogs

You hear barking and the sound of dogs running towards you. You turn to Leonardo, "We need to get inside, now!"

As you reach the mansion's door it swings open and a pair of attendants hurry out. After sizing you up, they motion for you to follow them.

You come face to face with Ada Lovelace, her eyes gleaming with curiosity.

Meeting Ada and Charles

Ada: "Ah, it's about time you arrived," she said warmly. "We need you to run an errand."

You: "About time", you keep saying that. Dinocrates said the same, but I'm not sure what you mean?

Ada: Hush, no time for that now, we need to talk about the device you're holding. Charles, fill them in..

You: But..

Ada Lovelace and Charles Babbage working on a device

Charles Babbage steps forward, examining the Time Beetle in your hand. "This device is remarkable, but it's a bit faulty, isn't it? You've noticed, I'm sure."

Leonardo nodded, "Sì, it has been acting strangely."

Ada: The device isn't quite ready, we need to give it more capabilities. We need to make it smarter, more aware of the world around it. The idea is for it to be able to retrieve information from different time periods and use it to generate responses that are accurate and relevant. Can you help with that?

You: Of course, sounds like we need to augment the responses of the device with data, makes sense.

Ada: Let's talk about a concept I'd like to call RAG, or Retrieval-Augmented Generation.

Interact with Ada Lovelace

If you want to interact with Ada, run the Characters app.

Important

This is entirely fictional; the responses are generated by AI. Responsible AI disclaimer

Steps:

Start a
Navigate to /app in the repo root.
Locate the console and run npm install followed by npm start.
Once it appears, select the "Open in Browser" button.
Chat with Ada.

For a more detailed explanation of the app, see Detailed app explanation.

Note

If you're running the project locally on your machine, please review the QuickStart guide to get a GitHub personal access token setup and replace the key in the code.

Known challenges with large language models, LLMs

Ada: Let’s start by discussing the AI we’ll use to power the device. We’ll rely on “AI models” paired with a data retrieval system to boost response quality.

First, you need to address some challenges before diving into RAG details. These models, trained on vast text data, can produce relevant and correct responses. But, like any data source, their output can be inaccurate, incomplete, or misleading due to various factors.

Out-of-date sources: The data used to train the model may be outdated and no longer accurate.
Wrong or inaccurate information: The sources used to train the model may contain incorrect or misleading information, like fake news or biased opinions.
Non-authoritative sources: The model may not be able to distinguish between authoritative and non-authoritative sources in its training data, leading to unreliable information.

This makes it difficult to tell if the information generated by an LLM is correct or not. This is where RAG comes in.

You: So I need to make sure the device can provide accurate information, even when it's not sure about the answer?

Ada: Yes, that's the idea. By combining the strengths of retrieval-based methods and generative models, we get a better AI system.

Retrieval augmented generation, RAG core concepts

Ada: Ah yes, time to discuss RAG specifically. Let's start with some basics:

Retrieval-Augmented generation (RAG) is a powerful technique that combines the strengths of two different approaches in natural language processing: retrieval-based methods and generative models. This hybrid approach allows for the generation of responses that are both contextually relevant and rich in content, to help alleviate some of the known challenges with LLMs.

At its core, RAG involves two main components: a retriever and a generator.

The retriever: it's responsible for finding relevant information from external data sources that can be used to enhance the AI-generated responses, like a search engine. This information can be in the form of text, images, or any other type of data that is relevant to the context of the conversation, although text is the most common type of data used.
The generator: it takes the retrieved information and uses it to generate a response that is contextually relevant and informative.

Here's a schema illustrating how a RAG system works:

User input: The user asks a question.
Retriever: The retriever component searches for relevant information using one or more knowledge bases.
Augmented prompt: The retrieved information is combined with the user question and context, to create an augmented prompt.
Generator: The LLM uses the augmented prompt to generate a response.

This combination allows for more precise and relevant answers, by using data that you provide instead of relying on the model’s training data.

Ada: Questions?

You: So the retriever finds the information and the generator uses it to generate a response?

Ada: Exactly, you're getting the hang of it.

Integrating external data sources

Ada: Now that we've covered the basics of RAG, let's talk about how you can integrate external data sources into your AI application.

Integrating external data sources into your AI application can be done in a variety of ways, depending on the type of data you want to use and the complexity of the retrieval mechanism. Here are a few common methods:

APIs: Many external data sources provide APIs that allow you to access their data programmatically. You can use these APIs to retrieve information in real-time and use it to enhance the AI-generated responses.
Databases: If you have a large amount of data that you want to use for retrieval, you can store it in a database and query it as needed. This can be useful for structured data that needs to be accessed quickly.

Once you've settled on a method for integrating external data sources, you may also need to consider how to preprocess and format the data so that it can be easily used by the AI model. This can involve cleaning the data, converting it to a suitable format (such as plain text or Markdown), or splitting it into smaller chunks for easier retrieval.

Note

When integrating external data sources into your AI application, it's important to consider the privacy and security implications of accessing and storing data. Make sure you have the necessary permissions and safeguards in place to protect the data and comply with any relevant regulations.

If you're using a database, you also want to think about how you want to search your data to retrieve the most relevant information. This can be done using keyword search, full-text search, or more advanced techniques like semantic search or vector search that may need specific indexing. We'll cover advanced search techniques in a future lesson.

You: Can you explain terms like API and Databases in more 1860s terms?

Ada: Of course, an API is like a messenger that delivers a message from one place to another, and a database is like a library where you store all your books.

You: Ah, I see, that makes sense.

Augmenting the prompt

Ada: Are you still with me? Good, let's move on to the next step, let's try to improve the prompt sent to the AI model.

Ada: Once you’ve set up a way to pull info from your data, you can add it to the AI model’s prompt. Just mix the retrieved info into the input text with some extra context or guidance to steer the AI’s response.

For example, if you're building an app to answer questions about cars, you could have a prompt like the following:


## Instructions
Answer questions about cars using only the sources below.
If there's not enough data in provided sources, say that you don't know.
Be brief and straight to the point.

## Sources
<insert the retrieved information here>

## Question
<insert the question here>

By providing the AI model with additional context and information, you can help guide the generation process and ensure that the responses are accurate and relevant to the topic at hand.

Tip

Note this part of the prompt: If there's not enough data in provided sources, say that you don't know.. This is important to avoid the AI generating incorrect information when there's not enough data to provide a reliable answer. This technique is called an escape hatch and is a good practice to ensure the quality of the generated content.

RAG can be considered as an advanced form of prompt engineering.

Code example

Ada: Practice makes perfect, so let’s apply what we’ve learned with an example. We’ll build a simple retrieval system into a JavaScript app using a CSV file of hybrid car data and a basic search algorithm to pull relevant info based on a user’s question.

// This example demonstrates how to use the Retrieval Augmented Generation (RAG)
// to answer questions based on a hybrid car data set.
// The code below reads the CSV file, searches for matches to the user question,
// and then generates a response based on the information found.

import { fileURLToPath } from 'node:url';
import { dirname } from 'node:path';
import process from "node:process";
import fs from "node:fs";
import { OpenAI } from "openai";

// Change the current working directory to the directory of the script
const __dirname = dirname(fileURLToPath(import.meta.url));
process.chdir(__dirname);

// 1. Ask a question about hybrid cars
// -----------------------------------

const question = `what's the fastest prius`;

// 2. Retriever component: search the data for relevant information
// ----------------------------------------------------------------

// Load CSV data as an array of objects
const rows = fs.readFileSync("./hybrid.csv", "utf8").split("\n");
const columns = rows[0].split(",");

// Search the data using a very naive search
const words = question
  .toLowerCase()
  .replaceAll(/[.?!()'":,]/g, "")
  .split(" ")
  .filter((word) => word.length > 2);
const matches = rows.slice(1).filter((row) => words.some((word) => row.toLowerCase().includes(word)));

// Format as a markdown table, since language models understand markdown
const table =
  `| ${columns.join(" | ")} |\n` +
  `|${columns.map(() => "---").join(" | ")}|\n` +
  matches.map((row) => `| ${row.replaceAll(",", " | ")} |\n`).join("");

console.log(`Found ${matches.length} matches:`);
console.log(table);

// 3. Context augmentation: create a combined prompt with the search results
// --------------------------------------------------------------------------

const augmentedPrompt = `
## Instructions
Answer questions about a time period or characters from said time period using only the sources below.
If there's not enough data in provided sources, say that you don't know.
Be brief and straight to the point.

## Sources
${table}

## Question
${question}
`;

// 4. Generator component: use the search results to generate a response
// ---------------------------------------------------------------------

const openai = new OpenAI({
  baseURL: "https://models.inference.ai.azure.com",
  apiKey: process.env.GITHUB_TOKEN,
});

const chunks = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: augmentedPrompt }],
  stream: true,
});

console.log(`Answer for "${question}":`);

for await (const chunk of chunks) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

You can find this code in the example/rag-cars.js file along with the hybrid.csv file containing the data used for the retrieval.

Ada: Once you run this code, you should see the data found in the CSV file by the retriever, formatted as a markdown table, followed by the AI-generated response to the question. Try changing the question to see how the retrieved data and response change. You can also try asking questions about unrelated topics to see how the AI model handles them.

Example of the output:

Found 1 matches:
| Person | Time Period | Description |
|---|---|---|
| Leonardo Da Vinci | 15th century | Italian polymath known for his art and inventions. |
| Isaac Newton | 17th century | English mathematician and physicist who formulated the laws of motion and universal gravitation. |

You: This is great, I can see how this can be useful when using the device, or rather how it has been already or will be, time travel is confusing sigh.

Ada: There there, you're doing great. Let's move on to the next step.

Assignment - Helping Ada and Charles

Having learned about RAG, you're now ready to help Ada and Charles with their device. However, upon closer inspection the device is looking familiar.

You: Time Beetle, do you know what this is?

Time Beetle: Of course, it's me, or it will be. I'm missing a few parts though. Come to think of it, I'm missing a lot of parts, I don't even have a shell yet.

Ada: The Time Beetle is a device that allows you to travel through time and space, that is once we get it to work properly. As I was saying, we need to add a new feature to it, a retrieval-augmented generation (RAG) module. This will help us retrieve information and needed context from different time periods as you're traveling. We want to make sure we refer to all sorts of sources, Wikipedia is a good start.

You: What do you need me to do?

Ada: Here's example code that retrieves text information about Tim Berners-Lee from Wikipedia, Tim will be very important one day.

const response = await fetch('https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&redirects=true&explaintext&titles=Tim%20Berners-Lee');
const data = await response.json();
const text = Object.values(data.query.pages)[0]?.extract;

You: I take it I'm not the only one who's been to the future?

Ada: ...

Solution

Knowledge check

Question: What is the role of the retriever in a RAG system?

A. The retriever generates responses based on the input data.

B. The retriever generates relevant information based on the model's training data.

C. The retriever finds relevant information from external data sources.

Quiz solution

Self-Study resources

Retrieval-Augmented Generation and Indexes
Sample apps:
- Serverless AI Chat with RAG
- Ask Youtube: A RAG-based Youtube Q&A API
Full-length workshop: Create your own ChatGPT with RAG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Lesson 5: Talk to your data with Retrieval-Augmented Generation (RAG)

Setup

Related resources

Narrative - Genesis

Running from the Dogs

Meeting Ada and Charles

Interact with Ada Lovelace

Known challenges with large language models, LLMs

Retrieval augmented generation, RAG core concepts

Integrating external data sources

Augmenting the prompt

Code example

Assignment - Helping Ada and Charles

Solution

Knowledge check

Self-Study resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

Lesson 5: Talk to your data with Retrieval-Augmented Generation (RAG)

Setup

Related resources

Narrative - Genesis

Running from the Dogs

Meeting Ada and Charles

Interact with Ada Lovelace

Known challenges with large language models, LLMs

Retrieval augmented generation, RAG core concepts

Integrating external data sources

Augmenting the prompt

Code example

Assignment - Helping Ada and Charles

Solution

Knowledge check

Self-Study resources