Skip to content

Commit

Permalink
clean up RCA
Browse files Browse the repository at this point in the history
  • Loading branch information
epec254 committed Jun 8, 2024
1 parent 04c0344 commit 5c35d54
Show file tree
Hide file tree
Showing 55 changed files with 4,511 additions and 863 deletions.
Binary file modified genai_cookbook/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified genai_cookbook/_build/.doctrees/nbs/1-introduction-to-rag.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified genai_cookbook/_build/.doctrees/nbs/5-hands-on-requirements.doctree
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This section provides an overview of Retrieval-augmented generation (RAG): what

## What is retrieval-augmented generation?

Retrieval-augmented generation (RAG) is a technique that enables a large language model (LLM) to generate enriched responses by augmenting a user’s prompt with supporting data retrieved from an outside information source. By incorporating this retrieved information, RAG enables the LLM to generate more accurate, higher quality responses compared to using the prompt alone.
Retrieval-augmented generation (RAG) is a technique that enables a large language model (LLM) to generate enriched responses by augmenting a user’s prompt with supporting data retrieved from an outside information source. By incorporating this retrieved information, RAG enables the LLM to generate more accurate, higher quality responses compared not augmenting the prompt with additional context.

For example, suppose you are building a question-and-answer chatbot to help employees answer questions about your company’s proprietary documents. A standalone LLM won’t be able to accurately answer questions about the content of these documents if it was not specifically trained on them. The LLM might refuse to answer due to a lack of information or, even worse, it might generate an incorrect response.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,24 @@ Unstructured data lacks a predefined data model or schema, making it impossible

During data preparation, the RAG application's data pipeline takes raw unstructured data and transforms it into discrete chunks that can be queried based on their relevance to a user's query. The key steps in data preprocessing are outlined below. Each step has a variety of knobs that can be tuned - for a deeper dive discussion on these knobs, please refer to the [deep dive into RAG section.](/nbs/3-deep-dive)

```{image} ../images/2-fundamentals-unstructured/2_img.png
:align: center
```
<br/>

In the remainder of this section, we describe the process of preparing unstructured data for retrieval using *semantic search*. Semantic search understands the contextual meaning and intent of a user query to provide more relevant search results.

Semantic search is one of several approaches that can be taken when implementing the retrieval component of a RAG application over unstructured data. We cover alternate retrieval strategies in the [retrieval deep dive section](/nbs/3-deep-dive).

```{image} ../images/2-fundamentals-unstructured/2_img.png
:align: center
```

<br/>

The following are the typical steps of a data pipeline in a RAG application using unstructured data:

1. **Parse the raw documents:** The initial step involves transforming raw data into a usable format. This can include extracting text, tables, and images from a collection of PDFs or employing optical character recognition (OCR) techniques to extract text from images.

2. **Extract document metadata (optional):** In some cases, extracting and using document metadata, such as document titles, page numbers, URLs, or other information can help the retrieval step more precisely query the correct data.

3. **Chunk documents:** To ensure the parsed documents can fit into the embedding model and the LLM's context window, we break the parsed documents into smaller, discrete chunks. Retrieving these focused chunks, rather than entire documents, gives the LLM more targeted content from which to generate its responses.
3. **Chunk documents:** To ensure the parsed documents can fit into the embedding model and the LLM's context window, we break the parsed documents into smaller, discrete chunks. Retrieving these focused chunks, rather than entire documents, gives the LLM more targeted context from which to generate its responses.

4. **Embedding chunks:** In a RAG application that uses semantic search, a special type of language model called an *embedding model* transforms each of the chunks from the previous step into numeric vectors, or lists of numbers, that encapsulate the meaning of each piece of content. Crucially, these vectors represent the semantic meaning of the text, not just surface-level keywords. This will later enable searching based on meaning rather than literal text matches.

Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#### Generation quality

##### Debugging generation quality

Even with optimal retrieval, if the LLM component of a RAG chain cannot effectively utilize the retrieved context to generate accurate, coherent, and relevant responses, the final output quality will suffer. Issues with generation quality can arise as hallucinations, inconsistencies, or failure to concisely address the user's query, to name a few.

To identify generation quality issues, you can use the approach outlined in the [Evaluation section](#section-4-evaluation). If evaluation results indicate poor generation quality (e.g., low accuracy, coherence, or relevance scores), you'll need to investigate further to identify the root cause.

The following is a step-by-step process to address **generation quality** issues:

1. Identify a set of test queries with low generation quality metrics.

2. For each query, manually examine the generated response and compare it to the retrieved context and the ground-truth response.

3. Look for patterns or common issues among the queries with low generation quality. Some examples:
- Generating information not present in the retrieved context or outputting contradicting information with respect to the retrieved context (i.e., hallucination)
- Failure to directly address the user's query given the provided retrieved context
- Generating responses that are overly verbose, difficult to understand or lack logical coherence

4. Based on the identified issues, hypothesize potential root causes and corresponding fixes. See the "[Common reasons for poor generation quality](#common-reasons-for-poor-generation-quality)" table below for guidance.

5. Implement the proposed fix for the most promising or impactful root cause. This may involve modifying the RAG chain (e.g., adjusting the prompt template, trying a different LLM) or the data pipeline (e.g., adjusting the chunking strategy to provide more context).

6. Re-run evals on the updated system and compare generation quality metrics to the previous version. If there is significant improvement, consider deploying the updated RAG application for further testing with end-users (see the [Deployment](#deployment) section).

7. If the generation quality is still not satisfactory, repeat steps 4-6 for the next most promising fix until the desired performance is achieved.

##### Common reasons for poor generation quality

Each of these potential fixes are tagged as one of three types. Based on the type of change, you will follow different steps in section 3.


<table>
<thead>
<tr>
<th>Generation Issue</th>
<th>Debugging Steps</th>
<th>Potential Fix</th>
</tr>
</thead>
<tbody>
<tr>
<td>Generating information not present in the retrieved context (e.g., hallucinations)</td>
<td><ul><li>Compare generated responses to retrieved context to identify hallucinated information</li><li>Assess if certain types of queries or retrieved context are more prone to hallucinations</td>
<td><ul><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Update prompt template to emphasize reliance on retrieved context</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Use a more capable LLM</li><li><img src="../_images/chain_code.png" alt="chain-code" height="20"/> Implement a fact-checking or verification step post-generation</td>
</tr>
<tr>
<td>Failure to directly address the user&#39;s query or providing overly generic responses</td>
<td><ul><li>Compare generated responses to user queries to assess relevance and specificity</li><li>Check if certain types of queries result in the correct context being retrieved, but the LLM producing low quality output</td>
<td><ul><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Improve prompt template to encourage direct, specific responses</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Retrieve more targeted context by improving the retrieval process</li><li><img src="../_images/chain_code.png" alt="chain-code" height="20"/> Re-rank retrieval results to put most relevant chunks first, only provide these to the LLM</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Use a more capable LLM</td>
</tr>
<tr>
<td>Generating responses that are difficult to understand or lack logical flow</td>
<td><ul><li>Assess output for logical flow, grammatical correctness, and understandability</li><li>Analyze if incoherence occurs more often with certain types of queries or when certain types of context are retrieved</td>
<td><ul><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Change prompt template to encourage coherent, well-structured response</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Provide more context to the LLM by retrieving additional relevant chunks</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Use a more capable LLM</td>
</tr>
<tr>
<td>Generated responses are not in the desired format or style</td>
<td><ul><li>Compare output to expected format and style guidelines</li><li>Assess if certain types of queries or retrieved context are more likely to result in format/style deviations</td>
<td><ul><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Update prompt template to specify the desired output format and style</li><li><img src="../_images/chain_code.png" alt="chain-code" height="20"/> Implement a post-processing step to convert the generated response into the desired format</li><li><img src="../_images/chain_code.png" alt="chain-code" height="20"/> Add a step to validate output structure/style, and output a fallback answer if needed.</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Use an LLM fine-tuned to provide outputs in a specific format or style</td>
</tr>
</tbody>
</table>
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#### Generation quality

##### Debugging generation quality

**You are on this page because your [root cause analysis](./5-hands-on-improve-quality-step-1.md) said that LLM generation quality was the issue to focus.**

Even with optimal retrieval, if the LLM component of a RAG chain cannot effectively utilize the retrieved context to generate accurate, coherent, and relevant responses, the final output quality will suffer. Issues with generation quality can arise as hallucinations, inconsistencies, or failure to concisely address the user's query, to name a few.

The following is a step-by-step process to address **generation quality** issues:

1. Identify a set of test queries with low generation quality metrics.

2. For each query, manually examine the generated response and compare it to the retrieved context and the ground-truth response.

3. Look for patterns or common issues among the queries with low generation quality. Some examples:
- Generating information not present in the retrieved context or outputting contradicting information with respect to the retrieved context (i.e., hallucination)
- Failure to directly address the user's query given the provided retrieved context
- Generating responses that are overly verbose, difficult to understand or lack logical coherence

4. Based on the identified issues, hypothesize potential root causes and corresponding fixes. See the "[Common reasons for poor generation quality](#common-reasons-for-poor-generation-quality)" table below for guidance.

5. Implement the proposed fix for the most promising or impactful root cause by following [step 4.2](./5-hands-on-improve-quality-step-2.md). This may involve modifying the RAG chain (e.g., adjusting the prompt template, trying a different LLM) or the data pipeline (e.g., adjusting the chunking strategy to provide more context).

6. Re-run evaluation on the updated system and compare generation quality metrics to the previous version. Once generation quality is at a desired level, re-run the [root cause analysis](./5-hands-on-improve-quality-step-1.md) to determine if the overall chain has any additional issues that should be addressed.

7. If the generation quality is still not satisfactory, repeat steps 4-6 for the next most promising fix until the desired performance is achieved.

##### Common reasons for poor generation quality

Each of these potential fixes are tagged as one of three types. Based on the type of change, you will follow different steps in [step 4.2](./5-hands-on-improve-quality-step-2.md).

<table>
<thead>
<tr>
<th>Generation Issue</th>
<th>Debugging Steps</th>
<th>Potential Fix</th>
</tr>
</thead>
<tbody>
<tr>
<td>Generating information not present in the retrieved context (e.g., hallucinations)</td>
<td><ul><li>Compare generated responses to retrieved context to identify hallucinated information</li><li>Assess if certain types of queries or retrieved context are more prone to hallucinations</td>
<td><ul><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Update prompt template to emphasize reliance on retrieved context</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Use a more capable LLM</li><li><img src="../_images/chain_code.png" alt="chain-code" height="20"/> Implement a fact-checking or verification step post-generation</td>
</tr>
<tr>
<td>Failure to directly address the user&#39;s query or providing overly generic responses</td>
<td><ul><li>Compare generated responses to user queries to assess relevance and specificity</li><li>Check if certain types of queries result in the correct context being retrieved, but the LLM producing low quality output</td>
<td><ul><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Improve prompt template to encourage direct, specific responses</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Retrieve more targeted context by improving the retrieval process</li><li><img src="../_images/chain_code.png" alt="chain-code" height="20"/> Re-rank retrieval results to put most relevant chunks first, only provide these to the LLM</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Use a more capable LLM</td>
</tr>
<tr>
<td>Generating responses that are difficult to understand or lack logical flow</td>
<td><ul><li>Assess output for logical flow, grammatical correctness, and understandability</li><li>Analyze if incoherence occurs more often with certain types of queries or when certain types of context are retrieved</td>
<td><ul><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Change prompt template to encourage coherent, well-structured response</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Provide more context to the LLM by retrieving additional relevant chunks</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Use a more capable LLM</td>
</tr>
<tr>
<td>Generated responses are not in the desired format or style</td>
<td><ul><li>Compare output to expected format and style guidelines</li><li>Assess if certain types of queries or retrieved context are more likely to result in format/style deviations</td>
<td><ul><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Update prompt template to specify the desired output format and style</li><li><img src="../_images/chain_code.png" alt="chain-code" height="20"/> Implement a post-processing step to convert the generated response into the desired format</li><li><img src="../_images/chain_code.png" alt="chain-code" height="20"/> Add a step to validate output structure/style, and output a fallback answer if needed.</li><li><img src="../_images/chain_config.png" alt="chain-config" height="20"/> Use an LLM fine-tuned to provide outputs in a specific format or style</td>
</tr>
</tbody>
</table>

<br/>
<br/>
Loading

0 comments on commit 5c35d54

Please sign in to comment.