Clean up of dev workflow and fundamentlas

databricks · Jun 8, 2024 · 7894718 · 7894718
1 parent 7e1be4f
commit 7894718
Show file tree

Hide file tree

Showing 5 changed files with 11 additions and 13 deletions.
diff --git a/genai_cookbook/nbs/2-fundamentals-unstructured-data-pipeline.md b/genai_cookbook/nbs/2-fundamentals-unstructured-data-pipeline.md
@@ -13,7 +13,7 @@ During data preparation, the RAG application's data pipeline takes raw unstructu
 
 In the remainder of this section, we describe the process of preparing unstructured data for retrieval using *semantic search*. Semantic search understands the contextual meaning and intent of a user query to provide more relevant search results.
 
-Semantic search is one of several approaches that can be taken when implementing the retrieval component of a RAG application over unstructured data. We cover alternate retrieval strategies in the [retrieval deep dive section](/nbs/3-deep-dive).
+Semantic search is one of several approaches that can be taken when implementing the retrieval component of a RAG application over unstructured data. We cover alternate retrieval strategies in the [retrieval knobs section](/nbs/3-deep-dive).
 
 
 

diff --git a/genai_cookbook/nbs/2-fundamentals-unstructured-eval.md b/genai_cookbook/nbs/2-fundamentals-unstructured-eval.md
@@ -11,13 +11,14 @@ Evaluation and monitoring of Generative AI applications, including RAG, differs
 | **Metrics** | Metrics evaluate the __inputs & outputs__ of the component e.g., feature drift, precision/recall, latency, etc <br/><br/> Since there is only one component, overall metrics == component metrics. | __Component metrics__ evaluate the __inputs & outputs__ of each component e.g., precision @ K, nDCG, latency, toxicity, etc <br/><br/>__Compound metrics__ evaluate how multiple components interact e.g., faithfulness measures the generator’s adherence to the knowledge from a retriever which requires the chain input, chain output, and output of the internal retriever<br/><br/>__Overall metrics__ evaluate the overall input & output of the system e.g., answer correctness, latency |
 | **Evaluation** | Answer is __deterministically__ “right” or “wrong” <br/><br/> → __Deterministic metrics__ work | Answer is “right” or “wrong” but: <br/><ul><li>Many right answers (non deterministic)</li><li>Some right answers are more right</li></ul><br/>→ Need __human feedback__ to be confident<br/>→ Need __LLM-judged metrics__ to scale evaluation<br/> |
 
+Effectively evaluating and monitoring application quality, cost and latency requires several components:
+
+
 ```{image} ../images/2-fundamentals-unstructured/4_img.png
 :align: center
 ```
 <br/>
 
-Effectively evaluating and monitoring application quality, cost and latency requires several components:
-
 - **Evaluation set:** To rigorously evaluate your RAG application, you need a curated set of evaluation queries (and ideally outputs) that are representative of the application's intended use. These evaluation examples should be challenging, diverse, and updated to reflect changing usage and requirements.
 
 - **Metric definitions**: You can't manage what you don't measure. In order to improve RAG quality, it is essential to define what quality means for your use case. Depending on the application, important metrics might include response accuracy, latency, cost, or ratings from key stakeholders.  You'll need metrics that measure each component, how the components interact with each other, and the overall system.

diff --git a/genai_cookbook/nbs/2-fundamentals-unstructured.md b/genai_cookbook/nbs/2-fundamentals-unstructured.md
@@ -16,5 +16,4 @@ This section will introduce the key components and principles behind developing
 :alt: Major components of RAG over unstructured data
 :align: center
 ```
-
-The [next section](/nbs/3-deep-dive) of this guide will unpack the finer details of the typical components that make up the data pipeline and RAG chain of a RAG application using unstructured data.
+<br/>
diff --git a/genai_cookbook/nbs/3-deep-dive-chain.md b/genai_cookbook/nbs/3-deep-dive-chain.md
@@ -50,7 +50,7 @@ Using the user query directly as a retrieval query can work for some queries. Ho
 ```{eval-rst}
 .. note::
 
-   Filter extraction must be done in conjunction with changes to both metadata extraction [data pipeline] and retrieval [RAG chain] components. The metadata extraction step should ensure that the relevant metadata fields are available for each document/chunk, and the retrieval step should be implemented to accept and apply extracted filters.
+   Filter extraction must be done in conjunction with changes to both metadata extraction [data pipeline](./3-deep-dive-data-pipeline.md) and [retriever chain](#retrieval) components. The metadata extraction step should ensure that the relevant metadata fields are available for each document/chunk, and the retrieval step should be implemented to accept and apply extracted filters.
 
 .. include:: ./include-rst.rst
 ```

diff --git a/genai_cookbook/nbs/5-rag-development-workflow.md b/genai_cookbook/nbs/5-rag-development-workflow.md
@@ -14,18 +14,16 @@ This section walks you through Databricks recommended development workflow for b
 ```{image} ../images/5-hands-on/1_img.png
 :align: center
 ```
-
-Mapping to this workflow, this section provides ready-to-run sample code for every step and every suggestion to improve quality.
-
-Throughout, we will demonstrate evaluation-driven development using one of Databricks' internal use generative AI cases: using a RAG bot to help answer customer support questions in order to [1] reduce support costs [2] improve the customer experience.
+<br/>
+The [implement](./5-hands-on-requirements.md) section of this cookbook provides a guided implementation of this workflow with sample code.
 
 There are two core concepts in **evaluation-driven development:**
 
-1. **Metrics:** Defining high-quality
+1. [**Metrics:**](./4-evaluation-metrics.md) Defining what high-quality means
 
    *Similar to how you set business goals each year, you need to define what high-quality means for your use case.* *Databricks' Quality Lab provides a suggested set of* *N metrics to use, the most important of which is answer accuracy or correctness - is the RAG application providing the right answer?*
 
-2. **Evaluation:** Objectively measuring the metrics
+2. [**Evaluation set:**](./4-evaluation-eval-sets.md) Objectively measuring the metrics
 
    *To objectively measure quality, you need an evaluation set, which contains questions with known-good answers validated by humans. While this may seem scary at first - you probably don't have an evaluation set sitting ready to go - this guide walks you through the process of developing and iteratively refining this evaluation set.*
 
@@ -35,4 +33,4 @@ Anchoring against metrics and an evaluation set provides the following benefits:
 
 2. Getting alignment with business stakeholders on the readiness of the application for production becomes more straightforward when you can confidently state, *"we know our application answers the most critical questions to our business correctly and doesn't hallucinate."*
 
-*>> Evaluation-driven development is known in the academic research community as "hill climbing" akin to climbing a hill to reach the peak - where the hill is your metric and the peak is 100% accuracy on your evaluation set.*
+> Evaluation-driven development is known in the academic research community as ["hill climbing"](https://en.wikipedia.org/wiki/Hill_climbing) akin to climbing a hill to reach the peak - where the hill is your metric and the peak is 100% accuracy on your evaluation set.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -13,7 +13,7 @@ During data preparation, the RAG application's data pipeline takes raw unstructu

		In the remainder of this section, we describe the process of preparing unstructured data for retrieval using semantic search. Semantic search understands the contextual meaning and intent of a user query to provide more relevant search results.

		Semantic search is one of several approaches that can be taken when implementing the retrieval component of a RAG application over unstructured data. We cover alternate retrieval strategies in the [retrieval deep dive section](/nbs/3-deep-dive).
		Semantic search is one of several approaches that can be taken when implementing the retrieval component of a RAG application over unstructured data. We cover alternate retrieval strategies in the [retrieval knobs section](/nbs/3-deep-dive).



Expand Down