Skip to content

Commit

Permalink
Push all changes from staging repo
Browse files Browse the repository at this point in the history
  • Loading branch information
epec254 committed Jun 10, 2024
1 parent 6cacd1f commit 08544cb
Show file tree
Hide file tree
Showing 151 changed files with 3,422 additions and 5,222 deletions.
2 changes: 2 additions & 0 deletions dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
jupyter-book
livereload
19 changes: 12 additions & 7 deletions genai_cookbook/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
format: jb-book
root: index
parts:
- caption: "Overview"
chapters:
- file: index-2
- caption: "Learn"
numbered: true
chapters:
Expand All @@ -25,7 +28,7 @@ parts:
- file: nbs/4-evaluation-infra
- file: nbs/5-rag-development-workflow
- caption: "Implement"
numbered: true
numbered: false
chapters:
- file: nbs/5-hands-on-requirements
- file: nbs/6-implement-overview
Expand All @@ -41,13 +44,15 @@ parts:
# - caption: "Build: Debug & iterate on RAG quality"
# numbered: true
# chapters:
- file: nbs/5-hands-on-improve-quality
# - file: nbs/5-hands-on-improve-quality
# sections:
- file: nbs/5-hands-on-improve-quality-step-1
sections:
- file: nbs/5-hands-on-improve-quality-step-1-retrieval
- file: nbs/5-hands-on-improve-quality-step-1-generation
- file: nbs/5-hands-on-improve-quality-step-2
sections:
- file: nbs/5-hands-on-improve-quality-step-1
sections:
- file: nbs/5-hands-on-improve-quality-step-1-retrieval
- file: nbs/5-hands-on-improve-quality-step-1-generation
- file: nbs/5-hands-on-improve-quality-step-2
- file: nbs/5-hands-on-improve-quality-step-2-data-pipeline
- file: nbs/5-hands-on-deploy-and-monitor
# - caption: "Deploy & monitor a RAG app"
# chapters:
Expand Down
Binary file modified genai_cookbook/images/.DS_Store
Binary file not shown.
Binary file modified genai_cookbook/images/5-hands-on/1_img.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added genai_cookbook/images/5-hands-on/fail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added genai_cookbook/images/5-hands-on/pass.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added genai_cookbook/images/5-hands-on/workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added genai_cookbook/images/5-hands-on/workflow_poc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
80 changes: 80 additions & 0 deletions genai_cookbook/index-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
title: Databricks Generative AI Cookbook
---

# Databricks Generative AI Cookbook

**TLDR;** this cookbook and its sample code will take you from initial POC to high-quality production-ready application using [Mosaic AI Quality Lab](https://docs.databricks.com/generative-ai/agent-evaluation/index.html) and [Mosaic AI Agent Framework](https://docs.databricks.com/generative-ai/retrieval-augmented-generation.html) on the Databricks platform.

The Databricks Generative AI Cookbook is a definitive how-to guide for building *high-quality* generative AI applications. *High-quality* applications are applications that:
1. **Accurate:** provide correct responses
2. **Safe:** do not deliver harmful or insecure responses
3. **Governed:** respect data permissions & access controls and track lineage

Developed in partnership with Mosaic AI's research team, this cookbook lays out Databricks best-practice development workflow for building high-quality RAG apps: *evaluation driven development.* It outlines the most relevant knobs & approaches that can increase RAG application quality and provides a comprehensive repository of sample code implementing those techniques.

```{important}
- Only have 10 minutes and want to see a demo of Mosaic AI Agent Framework & Quality lab? Start [here](https://DBDEMO).
- Want to hop into code and deploy a RAG POC using your data? Start [here](./nbs/6-implement-overview.md).
- Don't have any data, but want to deploy a sample RAG application? Start here.
```

```{image} images/index/dbxquality.png
:align: center
```

<br/>


```{image} images/5-hands-on/review_app2.gif
:align: center
```

<br/>

This cookbook is intended for use with the Databricks platform. Specifically:
- [Mosaic AI Agent Framework](https://docs.databricks.com/generative-ai/retrieval-augmented-generation.html) which provides a fast developer workflow with enterprise-ready LLMops & governance
- [Mosaic AI Quality Lab](https://docs.databricks.com/generative-ai/agent-evaluation/index.html) which provides reliable, quality measurement using proprietary AI-assisted LLM judges to measure quality metrics that are powered by human feedback collected through an intuitive web-based chat UI


# Retrieval-augmented generation (RAG)

> This first release focuses on retrieval-augmented generation (RAG). Future releases will include the other popular generative AI techniques: agents & function calling, prompt engineering, fine tuning, and pre-training.
The RAG cookbook is divided into 2 sections:
1. [**Learn:**](#learn) Understand the required components of a production-ready, high-quality RAG application
2. [**Implement:**](#implement) Use our sample code to follow an evaluation-driven workflow for delivering a high-quality RAG application

## Code-based quick starts

| Time required | Outcome | Link |
|------ | ---- | ---- |
| 🕧 <br/> 10 minutes | Sample RAG app deployed to web-based chat app that collects feedback | [RAG Demo]((https://DBDEMO)) |
| 🕧🕧🕧 <br/>60 minutes | POC RAG app with *your data* deployed to a chat UI that can collect feedback from your business stakeholders | [Build & deploy a POC](./nbs/5-hands-on-build-poc.md)|
| 🕧🕧 <br/>30 minutes | Comprehensive quality/cost/latency evaluation of your POC app | - [Evaluate your POC](./nbs/5-hands-on-evaluate-poc.md) <br/> - [Identify the root causes of quality issues](./nbs/5-hands-on-improve-quality-step-1.md) |



## Table of contents
<!--
**Table of contents**
1. [RAG overview](./nbs/1-introduction-to-rag): Understand how RAG works at a high-level
2. [RAG fundamentals](./nbs/2-fundamentals-unstructured): Understand the key components in a RAG app
3. [RAG quality knobs](./nbs/3-deep-dive): Understand the knobs Databricks recommends tuning improve RAG app quality
4. [RAG quality evaluation deep dive](./nbs/4-evaluation): Understand how RAG evaluation works, including creating evaluation sets, the quality metrics that matter, and required developer tooling
5. [Evaluation-driven development](nbs/5-rag-development-workflow.md): Understand Databricks recommended development workflow for building, testing, and deploying a high-quality RAG application: evaluation-driven development-->

```{tableofcontents}
```
<!--
#### Implement
**Table of contents**
1. [Gather Requirements](./nbs/5-hands-on-requirements.md): Requirements you must discover from stakeholders before building a RAG app
2. [Deploy POC to Collect Stakeholder Feedback](./nbs/5-hands-on-build-poc.md): Launch a proof of concept (POC) to gather feedback from stakeholders and understand baseline quality
3. [Evaluate POC’s Quality](./nbs/5-hands-on-evaluate-poc.md): Assess the quality of your POC to identify areas for improvement
4. [Root Cause & Iteratively Fix Quality Issues](./nbs/5-hands-on-improve-quality.md): Diagnose the root causes of any quality issues and apply iterative fixes to improve the app's quality
5. [Deploy & Monitor](./nbs/5-hands-on-deploy-and-monitor.md): Deploy the finalized RAG app to production and continuously monitor its performance to ensure sustained quality.
-->
81 changes: 51 additions & 30 deletions genai_cookbook/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,42 +2,71 @@
title: Databricks Generative AI Cookbook
---

# Databricks Mosaic Generative AI Cookbook
# Databricks Generative AI Cookbook

The Databricks Generative AI Cookbook is a definitive how-to guide for building *high-quality* generative AI applications. *High-quality* applications are:
1. **Accurate:** provides correct responses
2. **Safe:** does not deliver harmful or insecure responses
3. **Governed:** respects permissions & access controls
**TLDR;** this cookbook and its sample code will take you from initial POC to high-quality production-ready application using [Mosaic AI Quality Lab](https://docs.databricks.com/generative-ai/agent-evaluation/index.html) and [Mosaic AI Agent Framework](https://docs.databricks.com/generative-ai/retrieval-augmented-generation.html) on the Databricks platform.

Developed in partnership with Mosaic AI's research team, this cookbook lays out Databricks best-practice development workflow for building high-quality RAG apps: *evaluation driven development.* It outlines the most relevant knobs & approaches that can increase quality and provides a comprehensive repository of sample code implementing those techniques. This code & cookbook will take you from initial POC to high-quality production-ready application.
The Databricks Generative AI Cookbook is a definitive how-to guide for building *high-quality* generative AI applications. *High-quality* applications are applications that:
1. **Accurate:** provide correct responses
2. **Safe:** do not deliver harmful or insecure responses
3. **Governed:** respect data permissions & access controls and track lineage

> This first release focuses on retrieval-augmented generation (RAG). Future releases will include the other popular generative AI techniques: agents & function calling, prompt engineering, fine tuning, and pre-training.
Developed in partnership with Mosaic AI's research team, this cookbook lays out Databricks best-practice development workflow for building high-quality RAG apps: *evaluation driven development.* It outlines the most relevant knobs & approaches that can increase RAG application quality and provides a comprehensive repository of sample code implementing those techniques.

## Retrieval-augmented generation (RAG)
```{important}
- Only have 10 minutes and want to see a demo of Mosaic AI Agent Framework & Quality lab? Start [here](https://DBDEMO).
- Want to hop into code and deploy a RAG POC using your data? Start [here](./nbs/6-implement-overview.md).
- Don't have any data, but want to deploy a sample RAG application? Start here.
```

The RAG cookbook is divided into 2 sections:
1. [**Learn:**](#learn) Understand the required components of a production-ready, high-quality RAG application
2. [**Implement:**](#implement) Use our sample code to follow the Databricks-recommended developer workflow for delivering a high-quality RAG application
```{image} images/index/dbxquality.png
:align: center
```

<br/>

#### Learn

**Table of contents**
1. [RAG overview](./nbs/1-introduction-to-rag): High level overview of the basic concepts of RAG
2. [RAG fundamentals](./nbs/2-fundamentals-unstructured): Introduction to the key components of a RAG application
3. [RAG quality knobs](./nbs/3-deep-dive): Explains the knobs that Databricks recommends tuning in order to improve RAG application quality
4. [RAG quality evaluation deep dive](./nbs/4-evaluation): Understand how RAG evaluation works, including creating evaluation sets, the quality metrics that matter, and required developer tooling
5. [RAG development workflow](nbs/5-rag-development-workflow.md): Understand Databricks recommended development workflow for building, testing, and deploying a high-quality RAG application: evaluation-driven development
```{image} images/5-hands-on/review_app2.gif
:align: center
```

<br/>

This cookbook is intended for use with the Databricks platform. Specifically:
- [Mosaic AI Agent Framework](https://docs.databricks.com/generative-ai/retrieval-augmented-generation.html) which provides a fast developer workflow with enterprise-ready LLMops & governance
- [Mosaic AI Quality Lab](https://docs.databricks.com/generative-ai/agent-evaluation/index.html) which provides reliable, quality measurement using proprietary AI-assisted LLM judges to measure quality metrics that are powered by human feedback collected through an intuitive web-based chat UI


# Retrieval-augmented generation (RAG)

> This first release focuses on retrieval-augmented generation (RAG). Future releases will include the other popular generative AI techniques: agents & function calling, prompt engineering, fine tuning, and pre-training.
**Getting started**
The RAG cookbook is divided into 2 sections:
1. [**Learn:**](#learn) Understand the required components of a production-ready, high-quality RAG application
2. [**Implement:**](#implement) Use our sample code to follow an evaluation-driven workflow for delivering a high-quality RAG application

## Code-based quick starts

| Time required | Outcome | Link |
|------ | ---- | ---- |
| 🕧 <br/> 10 minutes | Sample RAG app deployed to web-based chat app that collects feedback | [RAG Demo]((https://DBDEMO)) |
| 🕧🕧🕧 <br/>60 minutes | POC RAG app with *your data* deployed to a chat UI that can collect feedback from your business stakeholders | [Build a POC](./nbs/5-hands-on-build-poc.md)|
| 🕧🕧 <br/>30 minutes | Comprehensive quality/cost/latency evaluation of your POC app | [Evaluate your POC](./nbs/5-hands-on-evaluate-poc.md) |
| 🕧🕧🕧 <br/>60 minutes | POC RAG app with *your data* deployed to a chat UI that can collect feedback from your business stakeholders | [Build & deploy a POC](./nbs/5-hands-on-build-poc.md)|
| 🕧🕧 <br/>30 minutes | Comprehensive quality/cost/latency evaluation of your POC app | - [Evaluate your POC](./nbs/5-hands-on-evaluate-poc.md) <br/> - [Identify the root causes of quality issues](./nbs/5-hands-on-improve-quality-step-1.md) |



## Table of contents
<!--
**Table of contents**
1. [RAG overview](./nbs/1-introduction-to-rag): Understand how RAG works at a high-level
2. [RAG fundamentals](./nbs/2-fundamentals-unstructured): Understand the key components in a RAG app
3. [RAG quality knobs](./nbs/3-deep-dive): Understand the knobs Databricks recommends tuning improve RAG app quality
4. [RAG quality evaluation deep dive](./nbs/4-evaluation): Understand how RAG evaluation works, including creating evaluation sets, the quality metrics that matter, and required developer tooling
5. [Evaluation-driven development](nbs/5-rag-development-workflow.md): Understand Databricks recommended development workflow for building, testing, and deploying a high-quality RAG application: evaluation-driven development-->

```{tableofcontents}
```
<!--
#### Implement
**Table of contents**
Expand All @@ -48,12 +77,4 @@ The RAG cookbook is divided into 2 sections:
3. [Evaluate POC’s Quality](./nbs/5-hands-on-evaluate-poc.md): Assess the quality of your POC to identify areas for improvement
4. [Root Cause & Iteratively Fix Quality Issues](./nbs/5-hands-on-improve-quality.md): Diagnose the root causes of any quality issues and apply iterative fixes to improve the app's quality
5. [Deploy & Monitor](./nbs/5-hands-on-deploy-and-monitor.md): Deploy the finalized RAG app to production and continuously monitor its performance to ensure sustained quality.

**Getting started**


| Time required | Outcome | Link |
|------ | ---- | ---- |
| 🕧 <br/> 5 minutes | Understand how RAG works at a high-level | [Intro to RAG](./nbs/1-introduction-to-rag.md) |
| 🕧🕧 <br/> 30 minutes |Understand the key components in a RAG app | [RAG fundamentals](./nbs/2-fundamentals-unstructured.md) |
| 🕧🕧🕧 <br/> 60 minutes | Understand the knobs Databricks recommends tuning improve RAG app quality | [RAG quality knobs](./nbs/3-deep-dive.md) |
-->
2 changes: 1 addition & 1 deletion genai_cookbook/nbs/4-evaluation-eval-sets.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ A good evaluation set has the following characteristics:

- **Representative:** Accurately reflects the variety of requests the application will encounter in production.
- **Challenging:** The set should include difficult and diverse cases to effectively test the model's capabilities. Ideally, it will include adversarial examples such as questions attempting prompt injection or questions attempting to generate inappropriate responses from LLM.
- **Continually updated:** The set must be periodically updated to reflect how the application is used in production and the changing nature of the indexed data.
- **Continually updated:** The set must be periodically updated to reflect how the application is used in production, the changing nature of the indexed data, and any changes to the application requirements.

Databricks recommends at least 30 questions in your evaluation set, and ideally 100 - 200. The best evaluation sets will grow over time to contain 1,000s of questions.

Expand Down
6 changes: 4 additions & 2 deletions genai_cookbook/nbs/4-evaluation-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ With an evaluation set, you are able to measure the performance of your RAG appl

- **Retrieval quality**: Retrieval metrics assess how successfully your RAG application retrieves relevant supporting data. Precision and recall are two key retrieval metrics.
- **Response quality**: Response quality metrics assess how well the RAG application responds to a user's request. Response metrics can measure, for instance, if the resulting answer is accurate per the ground-truth, how well-grounded the response was given the retrieved context (e.g., did the LLM hallucinate), or how safe the response was (e.g., no toxicity).
- **Cost & latency:** Chain metrics capture the overall cost and performance of RAG applications. Overall latency and token consumption are examples of chain performance metrics.
- **System performance (cost & latency):** Metrics capture the overall cost and performance of RAG applications. Overall latency and token consumption are examples of chain performance metrics.

It is very important to collect both response and retrieval metrics. A RAG application can respond poorly in spite of retrieving the correct context; it can also provide good responses on the basis of faulty retrievals. Only by measuring both components can we accurately diagnose and address issues in the application.

Expand All @@ -13,7 +13,9 @@ There are two key approaches to measuring performance across these metrics:
- **Deterministic measurement:** Cost and latency metrics can be computed deterministically based on the application's outputs. If your evaluation set includes a list of documents that contain the answer to a question, a subset of the retrieval metrics can also be computed deterministically.
- **LLM judge based measurement** In this approach, a separate [LLM acts as a judge](https://arxiv.org/abs/2306.05685) to evaluate the quality of the RAG application's retrieval and responses. Some LLM judges, such as answer correctness, compare the human-labeled ground truth vs. the app's outputs. Other LLM judges, such as groundedness, do not require human-labeled ground truth to assess their app's outputs.

Take time to ensure that the LLM judge's evaluations align with the RAG application's success criteria.
```{important}
For an LLM judge to be effective, it must be tuned to understand the use case. Doing so requires careful attention to understand where the judge does and does not work well, and then tuning the judge to improve it for the failure cases.
```

> [Mosaic AI Quality Lab](https://docs.databricks.com/generative-ai/agent-evaluation/index.html) provides an out-of-the-box implementation, using hosted LLM judge models, for each metric discussed on this page. Quality Lab's documentation discusses the [details](https://docs.databricks.com/generative-ai/agent-evaluation/llm-judge-metrics.html) of how these metrics and judges are implemented and provides [capabilities](https://docs.databricks.com/generative-ai/agent-evaluation/advanced-agent-eval.html#provide-examples-to-the-built-in-llm-judges) to tune the judge's with your data to increase their accuracy
Expand Down
Loading

0 comments on commit 08544cb

Please sign in to comment.