Skip to content

Commit

Permalink
docs: Add Self-Improving Math Reasoning Data Distillation (#1556)
Browse files Browse the repository at this point in the history
  • Loading branch information
Wendong-Fan authored Feb 5, 2025
1 parent 28bc314 commit f085cba
Show file tree
Hide file tree
Showing 6 changed files with 4,944 additions and 186 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ Practical guides and tutorials for implementing specific functionalities in CAME
| **[Agentic Data Generation, Evaluation & Filtering with Reward Models](https://docs.camel-ai.org/cookbooks/data_generation/synthetic_dataevaluation&filter_with_reward_model.html)** | Discover methods for generating, evaluating, and filtering agentic data using reward models to enhance the quality and efficiency of your synthetic data pipelines. |
| **[Data Model Generation and Structured Output with Qwen Model](https://docs.camel-ai.org/cookbooks/data_generation/data_model_generation_and_structured_output_with_qwen.html)** |Learn how to generate data models and structured outputs using the Qwen Model for improved data representation.|
| **[Distill Math Reasoning Data from DeepSeek R1](https://docs.camel-ai.org/cookbooks/data_generation/distill_math_reasoning_data_from_deepseek_r1.html)** |Learn how to set up and leverage CAMEL's data distillation pipline for distilling high-quality maths reasoning data with thought process (Long CoT data)from deepseek R1, and uploading the results to Hugging Face.|
| **[Self-Improving Math Reasoning Data Distillation from DeepSeek R1](https://docs.camel-ai.org/cookbooks/data_generation/self_improving_math_reasoning_data_distillation_from_deepSeek_r1.html)** |Learn how to set up and leverage CAMEL's data distillation pipline for self-improving math reasoning data distillation from deepseek R1, and uploading the results to Hugging Face.|
### Multi-Agent Systems & Applications
| Cookbook | Description |
Expand Down Expand Up @@ -332,6 +332,8 @@ We implemented amazing research ideas from other works for you to build, compare
- `Source2Synth` from *Alisia Lupidi et al.*: [Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources](https://arxiv.org/abs/2409.08239). [[Example](https://github.com/camel-ai/camel/blob/master/examples/datagen/source2synth.py)]
- `STaR` from *Eric Zelikman et al.*: [STaR: Bootstrapping Reasoning With Reasoning](https://arxiv.org/abs/2203.14465). [[Example](https://github.com/camel-ai/camel/blob/master/examples/datagen/star)]
## Other Research Works Based on Camel
- [Agent Trust](http://agent-trust.camel-ai.org/): Can Large Language Model Agents Simulate Human Trust Behavior?
Expand Down

Large diffs are not rendered by default.

2,048 changes: 1,908 additions & 140 deletions docs/cookbooks/data_generation/distill_math_reasoning_data_from_deepseek_r1.ipynb

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion docs/cookbooks/data_generation/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Model Training and Fine-tuning
Agentic Data Generation
=============================

.. raw:: html
Expand All @@ -20,3 +20,4 @@ Model Training and Fine-tuning
synthetic_dataevaluation&filter_with_reward_model
data_model_generation_and_structured_output_with_qwen
distill_math_reasoning_data_from_deepseek_r1
self_improving_math_reasoning_data_distillation_from_deepSeek_r1

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@
"id": "hMJJHNH02m2T"
},
"source": [
"# 🚀 Data Generation\n",
"## 🚀 Data Generation\n",
"Next, we define our data generation function. It takes a source content and generates a list of instruction-input-response triplets based on it.\n",
"\n",
"Later, we will use a reward model to filter this list."
Expand Down Expand Up @@ -254,7 +254,7 @@
"id": "YPh1UxQB2m2U"
},
"source": [
"# 📊 Point to content and generate data!\n",
"## 📊 Point to content and generate data!\n",
"Now we point to the content that we wish to generate SFT data around and use CAMEL's Firecrawl integration to get this content in a nice markdown format.\n"
]
},
Expand Down Expand Up @@ -339,7 +339,7 @@
"id": "5Nr-DX502m2V"
},
"source": [
"# 🔄 Code for Conversion to Reward Model Format\n",
"## 🔄 Code for Conversion to Reward Model Format\n",
"Next, we transform the Alpaca-style entries into a format compatible with the reward model. Each entry will be converted into a structured list of instruction-input-response pairs that the reward model can evaluate."
]
},
Expand Down Expand Up @@ -503,7 +503,7 @@
"id": "9kbfgqwY2m2V"
},
"source": [
" # 🎯Filtering the Generated Data Using the Reward Model\n",
"## 🎯Filtering the Generated Data Using the Reward Model\n",
" Finally, we utilize NVIDIA's Nemotron Reward Model to filter out low-quality instruction-input-response triplets. The model evaluates each response based on defined thresholds for metrics such as helpfulness and correctness.\n",
"\n",
" Let's use thresholds = {\"helpfulness\": 2.5, \"correctness\": 2.5} as an example of filter parameters. After filtering, some high-quality triplets are retained."
Expand Down

0 comments on commit f085cba

Please sign in to comment.