docs: Add Self-Improving Math Reasoning Data Distillation (#1556)

camel-ai · Feb 5, 2025 · f085cba · f085cba
1 parent 28bc314
commit f085cba
Show file tree

Hide file tree

Showing 6 changed files with 4,944 additions and 186 deletions.
diff --git a/README.md b/README.md
@@ -281,7 +281,7 @@ Practical guides and tutorials for implementing specific functionalities in CAME
 | **[Agentic Data Generation, Evaluation & Filtering with Reward Models](https://docs.camel-ai.org/cookbooks/data_generation/synthetic_dataevaluation&filter_with_reward_model.html)** | Discover methods for generating, evaluating, and filtering agentic data using reward models to enhance the quality and efficiency of your synthetic data pipelines. |
 | **[Data Model Generation and Structured Output with Qwen Model](https://docs.camel-ai.org/cookbooks/data_generation/data_model_generation_and_structured_output_with_qwen.html)** |Learn how to generate data models and structured outputs using the Qwen Model for improved data representation.|
 | **[Distill Math Reasoning Data from DeepSeek R1](https://docs.camel-ai.org/cookbooks/data_generation/distill_math_reasoning_data_from_deepseek_r1.html)** |Learn  how to set up and leverage CAMEL's data distillation pipline for distilling high-quality maths reasoning data with thought process (Long CoT data)from deepseek R1, and uploading the results to Hugging Face.|
-
+| **[Self-Improving Math Reasoning Data Distillation from DeepSeek R1](https://docs.camel-ai.org/cookbooks/data_generation/self_improving_math_reasoning_data_distillation_from_deepSeek_r1.html)** |Learn how to set up and leverage CAMEL's data distillation pipline for self-improving math reasoning data distillation from deepseek R1, and uploading the results to Hugging Face.|
 
 ### Multi-Agent Systems & Applications
 | Cookbook | Description |
@@ -332,6 +332,8 @@ We implemented amazing research ideas from other works for you to build, compare
 
 - `Source2Synth` from *Alisia Lupidi et al.*: [Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources](https://arxiv.org/abs/2409.08239). [[Example](https://github.com/camel-ai/camel/blob/master/examples/datagen/source2synth.py)]
 
+- `STaR` from *Eric Zelikman et al.*: [STaR: Bootstrapping Reasoning With Reasoning](https://arxiv.org/abs/2203.14465). [[Example](https://github.com/camel-ai/camel/blob/master/examples/datagen/star)]
+
 ## Other Research Works Based on Camel
 - [Agent Trust](http://agent-trust.camel-ai.org/): Can Large Language Model Agents Simulate Human Trust Behavior?
 

diff --git a/docs/cookbooks/data_generation/cot_data_gen_sft_qwen_unsolth_upload_huggingface.ipynb b/docs/cookbooks/data_generation/cot_data_gen_sft_qwen_unsolth_upload_huggingface.ipynb
diff --git a/docs/cookbooks/data_generation/distill_math_reasoning_data_from_deepseek_r1.ipynb b/docs/cookbooks/data_generation/distill_math_reasoning_data_from_deepseek_r1.ipynb
diff --git a/docs/cookbooks/data_generation/index.rst b/docs/cookbooks/data_generation/index.rst
@@ -1,4 +1,4 @@
-Model Training and Fine-tuning
+Agentic Data Generation
 =============================
 
 .. raw:: html
@@ -20,3 +20,4 @@ Model Training and Fine-tuning
    synthetic_dataevaluation&filter_with_reward_model
    data_model_generation_and_structured_output_with_qwen
    distill_math_reasoning_data_from_deepseek_r1
+   self_improving_math_reasoning_data_distillation_from_deepSeek_r1
diff --git a/...ks/data_generation/self_improving_math_reasoning_data_distillation_from_deepSeek_r1.ipynb b/...ks/data_generation/self_improving_math_reasoning_data_distillation_from_deepSeek_r1.ipynb
diff --git a/docs/cookbooks/data_generation/synthetic_dataevaluation&filter_with_reward_model.ipynb b/docs/cookbooks/data_generation/synthetic_dataevaluation&filter_with_reward_model.ipynb
@@ -171,7 +171,7 @@
         "id": "hMJJHNH02m2T"
       },
       "source": [
-        "# 🚀 Data Generation\n",
+        "## 🚀 Data Generation\n",
         "Next, we define our data generation function. It takes a source content and generates a list of instruction-input-response triplets based on it.\n",
         "\n",
         "Later, we will use a reward model to filter this list."
@@ -254,7 +254,7 @@
         "id": "YPh1UxQB2m2U"
       },
       "source": [
-        "# 📊 Point to content and generate data!\n",
+        "## 📊 Point to content and generate data!\n",
         "Now we point to the content that we wish to generate SFT data around and use CAMEL's Firecrawl integration to get this content in a nice markdown format.\n"
       ]
     },
@@ -339,7 +339,7 @@
         "id": "5Nr-DX502m2V"
       },
       "source": [
-        "# 🔄 Code for Conversion to Reward Model Format\n",
+        "## 🔄 Code for Conversion to Reward Model Format\n",
         "Next, we transform the Alpaca-style entries into a format compatible with the reward model. Each entry will be converted into a structured list of instruction-input-response pairs that the reward model can evaluate."
       ]
     },
@@ -503,7 +503,7 @@
         "id": "9kbfgqwY2m2V"
       },
       "source": [
-        " # 🎯Filtering the Generated Data Using the Reward Model\n",
+        "## 🎯Filtering the Generated Data Using the Reward Model\n",
         " Finally, we utilize NVIDIA's Nemotron Reward Model to filter out low-quality instruction-input-response triplets. The model evaluates each response based on defined thresholds for metrics such as helpfulness and correctness.\n",
         "\n",
         " Let's use thresholds = {\"helpfulness\": 2.5, \"correctness\": 2.5} as an example of filter parameters. After filtering, some high-quality triplets are retained."