Skip to content

Commit

Permalink
CU-8694w2cmw: Add example of k-fold metrics (#25)
Browse files Browse the repository at this point in the history
  • Loading branch information
mart-r authored Jul 4, 2024
1 parent 2282a67 commit 6b19e9f
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 18 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -13859,10 +13859,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla



<div id="f7f355c5-69e7-4db0-b862-0108bbf9a1a5"></div>
<div id="ceac96c5-9657-4da0-8345-7fa2de57788b"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#f7f355c5-69e7-4db0-b862-0108bbf9a1a5');
var element = $('#ceac96c5-9657-4da0-8345-7fa2de57788b');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "6fd10f1692234019836a7b40e83b56dd", "version_major": 2, "version_minor": 0}
Expand All @@ -13881,10 +13881,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla



<div id="1cd864ed-2f26-414a-9e7b-cca661359203"></div>
<div id="92e06761-c3ed-4ca7-80a8-f1e70848b7f6"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#1cd864ed-2f26-414a-9e7b-cca661359203');
var element = $('#92e06761-c3ed-4ca7-80a8-f1e70848b7f6');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "9a5ab9cfecc242b7aaf0f140e87bdde6", "version_major": 2, "version_minor": 0}
Expand Down Expand Up @@ -13963,10 +13963,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla



<div id="6960f9a5-5688-4564-873d-9adbd34be108"></div>
<div id="894a81fa-daca-4461-b9cf-8c8b2f318695"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#6960f9a5-5688-4564-873d-9adbd34be108');
var element = $('#894a81fa-daca-4461-b9cf-8c8b2f318695');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "434496e448984f55925d22fad0349ada", "version_major": 2, "version_minor": 0}
Expand All @@ -13985,10 +13985,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla



<div id="63d7255b-e667-4bea-af37-72e5372a0883"></div>
<div id="f0c0c808-7ff5-4702-83c5-2526a1f39a68"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#63d7255b-e667-4bea-af37-72e5372a0883');
var element = $('#f0c0c808-7ff5-4702-83c5-2526a1f39a68');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "f7d1803b3c6c4197b6612c5fdf189746", "version_major": 2, "version_minor": 0}
Expand All @@ -14007,10 +14007,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla



<div id="dd55a253-1bc0-4801-a02f-de6f7145ad2f"></div>
<div id="40209f7f-f501-410b-b86c-1dff1f4e15e8"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#dd55a253-1bc0-4801-a02f-de6f7145ad2f');
var element = $('#40209f7f-f501-410b-b86c-1dff1f4e15e8');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "c8d633f579de438a916d9ef3de9d8fe0", "version_major": 2, "version_minor": 0}
Expand All @@ -14029,10 +14029,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla



<div id="8dcdb0a2-c5bc-47ba-b4d8-98752ad7d19c"></div>
<div id="ba9f83fa-677a-4478-977a-84f6680e1016"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#8dcdb0a2-c5bc-47ba-b4d8-98752ad7d19c');
var element = $('#ba9f83fa-677a-4478-977a-84f6680e1016');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "de6c01c6983041e2b972f6008caefaea", "version_major": 2, "version_minor": 0}
Expand All @@ -14051,10 +14051,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla



<div id="276a076a-ba10-46e9-bfe6-4b148d823c15"></div>
<div id="2927e9a1-6999-48b3-bfe3-9cb426add119"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#276a076a-ba10-46e9-bfe6-4b148d823c15');
var element = $('#2927e9a1-6999-48b3-bfe3-9cb426add119');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "05132c907a874fe2a2eb9cb6c81da3b3", "version_major": 2, "version_minor": 0}
Expand Down Expand Up @@ -17502,10 +17502,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla



<div id="0345720f-a2f0-4660-a340-9c6e7ef44710"></div>
<div id="4471f788-129e-42f5-b40b-fe386231b101"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#0345720f-a2f0-4660-a340-9c6e7ef44710');
var element = $('#4471f788-129e-42f5-b40b-fe386231b101');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "00325922360c45009329d82ed6420f16", "version_major": 2, "version_minor": 0}
Expand All @@ -17524,10 +17524,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla



<div id="293c269d-d3e1-472b-8a2d-c983f9bd3529"></div>
<div id="ea307193-bdef-4dd8-970e-5fca075d9c90"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#293c269d-d3e1-472b-8a2d-c983f9bd3529');
var element = $('#ea307193-bdef-4dd8-970e-5fca075d9c90');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "d48e2f4d6dd3467fb3f17e0244b0e361", "version_major": 2, "version_minor": 0}
Expand Down Expand Up @@ -17599,6 +17599,31 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h4 id="K-fold-metrics">K-fold metrics<a class="anchor-link" href="#K-fold-metrics">&#182;</a></h4><p>K-fold cross-validation offers a more robust evaluation of your model's performance by dividing your dataset into k subsets, or folds.
Unlike a single evaluation on the entire dataset (like <code>cat._print_stats</code>), the k-fold approach ensures that every data point is used for both training and validation, thereby reducing the risk of bias and providing a more reliable estimate of the model's generalization capabilities.
This method is particularly beneficial for assessing the fine-tuned performance of your model on specific datasets, as it accounts for variability and offers a comprehensive understanding of how the model might perform on unseen data.</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># you need to import the module to use it</span>
<span class="kn">from</span> <span class="nn">medcat.stats.kfold</span> <span class="kn">import</span> <span class="n">get_k_fold_stats</span>
<span class="n">fps</span><span class="p">,</span> <span class="n">fns</span><span class="p">,</span> <span class="n">tps</span><span class="p">,</span> <span class="n">cui_prec</span><span class="p">,</span> <span class="n">cui_rec</span><span class="p">,</span> <span class="n">cui_f1</span><span class="p">,</span> <span class="n">cui_counts</span><span class="p">,</span> <span class="n">examples</span> <span class="o">=</span> <span class="n">get_k_fold_stats</span><span class="p">(</span><span class="n">cat</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4487,6 +4487,27 @@
"fps, fns, tps, cui_prec, cui_rec, cui_f1, cui_counts, examples = cat._print_stats(data, extra_cui_filter=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### K-fold metrics\n",
"K-fold cross-validation offers a more robust evaluation of your model's performance by dividing your dataset into k subsets, or folds.\n",
"Unlike a single evaluation on the entire dataset (like `cat._print_stats`), the k-fold approach ensures that every data point is used for both training and validation, thereby reducing the risk of bias and providing a more reliable estimate of the model's generalization capabilities.\n",
"This method is particularly beneficial for assessing the fine-tuned performance of your model on specific datasets, as it accounts for variability and offers a comprehensive understanding of how the model might perform on unseen data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# you need to import the module to use it\n",
"from medcat.stats.kfold import get_k_fold_stats\n",
"fps, fns, tps, cui_prec, cui_rec, cui_f1, cui_counts, examples = get_k_fold_stats(cat, data)"
]
},
{
"cell_type": "code",
"execution_count": 13,
Expand Down

0 comments on commit 6b19e9f

Please sign in to comment.