updated New-EVAL folder and evaluation notebook

Nkluge-correa · Apr 23, 2024 · 6b0b3d7 · 6b0b3d7
1 parent 0d4e15f
commit 6b0b3d7
Show file tree

Hide file tree

Showing 13 changed files with 2 additions and 762 deletions.
diff --git a/Evaluation/New-EVAL/README.md b/Evaluation/New-EVAL/README.md
@@ -16,7 +16,7 @@ We performed the following evaluations using a [Portuguese implementation of the
 
 - [HateBR](https://arxiv.org/abs/2103.14972) (25-shot) - HateBR is the first large-scale expert annotated dataset of Brazilian Instagram comments for abusive language detection on the web and social media. The HateBR was collected from politicians' Brazilian Instagram comments and manually annotated by specialists. It comprises 7,000 documents annotated with a binary classification (offensive versus non-offensive comments). - Data sources:  [[1]](https://huggingface.co/datasets/eduagarcia/portuguese_benchmark),  [[2]](https://github.com/franciellevargas/HateBR),  [[3]](https://huggingface.co/datasets/ruanchaves/hatebr). Metric - F1-macro.
 
-The notebook used to run these evaluations is the [`lm-evaluation-harness-pt-br.ipynb`](./lm-evaluation-harness-pt-br.ipynb). Available on Colab. Full results are stored in the [results folder](./results/).
+The notebook used to run these evaluations is the [`lm-evaluation-harness-pt-br.ipynb`](./lm-evaluation-harness-pt-br.ipynb). Available on Colab.
 
 <a href="https://colab.research.google.com/drive/1m6Oqey4P9ShYTO62yRq7wrM_eEsvFJ9D" target="_blank">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">

diff --git a/Evaluation/New-EVAL/lm-evaluation-harness-pt-br.ipynb b/Evaluation/New-EVAL/lm-evaluation-harness-pt-br.ipynb
@@ -50,8 +50,7 @@
         "!cd lm-evaluation-harness-pt && python lm_eval \\\n",
         "    --model huggingface \\\n",
         "    --model_args pretrained=\"nicholasKluge/TeenyTinyLlama-160m\",revision=\"main\" \\\n",
-        "    --tasks \"assin2_rte,assin2_sts,bluex,enem_challenge,faquad_nli,hatebr_offensive,oab_exams\" \\\n",
-        "    --num_fewshot \"15,15,3,3,15,25,3\" \\\n",
+        "    --tasks \"assin2_rte,assin2_sts,bluex,enem_challenge,faquad_nli,hatebr_offensive,oab_exams,portuguese_hate_speech,tweetsentbr\" \\\n",
         "    --batch_size \"auto\"\n",
         "    --device cuda:0 \\\n",
         "    --output_path \"./\""

diff --git a/Evaluation/New-EVAL/results/Bloom-560m.md b/Evaluation/New-EVAL/results/Bloom-560m.md
diff --git a/Evaluation/New-EVAL/results/GPT-2.md b/Evaluation/New-EVAL/results/GPT-2.md