Multilingual Bias Detection and Mitigation for Indian Languages #242

anoopkunchukuttan · 2023-12-31T06:38:43Z

Paper: https://arxiv.org/abs/2312.15181
Dataset: https://drive.google.com/file/d/169-yw7fKC-qB_wJ8Cwv8RusN67Uv7R7j/view

Translated from Wiki Neutrality Corpus and WikiBias English corpora to Indian languages using IndicTrans
Contains parallel (biased, unbiased sentence pairs)

8 Indian languages: Hindi (hi), Marathi (mr), Bengali (bn), Gujarati (gu), Tamil (ta), Telugu (te) and Kannada (kn).

Overall, the total number of samples for classification are 287.6K and 280.0K for mWikiBias and mWNC
respectively. To reduce training compute, we took a random sample from the overall bias mitigation data, leading to 39.4K and 39.0K paired samples in the mWikiBias and mWNC respectively.

Has the entire dataset ben released?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilingual Bias Detection and Mitigation for Indian Languages #242

Multilingual Bias Detection and Mitigation for Indian Languages #242

anoopkunchukuttan commented Dec 31, 2023 •

edited

Loading

Multilingual Bias Detection and Mitigation for Indian Languages #242

Multilingual Bias Detection and Mitigation for Indian Languages #242

Comments

anoopkunchukuttan commented Dec 31, 2023 • edited Loading

anoopkunchukuttan commented Dec 31, 2023 •

edited

Loading