You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Translated from Wiki Neutrality Corpus and WikiBias English corpora to Indian languages using IndicTrans
Contains parallel (biased, unbiased sentence pairs)
8 Indian languages: Hindi (hi), Marathi (mr), Bengali (bn), Gujarati (gu), Tamil (ta), Telugu (te) and Kannada (kn).
Overall, the total number of samples for classification are 287.6K and 280.0K for mWikiBias and mWNC
respectively. To reduce training compute, we took a random sample from the overall bias mitigation data, leading to 39.4K and 39.0K paired samples in the mWikiBias and mWNC respectively.
Has the entire dataset ben released?
The text was updated successfully, but these errors were encountered:
Paper: https://arxiv.org/abs/2312.15181
Dataset: https://drive.google.com/file/d/169-yw7fKC-qB_wJ8Cwv8RusN67Uv7R7j/view
Translated from Wiki Neutrality Corpus and WikiBias English corpora to Indian languages using IndicTrans
Contains parallel (biased, unbiased sentence pairs)
8 Indian languages: Hindi (hi), Marathi (mr), Bengali (bn), Gujarati (gu), Tamil (ta), Telugu (te) and Kannada (kn).
Overall, the total number of samples for classification are 287.6K and 280.0K for mWikiBias and mWNC
respectively. To reduce training compute, we took a random sample from the overall bias mitigation data, leading to 39.4K and 39.0K paired samples in the mWikiBias and mWNC respectively.
Has the entire dataset ben released?
The text was updated successfully, but these errors were encountered: