-
Notifications
You must be signed in to change notification settings - Fork 258
Add GLUE datasets #26
Comments
Hi, I am a Belgian student in computer engineering, I am following an introduction course about open source. One of my goal this semester is to make a contribution to a project. My master thesis will be related to NLP, this is why this project interest me. Is there a way I could help fixing this issue? (or maybe another issue related to this project) |
Hi There! Yeah, please fix this issue! GLUE datasets are a popular suite of datasets for evaluating NLP models. It'd be nice if there was support for those datasets. This issue should be an easy one to get started with. Recently, I was at Belgium for EMNLP 2018. One of the best NLP conferences in the world. |
Hey, so bad I missed the EMNLP! This is the first year I work on NLP, and I had never heard about those conferences, I hope I'll be able to go there next year. |
Yeah that'd work! |
Hi, On the first line of each downloaded file, we can find the names of the different features of the tsv file. In the 'train.tsv' file of SNLI for example, there should be 11 features per line. There are however a lot of lines (38.656 in total) where there are more than 10 tabs, so more than 11 features .... For the moment I decided not to add those lines in the Dataset object, but I know this is not what should be done. I've looked on the internet to find a meaning to those lines, but there is not a lot of documentation about QQP and SNLI. So do you maybe know what I should do? Or should I add my file to the project, and create a new issue? Someone that has already worked with those datasets should be able to fix it easily. Thanks. |
Thanks for your attempt at contributing this function: #60 :) |
Hey! I want to give this a try. Is there any way that I can do it still? It seems like it's too late to contribute to this project. |
GLUE datasets are standard for evaluating NLU tasks.
The text was updated successfully, but these errors were encountered: