-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
empty lines #1
Comments
I had the same question too when I tried to play around the dataset. The empty lines just means that they are not annotated by the STS organizers. Not all sentence pairs are used in the evaluation. Daniel Cer explains on https://groups.google.com/d/msg/sts-semeval/js-Y0e92YuM/jJUi5beJBwAJ |
To slurp the STS data into a import sframe
# Reads STS2012-2015 dataset.
sts_train = sframe.SFrame.read_csv('sts.csv', delimiter='\t', column_type_hints=[str, str, float, str, str], quote_char='\0')
# Throw the sentence pairs with empty annotations.
sts_train = sts_train.dropna(columns=['Score']) Take a look at https://github.com/alvations/stasis/blob/master/notebooks/SWORD.ipynb and https://github.com/alvations/stasis/blob/master/notebooks/SHIELD.ipynb for more details =) |
There is a lot of empty lines in the gs files - /STS2015-gold/STS.gs.headlines.txt for example.
is it means something? or just the label is missing?
The text was updated successfully, but these errors were encountered: