You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I was looking into human annotations for USS dataset and I realized different conversations are annotated by different number of annotators. May I know what is the reason and how the number of utterances with specific annotations ratings in the table have been calculated?
The text was updated successfully, but these errors were encountered:
The final score of a utterance is determined by the majority annotation. For example, a data sample with annotations (3, 3, 4) will translate to a score of 3, as 3 is the majority annotation.
We conduct additional labeling on entire conversations when inconsistencies (i.e., unable to determine majority) arise from initial annotators. Thus some data may receive more than 3 annotations.
thanks for the clarification! so in that case let's say for the first conversation from MWOZ that has labels 3,3,2 why did you add another annotator though the majority of the votes was clear?
Thanks for your question! We have added some additional annotations to the data provided by a select group of outlier annotators, i.e., those whose final annotation score distribution seem inconsistent with that of most annotators, such as being more likely to give high scores. We did not remove these outlier annotators due to some degree of subjectivity in the dialogue evaluation.
Hello, I was looking into human annotations for USS dataset and I realized different conversations are annotated by different number of annotators. May I know what is the reason and how the number of utterances with specific annotations ratings in the table have been calculated?
The text was updated successfully, but these errors were encountered: