You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been training tagger and parser on heavily augmented data and was surprised by poor performance (comparing to what I calculated manually on the test dataset). I narrowed it down to Scorer.score_token_attr.
How to reproduce the behaviour
Sorry, I don't have a code example, but it's pretty straight-forward. In this function, in the gold dataset, all tokens (except for those with a missing attribute) are included in evaluation. However, in the predicted dataset, whitespace tokens are excluded. If the gold dataset contains whitespace tokens (which is true in my case), we are not comparing apples to apples here and inflate the error rate.
I just created my own scorer for now, but this behavior is kind of unexpected.
Let me know if you want me to change the behavior, and I will do a PR. My own fix to the function was to add exclude_spaces parameter, defaulting to the current behavior (i.e. True), and either include or exclude whitespace tokens in both datasets.
# line 240 of scorer.py
for gold_i, token in enumerate(gold_doc):
value = getter(token, attr)
if value not in missing_values:
gold_tags.add((gold_i, getter(token, attr)))
else:
missing_indices.add(gold_i)
pred_tags = set()
for token in pred_doc:
if token.orth_.isspace(): # HERE: excluding whitespace tokens
continue
if align.x2y.lengths[token.i] == 1:
gold_i = align.x2y[token.i][0]
The text was updated successfully, but these errors were encountered:
I've been training tagger and parser on heavily augmented data and was surprised by poor performance (comparing to what I calculated manually on the test dataset). I narrowed it down to
Scorer.score_token_attr
.How to reproduce the behaviour
Sorry, I don't have a code example, but it's pretty straight-forward. In this function, in the gold dataset, all tokens (except for those with a missing attribute) are included in evaluation. However, in the predicted dataset, whitespace tokens are excluded. If the gold dataset contains whitespace tokens (which is true in my case), we are not comparing apples to apples here and inflate the error rate.
I just created my own scorer for now, but this behavior is kind of unexpected.
Let me know if you want me to change the behavior, and I will do a PR. My own fix to the function was to add
exclude_spaces
parameter, defaulting to the current behavior (i.e.True
), and either include or exclude whitespace tokens in both datasets.The text was updated successfully, but these errors were encountered: