Inconsistent handling of whitespace tokens in Scorer.score_token_attr #13739

nrodnova · 2025-01-31T16:44:50Z

I've been training tagger and parser on heavily augmented data and was surprised by poor performance (comparing to what I calculated manually on the test dataset). I narrowed it down to Scorer.score_token_attr.

How to reproduce the behaviour

Sorry, I don't have a code example, but it's pretty straight-forward. In this function, in the gold dataset, all tokens (except for those with a missing attribute) are included in evaluation. However, in the predicted dataset, whitespace tokens are excluded. If the gold dataset contains whitespace tokens (which is true in my case), we are not comparing apples to apples here and inflate the error rate.

I just created my own scorer for now, but this behavior is kind of unexpected.

Let me know if you want me to change the behavior, and I will do a PR. My own fix to the function was to add exclude_spaces parameter, defaulting to the current behavior (i.e. True), and either include or exclude whitespace tokens in both datasets.

# line 240 of scorer.py
for gold_i, token in enumerate(gold_doc):
        value = getter(token, attr)
        if value not in missing_values:
            gold_tags.add((gold_i, getter(token, attr)))
        else:
            missing_indices.add(gold_i)
    pred_tags = set()
    for token in pred_doc:
        if token.orth_.isspace(): # HERE: excluding whitespace tokens
            continue
        if align.x2y.lengths[token.i] == 1:
            gold_i = align.x2y[token.i][0]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent handling of whitespace tokens in Scorer.score_token_attr #13739

Inconsistent handling of whitespace tokens in Scorer.score_token_attr #13739

nrodnova commented Jan 31, 2025

Inconsistent handling of whitespace tokens in Scorer.score_token_attr #13739

Inconsistent handling of whitespace tokens in Scorer.score_token_attr #13739

Comments

nrodnova commented Jan 31, 2025

How to reproduce the behaviour