Cannot reproduce the performance of deepset/roberta-base-squad2 on squad2 due to no-answer questions #6270

yigexu · 2023-11-10T03:21:17Z

yigexu
Nov 10, 2023

Hi,
I loaded the deepset/roberta-base-squad2 on squad2, and got really poor performance on no-answer questions:

{
"exact": 42.103933294028465,
"f1": 45.67169337842289,
"total": 11873,
"HasAns_exact": 84.2948717948718,
"HasAns_f1": 91.44062339440198,
"HasAns_total": 5928,
"NoAns_exact": 0.0336417157275021,
"NoAns_f1": 0.0336417157275021,
"NoAns_total": 5945
}

Here's what I did:

Load the model from tranformer

model_name = "deepset/roberta-base-squad2"
model = pipeline('question-answering', model=model_name, tokenizer=model_name)

Save predictions to json file

prediction = model({'context': context, 'question': question})
predictions_dict[id] = prediction['answer']

Evaluate using squad2's official evaluation script

I'm quite new to this and am unsure where I might have gone wrong.
I'm thinking whether I should add an extra binary classifier. However, the model card on Hugging Face indicates that deepset/roberta-base-squad2 is already a fine-tuned model, so I assume that simply loading and using it should suffice.
Alternatively, should I also save the prediction scores in step2, pass them to the evaluation script using the --na-prob-file param, and then establish a --na-prob-thresh? If so, what would be the appropriate threshold? I'm trying to determine the threshold that replicates the performance metrics reported on Hugging Face.
I've tried searching for papers, documentation, and issues related to this but haven't found anything conclusive. I feel like I might be missing some basic understanding here. Could anyone offer some guidance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce the performance of deepset/roberta-base-squad2 on squad2 due to no-answer questions #6270

{{title}}

Replies: 0 comments

Select a reply

Cannot reproduce the performance of deepset/roberta-base-squad2 on squad2 due to no-answer questions #6270

yigexu Nov 10, 2023

Replies: 0 comments

yigexu
Nov 10, 2023