Human in The Loop: Making the Most Out of Automated Question Answering Annotation #3176

Serhii-Barskyi · 2022-09-06T22:03:40Z

Serhii-Barskyi
Sep 6, 2022

Hello! In this article https://www.deepset.ai/blog/generate-questions-automatically-for-faster-annotation in the section "Human in The Loop: Making the Most Out of Automated Question Answering Annotation" there is a quote "You can then retrain your language model with the manually amended dataset, to make sure the model doesn't repeat the same mistakes."

I want to first get a list of questions to my text using QuestionGenerationPipeline, after that, I correct this list of questions manually (1.leave valuable questions, 2.delete bad questions, 3.add other valuable questions) and add the resulting list of questions to the training dataset and then retrain the model for generating questions valhalla/t5-base-e2e-qg which is listed here https://haystack.deepset.ai/reference/question-generator

What is the best way to train the model valhalla/t5-base-e2e-qg for the your Module question_generator using a list of questions received from the QuestionGenerationPipeline and corrected manually?

Please tell me if I understood correctly:
valhalla/t5-small-e2e-qg model trained on squad_multitask dataset?

Does the squad_multitask dataset contain question-answer pairs?

I have text. The model generated a set of questions for me. I manually corrected the received set of questions and want to retrain the valhalla/t5-small-e2e-qg model.

I have the first list of questions that the model produced and the new list of questions that I got by manually improving the first list. How can I combine these two lists of questions (without answers) and use them as a dataset to retrain the valhalla/t5-small-e2e-qg model?

I didn't find such example in your tutorial https://haystack.deepset.ai/tutorials/question-generation

Please show an example code.

Thanks a lot in advance.

Answered by vblagoje

Sep 8, 2022

Hey @SergiyBarskyy,

If I understood you correctly, you want to adapt/fine-tune the valhalla/t5-small-e2e-qg question generation model on your data?
If so, the author of the model provided the details at https://github.com/patil-suraj/question_generation#fine-tuning
Fine-tuning such a model is out of scope for Haystack. But after you train it, then use it on Haystack 👍

View full answer

vblagoje · 2022-09-08T16:55:00Z

vblagoje
Sep 8, 2022
Maintainer

Hey @SergiyBarskyy,

If I understood you correctly, you want to adapt/fine-tune the valhalla/t5-small-e2e-qg question generation model on your data?
If so, the author of the model provided the details at https://github.com/patil-suraj/question_generation#fine-tuning
Fine-tuning such a model is out of scope for Haystack. But after you train it, then use it on Haystack 👍

5 replies

Serhii-Barskyi Sep 9, 2022
Author

Thanks Vladimir. Yes, fine tuning valhalla/t5-small-e2e-qg will be required. Your question_generator module uses this model.

Automatic annotation for end-to-end question generation tasks assumes that I have generated questions with your question_generator module, but I have to write the answers to these questions manually, and then submit this dataset with question-answer pairs for additional training valhalla/t5-small-e2e -qg

Using your Annotation tool, "bad questions" in the dataset should be accompanied by answers like: "Answer is not given",
"I don`t understand the question". Probably so.

vblagoje Sep 9, 2022
Maintainer

Sergiy, you don't have to answer them manually - use the off-the-shelf bert/roberta model to answer them, and manually check for a speedier turnaround. I didn't understand your point about the annotation tool. Can you please zoom out and explain the issue you are facing?

Serhii-Barskyi Sep 9, 2022
Author

This article recommends automatic question generation. I use end-to-end question generation, no answers.
I want to teach the model to ask qualitative questions.

The model that generates questions in the question_generation module is valhalla/t5-base-e2e-qg

I get a list of questions from your question_generation module: some of these questions are wrong.
I want to teach valhalla/t5-base-e2e-qg to ask the right questions.

To do this, I take a list of all the questions that the question_generation module generated, then select the correct ones from this list and provide answers manually. And I provide all the wrong questions with answers in the form: "Answer is not given",
"I don't understand the question"
After that, instead of a list of generated questions, I get question-answer pairs and use such a dataset for additional training valhalla/t5-base-e2e-qg

I plan to do so.

vblagoje Sep 9, 2022
Maintainer

Yeah, gotcha, the only part I don't understand is why do the answering manually? Select the correct questions and let another model answer them. For example you can use deepset/roberta-base-squad2 See https://paperswithcode.com/sota/question-answering-on-squad-v2 and https://huggingface.co/deepset/roberta-base-squad2 for more details

Serhii-Barskyi Sep 10, 2022
Author

Thank you, Vladimir, for your advice! It's a pleasure to do business with you. While I'm preparing the data, looking ahead a bit, if instead of valhalla/t5-small-e2e-qg and deepset/roberta-base-squad2 in both cases (e-2-e qg and QAgeneration) in the Haystack pipeline, use one model deepset/deberta-v3-large-squad2? Are there any downsides to this approach?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Human in The Loop: Making the Most Out of Automated Question Answering Annotation #3176

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Human in The Loop: Making the Most Out of Automated Question Answering Annotation #3176

Serhii-Barskyi Sep 6, 2022

Replies: 1 comment · 5 replies

vblagoje Sep 8, 2022 Maintainer

Serhii-Barskyi Sep 9, 2022 Author

vblagoje Sep 9, 2022 Maintainer

Serhii-Barskyi Sep 9, 2022 Author

vblagoje Sep 9, 2022 Maintainer

Serhii-Barskyi Sep 10, 2022 Author

Serhii-Barskyi
Sep 6, 2022

Replies: 1 comment 5 replies

vblagoje
Sep 8, 2022
Maintainer

Serhii-Barskyi Sep 9, 2022
Author

vblagoje Sep 9, 2022
Maintainer

Serhii-Barskyi Sep 9, 2022
Author

vblagoje Sep 9, 2022
Maintainer

Serhii-Barskyi Sep 10, 2022
Author