Fine-Tuning PromptNode for question generation #6511

demongolem-biz2 · 2023-12-07T17:04:43Z

demongolem-biz2
Dec 7, 2023

So I have code to fine-tune google/flan-t5. I just have to define my dataset now to fine-tune for the purpose of question generation.

The point I want to make 100% clear before I do a fine-tuning run which may takes hours is what are the interfaces for this dataset I am to supply? The normal question generation task is inputting a collection of Document objects and outputting one or more questions for each Document based upon parameters.

My intuition tells me my dataset must have one column for content (the text of the Document) and then have one column for a question, the gold output. And perhaps the content column I propose is an instruction which says how to do things concatenated with the document's content. But what would I label these columns in the dataset so that PromptNode can work with them? I see an example (outside of Haystack) for fine-tuning for the samsum dataset, a summarization problem. There, there is a conversation input and a summarization output and I get a sense that the column names are important in whatever dataset is used.

julian-risch · 2023-12-07T17:55:24Z

julian-risch
Dec 7, 2023
Maintainer

Hello @demongolem-biz2 I suggest that you have a look at existing training datasets for the question generation task. For example, you can find an overview here: https://link.springer.com/article/10.1007/s13748-023-00295-9#Sec18 and a table here: https://link.springer.com/article/10.1007/s13748-023-00295-9/tables/9
A typical training data sample is a triple of context (text of the document), prompt (instructions like how hard the question should be etc.), and then the question

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-Tuning PromptNode for question generation #6511

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Fine-Tuning PromptNode for question generation #6511

demongolem-biz2 Dec 7, 2023

Replies: 1 comment

julian-risch Dec 7, 2023 Maintainer

demongolem-biz2
Dec 7, 2023

julian-risch
Dec 7, 2023
Maintainer