Fine-Tuning PromptNode for question generation #6511
Unanswered
demongolem-biz2
asked this question in
Questions
Replies: 1 comment
-
Hello @demongolem-biz2 I suggest that you have a look at existing training datasets for the question generation task. For example, you can find an overview here: https://link.springer.com/article/10.1007/s13748-023-00295-9#Sec18 and a table here: https://link.springer.com/article/10.1007/s13748-023-00295-9/tables/9 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So I have code to fine-tune google/flan-t5. I just have to define my dataset now to fine-tune for the purpose of question generation.
The point I want to make 100% clear before I do a fine-tuning run which may takes hours is what are the interfaces for this dataset I am to supply? The normal question generation task is inputting a collection of Document objects and outputting one or more questions for each Document based upon parameters.
My intuition tells me my dataset must have one column for content (the text of the Document) and then have one column for a question, the gold output. And perhaps the content column I propose is an instruction which says how to do things concatenated with the document's content. But what would I label these columns in the dataset so that PromptNode can work with them? I see an example (outside of Haystack) for fine-tuning for the samsum dataset, a summarization problem. There, there is a conversation input and a summarization output and I get a sense that the column names are important in whatever dataset is used.
Beta Was this translation helpful? Give feedback.
All reactions