Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 1.56 KB

README.md

File metadata and controls

22 lines (13 loc) · 1.56 KB

Google QUEST Q&A Labeling

Improving automated understanding of complex question answer content

In order to run the code install 'A lightweight python library that helps to keep track of numerical experiments'.
You can find competition data here.

Example of default bert-base training command from master branch:

run.py --epochs=5 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=1 --batch_size=8 --warmup=300 --lr=1e-5 --bert_model=bert-base-uncased

Example of BART training command from bart branch:

run.py --epochs=4 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=4 --batch_size=2 --warmup=250 --lr=2e-5 --bert_model=./bart.large

After you've added a pseudo labels set (we used a 100k subset from archive):

run.py --epochs=4 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=4 --batch_size=2 --warmup=250 --lr=2e-5 --bert_model=./bart.large --pseudo_file ../input/leak-free-pseudo-100k/pseudo-100k-4x-blend-no-leak-fold-{}.csv.gz --split_pseudo --leak_free_pseudo

In monty branch you can find code for LM pretraining on stackexchange data

Read our solution and explanation here.
To be done.