Why Machine Reading Comprehension Models Learn Shortcuts?

Repo for 'Why Machine Reading Comprehension Models Learn Shortcuts?', Findings of ACL 2021

Arxiv Preprint: https://arxiv.org/abs/2106.01024

Data

The synthetic datasets proposed in our paper are in the dataset.zip file.

Size

	Train (Pairs)	Dev (Pairs)
Question Word Matching	6306	766
Simple Matching	7562	952

Format

The format of each dataset is as follows (same as the format of SQuAD). The version of the question (challenging or shortcut) is denoted in the key qtype.

{
  "version": "simple_matching_dev_challenging",  		// dataset version
  "data": [
    {
      "paragraphs": [
        {
          "cid": "squad1.1-dev-1_h", 				// context id
          "context": "As this was the 50th Super Bowl...",
          "qas": [
            {
              "id": "56be8e613aeaaa14008c90d2_h", // question id
              "question": "What day was the game played on?",
              "answers": [
                {
                  "text": "February 7, 2016",
                  "answer_start": 505 		// the start position of the answer in the context
                }
              ],
              "qtype": "challenging" 			// question type (challenging/shortcut)
            }
          ]
        }
      ]
    }
  ]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Why Machine Reading Comprehension Models Learn Shortcuts?

Data

Size

Format

Files

README.md

Latest commit

History

README.md

File metadata and controls

Why Machine Reading Comprehension Models Learn Shortcuts?

Data

Size

Format