Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With over 100k question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.
You can browse through v1.1 of the development and training sets.
You can download v1.1 of the training and development sets.
We also have a hidden test set. To evaluate your models on the test set, please get in contact with us.
Our best model (detailed in our paper) achieves an F1 score of 51.0%. We expect future models to close the gap to the human performance of 86.8% . Note that these results are on v1.0 of the dataset.
|Model||Dev F1||Test F1|
|Rajpurkar et al. '16 Logistic Regression||51.0%||51.0%|
|Beat our model? Contact us to get on the leaderboard.||?||?|
Because SQuAD is an ongoing effort, we expect the dataset to evolve.
To keep up to date with major changes to the dataset, please subscribe:
The dataset is distributed under the CC BY-SA 4.0 license.