The Stanford Question Answering Dataset

What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With over 100k question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

What does SQuAD look like?

How do I download the dataset?

We also have a hidden test set. To evaluate your models on the test set, please get in contact with us.

What is the best model performance?

Our best model (detailed in our paper) achieves an F1 score of 51.0%. We expect future models to close the gap to the human performance of 86.8% . Note that these results are on v1.0 of the dataset.

ModelDev F1Test F1
Human Performance90.5%86.8%
Rajpurkar et al. '16 Logistic Regression51.0% 51.0%
Have Questions?

Because SQuAD is an ongoing effort, we expect the dataset to evolve.

The dataset is distributed under the CC BY-SA 4.0 license.

Ask us questions through google groups or at pranavsr@stanford.edu.

