The Stanford Question Answering Dataset

What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With over 100k question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

What does SQuAD look like?

You can browse through v1.1 of the development and training sets.

How do I download the dataset?

You can download v1.1 of the training and development sets.

We also have a hidden test set. To evaluate your models on the test set, please get in contact with us.

What is the best model performance?

Our best model (detailed in our paper) achieves an F1 score of 51.0%. We expect future models to close the gap to the human performance of 86.8% . Note that these results are on v1.0 of the dataset.

ModelDev F1Test F1
Human Performance90.5%86.8%
Rajpurkar et al. '16 Logistic Regression51.0% 51.0%
Beat our model? Contact us to get on the leaderboard.??

Have Questions?

Because SQuAD is an ongoing effort, we expect the dataset to evolve.

To keep up to date with major changes to the dataset, please subscribe:

The dataset is distributed under the CC BY-SA 4.0 license.

Ask us questions through google groups or at pranavsr@stanford.edu.

Star on Github