SQuAD

The Stanford Question Answering Dataset

What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

Explore SQuAD and model predictionsRead the paper (Rajpurkar et. al, 2016)

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate-v1.1.py <path_to_dev-v1.1> <path_to_predictions>.

Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:

Submission Tutorial

Because SQuAD is an ongoing effort, we expect the dataset to evolve.

To keep up to date with major changes to the dataset, please subscribe:

Have Questions?

Ask us questions at our google group or at pranavsr@stanford.edu.

Star

Leaderboard

Since the release of our dataset, the community has made rapid progress! Here are the ExactMatch (EM) and F1 scores of the best models evaluated on the test and development sets of v1.1. Will your model outperform humans on the QA task?

RankModelEMF1
1 r-net (ensemble)

Microsoft Research Asia

75.86382.947
2 ReasoNet (ensemble)

MSR Redmond

73.41981.752
2 Multi-Perspective Matching (diversity-ensemble)

IBM Research

https://arxiv.org/abs/1612.04211
73.76581.257
3 BiDAF (ensemble)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
73.31481.089
4 Dynamic Coattention Networks (ensemble)

Salesforce Research

https://arxiv.org/abs/1611.01604
71.62580.383
5 r-net (single model)

Microsoft Research Asia

71.25879.66
6 Document Reader (single model)

Facebook AI Research

69.96778.974
7 ReasoNet (single model)

MSR Redmond

69.10778.895
7 FastQAExt

German Research Center for Artificial Intelligence

70.84978.857
8 Multi-Perspective Matching (single model)

IBM Research

https://arxiv.org/abs/1612.04211
68.87777.771
9 jNet (single model)

USTC & National Research Council Canada & York University

68.7377.393
10 BiDAF (single model)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
67.97477.323
10 FastQA

German Research Center for Artificial Intelligence

68.43677.07
11 Match-LSTM with Ans-Ptr (Boundary+Ensemble)

Singapore Management University

https://arxiv.org/abs/1608.07905
67.90177.022
12 Iterative Co-attention Network

Fudan University

67.50276.786
13 Dynamic Coattention Networks (single model)

Salesforce Research

https://arxiv.org/abs/1611.01604
66.23375.896
13 RaSoR

Google NY, Tel-Aviv University

https://arxiv.org/abs/1611.01436
67.38775.543
14 Match-LSTM with Bi-Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
64.74473.743
15 Attentive CNN context with LSTM

NLPR, CASIA

63.30673.463
16 Fine-Grained Gating

Carnegie Mellon University

https://arxiv.org/abs/1611.01724
62.44673.327
16 Dynamic Chunk Reader

IBM

https://arxiv.org/abs/1610.09996
62.49970.956
17 Match-LSTM with Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
60.47470.695
18 Match-LSTM with Ans-Ptr (Sentence)

Singapore Management University

https://arxiv.org/abs/1608.07905
54.50567.748
Human Performance

Stanford University

(Rajpurkar et al. '16)
82.30491.221