SQuAD

The Stanford Question Answering Dataset

What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

Explore SQuAD and model predictionsRead the paper (Rajpurkar et al. '16)

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate-v1.1.py <path_to_dev-v1.1> <path_to_predictions>.

Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:

Submission Tutorial

Because SQuAD is an ongoing effort, we expect the dataset to evolve.

To keep up to date with major changes to the dataset, please subscribe:

Have Questions?

Ask us questions at our google group or at pranavsr@stanford.edu.

Star

Leaderboard

Since the release of our dataset, the community has made rapid progress! Here are the ExactMatch (EM) and F1 scores of the best models evaluated on the test and development sets of v1.1. Will your model outperform humans on the QA task?

RankModelEMF1

1

14 days ago
r-net (ensemble)

Microsoft Research Asia

76.92284.006

2

17 days ago
ReasoNet (ensemble)

MSR Redmond

75.03482.552

3

a month ago
BiDAF (ensemble)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
73.74481.525

3

2 months ago
Multi-Perspective Matching (diversity-ensemble)

IBM Research

https://arxiv.org/abs/1612.04211
73.76581.257

4

16 days ago
r-net (single model)

Microsoft Research Asia

72.40180.751

5

5 months ago
Dynamic Coattention Networks (ensemble)

Salesforce Research

https://arxiv.org/abs/1611.01604
71.62580.383

6

15 hours ago
jNet (single model)

USTC & National Research Council Canada & York University

https://arxiv.org/abs/1703.04617
70.60779.821

7

10 days ago
Ruminate Reader (single model)

New York University

70.58679.492

8

18 days ago
ReasoNet (single model)

MSR Redmond

70.55579.364

8

12 days ago
Document Reader (single model)

Facebook AI Research

70.73379.353

8

3 months ago
FastQAExt

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
70.84978.857

9

2 months ago
Multi-Perspective Matching (single model)

IBM Research

https://arxiv.org/abs/1612.04211
68.87777.771

9

11 days ago
RaSoR (single model)

Google NY, Tel-Aviv University

https://arxiv.org/abs/1611.01436
69.64277.696

10

4 months ago
BiDAF (single model)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
67.97477.323

10

3 months ago
FastQA

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
68.43677.07

11

5 months ago
Match-LSTM with Ans-Ptr (Boundary+Ensemble)

Singapore Management University

https://arxiv.org/abs/1608.07905
67.90177.022

12

2 months ago
Iterative Co-attention Network

Fudan University

67.50276.786

13

5 months ago
Dynamic Coattention Networks (single model)

Salesforce Research

https://arxiv.org/abs/1611.01604
66.23375.896

14

5 months ago
Match-LSTM with Bi-Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
64.74473.743

15

a month ago
Attentive CNN context with LSTM

NLPR, CASIA

63.30673.463

16

5 months ago
Fine-Grained Gating

Carnegie Mellon University

https://arxiv.org/abs/1611.01724
62.44673.327

16

6 months ago
Dynamic Chunk Reader

IBM

https://arxiv.org/abs/1610.09996
62.49970.956

17

7 months ago
Match-LSTM with Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
60.47470.695

18

7 months ago
Match-LSTM with Ans-Ptr (Sentence)

Singapore Management University

https://arxiv.org/abs/1608.07905
54.50567.748
Human Performance

Stanford University

(Rajpurkar et al. '16)
82.30491.221