SQuAD

The Stanford Question Answering Dataset

What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

Explore SQuAD and model predictionsRead the paper (Rajpurkar et al. '16)

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate-v1.1.py <path_to_dev-v1.1> <path_to_predictions>.

Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:

Submission Tutorial

Because SQuAD is an ongoing effort, we expect the dataset to evolve.

To keep up to date with major changes to the dataset, please subscribe:

Have Questions?

Ask us questions at our google group or at pranavsr@stanford.edu.

Star

Leaderboard

Since the release of our dataset, the community has made rapid progress! Here are the ExactMatch (EM) and F1 scores of the best models evaluated on the test and development sets of v1.1. Will your model outperform humans on the QA task?

RankModelEMF1

1

Mar 2017
r-net (ensemble)

Microsoft Research Asia

http://aka.ms/rnet
76.92284.006

2

Jun 2017
Interactive AoA Reader (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

76.49283.745

3

May 2017
MEMEN (ensemble)

Eigen Technology & Zhejiang University

75.37082.658

4

Mar 2017
ReasoNet (ensemble)

MSR Redmond

https://arxiv.org/abs/1609.05284
75.03482.552

5

May 2017
r-net (single model)

Microsoft Research Asia

http://aka.ms/rnet
74.61482.458

6

May 2017
Mnemonic Reader (ensemble)

NUDT & Fudan University

https://arxiv.org/abs/1705.02798
73.75481.863

7

Apr 2017
SEDT+BiDAF (ensemble)

CMU

https://arxiv.org/abs/1703.00572
73.72381.530

7

Feb 2017
BiDAF (ensemble)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
73.74481.525

8

May 2017
jNet (ensemble)

USTC & National Research Council Canada & York University

https://arxiv.org/abs/1703.04617
73.01081.517

8

Jan 2017
Multi-Perspective Matching (ensemble)

IBM Research

https://arxiv.org/abs/1612.04211
73.76581.257

9

Apr 2017
T-gating (ensemble)

Peking University

72.75881.001

10

Nov 2016
Dynamic Coattention Networks (ensemble)

Salesforce Research

https://arxiv.org/abs/1611.01604
71.62580.383

10

Jun 2017
Interactive AoA Reader (single model)

Joint Laboratory of HIT and iFLYTEK Research

71.77280.303

10

Apr 2017
QFASE

NUS

71.89879.989

11

Apr 2017
Interactive AoA Reader (single model)

Joint Laboratory of HIT and iFLYTEK Research

71.15379.937

12

Mar 2017
jNet (single model)

USTC & National Research Council Canada & York University

https://arxiv.org/abs/1703.04617
70.60779.821

13

Jun 2017
smarnet (single model)

Eigen Technology & Zhejiang University

70.56579.549

13

Apr 2017
Ruminating Reader (single model)

New York University

https://arxiv.org/abs/1704.07415
70.63979.456

14

Mar 2017
ReasoNet (single model)

MSR Redmond

https://arxiv.org/abs/1609.05284
70.55579.364

14

Mar 2017
Document Reader (single model)

Facebook AI Research

https://arxiv.org/abs/1704.00051
70.73379.353

15

Apr 2017
Mnemonic Reader (single model)

NUDT & Fudan University

https://arxiv.org/abs/1705.02798
69.86379.207

15

Dec 2016
FastQAExt

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
70.84978.857

16

Apr 2017
Multi-Perspective Matching (single model)

IBM Research

https://arxiv.org/abs/1612.04211
70.38778.784

16

May 2017
RaSoR (single model)

Google NY, Tel-Aviv University

https://arxiv.org/abs/1611.01436
70.84978.741

17

Apr 2017
SEDT+BiDAF (single model)

CMU

https://arxiv.org/abs/1703.00572
68.47877.971

18

Jun 2017
PQMN (single model)

KAIST & AIBrain & Crosscert

68.33177.783

19

Apr 2017
T-gating (single model)

Peking University

68.13277.569

20

Nov 2016
BiDAF (single model)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
67.97477.323

20

Dec 2016
FastQA

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
68.43677.070

21

Oct 2016
Match-LSTM with Ans-Ptr (Boundary) (ensemble)

Singapore Management University

https://arxiv.org/abs/1608.07905
67.90177.022

22

Feb 2017
Iterative Co-attention Network

Fudan University

67.50276.786

23

Nov 2016
Dynamic Coattention Networks (single model)

Salesforce Research

https://arxiv.org/abs/1611.01604
66.23375.896

24

Oct 2016
Match-LSTM with Bi-Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
64.74473.743

25

Feb 2017
Attentive CNN context with LSTM

NLPR, CASIA

63.30673.463

26

Nov 2016
Fine-Grained Gating

Carnegie Mellon University

https://arxiv.org/abs/1611.01724
62.44673.327

26

Sep 2016
Dynamic Chunk Reader

IBM

https://arxiv.org/abs/1610.09996
62.49970.956

27

Aug 2016
Match-LSTM with Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
60.47470.695

28

Aug 2016
Match-LSTM with Ans-Ptr (Sentence)

Singapore Management University

https://arxiv.org/abs/1608.07905
54.50567.748
Human Performance

Stanford University

(Rajpurkar et al. '16)
82.30491.221