SQuAD

The Stanford Question Answering Dataset

What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

Explore SQuAD and model predictionsRead the paper (Rajpurkar et al. '16)

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate-v1.1.py <path_to_dev-v1.1> <path_to_predictions>.

Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:

Submission Tutorial

Because SQuAD is an ongoing effort, we expect the dataset to evolve.

To keep up to date with major changes to the dataset, please subscribe:

Have Questions?

Ask us questions at our google group or at pranavsr@stanford.edu.

Star

Leaderboard

Since the release of our dataset, the community has made rapid progress! Here are the ExactMatch (EM) and F1 scores of the best models evaluated on the test set of v1.1. Will your model outperform humans on the QA task?

RankModelEMF1
Human Performance

Stanford University

(Rajpurkar et al. '16)
82.30491.221

1

Jan 22, 2018
Hybrid AoA Reader (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

82.48289.281

2

Jan 05, 2018
SLQA+ (ensemble)

Alibaba iDST NLP

82.44088.607

2

Jan 03, 2018
r-net+ (ensemble)

Microsoft Research Asia

82.65088.493

2

Feb 02, 2018
Reinforced Mnemonic Reader (ensemble model)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
82.28388.533

3

Dec 22, 2017
AttentionReader+ (ensemble)

Tencent DPDAC NLP

81.79088.163

3

Dec 17, 2017
r-net (ensemble)

Microsoft Research Asia

http://aka.ms/rnet
82.13688.126

4

Nov 17, 2017
BiDAF + Self Attention + ELMo (ensemble)

Allen Institute for Artificial Intelligence

81.00387.432

4

Feb 12, 2018
Reinforced Mnemonic Reader + A2D (single model)

Microsoft Research Asia & NUDT

80.48987.454

5

Jan 04, 2018
{EAZI} (ensemble)

Yiwise NLP Group

80.43686.912

5

Jan 13, 2018
SLQA+

single model

80.43687.021

5

Jan 22, 2018
Hybrid AoA Reader (single model)

Joint Laboratory of HIT and iFLYTEK Research

80.02787.288

5

Jan 12, 2018
EAZI+ (ensemble)

Yiwise NLP Group

80.42686.912

6

Feb 12, 2018
BiDAF + Self Attention + ELMo + A2D (single model)

Microsoft Research Asia & NUDT

79.99686.711

7

Jan 03, 2018
r-net+ (single model)

Microsoft Research Asia

79.90186.536

8

Dec 05, 2017
SAN (ensemble model)

Microsoft Business AI Solutions Team

https://arxiv.org/pdf/1712.03556.pdf
79.60886.496

8

Jan 29, 2018
Reinforced Mnemonic Reader (single model)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
79.54586.654

9

Dec 28, 2017
SLQA+ (single model)

Alibaba iDST NLP

79.19986.590

10

Oct 17, 2017
Interactive AoA Reader+ (ensemble)

Joint Laboratory of HIT and iFLYTEK

79.08386.450

11

Jan 29, 2018
MAMCN+ (single model)

Samsung Research

78.99986.151

12

Oct 24, 2017
FusionNet (ensemble)

Microsoft Business AI Solutions Team

https://arxiv.org/abs/1711.07341
78.97886.016

13

Oct 22, 2017
DCN+ (ensemble)

Salesforce Research

78.85285.996

14

Nov 03, 2017
BiDAF + Self Attention + ELMo (single model)

Allen Institute for Artificial Intelligence

78.58085.833

15

Nov 30, 2017
SLQA(ensemble)

Alibaba iDST NLP

78.32885.682

15

Jan 02, 2018
Conductor-net (ensemble)

CMU

https://arxiv.org/abs/1710.10504
78.43385.517

16

Jan 29, 2018
test

single

78.08785.348

16

Jan 03, 2018
MEMEN (single model)

Zhejiang University

https://arxiv.org/abs/1707.09098
78.23485.344

17

Jul 25, 2017
Interactive AoA Reader (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

77.84585.297

18

Dec 13, 2017
RaSoR + TR + LM (single model)

Tel-Aviv University

https://arxiv.org/abs/1712.03609
77.58384.163

18

Jan 10, 2018
Unnamed submission by SCV

77.43685.130

19

Dec 06, 2017
AttentionReader+ (single)

Tencent DPDAC NLP

77.34284.925

20

Nov 06, 2017
Conductor-net (ensemble)

CMU

https://arxiv.org/abs/1710.10504
76.99684.630

20

Dec 21, 2017
Jenga (ensemble)

Facebook AI Research

77.23784.466

20

Jan 23, 2018
MARS (single model)

YUANFUDAO research NLP

76.85984.739

21

Oct 13, 2017
r-net (single model)

Microsoft Research Asia

http://aka.ms/rnet
76.46184.265

21

Dec 19, 2017
FRC (single model)

in review

76.24084.599

21

Nov 01, 2017
SAN (single model)

Microsoft Business AI Solutions Team

https://arxiv.org/pdf/1712.03556.pdf
76.82884.396

22

Oct 22, 2017
Conductor-net (ensemble)

CMU

76.14683.991

23

Sep 08, 2017
FusionNet (single model)

Microsoft Business AI Solutions team

https://arxiv.org/abs/1711.07341
75.96883.900

24

Oct 22, 2017
Interactive AoA Reader+ (single model)

Joint Laboratory of HIT and iFLYTEK

75.82183.843

24

Jul 14, 2017
smarnet (ensemble)

Eigen Technology & Zhejiang University

75.98983.475

25

Aug 18, 2017
RaSoR + TR (single model)

Tel-Aviv University

https://arxiv.org/abs/1712.03609
75.78983.261

26

Oct 23, 2017
DCN+ (single model)

Salesforce Research

75.08783.081

27

Oct 31, 2017
SLQA (single model)

Alibaba iDST NLP

74.48982.815

27

Feb 06, 2018
Jenga (single model)

Facebook AI Research

74.37382.845

27

Nov 01, 2017
Mixed model (ensemble)

Sean

75.26582.769

28

Jan 02, 2018
Conductor-net (single model)

CMU

https://arxiv.org/abs/1710.10504
74.40582.742

28

Nov 17, 2017
two-attention-self-attention (ensemble)

guotong1988

75.22382.716

28

May 21, 2017
MEMEN (ensemble)

Eigen Technology & Zhejiang University

https://arxiv.org/abs/1707.09098
75.37082.658

29

Mar 09, 2017
ReasoNet (ensemble)

MSR Redmond

https://arxiv.org/abs/1609.05284
75.03482.552

30

Oct 27, 2017
Unnamed submission by null

74.48982.312

30

Jul 10, 2017
DCN+ (single model)

Salesforce Research

74.86682.806

31

Jul 14, 2017
Mnemonic Reader (ensemble)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
74.26882.371

32

Dec 23, 2017
S^3-Net (ensemble)

Kangwon National University in South Korea

74.12182.342

33

Nov 06, 2017
Conductor-net (single)

CMU

https://arxiv.org/abs/1710.10504
73.24081.933

33

Jul 25, 2017
Interactive AoA Reader (single model)

Joint Laboratory of HIT and iFLYTEK Research

73.63981.931

33

Jul 29, 2017
SEDT (ensemble model)

CMU

https://arxiv.org/abs/1703.00572
74.09081.761

34

Dec 14, 2017
Jenga (single model)

Facebook AI Research

73.30381.754

34

Jul 06, 2017
SSAE (ensemble)

Tsinghua University

74.08081.665

35

Apr 22, 2017
SEDT+BiDAF (ensemble)

CMU

https://arxiv.org/abs/1703.00572
73.72381.530

35

Feb 22, 2017
BiDAF (ensemble)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
73.74481.525

35

Jan 24, 2017
Multi-Perspective Matching (ensemble)

IBM Research

https://arxiv.org/abs/1612.04211
73.76581.257

35

May 01, 2017
jNet (ensemble)

USTC & National Research Council Canada & York University

https://arxiv.org/abs/1703.04617
73.01081.517

36

Oct 22, 2017
Conductor-net (single)

CMU

72.59081.415

36

Apr 12, 2017
T-gating (ensemble)

Peking University

72.75881.001

36

Nov 16, 2017
two-attention-self-attention (single model)

guotong1988

72.60081.011

36

Sep 20, 2017
BiDAF + Self Attention (single model)

Allen Institute for Artificial Intelligence

https://arxiv.org/abs/1710.10723
72.13981.048

37

Dec 15, 2017
S^3-Net (single model)

Kangwon National University in South Korea

71.90881.023

38

Nov 01, 2016
Dynamic Coattention Networks (ensemble)

Salesforce Research

https://arxiv.org/abs/1611.01604
71.62580.383

39

Jul 14, 2017
smarnet (single model)

Eigen Technology & Zhejiang University

https://arxiv.org/abs/1710.02772
71.41580.160

40

Jul 14, 2017
Mnemonic Reader (single model)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
70.99580.146

40

Apr 13, 2017
QFASE

NUS

71.89879.989

40

Nov 06, 2017
attention+self-attention (single model)

guotong1988

71.69880.462

41

Oct 27, 2017
M-NET (single)

UFL

71.01679.835

42

Apr 02, 2017
Ruminating Reader (single model)

New York University

https://arxiv.org/abs/1704.07415
70.63979.456

42

Mar 24, 2017
jNet (single model)

USTC & National Research Council Canada & York University

https://arxiv.org/abs/1703.04617
70.60779.821

42

Mar 14, 2017
Document Reader (single model)

Facebook AI Research

https://arxiv.org/abs/1704.00051
70.73379.353

42

Dec 28, 2016
FastQAExt

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
70.84978.857

42

May 13, 2017
RaSoR (single model)

Google NY, Tel-Aviv University

https://arxiv.org/abs/1611.01436
70.84978.741

42

Mar 08, 2017
ReasoNet (single model)

MSR Redmond

https://arxiv.org/abs/1609.05284
70.55579.364

43

Apr 14, 2017
Multi-Perspective Matching (single model)

IBM Research

https://arxiv.org/abs/1612.04211
70.38778.784

44

Feb 05, 2018
SSR-BiDAF

single model

69.44378.358

44

Aug 30, 2017
SimpleBaseline (single model)

Technical University of Vienna

69.60078.236

45

Apr 12, 2017
SEDT+BiDAF (single model)

CMU

https://arxiv.org/abs/1703.00572
68.47877.971

46

Jun 25, 2017
PQMN (single model)

KAIST & AIBrain & Crosscert

68.33177.783

46

Dec 28, 2016
FastQA

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
68.43677.070

46

Apr 12, 2017
T-gating (single model)

Peking University

68.13277.569

47

Nov 28, 2016
BiDAF (single model)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
67.97477.323

47

Jan 22, 2018
FABIR (Single Model)

in review

67.74477.605

47

Jul 29, 2017
SEDT (single model)

CMU

https://arxiv.org/abs/1703.00572
68.16377.527

48

Sep 19, 2017
AllenNLP BiDAF (single model)

Allen Institute for AI

http://allennlp.org/
67.61877.151

48

Oct 26, 2016
Match-LSTM with Ans-Ptr (Boundary) (ensemble)

Singapore Management University

https://arxiv.org/abs/1608.07905
67.90177.022

49

Feb 05, 2017
Iterative Co-attention Network

Fudan University

67.50276.786

50

Nov 01, 2016
Dynamic Coattention Networks (single model)

Salesforce Research

https://arxiv.org/abs/1611.01604
66.23375.896

50

Jan 03, 2018
newtest

single model

66.52775.787

51

Jan 03, 2018
baseline

single model

64.79674.272

52

Dec 09, 2017
Unnamed submission by ravioncodalab

64.43973.921

52

Oct 26, 2016
Match-LSTM with Bi-Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
64.74473.743

53

Feb 19, 2017
Attentive CNN context with LSTM

NLPR, CASIA

63.30673.463

54

Nov 02, 2016
Fine-Grained Gating

Carnegie Mellon University

https://arxiv.org/abs/1611.01724
62.44673.327

54

Sep 21, 2017
OTF dict+spelling (single)

University of Montreal

https://arxiv.org/abs/1706.00286
64.08373.056

55

Sep 21, 2017
OTF spelling (single)

University of Montreal

https://arxiv.org/abs/1706.00286
62.89772.016

56

Sep 21, 2017
OTF spelling+lemma (single)

University of Montreal

https://arxiv.org/abs/1706.00286
62.60471.968

57

Sep 28, 2016
Dynamic Chunk Reader

IBM

https://arxiv.org/abs/1610.09996
62.49970.956

58

Aug 27, 2016
Match-LSTM with Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
60.47470.695

59

Jan 05, 2018
PivRet (single model)

anonymous

58.76469.276

60

Aug 27, 2016
Match-LSTM with Ans-Ptr (Sentence)

Singapore Management University

https://arxiv.org/abs/1608.07905
54.50567.748