SQuAD2.0

The Stanford Question Answering Dataset

What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.


New SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. SQuAD2.0 is a challenging natural language understanding task for existing models, and we release SQuAD2.0 to the community as the successor to SQuAD1.1. We are optimistic that this new dataset will encourage the development of reading comprehension systems that know what they don't know.

SQuAD2.0 paper (Rajpurkar & Jia et al. '18)SQuAD1.0 paper (Rajpurkar et al. '16)

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate-v2.0.py <path_to_dev-v2.0> <path_to_predictions>.

Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:

Submission Tutorial

Because SQuAD is an ongoing effort, we expect the dataset to evolve.

To keep up to date with major changes to the dataset, please subscribe:

Have Questions?

Ask us questions at our google group or at pranavsr@stanford.edu and robinjia@stanford.edu.

Star

Leaderboard

SQuAD2.0 tests the ability of a system to not only answer reading comprehension questions, but also abstain when presented with a question that cannot be answered based on the provided paragraph. How will your system compare to humans on this task?

RankModelEMF1
Human Performance

Stanford University

(Rajpurkar & Jia et al. '18)
86.83189.452

1

Sep 13, 2018
nlnet (single model)

Microsoft Research Asia

74.23877.022

2

Oct 12, 2018
YARCS (ensemble)

IBM Research AI

72.67075.493

3

Oct 13, 2018
RNANetSimple (ensemble)

Anonymous

72.60275.089

4

Sep 17, 2018
Unet (ensemble)

Fudan University & Liulishuo Lab

https://arxiv.org/abs/1810.06638
71.55375.011

4

Aug 15, 2018
Reinforced Mnemonic Reader + Answer Verifier (single model)

NUDT

https://arxiv.org/abs/1808.05759
71.69974.238

4

Aug 28, 2018
SLQA+ (single model)

Alibaba DAMO NLP

http://www.aclweb.org/anthology/P18-1158
71.45174.422

5

Sep 14, 2018
SAN (ensemble model)

Microsoft Business Applications AI Research

https://arxiv.org/abs/1712.03556
71.28273.658

6

Oct 13, 2018
RNANetSimple (single model)

Anonymous

70.67373.362

7

Oct 12, 2018
Candi-Net (single model)

42Maru NLP Team

70.18772.676

8

Sep 14, 2018
Unet (single model)

Fudan University & Liulishuo Lab

69.23972.622

8

Aug 21, 2018
FusionNet++ (ensemble)

Microsoft Business Applications Group AI Research

https://arxiv.org/abs/1711.07341
70.34572.555

8

Sep 26, 2018
Multi-Level Attention Fusion(MLAF) (single model)

Chonbuk National University, Cognitive Computing Lab.

69.63473.011

9

Aug 21, 2018
SAN (single model)

Microsoft Business Applications AI Research

https://arxiv.org/abs/1712.03556
68.65371.441

9

Sep 13, 2018
BiDAF++ with pair2vec (single model)

UW and FAIR

68.02171.583

9

Jul 13, 2018
VS^3-NET (single model)

Kangwon National University in South Korea

68.43871.282

9

Sep 04, 2018
U-net (single model)

Fudan University & Liulishuo AI Lab

67.72771.676

9

Aug 25, 2018
ARRR (single model)

anonymous

68.64171.113

10

Aug 26, 2018
abcNet (single model)

Fudan University & Liulishuo AI Lab

66.67871.052

10

Jun 24, 2018
KACTEIL-MRC(GFN-Net) (single model)

Kangwon National University, Natural Language Processing Lab.

68.22470.871

11

Sep 21, 2018
PAML (single model)

GammaLab

67.42370.551

11

Sep 27, 2018
Clova QA Engine (single model)

NAVER Clova

67.20870.893

12

Oct 11, 2018
TAR (single model)

Anonymous

66.99470.172

13

Aug 25, 2018
FusionNet++ (single)

Microsoft Business Applications Group AI Research

https://arxiv.org/abs/1711.07341
66.55469.649

14

Jun 25, 2018
KakaoNet2 (single model)

Kakao NLP Team

65.70869.369

15

Jul 11, 2018
abcNet (single model)

Fudan University & Liulishuo AI Lab

65.25669.198

15

Sep 13, 2018
BiDAF++ (single model)

UW and FAIR

65.66268.870

16

Jun 27, 2018
BSAE AddText (single model)

reciTAL.ai

63.38367.478

17

Aug 14, 2018
eeAttNet (single model)

BBD NLP Team

https://www.bbdservice.com
63.36066.638

17

May 30, 2018
BiDAF + Self Attention + ELMo (single model)

Allen Institute for Artificial Intelligence [modified by Stanford]

63.38366.262

18

May 30, 2018
BiDAF + Self Attention (single model)

Allen Institute for Artificial Intelligence [modified by Stanford]

59.33262.305

19

May 30, 2018
BiDAF-No-Answer (single model)

University of Washington [modified by Stanford]

59.17462.093

SQuAD1.1 Leaderboard

Since the release of SQuAD1.0, the community has made rapid progress, with the best models now rivaling human performance on the task. Here are the ExactMatch (EM) and F1 scores evaluated on the test set of v1.1.

RankModelEMF1
Human Performance

Stanford University

(Rajpurkar et al. '16)
82.30491.221

1

Oct 05, 2018
BERT (ensemble)

Google AI Language

https://arxiv.org/abs/1810.04805
87.43393.160

2

Oct 05, 2018
BERT (single model)

Google AI Language

https://arxiv.org/abs/1810.04805
85.08391.835

2

Sep 09, 2018
nlnet (ensemble)

Microsoft Research Asia

85.35691.202

2

Sep 26, 2018
nlnet (ensemble)

Microsoft Research Asia

85.95491.677

3

Jul 11, 2018
QANet (ensemble)

Google Brain & CMU

84.45490.490

4

Jul 08, 2018
r-net (ensemble)

Microsoft Research Asia

84.00390.147

5

Mar 19, 2018
QANet (ensemble)

Google Brain & CMU

83.87789.737

5

Sep 09, 2018
nlnet (single model)

Microsoft Research Asia

83.46890.133

5

Jun 20, 2018
MARS (ensemble)

YUANFUDAO research NLP

83.98289.796

6

Sep 01, 2018
MARS (single model)

YUANFUDAO research NLP

83.18589.547

7

Jan 22, 2018
Hybrid AoA Reader (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

82.48289.281

7

May 09, 2018
MARS (single model)

YUANFUDAO research NLP

82.58788.880

7

Feb 19, 2018
Reinforced Mnemonic Reader + A2D (ensemble model)

Microsoft Research Asia & NUDT

82.84988.764

7

Jun 21, 2018
MARS (single model)

YUANFUDAO research NLP

83.12289.224

7

Jun 20, 2018
QANet (single)

Google Brain & CMU

82.47189.306

7

Mar 06, 2018
QANet (ensemble)

Google Brain & CMU

82.74489.045

8

Jan 03, 2018
r-net+ (ensemble)

Microsoft Research Asia

82.65088.493

8

Jan 05, 2018
SLQA+ (ensemble)

Alibaba iDST NLP

82.44088.607

8

Feb 27, 2018
QANet (single model)

Google Brain & CMU

82.20988.608

8

Feb 02, 2018
Reinforced Mnemonic Reader (ensemble model)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
82.28388.533

9

Dec 22, 2017
AttentionReader+ (ensemble)

Tencent DPDAC NLP

81.79088.163

9

Dec 17, 2017
r-net (ensemble)

Microsoft Research Asia

http://aka.ms/rnet
82.13688.126

9

May 09, 2018
Reinforced Mnemonic Reader + A2D (single model)

Microsoft Research Asia & NUDT

81.53888.130

9

Apr 23, 2018
r-net (single model)

Microsoft Research Asia

81.39188.170

9

Apr 03, 2018
KACTEIL-MRC(GF-Net+) (ensemble)

Kangwon National University, Natural Language Processing Lab.

81.49687.557

9

May 09, 2018
Reinforced Mnemonic Reader + A2D + DA (single model)

Microsoft Research Asia & NUDT

81.40188.122

10

Feb 12, 2018
Reinforced Mnemonic Reader + A2D (single model)

Microsoft Research Asia & NUDT

80.48987.454

10

Nov 17, 2017
BiDAF + Self Attention + ELMo (ensemble)

Allen Institute for Artificial Intelligence

81.00387.432

10

Feb 27, 2018
QANet (single model)

Google Brain & CMU

80.92987.773

11

Feb 19, 2018
Reinforced Mnemonic Reader + A2D (single model)

Microsoft Research Asia & NUDT

80.91987.492

12

Apr 12, 2018
AVIQA+ (ensemble)

aviqa team

80.61587.311

13

Jan 22, 2018
Hybrid AoA Reader (single model)

Joint Laboratory of HIT and iFLYTEK Research

80.02787.288

13

Jan 13, 2018
SLQA+

single model

80.43687.021

14

Jan 12, 2018
EAZI+ (ensemble)

Yiwise NLP Group

80.42686.912

14

Jan 04, 2018
{EAZI} (ensemble)

Yiwise NLP Group

80.43686.912

15

Mar 20, 2018
DNET (ensemble)

QA geeks

80.16486.721

16

Feb 12, 2018
BiDAF + Self Attention + ELMo + A2D (single model)

Microsoft Research Asia & NUDT

79.99686.711

17

Jan 29, 2018
Reinforced Mnemonic Reader (single model)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
79.54586.654

17

Feb 23, 2018
MAMCN+ (single model)

Samsung Research

79.69286.727

17

Apr 10, 2018
Unnamed submission by null

80.02786.612

18

Dec 28, 2017
SLQA+ (single model)

Alibaba iDST NLP

79.19986.590

18

Jan 03, 2018
r-net+ (single model)

Microsoft Research Asia

79.90186.536

19

Dec 05, 2017
SAN (ensemble model)

Microsoft Business AI Solutions Team

https://arxiv.org/abs/1712.03556
79.60886.496

20

Oct 17, 2017
Interactive AoA Reader+ (ensemble)

Joint Laboratory of HIT and iFLYTEK

79.08386.450

21

Jun 01, 2018
MDReader

single model

79.03186.006

21

Feb 01, 2018
Unnamed submission by null

78.99986.151

22

Oct 24, 2017
FusionNet (ensemble)

Microsoft Business AI Solutions Team

https://arxiv.org/abs/1711.07341
78.97886.016

23

Oct 22, 2017
DCN+ (ensemble)

Salesforce Research

https://arxiv.org/abs/1711.00106
78.85285.996

24

Nov 03, 2017
BiDAF + Self Attention + ELMo (single model)

Allen Institute for Artificial Intelligence

78.58085.833

24

Mar 29, 2018
KACTEIL-MRC(GF-Net+) (single model)

Kangwon National University, Natural Language Processing Lab.

78.66485.780

25

May 09, 2018
KakaoNet (single model)

Kakao NLP Team

78.40185.724

26

Nov 30, 2017
SLQA(ensemble)

Alibaba iDST NLP

78.32885.682

27

Jun 01, 2018
MDReader0

single model

78.17185.543

27

Mar 19, 2018
aviqa (ensemble)

aviqa team

78.49685.469

27

Sep 18, 2018
BiDAF++ with pair2vec (single model)

UW and FAIR

78.22385.535

27

Jan 03, 2018
MEMEN (single model)

Zhejiang University

https://arxiv.org/abs/1707.09098
78.23485.344

27

Jan 02, 2018
Conductor-net (ensemble)

CMU

https://arxiv.org/abs/1710.10504
78.43385.517

28

Jan 29, 2018
test

single

78.08785.348

29

Jul 25, 2017
Interactive AoA Reader (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

77.84585.297

30

Jan 10, 2018
Unnamed submission by null

77.43685.130

30

Mar 20, 2018
DNET (single model)

QA geeks

77.64684.905

31

Sep 18, 2018
BiDAF++ (single model)

UW and FAIR

77.57384.858

31

Dec 13, 2017
RaSoR + TR + LM (single model)

Tel-Aviv University

https://arxiv.org/abs/1712.03609
77.58384.163

31

Apr 10, 2018
Unnamed submission by null

77.48984.735

31

Dec 06, 2017
AttentionReader+ (single)

Tencent DPDAC NLP

77.34284.925

32

Nov 06, 2017
Conductor-net (ensemble)

CMU

https://arxiv.org/abs/1710.10504
76.99684.630

32

Dec 21, 2017
Jenga (ensemble)

Facebook AI Research

77.23784.466

32

Jan 23, 2018
MARS (single model)

YUANFUDAO research NLP

76.85984.739

33

May 14, 2018
VS^3-NET (single model)

Kangwon National University in South Korea

76.77584.491

33

Nov 01, 2017
SAN (single model)

Microsoft Business AI Solutions Team

https://arxiv.org/abs/1712.03556
76.82884.396

34

Oct 13, 2017
r-net (single model)

Microsoft Research Asia

http://aka.ms/rnet
76.46184.265

34

Dec 19, 2017
FRC (single model)

in review

76.24084.599

35

Oct 22, 2017
Conductor-net (ensemble)

CMU

76.14683.991

36

Sep 08, 2017
FusionNet (single model)

Microsoft Business AI Solutions team

https://arxiv.org/abs/1711.07341
75.96883.900

37

Oct 22, 2017
Interactive AoA Reader+ (single model)

Joint Laboratory of HIT and iFLYTEK

75.82183.843

37

Jul 14, 2017
smarnet (ensemble)

Eigen Technology & Zhejiang University

75.98983.475

38

Mar 15, 2018
AVIQA-v2 (single model)

aviqa team

75.92683.305

39

Oct 05, 2018
Knowledge Aided Reader (single model)

York University

https://arxiv.org/abs/1809.03449
74.95083.294

39

Aug 18, 2017
RaSoR + TR (single model)

Tel-Aviv University

https://arxiv.org/abs/1712.03609
75.78983.261

40

Oct 23, 2017
DCN+ (single model)

Salesforce Research

https://arxiv.org/abs/1711.00106
75.08783.081

41

Mar 09, 2017
ReasoNet (ensemble)

MSR Redmond

https://arxiv.org/abs/1609.05284
75.03482.552

41

May 21, 2017
MEMEN (ensemble)

Eigen Technology & Zhejiang University

https://arxiv.org/abs/1707.09098
75.37082.658

41

Nov 17, 2017
two-attention-self-attention (ensemble)

guotong1988

75.22382.716

41

Nov 01, 2017
Mixed model (ensemble)

Sean

75.26582.769

41

Jul 10, 2017
DCN+ (single model)

Salesforce Research

https://arxiv.org/abs/1711.00106
74.86682.806

41

Oct 31, 2017
SLQA (single model)

Alibaba iDST NLP

74.48982.815

41

Feb 06, 2018
Jenga (single model)

Facebook AI Research

74.37382.845

41

Aug 14, 2018
eeAttNet (single model)

BBD NLP Team

https://www.bbdservice.com
74.60482.501

42

Feb 13, 2018
SSR-BiDAF

ensemble model

74.54182.477

42

Jan 02, 2018
Conductor-net (single model)

CMU

https://arxiv.org/abs/1710.10504
74.40582.742

42

Oct 27, 2017
Unnamed submission by null

74.48982.312

42

Jul 14, 2017
Mnemonic Reader (ensemble)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
74.26882.371

43

Dec 23, 2017
S^3-Net (ensemble)

Kangwon National University in South Korea

74.12182.342

44

Jul 29, 2017
SEDT (ensemble model)

CMU

https://arxiv.org/abs/1703.00572
74.09081.761

45

Jul 06, 2017
SSAE (ensemble)

Tsinghua University

74.08081.665

45

Dec 14, 2017
Jenga (single model)

Facebook AI Research

73.30381.754

45

Jul 25, 2017
Interactive AoA Reader (single model)

Joint Laboratory of HIT and iFLYTEK Research

73.63981.931

45

Apr 22, 2017
SEDT+BiDAF (ensemble)

CMU

https://arxiv.org/abs/1703.00572
73.72381.530

45

Feb 22, 2017
BiDAF (ensemble)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
73.74481.525

46

May 01, 2017
jNet (ensemble)

USTC & National Research Council Canada & York University

https://arxiv.org/abs/1703.04617
73.01081.517

47

Oct 22, 2017
Conductor-net (single)

CMU

72.59081.415

47

Jan 24, 2017
Multi-Perspective Matching (ensemble)

IBM Research

https://arxiv.org/abs/1612.04211
73.76581.257

47

Nov 06, 2017
Conductor-net (single)

CMU

https://arxiv.org/abs/1710.10504
73.24081.933

48

Dec 15, 2017
S^3-Net (single model)

Kangwon National University in South Korea

71.90881.023

48

Nov 16, 2017
two-attention-self-attention (single model)

guotong1988

72.60081.011

48

Apr 12, 2017
T-gating (ensemble)

Peking University

72.75881.001

48

Apr 17, 2018
Unnamed submission by null

72.83180.622

48

Apr 17, 2018
Unnamed submission by null

72.83180.622

48

Sep 20, 2017
BiDAF + Self Attention (single model)

Allen Institute for Artificial Intelligence

https://arxiv.org/abs/1710.10723
72.13981.048

48

Mar 03, 2018
AVIQA (single model)

aviqa team

72.48580.550

49

Nov 06, 2017
attention+self-attention (single model)

guotong1988

71.69880.462

50

Jul 14, 2017
smarnet (single model)

Eigen Technology & Zhejiang University

https://arxiv.org/abs/1710.02772
71.41580.160

51

Jul 14, 2017
Mnemonic Reader (single model)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
70.99580.146

51

Apr 13, 2017
QFASE

NUS

71.89879.989

51

Nov 01, 2016
Dynamic Coattention Networks (ensemble)

Salesforce Research

https://arxiv.org/abs/1611.01604
71.62580.383

52

Apr 22, 2018
MAMCN (single model)

Samsung Research

70.98579.939

52

Oct 27, 2017
M-NET (single)

UFL

71.01679.835

52

May 23, 2018
AttReader (single)

College of Computer & Information Science, SouthWest University, Chongqing, China

71.37379.725

53

Apr 02, 2017
Ruminating Reader (single model)

New York University

https://arxiv.org/abs/1704.07415
70.63979.456

53

Mar 24, 2017
jNet (single model)

USTC & National Research Council Canada & York University

https://arxiv.org/abs/1703.04617
70.60779.821

53

May 13, 2017
RaSoR (single model)

Google NY, Tel-Aviv University

https://arxiv.org/abs/1611.01436
70.84978.741

53

Mar 14, 2017
Document Reader (single model)

Facebook AI Research

https://arxiv.org/abs/1704.00051
70.73379.353

53

Dec 28, 2016
FastQAExt

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
70.84978.857

53

Mar 08, 2017
ReasoNet (single model)

MSR Redmond

https://arxiv.org/abs/1609.05284
70.55579.364

54

Apr 14, 2017
Multi-Perspective Matching (single model)

IBM Research

https://arxiv.org/abs/1612.04211
70.38778.784

55

Aug 30, 2017
SimpleBaseline (single model)

Technical University of Vienna

69.60078.236

55

Feb 05, 2018
SSR-BiDAF

single model

69.44378.358

56

Apr 12, 2017
SEDT+BiDAF (single model)

CMU

https://arxiv.org/abs/1703.00572
68.47877.971

57

Jun 25, 2017
PQMN (single model)

KAIST & AIBrain & Crosscert

68.33177.783

58

Apr 12, 2017
T-gating (single model)

Peking University

68.13277.569

58

Jul 29, 2017
SEDT (single model)

CMU

https://arxiv.org/abs/1703.00572
68.16377.527

59

Nov 28, 2016
BiDAF (single model)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
67.97477.323

59

Jan 22, 2018
FABIR (Single Model)

in review

67.74477.605

59

Dec 28, 2016
FastQA

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
68.43677.070

59

Feb 22, 2018
Unnamed submission by null

68.42577.077

59

Feb 22, 2018
Unnamed submission by null

68.47877.220

60

Sep 19, 2017
AllenNLP BiDAF (single model)

Allen Institute for AI

http://allennlp.org/
67.61877.151

60

Oct 26, 2016
Match-LSTM with Ans-Ptr (Boundary) (ensemble)

Singapore Management University

https://arxiv.org/abs/1608.07905
67.90177.022

61

Feb 05, 2017
Iterative Co-attention Network

Fudan University

67.50276.786

62

Nov 01, 2016
Dynamic Coattention Networks (single model)

Salesforce Research

https://arxiv.org/abs/1611.01604
66.23375.896

62

Jan 03, 2018
newtest

single model

66.52775.787

63

Feb 24, 2018
Unnamed submission by null

65.99275.469

64

Jan 10, 2018
Unnamed submission by null

64.79674.272

65

Dec 09, 2017
Unnamed submission by ravioncodalab

64.43973.921

65

Oct 26, 2016
Match-LSTM with Bi-Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
64.74473.743

66

Feb 19, 2017
Attentive CNN context with LSTM

NLPR, CASIA

63.30673.463

66

Sep 21, 2017
OTF dict+spelling (single)

University of Montreal

https://arxiv.org/abs/1706.00286
64.08373.056

67

Sep 21, 2017
OTF spelling (single)

University of Montreal

https://arxiv.org/abs/1706.00286
62.89772.016

67

Nov 02, 2016
Fine-Grained Gating

Carnegie Mellon University

https://arxiv.org/abs/1611.01724
62.44673.327

67

Sep 21, 2017
OTF spelling+lemma (single)

University of Montreal

https://arxiv.org/abs/1706.00286
62.60471.968

68

Sep 28, 2016
Dynamic Chunk Reader

IBM

https://arxiv.org/abs/1610.09996
62.49970.956

69

Aug 27, 2016
Match-LSTM with Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
60.47470.695

70

Sep 11, 2018
Unnamed submission by Will_Wu

59.05869.436

71

Jan 05, 2018
PivRet (single model)

anonymous

58.76469.276

72

Aug 27, 2016
Match-LSTM with Ans-Ptr (Sentence)

Singapore Management University

https://arxiv.org/abs/1608.07905
54.50567.748