Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
New SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. SQuAD2.0 is a challenging natural language understanding task for existing models, and we release SQuAD2.0 to the community as the successor to SQuAD1.1. We are optimistic that this new dataset will encourage the development of reading comprehension systems that know what they don't know.
SQuAD2.0 paper (Rajpurkar & Jia et al. '18)SQuAD1.0 paper (Rajpurkar et al. '16)We've built a few resources to help you get started with the dataset.
Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):
To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate-v2.0.py <path_to_dev-v2.0> <path_to_predictions>.
Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:
Submission TutorialBecause SQuAD is an ongoing effort, we expect the dataset to evolve.
To keep up to date with major changes to the dataset, please subscribe:
Ask us questions at our google group or at pranavsr@stanford.edu and robinjia@stanford.edu.
SQuAD2.0 tests the ability of a system to not only answer reading comprehension questions, but also abstain when presented with a question that cannot be answered based on the provided paragraph. How will your system compare to humans on this task?
| Rank | Model | EM | F1 |
|---|---|---|---|
| Human Performance Stanford University (Rajpurkar & Jia et al. '18) | 86.831 | 89.452 | |
| 1 Jul 13, 2018 | VS^3-NET (single model) Kangwon National University in South Korea | 68.438 | 71.282 |
| 2 Jun 25, 2018 | KACTEIL-MRC(GFN-Net) (single model) Kangwon National University, Natural Language Processing Lab. | 68.224 | 70.871 |
| 3 Jun 26, 2018 | KakaoNet2 (single model) Kakao NLP Team | 65.708 | 69.369 |
| 4 Jul 11, 2018 | abcNet (single model) Fudan University & Liulishuo AI Lab | 65.256 | 69.198 |
| 5 Jun 27, 2018 | BSAE AddText (single model) reciTAL.ai | 63.383 | 67.478 |
| 5 May 31, 2018 | BiDAF + Self Attention + ELMo (single model) Allen Institute for Artificial Intelligence [modified by Stanford] | 63.383 | 66.262 |
| 6 May 31, 2018 | BiDAF + Self Attention (single model) Allen Institute for Artificial Intelligence [modified by Stanford] | 59.332 | 62.305 |
| 7 May 31, 2018 | BiDAF-No-Answer (single model) University of Washington [modified by Stanford] | 59.174 | 62.093 |
Since the release of SQuAD1.0, the community has made rapid progress, with the best models now rivaling human performance on the task. Here are the ExactMatch (EM) and F1 scores evaluated on the test set of v1.1.
| Rank | Model | EM | F1 |
|---|---|---|---|
| Human Performance Stanford University (Rajpurkar et al. '16) | 82.304 | 91.221 | |
| 1 Jul 12, 2018 | QANet (ensemble) Google Brain & CMU | 84.454 | 90.490 |
| 2 Jul 09, 2018 | r-net (ensemble) Microsoft Research Asia | 84.003 | 90.147 |
| 3 Jun 21, 2018 | MARS (ensemble) YUANFUDAO research NLP | 83.982 | 89.796 |
| 4 Mar 20, 2018 | QANet (ensemble) Google Brain & CMU | 83.877 | 89.737 |
| 5 Jun 21, 2018 | MARS (single model) YUANFUDAO research NLP | 83.122 | 89.224 |
| 6 Mar 07, 2018 | QANet (ensemble) Google Brain & CMU | 82.744 | 89.045 |
| 7 May 09, 2018 | MARS (single model) YUANFUDAO research NLP | 82.587 | 88.880 |
| 7 Feb 20, 2018 | Reinforced Mnemonic Reader + A2D (ensemble model) Microsoft Research Asia & NUDT | 82.849 | 88.764 |
| 7 Jun 20, 2018 | QANet (single) Google Brain & CMU | 82.471 | 89.306 |
| 7 Jan 23, 2018 | Hybrid AoA Reader (ensemble) Joint Laboratory of HIT and iFLYTEK Research | 82.482 | 89.281 |
| 7 Jan 04, 2018 | r-net+ (ensemble) Microsoft Research Asia | 82.650 | 88.493 |
| 7 Jan 06, 2018 | SLQA+ (ensemble) Alibaba iDST NLP | 82.440 | 88.607 |
| 8 Feb 02, 2018 | Reinforced Mnemonic Reader (ensemble model) NUDT and Fudan University https://arxiv.org/abs/1705.02798 | 82.283 | 88.533 |
| 8 Feb 28, 2018 | QANet (single model) Google Brain & CMU | 82.209 | 88.608 |
| 9 Dec 22, 2017 | AttentionReader+ (ensemble) Tencent DPDAC NLP | 81.790 | 88.163 |
| 10 May 09, 2018 | Reinforced Mnemonic Reader + A2D (single model) Microsoft Research Asia & NUDT | 81.538 | 88.130 |
| 10 Dec 18, 2017 | r-net (ensemble) Microsoft Research Asia http://aka.ms/rnet | 82.136 | 88.126 |
| 11 Feb 28, 2018 | QANet (single model) Google Brain & CMU | 80.929 | 87.773 |
| 11 Apr 03, 2018 | KACTEIL-MRC(GF-Net+) (ensemble model) Kangwon National University, Natural Language Processing Lab. | 81.496 | 87.557 |
| 11 May 09, 2018 | Reinforced Mnemonic Reader + A2D + DA (single model) Microsoft Research Asia & NUDT | 81.401 | 88.122 |
| 11 Apr 24, 2018 | r-net (single model) Microsoft Research Asia | 81.391 | 88.170 |
| 12 Nov 18, 2017 | BiDAF + Self Attention + ELMo (ensemble) Allen Institute for Artificial Intelligence | 81.003 | 87.432 |
| 12 Feb 19, 2018 | Reinforced Mnemonic Reader + A2D (single model) Microsoft Research Asia & NUDT | 80.919 | 87.492 |
| 13 Apr 13, 2018 | AVIQA+ (ensemble) aviqa team | 80.615 | 87.311 |
| 13 Feb 13, 2018 | Reinforced Mnemonic Reader + A2D (single model) Microsoft Research Asia & NUDT | 80.489 | 87.454 |
| 14 Jan 13, 2018 | SLQA+ single model | 80.436 | 87.021 |
| 15 Jan 13, 2018 | EAZI+ (ensemble) Yiwise NLP Group | 80.426 | 86.912 |
| 15 Jan 05, 2018 | {EAZI} (ensemble) Yiwise NLP Group | 80.436 | 86.912 |
| 16 Mar 20, 2018 | DNET (ensemble) QA geeks | 80.164 | 86.721 |
| 16 Jan 23, 2018 | Hybrid AoA Reader (single model) Joint Laboratory of HIT and iFLYTEK Research | 80.027 | 87.288 |
| 17 Feb 13, 2018 | BiDAF + Self Attention + ELMo + A2D (single model) Microsoft Research Asia & NUDT | 79.996 | 86.711 |
| 17 Apr 11, 2018 | Unnamed submission by null | 80.027 | 86.612 |
| 18 Jan 04, 2018 | r-net+ (single model) Microsoft Research Asia | 79.901 | 86.536 |
| 18 Feb 24, 2018 | MAMCN+ (single model) Samsung Research | 79.692 | 86.727 |
| 19 Jan 30, 2018 | Reinforced Mnemonic Reader (single model) NUDT and Fudan University https://arxiv.org/abs/1705.02798 | 79.545 | 86.654 |
| 19 Dec 06, 2017 | SAN (ensemble model) Microsoft Business AI Solutions Team https://arxiv.org/abs/1712.03556 | 79.608 | 86.496 |
| 19 Dec 29, 2017 | SLQA+ (single model) Alibaba iDST NLP | 79.199 | 86.590 |
| 20 Oct 18, 2017 | Interactive AoA Reader+ (ensemble) Joint Laboratory of HIT and iFLYTEK | 79.083 | 86.450 |
| 21 Feb 02, 2018 | Unnamed submission by null | 78.999 | 86.151 |
| 21 Jun 02, 2018 | MDReader single model | 79.031 | 86.006 |
| 21 Oct 25, 2017 | FusionNet (ensemble) Microsoft Business AI Solutions Team https://arxiv.org/abs/1711.07341 | 78.978 | 86.016 |
| 22 Oct 22, 2017 | DCN+ (ensemble) Salesforce Research https://arxiv.org/abs/1711.00106 | 78.852 | 85.996 |
| 23 Nov 03, 2017 | BiDAF + Self Attention + ELMo (single model) Allen Institute for Artificial Intelligence | 78.580 | 85.833 |
| 23 Mar 30, 2018 | KACTEIL-MRC(GF-Net+) (single model) Kangwon National University, Natural Language Processing Lab. | 78.664 | 85.780 |
| 24 Jun 02, 2018 | MDReader0 single model | 78.171 | 85.543 |
| 24 Mar 19, 2018 | aviqa (ensemble) aviqa team | 78.496 | 85.469 |
| 24 Dec 01, 2017 | SLQA(ensemble) Alibaba iDST NLP | 78.328 | 85.682 |
| 24 Jan 03, 2018 | Conductor-net (ensemble) CMU https://arxiv.org/abs/1710.10504 | 78.433 | 85.517 |
| 24 May 10, 2018 | KakaoNet (single model) Kakao NLP Team | 78.401 | 85.724 |
| 25 Jan 29, 2018 | test single | 78.087 | 85.348 |
| 25 Jan 04, 2018 | MEMEN (single model) Zhejiang University https://arxiv.org/abs/1707.09098 | 78.234 | 85.344 |
| 26 Jul 26, 2017 | Interactive AoA Reader (ensemble) Joint Laboratory of HIT and iFLYTEK Research | 77.845 | 85.297 |
| 27 Dec 07, 2017 | AttentionReader+ (single) Tencent DPDAC NLP | 77.342 | 84.925 |
| 27 Mar 20, 2018 | DNET (single model) QA geeks | 77.646 | 84.905 |
| 27 Jan 11, 2018 | Unnamed submission by null | 77.436 | 85.130 |
| 28 Jan 24, 2018 | MARS (single model) YUANFUDAO research NLP | 76.859 | 84.739 |
| 28 Nov 07, 2017 | Conductor-net (ensemble) CMU https://arxiv.org/abs/1710.10504 | 76.996 | 84.630 |
| 28 Dec 14, 2017 | RaSoR + TR + LM (single model) Tel-Aviv University https://arxiv.org/abs/1712.03609 | 77.583 | 84.163 |
| 28 Dec 21, 2017 | Jenga (ensemble) Facebook AI Research | 77.237 | 84.466 |
| 28 Apr 11, 2018 | Unnamed submission by null | 77.489 | 84.735 |
| 29 May 14, 2018 | VS^3-NET (single model) Kangwon National University in South Korea | 76.775 | 84.491 |
| 29 Nov 02, 2017 | SAN (single model) Microsoft Business AI Solutions Team https://arxiv.org/abs/1712.03556 | 76.828 | 84.396 |
| 29 Dec 20, 2017 | FRC (single model) in review | 76.240 | 84.599 |
| 29 Oct 14, 2017 | r-net (single model) Microsoft Research Asia http://aka.ms/rnet | 76.461 | 84.265 |
| 30 Oct 23, 2017 | Conductor-net (ensemble) CMU | 76.146 | 83.991 |
| 31 Sep 09, 2017 | FusionNet (single model) Microsoft Business AI Solutions team https://arxiv.org/abs/1711.07341 | 75.968 | 83.900 |
| 31 Jul 15, 2017 | smarnet (ensemble) Eigen Technology & Zhejiang University | 75.989 | 83.475 |
| 32 Mar 15, 2018 | AVIQA-v2 (single model) aviqa team | 75.926 | 83.305 |
| 32 Oct 23, 2017 | Interactive AoA Reader+ (single model) Joint Laboratory of HIT and iFLYTEK | 75.821 | 83.843 |
| 33 Aug 18, 2017 | RaSoR + TR (single model) Tel-Aviv University https://arxiv.org/abs/1712.03609 | 75.789 | 83.261 |
| 34 Oct 24, 2017 | DCN+ (single model) Salesforce Research https://arxiv.org/abs/1711.00106 | 75.087 | 83.081 |
| 35 Feb 06, 2018 | Jenga (single model) Facebook AI Research | 74.373 | 82.845 |
| 35 Oct 31, 2017 | SLQA (single model) Alibaba iDST NLP | 74.489 | 82.815 |
| 35 Jul 11, 2017 | DCN+ (single model) Salesforce Research https://arxiv.org/abs/1711.00106 | 74.866 | 82.806 |
| 35 Nov 02, 2017 | Mixed model (ensemble) Sean | 75.265 | 82.769 |
| 36 Feb 13, 2018 | SSR-BiDAF ensemble model | 74.541 | 82.477 |
| 36 Nov 18, 2017 | two-attention-self-attention (ensemble) guotong1988 | 75.223 | 82.716 |
| 36 May 22, 2017 | MEMEN (ensemble) Eigen Technology & Zhejiang University https://arxiv.org/abs/1707.09098 | 75.370 | 82.658 |
| 37 Mar 10, 2017 | ReasoNet (ensemble) MSR Redmond https://arxiv.org/abs/1609.05284 | 75.034 | 82.552 |
| 37 Jan 03, 2018 | Conductor-net (single model) CMU https://arxiv.org/abs/1710.10504 | 74.405 | 82.742 |
| 38 Jul 15, 2017 | Mnemonic Reader (ensemble) NUDT and Fudan University https://arxiv.org/abs/1705.02798 | 74.268 | 82.371 |
| 38 Oct 27, 2017 | Unnamed submission by null | 74.489 | 82.312 |
| 38 Dec 23, 2017 | S^3-Net (ensemble) Kangwon National University in South Korea | 74.121 | 82.342 |
| 39 Nov 07, 2017 | Conductor-net (single) CMU https://arxiv.org/abs/1710.10504 | 73.240 | 81.933 |
| 39 Jul 25, 2017 | Interactive AoA Reader (single model) Joint Laboratory of HIT and iFLYTEK Research | 73.639 | 81.931 |
| 39 Jul 06, 2017 | SSAE (ensemble) Tsinghua University | 74.080 | 81.665 |
| 40 Apr 23, 2017 | SEDT+BiDAF (ensemble) CMU https://arxiv.org/abs/1703.00572 | 73.723 | 81.530 |
| 40 Feb 22, 2017 | BiDAF (ensemble) Allen Institute for AI & University of Washington https://arxiv.org/abs/1611.01603 | 73.744 | 81.525 |
| 40 Jul 29, 2017 | SEDT (ensemble model) CMU https://arxiv.org/abs/1703.00572 | 74.090 | 81.761 |
| 41 Jan 24, 2017 | Multi-Perspective Matching (ensemble) IBM Research https://arxiv.org/abs/1612.04211 | 73.765 | 81.257 |
| 41 Dec 14, 2017 | Jenga (single model) Facebook AI Research | 73.303 | 81.754 |
| 42 May 02, 2017 | jNet (ensemble) USTC & National Research Council Canada & York University https://arxiv.org/abs/1703.04617 | 73.010 | 81.517 |
| 43 Oct 23, 2017 | Conductor-net (single) CMU | 72.590 | 81.415 |
| 44 Sep 21, 2017 | BiDAF + Self Attention (single model) Allen Institute for Artificial Intelligence https://arxiv.org/abs/1710.10723 | 72.139 | 81.048 |
| 44 Nov 17, 2017 | two-attention-self-attention (single model) guotong1988 | 72.600 | 81.011 |
| 44 Dec 15, 2017 | S^3-Net (single model) Kangwon National University in South Korea | 71.908 | 81.023 |
| 44 Apr 18, 2018 | Unnamed submission by null | 72.831 | 80.622 |
| 44 Apr 18, 2018 | Unnamed submission by null | 72.831 | 80.622 |
| 45 Mar 03, 2018 | AVIQA (single model) aviqa team | 72.485 | 80.550 |
| 45 Apr 12, 2017 | T-gating (ensemble) Peking University | 72.758 | 81.001 |
| 46 Nov 06, 2017 | attention+self-attention (single model) guotong1988 | 71.698 | 80.462 |
| 47 Nov 02, 2016 | Dynamic Coattention Networks (ensemble) Salesforce Research https://arxiv.org/abs/1611.01604 | 71.625 | 80.383 |
| 48 Jul 15, 2017 | smarnet (single model) Eigen Technology & Zhejiang University https://arxiv.org/abs/1710.02772 | 71.415 | 80.160 |
| 49 Jul 15, 2017 | Mnemonic Reader (single model) NUDT and Fudan University https://arxiv.org/abs/1705.02798 | 70.995 | 80.146 |
| 49 Apr 13, 2017 | QFASE NUS | 71.898 | 79.989 |
| 50 Apr 23, 2018 | MAMCN (single model) Samsung Research | 70.985 | 79.939 |
| 50 Oct 27, 2017 | M-NET (single) UFL | 71.016 | 79.835 |
| 50 May 23, 2018 | AttReader (single) College of Computer & Information Science, SouthWest University, Chongqing, China | 71.373 | 79.725 |
| 51 Mar 15, 2017 | Document Reader (single model) Facebook AI Research https://arxiv.org/abs/1704.00051 | 70.733 | 79.353 |
| 51 Apr 03, 2017 | Ruminating Reader (single model) New York University https://arxiv.org/abs/1704.07415 | 70.639 | 79.456 |
| 52 Mar 09, 2017 | ReasoNet (single model) MSR Redmond https://arxiv.org/abs/1609.05284 | 70.555 | 79.364 |
| 52 Mar 25, 2017 | jNet (single model) USTC & National Research Council Canada & York University https://arxiv.org/abs/1703.04617 | 70.607 | 79.821 |
| 52 Dec 29, 2016 | FastQAExt German Research Center for Artificial Intelligence https://arxiv.org/abs/1703.04816 | 70.849 | 78.857 |
| 52 May 14, 2017 | RaSoR (single model) Google NY, Tel-Aviv University https://arxiv.org/abs/1611.01436 | 70.849 | 78.741 |
| 52 Apr 14, 2017 | Multi-Perspective Matching (single model) IBM Research https://arxiv.org/abs/1612.04211 | 70.387 | 78.784 |
| 53 Feb 06, 2018 | SSR-BiDAF single model | 69.443 | 78.358 |
| 53 Aug 31, 2017 | SimpleBaseline (single model) Technical University of Vienna | 69.600 | 78.236 |
| 54 Apr 13, 2017 | SEDT+BiDAF (single model) CMU https://arxiv.org/abs/1703.00572 | 68.478 | 77.971 |
| 55 Jun 26, 2017 | PQMN (single model) KAIST & AIBrain & Crosscert | 68.331 | 77.783 |
| 56 Apr 12, 2017 | T-gating (single model) Peking University | 68.132 | 77.569 |
| 56 Jul 29, 2017 | SEDT (single model) CMU https://arxiv.org/abs/1703.00572 | 68.163 | 77.527 |
| 57 Nov 28, 2016 | BiDAF (single model) Allen Institute for AI & University of Washington https://arxiv.org/abs/1611.01603 | 67.974 | 77.323 |
| 57 Feb 22, 2018 | Unnamed submission by null | 68.478 | 77.220 |
| 57 Jan 23, 2018 | FABIR (Single Model) in review | 67.744 | 77.605 |
| 57 Feb 22, 2018 | Unnamed submission by null | 68.425 | 77.077 |
| 57 Dec 29, 2016 | FastQA German Research Center for Artificial Intelligence https://arxiv.org/abs/1703.04816 | 68.436 | 77.070 |
| 58 Oct 27, 2016 | Match-LSTM with Ans-Ptr (Boundary) (ensemble) Singapore Management University https://arxiv.org/abs/1608.07905 | 67.901 | 77.022 |
| 58 Sep 20, 2017 | AllenNLP BiDAF (single model) Allen Institute for AI http://allennlp.org/ | 67.618 | 77.151 |
| 59 Feb 06, 2017 | Iterative Co-attention Network Fudan University | 67.502 | 76.786 |
| 60 Nov 02, 2016 | Dynamic Coattention Networks (single model) Salesforce Research https://arxiv.org/abs/1611.01604 | 66.233 | 75.896 |
| 60 Jan 03, 2018 | newtest single model | 66.527 | 75.787 |
| 61 Feb 25, 2018 | Unnamed submission by null | 65.992 | 75.469 |
| 62 Jan 04, 2018 | baseline single model | 64.796 | 74.272 |
| 63 Dec 10, 2017 | Unnamed submission by ravioncodalab | 64.439 | 73.921 |
| 63 Oct 27, 2016 | Match-LSTM with Bi-Ans-Ptr (Boundary) Singapore Management University https://arxiv.org/abs/1608.07905 | 64.744 | 73.743 |
| 64 Feb 20, 2017 | Attentive CNN context with LSTM NLPR, CASIA | 63.306 | 73.463 |
| 65 Nov 03, 2016 | Fine-Grained Gating Carnegie Mellon University https://arxiv.org/abs/1611.01724 | 62.446 | 73.327 |
| 65 Sep 22, 2017 | OTF dict+spelling (single) University of Montreal https://arxiv.org/abs/1706.00286 | 64.083 | 73.056 |
| 66 Sep 22, 2017 | OTF spelling (single) University of Montreal https://arxiv.org/abs/1706.00286 | 62.897 | 72.016 |
| 67 Sep 22, 2017 | OTF spelling+lemma (single) University of Montreal https://arxiv.org/abs/1706.00286 | 62.604 | 71.968 |
| 68 Sep 29, 2016 | Dynamic Chunk Reader IBM https://arxiv.org/abs/1610.09996 | 62.499 | 70.956 |
| 69 Aug 27, 2016 | Match-LSTM with Ans-Ptr (Boundary) Singapore Management University https://arxiv.org/abs/1608.07905 | 60.474 | 70.695 |
| 70 Jan 06, 2018 | PivRet (single model) anonymous | 58.764 | 69.276 |
| 71 Aug 27, 2016 | Match-LSTM with Ans-Ptr (Sentence) Singapore Management University https://arxiv.org/abs/1608.07905 | 54.505 | 67.748 |