layout	permalink	redirect_from
spider	spider	/seq2sql/spider

Spider 1.0

Yale Semantic Parsing and Text-to-SQL Challenge

What is Spider?

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables covering 138 different domains. In Spider 1.0, different complex SQL queries and databases appear in train and test sets. To do well on it, systems must generalize well to not only new SQL queries but also new database schemas.

Why we call it "Spider"? It is because our dataset is complex and cross-domain like a spider crawling across mutiple complex(with many foreign keys) nests(databases). Spider Paper (EMNLP'18) Spider Post

Related challenges: multi-turn SParC and conversational CoSQL text-to-SQL tasks. SParC Challenge (ACL'19) CoSQL Challenge (EMNLP'19)

News

03/11/2021 Please check out a nice work from Google Research (including new Spider splits) for studying compositional generalization in semantic parsing!
11/15/2020 We will use Test Suite Accuracy as our official evaluation metric for Spider, SParC, and CoSQL. Please find the evaluation code from here. Also, Notice that Test results after May 02, 2020 are reported on the new release (collected some annotation errors).
08/03/2020 Corrected "column_name" and "column_name_original" mismatches in 2 dbs ("scholar" and "formula_1") in tables.json, and reparsed SQL queries (this only affects some models (e.g. RATSQL) which use our parsed SQL as the SQL input). Please download the Spider dataset from this page again.
06/07/2020 We corrected some annotation errors and label mismatches (not errors) in Spider dev and test sets (~4% of dev examples updated, click here for more details). Please download the Spider dataset from this page again.
01/16/2020 For value prediction (in order to compute the execution accuracy), your model should be able to 1) copy from the question inputs, 2) retrieve from the database content (database content is available), or 3) generate numbers (e.g. 3 in "LIMIT 3").
9/24/2019 (Min et al., EMNLP 2019) translated Spider to Chinese! Check out the Chinese challenge page.
5/17/2019 Our paper SParC: Cross-Domain Semantic Parsing in Context with Salesforce Research was accepted to ACL 2019! It introduces the context-dependent version of the Spider challenge: SParC!
5/17/2019 Please report any annotation errors here, we really appreciate your help and will update the data release in this summer!
1/14/2019 The submission tutorial is out!.
12/17/2018 We updated 7 sqlite database files (issue 14). Please download the Spider dataset from this page again.
10/25/2018 The evaluation script and results were updated (issue 5). Please download the lastest versions of the script and papers. Also, please follow instructions in issue 3 to generate the latest SQL parsing results (fixed a bug).

Why Spider?

As the above spider chart shows, Spider 1.0 is distinct from most of the previous semantic parsing tasks because:

ATIS, Geo, Academic: Each of them contains only a single database with a limited number of SQL queries, and has exact same SQL queries in train and test splits.
WikiSQL: The numbers of SQL queries and tables are significantly large. But all SQL queries are simple, and each database is only a simple table without any foreign key.

Spider 1.0 spans the largest area in the chart, making it the first complex and cross-domain semantic parsing and text-to-SQL dataset! Read more on the blog post.

Getting Started

The data is split into training, development, and unreleased test sets. Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

Spider Dataset Details of baseline models and evaluation script can be found on the following GitHub site: Spider GitHub Page

Once you have built a model that works to your expectations on the dev set, you can submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we request you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:

Submission Tutorial

Data Examples

Some examples look like the following:

Have Questions or Want to Contribute ?

Ask us questions at our Github issues page or contact Tao Yu, Rui Zhang, or Michihiro Yasunaga.

We expect the dataset to evolve. We would greatly appreciate it if you could donate us your non-private databases or SQL queries for the project.

Acknowledgement

We thank Graham Neubig, Tianze Shi, Catherine Finegan-Dollak, and the anonymous reviewers for their precious comments on this project. Also, we thank Pranav Rajpurkar for giving us the permission to build this website based on SQuAD.

Our team at the summit of the East Rock park in New Haven (The pose is "NLseq2SQL"):

Tweet <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>

Leaderboard - Execution with Values

Our current models do not predict any value in SQL conditions so that we do not provide execution accuracies. However, we encourage you to provide it in the future submissions. For value prediction, your model should be able to 1) copy from the question inputs, 2) retrieve from the database content (database content is available), or 3) generate numbers (e.g. 3 in "LIMIT 3"). Notice: Test results after May 02, 2020 are reported on the new release (collected some annotation errors).

Rank	Model	Test
1 Jul 14, 2021	T5-3B+PICARD (DB content used) Element AI, a ServiceNow company (Scholak et al., EMNLP'21) code	75.1
2 May 4, 2021	RATSQL+GAP+NatSQL (DB content used) Queen Mary University of London (Gan et al., EMNLP Findings'21) code	73.3
3 Mar 10, 2021	SmBoP + GraPPa (DB content used) Tel-Aviv University & Allen Institute for AI (Rubin and Berant, NAACL'21) code	71.1
4 Aug 05, 2021	RaSaP + ELECTRA (DB content used) Ant Group, ZhiXiaoBao & Ada (Huang et al.,'21)	70.0
5 Nov 24, 2020	BRIDGE v2 + BERT(ensemble) (DB content used) Salesforce Research (Lin et al., EMNLP-Findings '20) code	68.3
6 Jan 16, 2021	COMBINE (DB content used) Novelis.io Research (Youssef et al.,'21)	68.2
7 Nov 24, 2020	BRIDGE v2 + BERT (DB content used) Salesforce Research (Lin et al., EMNLP-Findings '20) code	64.3
8 May 30, 2020	AuxNet + BART (DB content used) Anonymous	62.6
9 May 30, 2020	BRIDGE + BERT (DB content used) Salesforce Research (Lin et al., EMNLP-Findings '20) code	59.9
10 May 20, 2020	GAZP + BERT (DB content used) University of Washington & Facebook AI Research (Zhong et al., EMNLP '20)	53.5

Leaderboard - Exact Set Match without Values

For exact matching evaluation, instead of simply conducting string comparison between the predicted and gold SQL queries, we decompose each SQL into several clauses, and conduct set comparison in each SQL clause. Please refer to the paper and the Github page for more details. Notice: Test results after May 02, 2020 are reported on the new release (collected some annotation errors).

Rank	Model	Dev	Test
1 Sep 1, 2021	S²SQL + ELECTRA (DB content used) Anonymous	76.4	72.1
1 Jun 1, 2021	LGESQL + ELECTRA (DB content used) SJTU X-LANCE Lab & AISpeech (Cao et al., ACL'21) code	75.1	72.0
1 Jul 14, 2021	T5-3B+PICARD (DB content used) Element AI, a ServiceNow company (Scholak et al., EMNLP'21) code	75.5	71.9
4 Nov 19, 2020	DT-Fixup SQL-SP + RoBERTa (DB content used) Borealis AI (Xu et al., ACL'21) code	75.0	70.9
5 Nov 19, 2020	RAT-SQL + GraPPa + Adv (DB content used) Anonymous	75.5	70.5
6 Nov 19, 2020	SADGA + GAP (DB content used) DMIR Lab (Cai and Yuan et al., NeurIPS'21) code	73.1	70.1
7 Dec 25, 2020	RATSQL + GraPPa + GP (DB content used) OCFT Gamma Big Data Lab (Zhao et al.,'21)	72.8	69.8
8 Sep 08, 2020	RATSQL + GAP (DB content used) University of Waterloo & AWS AI Labs (Shi et al., AAAI'21) code	71.8	69.7
9 Aug 18, 2020	RATSQL + GraPPa (DB content used) Yale & Salesforce Research (Yu et al., ICLR'21) code	73.4	69.6
10 Mar 10, 2021	SmBoP + GraPPa (DB content used) Tel-Aviv University & Allen Institute for AI (Rubin and Berant, NAACL'21) code	74.7	69.5
11 Aug 05, 2021	RaSaP + ELECTRA (DB content used) Ant Group, ZhiXiaoBao & Ada (Huang et al.,'21)	74.7	69.0
12 May 4, 2021	RATSQL+GAP+NatSQL (DB content used) Queen Mary University of London (Gan et al., EMNLP Findings'21) code	-	68.7
13 Nov 20, 2020	RAT-SQL + STRUG (DB content used) Microsoft Research & OSU (Deng et al., NAACL '21)	72.6	68.4
14 Jun 1, 2021	LGESQL + BERT (DB content used) SJTU X-LANCE Lab & AISpeech (Cao et al., ACL'21) code	74.1	68.3
15 Jan 16, 2021	COMBINE (DB content used) Novelis.io Research (Youssef et al.,'21)	71.4	67.7
16 Nov 24, 2020	BRIDGE v2 + BERT(ensemble) (DB content used) Salesforce Research (Lin et al., EMNLP-Findings '20) code	71.1	67.5
17 Sep. 8, 2020	ShadowGNN + RoBERTa (DB content used) SJTU X-LANCE Lab & AISpeech (Chen et al., NAACL'21)	72.3	66.1
18 May 02, 2020	RATSQL v3 + BERT (DB content used) Microsoft Research (Wang and Shin et al., ACL '20) code	69.7	65.6
19 Dec. 07, 2020	DuoRAT + BERT (DB content used) Anonymous	-	65.4
20 Sep. 8, 2020	YCSQL + BERT (DB content used) Anonymous	-	65.3
21 Jan. 29, 2021	ETA + BERT (DB content used) Microsoft Research Asia (Liu et al., ACL-Findings '21)	70.8	65.3
22 Nov 24, 2020	BRIDGE v2 + BERT (DB content used) Salesforce Research (Lin et al., EMNLP-Findings '20) code	70.0	65.0
23 Sep. 8, 2020	GP-RATSQL + BERT (DB content used) Anonymous	-	64.5
24 Nov. 25, 2020	RATSQL-HPFT + BERT (DB content used) Anonymous	-	64.4
25 Feb 2, 2021	LGESQL + GLOVE (DB content used) SJTU X-LANCE Lab & AISpeech (Cao et al., ACL'21) code	67.6	62.8
26 May 31, 2020	AuxNet + BART (DB content used) Anonymous	70.0	61.9
27 Dec 13, 2019	RATSQL v2 + BERT (DB content used) Microsoft Research (Wang and Shin et al., ACL '20) code	65.8	61.9
28 May 31, 2020	AuxNet + BART Anonymous	68.0	61.3
29 Feb 18, 2020	RYANSQL v2 + BERT Kakao Enterprise (Choi et al., '20)	70.6	60.6
30 Oct 19, 2020	SmBoP + BART Tel-Aviv University & Allen Institute for AI (Rubin and Berant '20)	66.0	60.5
31 Dec 18, 2019	IRNet++ + XLNet (DB content used) Anonymous	65.5	60.1
32 May 30, 2020	BRIDGE + BERT (DB content used) Salesforce Research (Lin et al., EMNLP-Findings '20) code	65.5	59.2
33 Nov 12, 2019	RYANSQL + BERT Kakao Enterprise (Choi et al., '20)	66.6	58.2
34 Dec 13, 2019	RATSQL v2 (DB content used) Microsoft Research (Wang and Shin et al., ACL '20) code	62.7	57.2
35 Dec 13, 2019	SLSQL + BERT + Data Annotation National University of Singapore (Lei and Wang et al., EMNLP '20) code	60.8	55.7
36 Dec 13, 2019	EditSQL+LSL + BERT Anonymous	57.9	55.2
37 June 24, 2019	IRNet v2 + BERT Microsoft Research Asia	63.9	55.0
38 Sep 20, 2019	GIRN + BERT Anonymous	60.2	54.8
39 May 19, 2019	IRNet + BERT Microsoft Research Asia (Guo and Zhan et al., ACL '19) code	61.9	54.7
40 Nov 4, 2019	GNN + Bertrand-DR Got It R&D (Kelkar et al., '20) code	57.9	54.6
41 Apr 8, 2020	CNSQL Anonymous	58.0	54.0
42 Sep 19, 2019	RATSQL Anonymous	60.6	53.7
43 Sep 1, 2019	EditSQL + BERT Yale University & Salesforce Research (Zhang et al., EMNLP '19) code	57.6	53.4
44 May 21, 2020	GAZP + BERT University of Washington & Facebook AI Research (Zhong et al., EMNLP '20)	-	53.3
45 May 21, 2020	NatSQL v3 Anonymous	-	53.2
46 May 28, 2020	IRNET+ GNN Anonymous	-	49.6
47 June 24, 2019	IRNet v2 Microsoft Research Asia	55.4	48.5
48 Aug 30, 2019	Global-GNN (DB content used) Tel-Aviv University & Allen Institute for AI (Bogin et al., EMNLP '19) code	52.7	47.4
49 Dec 13, 2019	LSL Anonymous	56.8	47.0
50 Apr 5, 2020	GraphSQL Anonymous	52.8	46.9
51 May 19, 2019	IRNet Microsoft Research Asia (Guo and Zhan et al., ACL '19) code	53.2	46.7
52 Mar 17, 2020	SG-IRNet Anonymous	-	46.6
53 Dec 13, 2019	NatSQL v2 Anonymous	52.0	46.4
54 June 11, 2019	HSRNet Anonymous	51.5	45.6
55 June 12, 2019	CFGN Anonymous	48.7	44.1
56 Aug 31, 2019	NatSQL Anonymous	52.9	42.5
57 May 16, 2019	GNN Tel-Aviv University & Allen Institute for AI (Bogin et al., ACL '19) code	40.7	39.4
58 Feb 25, 2019	SASeq Anonymous	40.8	37.4
59 May 30, 2019	GrammarSQL Allen Institute for AI (Lin et al., '19)	34.8	33.8
60 Sep 1, 2019	EditSQL Yale University & Salesforce Research (Zhang et al., EMNLP '19) code	36.4	32.9
61 Dec 13, 2019	GuideSQL Anonymous	36.8	31.5
62 Sep 20, 2018	SyntaxSQLNet + augment Yale University (Yu et al., EMNLP '18) code	24.8	27.2
63 April 18, 2019	RCSQL SAP Labs Korea (Lee, EMNLP'19)	28.5	24.3
64 Sep 20, 2018	SyntaxSQLNet Yale University (Yu et al., EMNLP '18) code	18.9	19.7
65 Sep 20, 2018	SQLNet Shanghai Jiao Tong University (modified by Yale) (Xu et al., '18) code	10.9	12.4
66 Sep 20, 2018	TypeSQL Yale University (Yu et al., NAACL '18) code	8.0	8.2
67 Sep 20, 2018	Seq2Seq + attention University of Edinburgh (modified by Yale) (Dong and Lapata, ACL '16) code	1.8	4.8

Other papers used Spider (evaluated on the dev but not test set):

(Min et al., EMNLP 2019), Westlake University, Spider in Chinese
(Yao et al., EMNLP 2019), OSU & Facebook AI Research
(Shaw et al., ACL 2019), Google
(Shin et al., NeurlPS 2019), UC Berkeley & MSR
(Weir et al., SIGMOD 2019), Brown University & TU Darmstadt
(Baik et al., ICDE 2019), U of Michigan & IBM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spider.md

spider.md

Spider 1.0

Yale Semantic Parsing and Text-to-SQL Challenge

What is Spider?

News

Why Spider?

Getting Started

Data Examples

Have Questions or Want to Contribute ?

Acknowledgement

Leaderboard - Execution with Values

Leaderboard - Exact Set Match without Values

Files

spider.md

Latest commit

History

spider.md

File metadata and controls

Spider 1.0

Yale Semantic Parsing and Text-to-SQL Challenge

What is Spider?

News

Why Spider?

Getting Started

Data Examples

Have Questions or Want to Contribute ?

Acknowledgement

Leaderboard - Execution with Values

Leaderboard - Exact Set Match without Values