CardinalityEstimationTestbed

The source code of Paper Learned Cardinality Estimation: A Design Space Exploration and A Comparative Evaluation.

Experiment of synthetic

Generate_data_sql

cd CardinalityEstimationTestbed/Synthetic
python generate_data_sql.py --cols [COLUMNS_NUM] --distinct [DOMAIN_SIZE] --corr [CORRELATION] --skew [SKEWNESS]

Get_sql_truecard

python get_truecard.py --version cols_[COLUMNS_NUM]_distinct_[DOMAIN_SIZE]_corr_[CORRELATION]_skew_[SKEWNESS]

Get_result

python get_result.py --cols [COLUMNS_NUM] --distinct [DOMAIN_SIZE] --corr [CORRELATION] --skew [SKEWNESS] --method [METHOD]

Then experimental results of all methods can be obtained.

Parameters

[COLUMNS_NUM] should be an integer between 1 and 9.
[DOMAIN_SIZE] should be an integer between 10 and 20000.
[CORRELATION] should be an integer between 1 and 9.
[SKEWNESS] should be an integer between 1 and 9.
The parameters we use are:

cols in [2, 4, 6, 8]; distinct in [10, 100, 1000, 10000]; corr in [2, 4, 6, 8]; skew in [2, 4, 6, 8];

Experiment of overall

Real Datasets prepare

Download the IMDB data tables to: ../train-test-data/imdbdataset-str
Download the forest, power data tables to: ../train-test-data/forest_power-data-sql
XTZX. We have put the data and workload for the experiment into the repositorie. Begin by doing a simple job on the table, removing some unused columns. For Forest We use ﬁrst 10 numeric attributes; For Power We used the 7 numeric attributes after the ﬁrst two attributes (date and time).
Remember to put the tables in their respective folders that do not contain table headers, for example: forest_power-data-sql/no_head, imdbdata-num/no_head, xtzx-data-sql/no_head.

Forest & Power

Run the forest.sh, power.sh in each method folder to execute the code to get the result.

Imdb

First, the strings in the data tables should be converted to numbers.
cd CardinalityEstimationTestbed/Overall/imdb
python data_str2num.py

Varying Columns

cd Overall/[METHOD]
python run_exp.py --cols [cols]
For example:
cd Overall/deepdb
python run_exp.py --cols 4
AND cd Overall/xgboost_localnn
python run_exp.py --cols 4 --model [nn OR xgb]

Varying Domain Size

First, the data of the distinct is generated and populated.
cd CardinalityEstimationTestbed/Overall/imdb/distinct
python data_process.py
Then refer to run.sh in mscn and neurocard folder to execute the code to get the result.

Varying Join samples

For example:
cd Overall/deepdb
python run_exp.py --cols 4 --samples 3

XTZX

cd Overall/[METHOD]
python run_exp.py --xt [cols]
For example:
cd Overall/deepdb
python run_exp.py --xt 4
AND cd Overall/xgboost_localnn
python run_exp.py --xt 4 --model [nn OR xgb]

Update

Select 10% of the number of rows in each table and add them to the end of the table.
Refer to update.sh in each method folder to execute the code to get the result.

Citation

@article{sun2021learned,
  title={Learned cardinality estimation: A design space exploration and a comparative evaluation},
  author={Sun, Ji and Zhang, Jintao and Sun, Zhaoyan and Li, Guoliang and Tang, Nan},
  journal={Proceedings of the VLDB Endowment},
  volume={15},
  number={1},
  pages={85--97},
  year={2021},
  publisher={VLDB Endowment}
}
@inproceedings{zhang2023autoce,
  title={Autoce: An accurate and efficient model advisor for learned cardinality estimation},
  author={Zhang, Jintao and Zhang, Chao and Li, Guoliang and Chai, Chengliang},
  booktitle={2023 IEEE 39th International Conference on Data Engineering (ICDE)},
  pages={2621--2633},
  year={2023},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.idea		.idea
Overall		Overall
Synthetic		Synthetic
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CardinalityEstimationTestbed

Experiment of synthetic

Generate_data_sql

Get_sql_truecard

Get_result

Parameters

Experiment of overall

Real Datasets prepare

Forest & Power

Imdb

Varying Columns

Varying Domain Size

Varying Join samples

XTZX

Update

Citation

About

Releases

Packages

Contributors 2

Languages

jt-zhang/CardinalityEstimationTestbed

Folders and files

Latest commit

History

Repository files navigation

CardinalityEstimationTestbed

Experiment of synthetic

Generate_data_sql

Get_sql_truecard

Get_result

Parameters

Experiment of overall

Real Datasets prepare

Forest & Power

Imdb

Varying Columns

Varying Domain Size

Varying Join samples

XTZX

Update

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages