Skip to content

Commit

Permalink
Performance tests for DataCatalog (#4230)
Browse files Browse the repository at this point in the history
* Update index.md (#4221)

Fixed an erroneous link to the Get started with Kedro - Create your first data pipeline with Kedro video.  It was accidentally linked to the previous video.

Signed-off-by: Greg Vaslowski <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>

* Bump kedro-sphinx-theme from 2024.4.0 to 2024.10.0 (#4216)

* Bump kedro-sphinx-theme from 2024.4.0 to 2024.10.0

Bumps [kedro-sphinx-theme](https://github.com/kedro-org/kedro-sphinx-theme) from 2024.4.0 to 2024.10.0.
- [Release notes](https://github.com/kedro-org/kedro-sphinx-theme/releases)
- [Commits](kedro-org/kedro-sphinx-theme@v2024.4.0...v2024.10.0)

---
updated-dependencies:
- dependency-name: kedro-sphinx-theme
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* updated to 2024.10.2

* trigger_run

* trigger_run

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: L. R. Couto <[email protected]>
Co-authored-by: rashidakanchwala <[email protected]>
Co-authored-by: Ankita Katiyar <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>

* Replace all instances of "data set" with "dataset" (#4211)

Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>

* Manually created sitemap.xml for improved control over indexed docs pages (#4145)

* Load manually created sitemap

Signed-off-by: Dmitry Sorokin <[email protected]>

* Add projects remove lastmod for latest

Signed-off-by: Dmitry Sorokin <[email protected]>

* Add latest for projects

Signed-off-by: Dmitry Sorokin <[email protected]>

---------

Signed-off-by: Dmitry Sorokin <[email protected]>
Co-authored-by: Dmitry Sorokin <[email protected]>
Co-authored-by: ElenaKhaustova <[email protected]>
Co-authored-by: L. R. Couto <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>

* Bump up version to 0.19.9 (#4219)

* Bump up version to 0.19.9

Signed-off-by: Laura Couto <[email protected]>

* Add placeholders to release.md

Signed-off-by: Laura Couto <[email protected]>

* Update citation.cff release date

Signed-off-by: Laura Couto <[email protected]>

---------

Signed-off-by: Laura Couto <[email protected]>
Signed-off-by: L. R. Couto <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>

* first pass doesn't work yet

Signed-off-by: Ankita Katiyar <[email protected]>

* Update ocl tests

Signed-off-by: Ankita Katiyar <[email protected]>

* revert some changes

Signed-off-by: Ankita Katiyar <[email protected]>

* Update to use larger config

Signed-off-by: Ankita Katiyar <[email protected]>

* Update functions and docstrings

Signed-off-by: Ankita Katiyar <[email protected]>

* Add performance tests for DataCatalog

Signed-off-by: Ankita Katiyar <[email protected]>

* Update mypy ignore messages (#4228)

Signed-off-by: Ankita Katiyar <[email protected]>

* Revise Kedro project structure docs (#4208)

* Update project structure docs
---------

Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Co-authored-by: Juan Luis Cano Rodríguez <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>

* Update CLI autocompletion docs with new Click syntax (#4213)

* Update CLI autocompletion docs with new Click syntax

Updated the autocompletion setup instructions for Bash, Zsh, and Fish shells to reflect the latest Click 8.1 syntax. Changed Fish shell completion script path to ~/.config/fish/completions/kedro.fish for correct placement.

Signed-off-by: hyew0nChoi <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>

* Bump import-linter from 2.0 to 2.1 (#4226)

Bumps [import-linter](https://github.com/seddonym/import-linter) from 2.0 to 2.1.
- [Changelog](https://github.com/seddonym/import-linter/blob/master/CHANGELOG.rst)
- [Commits](seddonym/import-linter@v2.0...v2.1)

---
updated-dependencies:
- dependency-name: import-linter
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Ankita Katiyar <[email protected]>

* Performance test for `OmegaConfigLoader` (#4225)

* first pass doesn't work yet

Signed-off-by: Ankita Katiyar <[email protected]>

* Update ocl tests

Signed-off-by: Ankita Katiyar <[email protected]>

* revert some changes

Signed-off-by: Ankita Katiyar <[email protected]>

* Update to use larger config

Signed-off-by: Ankita Katiyar <[email protected]>

* Update functions and docstrings

Signed-off-by: Ankita Katiyar <[email protected]>

* lint

Signed-off-by: Ankita Katiyar <[email protected]>

---------

Signed-off-by: Ankita Katiyar <[email protected]>

* Add a test for init and fix indent

Signed-off-by: Ankita Katiyar <[email protected]>

* Revert "Add a test for init and fix indent"

This reverts commit 0dbe3c7.

Signed-off-by: Ankita Katiyar <[email protected]>

* Add a test for init and fix indent

Signed-off-by: Ankita Katiyar <[email protected]>

---------

Signed-off-by: Greg Vaslowski <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Laura Couto <[email protected]>
Signed-off-by: L. R. Couto <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: hyew0nChoi <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
  • Loading branch information
ankatiyar authored Oct 18, 2024
1 parent 82c1223 commit 2e950a2
Show file tree
Hide file tree
Showing 3 changed files with 95 additions and 7 deletions.
8 changes: 7 additions & 1 deletion asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,11 @@
"environment_type": "virtualenv",
"show_commit_url": "http://github.com/kedro-org/kedro/commit/",
"results_dir": ".asv/results",
"html_dir": ".asv/html"
"html_dir": ".asv/html",
"matrix": {
"req": {
"kedro-datasets": [],
"pandas": []
}
}
}
82 changes: 82 additions & 0 deletions benchmarks/benchmark_datacatalog.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import pandas as pd
from kedro_datasets.pandas import CSVDataset

from kedro.io import DataCatalog

base_catalog = {
f"dataset_{i}": {
"type": "pandas.CSVDataset",
"filepath": f"data_{i}.csv",
} for i in range(1, 1001)
}
# Add datasets with the same filepath for loading
base_catalog.update({
f"dataset_load_{i}": {
"type": "pandas.CSVDataset",
"filepath": "data.csv",
} for i in range(1, 1001)
})
# Add a factory pattern
base_catalog.update({
"dataset_factory_{placeholder}": {
"type": "pandas.CSVDataset",
"filepath": "data_{placeholder}.csv",
}
})

class TimeDataCatalog:
def setup(self):
self.catalog = DataCatalog.from_config(base_catalog)
self.dataframe = pd.DataFrame({"column": [1, 2, 3]})
self.dataframe.to_csv("data.csv", index=False)
self.datasets = {
f"dataset_new_{i}": CSVDataset(filepath="data.csv") for i in range(1, 1001)
}
self.feed_dict = {
f"param_{i}": i for i in range(1, 1001)
}

def time_init(self):
"""Benchmark the time to initialize the catalog"""
DataCatalog.from_config(base_catalog)

def time_save(self):
"""Benchmark the time to save datasets"""
for i in range(1,1001):
self.catalog.save(f"dataset_{i}", self.dataframe)

def time_load(self):
"""Benchmark the time to load datasets"""
for i in range(1,1001):
self.catalog.load(f"dataset_load_{i}")

def time_exists(self):
"""Benchmark the time to check if datasets exist"""
for i in range(1,1001):
self.catalog.exists(f"dataset_{i}")

def time_release(self):
"""Benchmark the time to release datasets"""
for i in range(1,1001):
self.catalog.release(f"dataset_{i}")

def time_add_all(self):
"""Benchmark the time to add all datasets"""
self.catalog.add_all(self.datasets)

def time_feed_dict(self):
"""Benchmark the time to add feed dict"""
self.catalog.add_feed_dict(self.feed_dict)

def time_list(self):
"""Benchmark the time to list all datasets"""
self.catalog.list()

def time_shallow_copy(self):
"""Benchmark the time to shallow copy the catalog"""
self.catalog.shallow_copy()

def time_resolve_factory(self):
"""Benchmark the time to resolve factory"""
for i in range(1,1001):
self.catalog._get_dataset(f"dataset_factory_{i}")
12 changes: 6 additions & 6 deletions benchmarks/benchmark_ocl.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,13 @@ def _generate_globals(start_range, end_range, is_local=False):
return globals_dict

def _create_config_file(conf_source, env, file_name, data):
env_path = conf_source / env
env_path.mkdir(parents=True, exist_ok=True)
file_path = env_path / file_name
env_path = conf_source / env
env_path.mkdir(parents=True, exist_ok=True)
file_path = env_path / file_name

import yaml
with open(file_path, "w") as f:
yaml.dump(data, f)
import yaml
with open(file_path, "w") as f:
yaml.dump(data, f)

base_catalog = _generate_catalog(1, 1000, is_versioned=True)
local_catalog = _generate_catalog(501, 1500, is_local=True)
Expand Down

0 comments on commit 2e950a2

Please sign in to comment.