-
-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: carefully replaced minio_url with parquet_url #1280
fix: carefully replaced minio_url with parquet_url #1280
Conversation
Thanks @varshneydevansh! I left a minor comment. Otherwise at first glance it looks good to me. However, tests should pass and there seem to be some issues. It is possible that this is not due to your changes, but due to the state of the (test) server. I am not currently up-to-date on that as I am on vacation. I requested a review from @LennartPurucker, hopefully he has time to assist you where necessary and help merge this PR :) @LennartPurucker if you don't have time, you can remove yourself from the PR. I am on vacation and I knew Matthias is very busy, so I was hoping that you had time for this. Otherwise we'll just have to wait until people are available. |
That is why the last commit in this project was made last week. Now I get it. Have your holidays. Meanwhile, I am doing contributions to other projects. =) |
Heyho @varshneydevansh, I can help you. Sorry for the delay, I was busy over the weekend. I read the issue and PR and hope I understood the task. If my feedback sound strange, feel free to correct me as I might not have the full picture. I am now starting the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest looks good to me. I can approve the PR and run the workflows once we updated the documentation. I also double checked the code and think you found all relevant occurrences of the URL.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you.
As a general comment: I am not entirely sure which version we would like this change to be in. This is technically breaking the API. So we might want to go through a deprecation cycle. @PGijsbers, when you are back, could you share your goals for this? Once we know how to handle this, we can update the progress.rst and merge it. @varshneydevansh Thank you very much for your help! |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #1280 +/- ##
===========================================
- Coverage 85.21% 82.87% -2.35%
===========================================
Files 38 38
Lines 5108 5108
===========================================
- Hits 4353 4233 -120
- Misses 755 875 +120
☔ View full report in Codecov by Sentry. |
Not sure what you mean with this. The intended way to retrieve the dataset from OpenML is through the |
Ahh okay, I thought |
I used a `next` header instead of a specific version since we don't know if it will be 0.15.0 or 0.14.2. We can change it before the next release.
I added the progress note so we could go ahead and merge this 🎉 thanks @varshneydevansh (and Lennart)! |
* carefully replaced minio with parquet * fix: corrected some mistakes * fix: restored the instances of minio * fix: updated the documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add #1280 I used a `next` header instead of a specific version since we don't know if it will be 0.15.0 or 0.14.2. We can change it before the next release. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pieter Gijsbers <[email protected]> Co-authored-by: Lennart Purucker <[email protected]>
* [pre-commit.ci] pre-commit autoupdate updates: - [github.com/psf/black: 23.3.0 → 23.7.0](psf/black@23.3.0...23.7.0) - [github.com/pycqa/flake8: 6.0.0 → 6.1.0](PyCQA/flake8@6.0.0...6.1.0) * Raise correct TypeError and improve type check * Type check with isinstance instead of type() == * Docker enhancement #1277 (#1278) * Add multiple build platforms * Automatically update Dockerhub description * Launch Python instead of Bash by default * Change `omlp` directory name to less cryptic `openml` * Change directory to `openml` for running purpose of running script For mounted scripts, instructions say to mount them to `/openml`, so we have to `cd` before invoking `python`. * Update readme to reflect updates (python by default, rename dirs) * Add branch/code for doc and test examples as they are required * Ship docker images with readme * Only update readme on release, also try build docker on PR * Update the toc descriptions * fix: carefully replaced minio_url with parquet_url (#1280) * carefully replaced minio with parquet * fix: corrected some mistakes * fix: restored the instances of minio * fix: updated the documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add #1280 I used a `next` header instead of a specific version since we don't know if it will be 0.15.0 or 0.14.2. We can change it before the next release. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pieter Gijsbers <[email protected]> Co-authored-by: Lennart Purucker <[email protected]> * Pytest/utils (#1269) * Extract mocked_perform_api_call because its independent of object * Remove _multiprocess_can_split_ as it is a nose directive and we use pytest * Convert test list all * Add markers and refactor test_list_all_for_tasks for pytest * Add cache marker * Converted remainder of tests to pytest * Documented remaining Attributes of classes and functions (#1283) Add documentation and type hints for the remaining attributes of classes and functions. --------- Co-authored-by: Lennart Purucker <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (#1281) updates: - [github.com/psf/black: 23.7.0 → 23.11.0](psf/black@23.7.0...23.11.0) - [github.com/pre-commit/mirrors-mypy: v1.4.1 → v1.7.0](pre-commit/mirrors-mypy@v1.4.1...v1.7.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (#1291) updates: - [github.com/psf/black: 23.11.0 → 23.12.1](psf/black@23.11.0...23.12.1) - [github.com/pre-commit/mirrors-mypy: v1.7.0 → v1.8.0](pre-commit/mirrors-mypy@v1.7.0...v1.8.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bump actions/checkout from 3 to 4 (#1284) Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v3...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker/login-action from 2 to 3 (#1285) Bumps [docker/login-action](https://github.com/docker/login-action) from 2 to 3. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](docker/login-action@v2...v3) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker/metadata-action from 4 to 5 (#1286) Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 4 to 5. - [Release notes](https://github.com/docker/metadata-action/releases) - [Upgrade guide](https://github.com/docker/metadata-action/blob/master/UPGRADE.md) - [Commits](docker/metadata-action@v4...v5) --- updated-dependencies: - dependency-name: docker/metadata-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker/setup-qemu-action from 2 to 3 (#1287) Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 2 to 3. - [Release notes](https://github.com/docker/setup-qemu-action/releases) - [Commits](docker/setup-qemu-action@v2...v3) --- updated-dependencies: - dependency-name: docker/setup-qemu-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker/build-push-action from 4 to 5 (#1288) Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 4 to 5. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](docker/build-push-action@v4...v5) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add more type annotations (#1261) * Add more type annotations * add to progress rst --------- Co-authored-by: Lennart Purucker <[email protected]> * Bump docker/setup-buildx-action from 2 to 3 (#1292) Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 2 to 3. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](docker/setup-buildx-action@v2...v3) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/setup-python from 4 to 5 (#1293) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v4...v5) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Rework Tagging Tests for New Server Specification (#1294) * rework tagging test adjusted for new server specification * update progress.rst * ci: Update tooling (#1298) * ci: Migrate everything to pyproject.toml * style: Apply ruff fixes * style: Apply (more) ruff fixes * style(ruff): Add some file specific ignores * ci: Fix build with setuptools * ci(ruff): Allow prints in cli.py * fix: Circular import * test: Use raises(..., match=) * fix(cli): re-add missing print statements * fix: Fix some over ruff-ized code * add make check to documentation * ci: Change to `build` for sdist * ci: Add ruff to `"test"` deps --------- Co-authored-by: Lennart Purucker <[email protected]> * ci: Disable 3.6 tests (#1302) * fix: Chipping away at ruff lints (#1303) * fix: Chipping away at ruff lints * fix: return lockfile path * Update openml/config.py * Update openml/runs/functions.py * Update openml/tasks/functions.py * Update openml/tasks/split.py * Update openml/utils.py * Update openml/utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update openml/config.py * Update openml/testing.py * Update openml/utils.py * Update openml/config.py * Update openml/utils.py * Update openml/utils.py * add concurrency to workflow calls * adjust docstring * adjust docstring * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Lennart Purucker <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lennart Purucker <[email protected]> * Tagging constraints (#1305) * update tagging constraints * openml python tests * small fix * [pre-commit.ci] pre-commit autoupdate (#1306) updates: - https://github.com/charliermarsh/ruff-pre-commit → https://github.com/astral-sh/ruff-pre-commit - [github.com/astral-sh/ruff-pre-commit: v0.1.5 → v0.1.11](astral-sh/ruff-pre-commit@v0.1.5...v0.1.11) - [github.com/python-jsonschema/check-jsonschema: 0.27.1 → 0.27.3](python-jsonschema/check-jsonschema@0.27.1...0.27.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Linting Everything - Fix All mypy and ruff Errors (#1307) * style: Fix linting split.py * typing: Fix mypy errors split.py * typing: data_feature * typing: trace * more linting fixes * typing: finish up trace * typing: config.py * typing: More fixes on config.py * typing: setup.py * finalize runs linting * typing: evaluation.py * typing: setup * ruff fixes across different files and mypy fixes for run files * typing: _api_calls * adjust setup files' linting and minor ruff changes * typing: utils * late night push * typing: utils.py * typing: tip tap tippity * typing: mypy 78, ruff ~200 * refactor output format name and minor linting stuff * other: midway merge * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typing: I'm runnign out of good messages * typing: datasets * leinting for flows and some ruff changes * no more mypy errors * ruff runs and setups * typing: Finish off mypy and ruff errors Co-authored-by: Bilgecelik <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style: File wide ignores of PLR0913 This is because the automated pre-commit.ci bot which made automatic commits and pushes would think the `noqa` on the individualy overloaded functions was not needed. After removing the `noqa`, the linter then raised the issue --------- Co-authored-by: eddiebergman <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bilgecelik <[email protected]> * ci: Remove Python 3.6/7 (#1308) * ci: remove 3.7 patch (#1309) * Make Test Work Again After Ruff and Linter Changes (#1310) * mark production tests * make production test run * fix test bug -1/N * add retry raise again after refactor * fix str dict representation * test: Fix non-writable home mocks * testing: not not a change * testing: trigger CI * typing: Update typing * ci: Update testing matrix * testing: Fixup run flow error check * ci: Manual dispatch, disable double testing * ci: Prevent further ci duplication * ci: Add concurrency checks to all * ci: Remove the max-parallel on test ci There are a lot less now and they cancel previous puhes in the same pr now so it shouldn't be a problem anymore * testing: Fix windows path generation * add pytest for server state * add assert cache state * some formatting * fix with cache fixture * finally remove th finally * doc: Fix link * update test matrix * doc: Update to just point to contributing * add linkcheck ignore for test server --------- Co-authored-by: eddiebergman <[email protected]> * Add Feature Descriptions Rebase Clean (#1316) Co-authored-by: Jan van Rijn <[email protected]> * Make Class Label Retrieval More Lenient (#1315) * mark production tests * make production test run * fix test bug -1/N * add retry raise again after refactor * fix str dict representation * test: Fix non-writable home mocks * testing: not not a change * testing: trigger CI * typing: Update typing * ci: Update testing matrix * testing: Fixup run flow error check * ci: Manual dispatch, disable double testing * ci: Prevent further ci duplication * ci: Add concurrency checks to all * ci: Remove the max-parallel on test ci There are a lot less now and they cancel previous puhes in the same pr now so it shouldn't be a problem anymore * testing: Fix windows path generation * add pytest for server state * add assert cache state * some formatting * fix with cache fixture * finally remove th finally * doc: Fix link * update test matrix * doc: Update to just point to contributing * add linkcheck ignore for test server * add special case for class labels that are dtype string * fix bug and add test * formatting --------- Co-authored-by: eddiebergman <[email protected]> * [pre-commit.ci] pre-commit autoupdate (#1318) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.1.11 → v0.1.13](astral-sh/ruff-pre-commit@v0.1.11...v0.1.13) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix: update fetching a bucket from MinIO (#1314) * Update fetching a bucket from MinIO Previously, each dataset had their own bucket: https://openml1.win.tue.nl/datasets61/dataset_61.pq But we were advised to reduce the amount of buckets and favor hosting many objects in hierarchical structure, so we now have instead some prefixes to divide up the dataset objects into separate subdirectories: https://openml1.win.tue.nl/datasets/0000/0061/dataset_61.pq This commit has bypassed pre-commit. Tests should be updated too. * ci: Trigger ci * ci: Add some files to .gitignore --------- Co-authored-by: PGijsbers <[email protected]> * Update progress.rst for minor release (#1319) * Update progress.rst for minor release * update version number * fix typo * Prepare Develop for Merge with Main (#1321) Co-authored-by: Pieter Gijsbers <[email protected]> --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Pieter Gijsbers <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Devansh Varshney (देवांश वार्ष्णेय) <[email protected]> Co-authored-by: Lennart Purucker <[email protected]> Co-authored-by: Vishal Parmar <[email protected]> Co-authored-by: Lennart Purucker <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: Lennart Purucker <[email protected]> Co-authored-by: janvanrijn <[email protected]> Co-authored-by: Lennart Purucker <[email protected]> Co-authored-by: Bilgecelik <[email protected]>
Reference Issue
minio_url
toparquet_url
#1142What does this PR implement/fix? Explain your changes.
The minio_url field in the dataset description URL is deprecated and will be fully replaced by parquet_url in the future (in the mean time, both are available).
How should this PR be tested?
This PR could be tested by checking the dataset sent from the server, which is the
xml response
as it was with theminio_url
parameter and the relevant parsing for data download is happening correctly or not.Any other comments?
It's my first PR so might make mistakes and I haven't tested this on my local machine. I am still setting-up the testing environment.