-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test] Add integration test for accessing sd sttr in dc #2969
[test] Add integration test for accessing sd sttr in dc #2969
Conversation
Signed-off-by: JiaWei Jiang <[email protected]>
Signed-off-by: JiaWei Jiang <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2969 +/- ##
==========================================
- Coverage 82.79% 77.53% -5.26%
==========================================
Files 3 288 +285
Lines 186 24695 +24509
Branches 0 2729 +2729
==========================================
+ Hits 154 19148 +18994
- Misses 32 4786 +4754
- Partials 0 761 +761 ☔ View full report in Codecov by Sentry. |
1. Upload a local parquet file to minio s3 bucket 2. Access StructuredDataset attr from a dataclass 3. Open StructuredDataset from a remote path Signed-off-by: JiaWei Jiang <[email protected]>
Signed-off-by: JiaWei Jiang <[email protected]>
Signed-off-by: JiaWei Jiang <[email protected]>
Signed-off-by: JiaWei Jiang <[email protected]>
Signed-off-by: JiaWei Jiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nits, otherwise, looks good.
1. Remove redundant prints 2. Use `mock.patch.dict` to setup `os.environ` for the current test fn * Avoid contaminating other tests running in the same process Signed-off-by: JiaWei Jiang <[email protected]>
Hi @eapolinario, Thanks a lot for reviewing this. I have one more question! Would it be better to move the definition of |
Signed-off-by: JiaWei Jiang <[email protected]>
Signed-off-by: JiaWei Jiang <[email protected]>
Code Review Agent Run #4d8bc5Actionable Suggestions - 4
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
json.dump(d, f) | ||
elif file_type == "parquet": | ||
# Because `upload_file` accepts a single file only, we specify 00000 to make it a single file | ||
tmp_file_path = pathlib.Path(__file__).parent / "workflows/basic/data/df.parquet/00000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded path workflows/basic/data/df.parquet/00000
may cause issues if the file structure changes. Consider making this configurable or using a more robust path resolution approach.
Code suggestion
Check the AI-generated fix before applying
tmp_file_path = pathlib.Path(__file__).parent / "workflows/basic/data/df.parquet/00000" | |
TEST_DATA_DIR = pathlib.Path(__file__).parent / "test_data" | |
tmp_file_path = TEST_DATA_DIR / "df.parquet/00000" |
Code Review Run #4d8bc5
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
file_transfer = SimpleFileTransfer() | ||
remote_file_path = file_transfer.upload_file(file_type="parquet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling around the file upload operation since network operations can fail. The upload_file()
call could throw exceptions that should be caught and handled appropriately. A similar issue was also found in tests/flytekit/integration/remote/workflows/basic/attr_access_sd.py (line 34).
Code suggestion
Check the AI-generated fix before applying
file_transfer = SimpleFileTransfer() | |
remote_file_path = file_transfer.upload_file(file_type="parquet") | |
file_transfer = SimpleFileTransfer() | |
try: | |
remote_file_path = file_transfer.upload_file(file_type="parquet") | |
except Exception as e: | |
# Clean up any partial uploads if needed | |
raise RuntimeError(f"Failed to upload file: {str(e)}") |
Code Review Run #4d8bc5
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
# Delete the remote file to free the space | ||
url = urlparse(remote_file_path) | ||
bucket, key = url.netloc, url.path.lstrip("/") | ||
file_transfer.delete_file(bucket=bucket, key=key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling around the file deletion operation. The delete_file()
call could fail and should be handled gracefully, possibly with a warning if cleanup fails.
Code suggestion
Check the AI-generated fix before applying
# Delete the remote file to free the space | |
url = urlparse(remote_file_path) | |
bucket, key = url.netloc, url.path.lstrip("/") | |
file_transfer.delete_file(bucket=bucket, key=key) | |
# Delete the remote file to free the space | |
url = urlparse(remote_file_path) | |
bucket, key = url.netloc, url.path.lstrip("/") | |
try: | |
file_transfer.delete_file(bucket=bucket, key=key) | |
except Exception as e: | |
warnings.warn(f"Failed to delete remote file: {str(e)}") |
Code Review Run #4d8bc5
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Signed-off-by: JiaWei Jiang <[email protected]>
Signed-off-by: JiaWei Jiang <[email protected]>
Code Review Agent Run Status
|
* Store protos in local cache (#3022) * Store proto obj instead of model Literal in local cache Signed-off-by: Eduardo Apolinario <[email protected]> * Remove unused file Signed-off-by: Eduardo Apolinario <[email protected]> --------- Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> * Bump aiohttp from 3.9.5 to 3.10.11 (#3018) Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.5 to 3.10.11. - [Release notes](https://github.com/aio-libs/aiohttp/releases) - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst) - [Commits](aio-libs/aiohttp@v3.9.5...v3.10.11) --- updated-dependencies: - dependency-name: aiohttp dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix bug in FlyteDirectory.listdir on local files (#2926) * Fix issue in FlyteDirectory.listdir Fixes flyteorg/flyte#6005 Signed-off-by: Pim de Haan <[email protected]> * Added test Signed-off-by: Pim de Haan <[email protected]> * Run make lint Signed-off-by: Eduardo Apolinario <[email protected]> --------- Signed-off-by: Pim de Haan <[email protected]> Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> * Fix unit tests in airflow plugin (#3024) Signed-off-by: Kevin Su <[email protected]> * fix: Fix resource meta typos for async agent (#3023) Signed-off-by: JiaWei Jiang <[email protected]> * fix: format commands output (#3026) * Fix pydantic basemodel default input (#3013) * Fix pydantic default input Signed-off-by: Future-Outlier <[email protected]> * add pydantic integration test Signed-off-by: Future-Outlier <[email protected]> * Use duck typing by Thomas's advice Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * lint Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * [BUG] Open FlyteFile from remote path (#2991) * fix: Open FlyteFile from remote path Signed-off-by: JiaWei Jiang <[email protected]> * Add integration test Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Use ctx as param instead of recreation Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Clean test logic 1. Remove redundant prints 2. Use `mock.patch.dict` to setup `os.environ` for the current test fn * Avoid contaminating other tests running in the same process Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Setup local path and downloader in constructor Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Move SimpleFileTransfer to an utility file Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant env var setup Please refer to #3001 Signed-off-by: JiaWei Jiang <[email protected]> * test: Add another ff use case Create ff in one task pod and read it in another task pod. Signed-off-by: JiaWei Jiang <[email protected]> --------- Signed-off-by: JiaWei Jiang <[email protected]> * vllm inference plugin (#2967) * vllm inference plugin Signed-off-by: Daniel Sola <[email protected]> * fixed default value Signed-off-by: Daniel Sola <[email protected]> --------- Signed-off-by: Daniel Sola <[email protected]> * Add poetry to image spec (#3025) * Add poetry to image spec Signed-off-by: Thomas J. Fan <[email protected]> * Add stricter check Signed-off-by: Thomas J. Fan <[email protected]> --------- Signed-off-by: Thomas J. Fan <[email protected]> * [test] Add integration test for accessing sd sttr in dc (#2969) * test: Add integration test for attr access of sd Signed-off-by: JiaWei Jiang <[email protected]> * Correct file path Signed-off-by: JiaWei Jiang <[email protected]> * test: Support interaction with minio s3 bucket 1. Upload a local parquet file to minio s3 bucket 2. Access StructuredDataset attr from a dataclass 3. Open StructuredDataset from a remote path Signed-off-by: JiaWei Jiang <[email protected]> * Delete an unmerged integration test Signed-off-by: JiaWei Jiang <[email protected]> * Try imagespec with commit sha of corresponding fix Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant test Signed-off-by: JiaWei Jiang <[email protected]> * Remove default_factory and create sd dc from input uri Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Clean test logic 1. Remove redundant prints 2. Use `mock.patch.dict` to setup `os.environ` for the current test fn * Avoid contaminating other tests running in the same process Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant minio env var setup and add test comments Signed-off-by: JiaWei Jiang <[email protected]> * Support uploading tmp pqt file Signed-off-by: JiaWei Jiang <[email protected]> * Udpate deprecated module Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant and unused imports Signed-off-by: JiaWei Jiang <[email protected]> --------- Signed-off-by: JiaWei Jiang <[email protected]> --------- Signed-off-by: Eduardo Apolinario <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Pim de Haan <[email protected]> Signed-off-by: Kevin Su <[email protected]> Signed-off-by: JiaWei Jiang <[email protected]> Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: Daniel Sola <[email protected]> Signed-off-by: Thomas J. Fan <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Pim de Haan <[email protected]> Co-authored-by: Kevin Su <[email protected]> Co-authored-by: 江家瑋 <[email protected]> Co-authored-by: V <[email protected]> Co-authored-by: Han-Ru Chen (Future-Outlier) <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Daniel Sola <[email protected]>
* test: Add integration test for attr access of sd Signed-off-by: JiaWei Jiang <[email protected]> * Correct file path Signed-off-by: JiaWei Jiang <[email protected]> * test: Support interaction with minio s3 bucket 1. Upload a local parquet file to minio s3 bucket 2. Access StructuredDataset attr from a dataclass 3. Open StructuredDataset from a remote path Signed-off-by: JiaWei Jiang <[email protected]> * Delete an unmerged integration test Signed-off-by: JiaWei Jiang <[email protected]> * Try imagespec with commit sha of corresponding fix Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant test Signed-off-by: JiaWei Jiang <[email protected]> * Remove default_factory and create sd dc from input uri Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Clean test logic 1. Remove redundant prints 2. Use `mock.patch.dict` to setup `os.environ` for the current test fn * Avoid contaminating other tests running in the same process Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant minio env var setup and add test comments Signed-off-by: JiaWei Jiang <[email protected]> * Support uploading tmp pqt file Signed-off-by: JiaWei Jiang <[email protected]> * Udpate deprecated module Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant and unused imports Signed-off-by: JiaWei Jiang <[email protected]> --------- Signed-off-by: JiaWei Jiang <[email protected]> Signed-off-by: Shuying Liang <[email protected]>
* test: Add integration test for attr access of sd Signed-off-by: JiaWei Jiang <[email protected]> * Correct file path Signed-off-by: JiaWei Jiang <[email protected]> * test: Support interaction with minio s3 bucket 1. Upload a local parquet file to minio s3 bucket 2. Access StructuredDataset attr from a dataclass 3. Open StructuredDataset from a remote path Signed-off-by: JiaWei Jiang <[email protected]> * Delete an unmerged integration test Signed-off-by: JiaWei Jiang <[email protected]> * Try imagespec with commit sha of corresponding fix Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant test Signed-off-by: JiaWei Jiang <[email protected]> * Remove default_factory and create sd dc from input uri Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Clean test logic 1. Remove redundant prints 2. Use `mock.patch.dict` to setup `os.environ` for the current test fn * Avoid contaminating other tests running in the same process Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant minio env var setup and add test comments Signed-off-by: JiaWei Jiang <[email protected]> * Support uploading tmp pqt file Signed-off-by: JiaWei Jiang <[email protected]> * Udpate deprecated module Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant and unused imports Signed-off-by: JiaWei Jiang <[email protected]> --------- Signed-off-by: JiaWei Jiang <[email protected]> Signed-off-by: Shuying Liang <[email protected]>
Tracking issue
flyteorg/flyte#5956
Why are the changes needed?
So far, this patch is tested via single binary, which is limited. The newly released flyte now supports accessing
StructuredDataset
attribute in a dataclass. Hence, we should add an integration to enhance the robustness.What changes were proposed in this pull request?
Add an integration test supporting:
StructuredDataset
attribute from a dataclassStructuredDataset
from a remote path (i.e., the uploaded location)How was this patch tested?
This integration test passes in local run.
Setup process
For local run, the setup process is summarized as follows:
After installation, run the following command:
Screenshots
Check all the applicable boxes
Related PRs
#2954
Docs link
Summary by Bito
This PR implements integration tests for StructuredDataset attribute access within dataclasses in Flyte, featuring a test workflow for parquet file handling in minio s3 bucket. The implementation includes file transfer utilities and comprehensive test cases, demonstrating the complete lifecycle from upload to cleanup.Unit tests added: True
Estimated effort to review (1-5, lower is better): 2