-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return StructuredDataset which is a field in a dataclass #3071
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Nelson Chen <[email protected]>
Signed-off-by: Nelson Chen <[email protected]>
Signed-off-by: Nelson Chen <[email protected]>
Code Review Agent Run #63793cActionable Suggestions - 2
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
if isinstance(python_val._literal_sd, StructuredDataset): | ||
sdt = StructuredDatasetType(format=python_val._literal_sd.file_format) | ||
metad = literals.StructuredDatasetMetadata(structured_dataset_type=sdt) | ||
sd_literal = literals.StructuredDataset(uri=python_val._literal_sd.uri, metadata=metad) | ||
return Literal(scalar=Scalar(structured_dataset=sd_literal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing private member '_literal_sd'. Consider using a public interface or property to access this data.
Code suggestion
Check the AI-generated fix before applying
- if literal_type.structured_dataset_type is not None and self._literal_sd is not None:
- return self._literal_sd
- if literal_type.structured_dataset_type is not None and self._literal_sd is None:
+ if literal_type.structured_dataset_type is not None and self.literal_sd is not None:
+ return self.literal_sd
+ if literal_type.structured_dataset_type is not None and self.literal_sd is None:
Code Review Run #63793c
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
if isinstance(python_val._literal_sd, StructuredDataset): | ||
sdt = StructuredDatasetType(format=python_val._literal_sd.file_format) | ||
metad = literals.StructuredDatasetMetadata(structured_dataset_type=sdt) | ||
sd_literal = literals.StructuredDataset(uri=python_val._literal_sd.uri, metadata=metad) | ||
return Literal(scalar=Scalar(structured_dataset=sd_literal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code block for handling StructuredDataset
passed through dataclass could be simplified by extracting the literal creation logic into a helper method. This would improve code readability and maintainability.
Code suggestion
Check the AI-generated fix before applying
if isinstance(python_val._literal_sd, StructuredDataset): | |
sdt = StructuredDatasetType(format=python_val._literal_sd.file_format) | |
metad = literals.StructuredDatasetMetadata(structured_dataset_type=sdt) | |
sd_literal = literals.StructuredDataset(uri=python_val._literal_sd.uri, metadata=metad) | |
return Literal(scalar=Scalar(structured_dataset=sd_literal)) | |
if isinstance(python_val._literal_sd, StructuredDataset): | |
return self._create_structured_dataset_literal(python_val._literal_sd.uri, python_val._literal_sd.file_format) | |
def _create_structured_dataset_literal(self, uri: str, file_format: str) -> Literal: | |
sdt = StructuredDatasetType(format=file_format) | |
metad = literals.StructuredDatasetMetadata(structured_dataset_type=sdt) | |
sd_literal = literals.StructuredDataset(uri=uri, metadata=metad) | |
return Literal(scalar=Scalar(structured_dataset=sd_literal)) |
Code Review Run #63793c
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks correct, can you provide
- screenshot
- add an example to integration test to test it properlly?
test_remote.py
Signed-off-by: Nelson Chen <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3071 +/- ##
==========================================
- Coverage 80.13% 79.64% -0.50%
==========================================
Files 272 202 -70
Lines 24614 21479 -3135
Branches 2768 2769 +1
==========================================
- Hits 19725 17106 -2619
+ Misses 4082 3604 -478
+ Partials 807 769 -38 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Nelson Chen <[email protected]>
Signed-off-by: Nelson Chen <[email protected]>
Code Review Agent Run #d93af6Actionable Suggestions - 1
Additional Suggestions - 10
Review Details
|
@task | ||
def read_sd(dc: DC) -> StructuredDataset: | ||
"""Read input StructuredDataset.""" | ||
print("sd:", dc.sd.open(pd.DataFrame).all()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling around open()
and all()
calls to handle potential exceptions when accessing the structured dataset.
Code suggestion
Check the AI-generated fix before applying
print("sd:", dc.sd.open(pd.DataFrame).all()) | |
try: | |
df = dc.sd.open(pd.DataFrame).all() | |
print("sd:", df) | |
except Exception as e: | |
print(f"Error accessing structured dataset: {e}") | |
raise |
Code Review Run #d93af6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Tracking issue
Related to #6117
Why are the changes needed?
If we wrap the StructuredDataset in a dataclass, it will fail during the to_flyte_idl conversion.
What changes were proposed in this pull request?
Before returning
Literals
, we check the type ofpython_val._literal_sd
. If it is a Python nativeStructuredDataset
, we transform it into aLiterals.StructuredDataset
.How was this patch tested?
As described in #6117, an error occurs when the
extract
task is executed.Setup process
Screenshots
Check all the applicable boxes
Summary by Bito
This PR fixes StructuredDataset handling within dataclasses during to_flyte_idl conversion by properly transforming python_val._literal_sd instances into Literals.StructuredDataset. It also introduces a new Kubernetes StatefulSet Data Service plugin, enhances image specification with custom Python executable support, and implements configurable chunk sizes for S3/GCS operations. The changes include improved resource management and enhanced Ray plugin configuration capabilities.Unit tests added: True
Estimated effort to review (1-5, lower is better): 5