Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets): add wrapped dataset to store the response from a POST/PUT request #905

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

npfp
Copy link

@npfp npfp commented Oct 23, 2024

Proof of concept to start discussions about design on how to store and use the response object.

Description

To pursue discussions on how we could store the response object from a POST or PUT api request.

Relates to #748

Development notes

  • Only poc code
  • Tests to be implemented

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes
  • Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

@property
def _nested_dataset_type(
self,
) -> Type[JSONDataset | PickleDataset | MemoryDataset]:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a better way to do this.

Comment on lines +243 to +250
if self._extension == "json":
self.wrapped_dataset.save(response.json()) #TODO(npfp): expose json loads arguments
elif self._extension == "text":
self.wrapped_dataset.save(response.text)
elif self._extension:
self.wrapped_dataset.save(response)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to check on the self.wrapped_dataset type:

Suggested change
if self._extension == "json":
self.wrapped_dataset.save(response.json()) #TODO(npfp): expose json loads arguments
elif self._extension == "text":
self.wrapped_dataset.save(response.text)
elif self._extension:
self.wrapped_dataset.save(response)
elif isinstance(self.wrapped_dataset, JSONDataset):
self.wrapped_dataset.save(response.json()) #TODO(npfp): expose json loads arguments
elif isinstance(self.wrapped_dataset, TextDataset)
self.wrapped_dataset.save(response.text)
elif self._extension:
self.wrapped_dataset.save(response)

@npfp npfp force-pushed the feature/save-response-in-api-dataset branch from 2978b3c to dc433d1 Compare October 23, 2024 12:54
@npfp npfp changed the title feat(api_dataset): add wrapped dataset to store the response from a POST/PUT request feat(dataset): add wrapped dataset to store the response from a POST/PUT request Oct 23, 2024
@npfp npfp changed the title feat(dataset): add wrapped dataset to store the response from a POST/PUT request feat(datasets): add wrapped dataset to store the response from a POST/PUT request Oct 23, 2024
…/PUT request

Proof of concept to start discussions about design on how to store and use response.

Signed-off-by: Nicolas Perrin <[email protected]>
@npfp npfp force-pushed the feature/save-response-in-api-dataset branch from dc433d1 to e9980af Compare October 23, 2024 13:14
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @npfp , sorry for the very long delay here. Thanks a lot for sending this pull request.

Is your idea to make APIDataset wrap another dataset? (Similar to what you mentioned in #748 (comment))

I'm assuming that this is a breaking change on the API dataset right?

@npfp
Copy link
Author

npfp commented Jan 14, 2025

No worries. Thx for getting back to me!

Yes, I chose the wrapped dataset mainly because it would avoid maintenance burden by relying on the implementation of the wrapped dataset rather than duplicating the logic there.

Of course, it's a subjective opinion, so feel free to challenge this, if it doesn't follow kedro's guidelines.

@npfp
Copy link
Author

npfp commented Jan 14, 2025

I don't think it's a breaking change of the API dataset: the wrapped dataset args are optionals.

If not provided, the response is not saved as it's already the case.

@merelcht
Copy link
Member

It's been a while since I looked at the issue this PR is for, but if I understand correctly the problem with APIDataset is that it only returns the response, but doesn't actually save it? And this PR aims to add a wrapper so a user can indicate what format to save the response in?

On the original issue I said I didn't like the idea of importing other datasets, which I still agree with, but at the same time it does make sense now I see it in code 😜 I'll have to think about this a bit more to review it properly.

@npfp
Copy link
Author

npfp commented Jan 17, 2025

It's been a while since I looked at the issue this PR is for, but if I understand correctly the problem with APIDataset is that it only returns the response, but doesn't actually save it? And this PR aims to add a wrapper so a user can indicate what format to save the response in?

On the original issue I said I didn't like the idea of importing other datasets, which I still agree with, but at the same time it does make sense now I see it in code 😜 I'll have to think about this a bit more to review it properly.

Yes that's right:

  • some APIs would return a response that might be important to store somewhere,
  • yes exactly.

Good to know, thx! (I'm a bit biased because we almost always use wrapped dataset for our custom datasets: we're lazy and like relying on a maintained classe :-) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants