Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix updating ID for resource #3239

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

ThibaudDauce
Copy link
Contributor

No description provided.

@bolinocroustibat bolinocroustibat self-requested a review January 2, 2025 17:29
Copy link
Contributor

@bolinocroustibat bolinocroustibat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, that was quickly done!
Would that make sense, in the future, to create the ID on udata server side?

@ThibaudDauce
Copy link
Contributor Author

LGTM, that was quickly done! Would that make sense, in the future, to create the ID on udata server side?

The front-end is using the client side ID generation to send in parallel all the chunks so not sure it's possible without changing the API (for exemple having a client side UUID then a real UUID saved in the database at the end of the upload process)… https://github.com/datagouv/front-end/blob/700fa2e57f9ae347c7b59e8eb1e5dbbff659a475/utils/datasets.ts#L175-L214

Copy link
Contributor

@magopian magopian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the investigation and resolution, good job!

udata/core/dataset/api.py Outdated Show resolved Hide resolved
udata/tests/api/test_datasets_api.py Show resolved Hide resolved
Copy link
Contributor

@maudetes maudetes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the investigation and proposed solution!

I don't think the resource id field is currently set by the frontend.
The uuid header in the multipart isn't currently used to create the Resource object, it is only used to identify the chunks the file belong to.
Comparing the uuid with the resource id shows they differ in this chunk upload test case.

I think we should make sure the resource id field is entirely readonly instead.

Copy link
Contributor

@maudetes maudetes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to deal with existing duplicated resources ID before merging & deploying?

@@ -377,8 +383,9 @@ class ResourcesAPI(API):
def post(self, dataset):
"""Create a new resource for a given dataset"""
ResourceEditPermission(dataset).test()
form = api.validate(ResourceForm)
resource = Resource()
form = api.validate(ResourceFormWithoutId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@@ -104,6 +104,10 @@ class ResourceForm(BaseResourceForm):
id = fields.UUIDField()


class ResourceFormWithoutId(BaseResourceForm):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused we use ResourceFormWithoutId in DatasetForm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because otherwise we don't have the ID for the reorder for exemple…

Comment on lines +641 to +647
resources_ids = set()
for resource in self.resources:
if resource.id in resources_ids:
raise MongoEngineValidationError(
f"Duplicate resource ID {resource.id} in dataset #{self.id}."
)
resources_ids.add(resource.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do something like?

if len(set(self.resources)) != len(res.id for res in self.resources):
    raise...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's slower since there is two iteration but yes it's shorter…

@@ -335,7 +335,7 @@ def to_mongo(self, *args, **kwargs):


class ResourceMixin(object):
id = db.AutoUUIDField(primary_key=True)
id = db.AutoUUIDField(primary_key=True, unique=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the manual check if we already have unique=True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can remove this, I didn't find a way to make it work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -638,6 +638,14 @@ def clean(self):
if self.frequency in LEGACY_FREQUENCIES:
self.frequency = LEGACY_FREQUENCIES[self.frequency]

resources_ids = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These manual checks only care about the current dataset though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we cannot check for all resources in different dataset (but I think resources are always scoped by the dataset ID so it shouldn't be a problem, no?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants