-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix updating ID for resource #3239
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, that was quickly done!
Would that make sense, in the future, to create the ID on udata server side?
The front-end is using the client side ID generation to send in parallel all the chunks so not sure it's possible without changing the API (for exemple having a client side UUID then a real UUID saved in the database at the end of the upload process)… https://github.com/datagouv/front-end/blob/700fa2e57f9ae347c7b59e8eb1e5dbbff659a475/utils/datasets.ts#L175-L214 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the investigation and resolution, good job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the investigation and proposed solution!
I don't think the resource id field is currently set by the frontend.
The uuid
header in the multipart isn't currently used to create the Resource object, it is only used to identify the chunks the file belong to.
Comparing the uuid with the resource id shows they differ in this chunk upload test case.
I think we should make sure the resource id field is entirely readonly instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need to deal with existing duplicated resources ID before merging & deploying?
@@ -377,8 +383,9 @@ class ResourcesAPI(API): | |||
def post(self, dataset): | |||
"""Create a new resource for a given dataset""" | |||
ResourceEditPermission(dataset).test() | |||
form = api.validate(ResourceForm) | |||
resource = Resource() | |||
form = api.validate(ResourceFormWithoutId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
@@ -104,6 +104,10 @@ class ResourceForm(BaseResourceForm): | |||
id = fields.UUIDField() | |||
|
|||
|
|||
class ResourceFormWithoutId(BaseResourceForm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused we use ResourceFormWithoutId in DatasetForm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because otherwise we don't have the ID for the reorder for exemple…
resources_ids = set() | ||
for resource in self.resources: | ||
if resource.id in resources_ids: | ||
raise MongoEngineValidationError( | ||
f"Duplicate resource ID {resource.id} in dataset #{self.id}." | ||
) | ||
resources_ids.add(resource.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do something like?
if len(set(self.resources)) != len(res.id for res in self.resources):
raise...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's slower since there is two iteration but yes it's shorter…
udata/core/dataset/models.py
Outdated
@@ -335,7 +335,7 @@ def to_mongo(self, *args, **kwargs): | |||
|
|||
|
|||
class ResourceMixin(object): | |||
id = db.AutoUUIDField(primary_key=True) | |||
id = db.AutoUUIDField(primary_key=True, unique=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the manual check if we already have unique=True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I can remove this, I didn't find a way to make it work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it. Adding it fails a lot of tests I don't know why… https://app.circleci.com/pipelines/github/opendatateam/udata/5863/workflows/ee92a1f9-786c-42dc-a1e9-dc5b5a3be992/jobs/33495?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary
@@ -638,6 +638,14 @@ def clean(self): | |||
if self.frequency in LEGACY_FREQUENCIES: | |||
self.frequency = LEGACY_FREQUENCIES[self.frequency] | |||
|
|||
resources_ids = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These manual checks only care about the current dataset though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we cannot check for all resources in different dataset (but I think resources are always scoped by the dataset ID so it shouldn't be a problem, no?)
No description provided.