Add report of duplicates resources ids #3247

ThibaudDauce · 2025-01-14T08:33:53Z

No description provided.

maudetes

Locally, I have 168 resources with duplicated IDs for duplicate inside datasets only

udata/commands/db.py

maudetes · 2025-01-28T16:55:07Z

udata/commands/db.py

+                for r in dataset.resources:
+                    # If it's the duplicated resource we're interested in and
+                    # that ID was already added to the new_resources (so we are
+                    # on the second resource), do not add it.
+                    if r.id == id and id in [r.id for r in new_resources]:
+                        continue
+
+                    new_resources.append(r)


can't we directly add resource1?

Suggested change

for r in dataset.resources:

# If it's the duplicated resource we're interested in and

# that ID was already added to the new_resources (so we are

# on the second resource), do not add it.

if r.id == id and id in [r.id for r in new_resources]:

continue

new_resources.append(r)

new_resources.append(resource1)

No because we also add back all the dataset original resources. But I simplified a lot the code by using r == resource2 in dd1a108

udata/commands/db.py

Co-authored-by: maudetes <[email protected]>

Add report of duplicates resources ids

bd50630

ThibaudDauce requested a review from maudetes January 14, 2025 08:33

ThibaudDauce and others added 5 commits January 14, 2025 09:40

Fix inversion and add changelog

c42a8a2

Add checksum info

23b9edb

Add --fix to command

d060134

Merge branch 'master' into add_report_of_duplicates_resources_ids

93ca49a

Fix typo during merge conflict

55bc3bc

maudetes approved these changes Jan 29, 2025

View reviewed changes

ThibaudDauce and others added 3 commits January 29, 2025 10:25

Apply suggestions from code review

aa508b1

Co-authored-by: maudetes <[email protected]>

Some arguments changes

5b83e5c

Add dry run tag if not a fix and simplify logic around new_resources

dd1a108

ThibaudDauce merged commit cf762db into master Jan 29, 2025
1 check passed

ThibaudDauce deleted the add_report_of_duplicates_resources_ids branch January 29, 2025 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add report of duplicates resources ids #3247

Add report of duplicates resources ids #3247

ThibaudDauce commented Jan 14, 2025

maudetes left a comment

maudetes Jan 28, 2025

ThibaudDauce Jan 29, 2025

Add report of duplicates resources ids #3247

Add report of duplicates resources ids #3247

Conversation

ThibaudDauce commented Jan 14, 2025

maudetes left a comment

Choose a reason for hiding this comment

maudetes Jan 28, 2025

Choose a reason for hiding this comment

ThibaudDauce Jan 29, 2025

Choose a reason for hiding this comment