Update harvard ordering command #4388

quevon24 · 2024-08-30T20:45:12Z

I added a small change to update the way we get the harvard queryset, which has over ~350k objects.

The original approach could cause memory issues because we get the ~350k clusters and use prefetch_related to get their opinions, if we still wanted to use prefetch_related, we will have to use an iterator with chunk_size param to avoid getting everything.

The new approach is similar to the one bill implemented for columbia ordering, we get the cluster_ids of the clusters that we need to update their opinions, and then use the id to get the opinions and order them.

sentry-io · 2024-08-30T20:45:25Z

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: cl/corpus_importer/management/commands/update_opinions_order.py

Function	Unhandled Issue
`sort_columbia_opinions`	NameError: name 'x' is not defined cl.corpus_impo... `Event Count:` 1
`sort_columbia_opinions`	FileNotFoundError: [Errno 2] No such file or directory: '/opt/courtlistener/_columbia/washington/supreme_court_opini... ... `Event Count:` 1
`sort_harvard_opinions`	ValueError: chunk_size must be provided when using QuerySet.iterator() after prefetch_related(). cl.corpus_importer.management.command... `Event Count:` 1

_{Did you find this useful? React with a 👍 or 👎}

…date_ordering_command

cl/corpus_importer/management/commands/update_opinions_order.py

fix(ordering): use iterator() for querysets

e553915

quevon24 requested a review from flooie August 30, 2024 20:45

fix(ordering): remove print

9b4135c

quevon24 marked this pull request as draft August 30, 2024 23:37

quevon24 added 3 commits August 30, 2024 17:37

Merge branch 'main' into update_ordering_command

6680637

fix(ordering): changes in harvard function to avoid high memory usage

feb5eff

Merge remote-tracking branch 'origin/update_ordering_command' into up…

5832754

…date_ordering_command

quevon24 changed the title ~~Update ordering command to use iterator() with harvard and columbia querysets~~ Update harvard ordering command Sep 2, 2024

quevon24 marked this pull request as ready for review September 2, 2024 18:40

flooie reviewed Sep 3, 2024

View reviewed changes

cl/corpus_importer/management/commands/update_opinions_order.py Show resolved Hide resolved

flooie reviewed Sep 3, 2024

View reviewed changes

cl/corpus_importer/management/commands/update_opinions_order.py Show resolved Hide resolved

quevon24 added 2 commits September 3, 2024 15:45

feat(ordering): log message when combined opinion is found

a131197

Merge branch 'main' into update_ordering_command

e3fa3f4

flooie enabled auto-merge September 4, 2024 01:38

flooie merged commit 68318f9 into main Sep 4, 2024
13 checks passed

flooie deleted the update_ordering_command branch September 4, 2024 01:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update harvard ordering command #4388

Update harvard ordering command #4388

quevon24 commented Aug 30, 2024 •

edited

Loading

sentry-io bot commented Aug 30, 2024

Update harvard ordering command #4388

Update harvard ordering command #4388

Conversation

quevon24 commented Aug 30, 2024 • edited Loading

sentry-io bot commented Aug 30, 2024

🔍 Existing Issues For Review

quevon24 commented Aug 30, 2024 •

edited

Loading