-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix finding document entries with a bad pacer_doc_id #4885
Fix finding document entries with a bad pacer_doc_id #4885
Conversation
Thanks! One day, we'll catch all these variations, and retire to a beach, I suppose. Until then, Alberto, can you please review? Seems like we'll want a test for this as well? |
60cfec2
to
a335d6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ttys0dev I've added a couple of related tests. They're currently failing due to two issues that may require to be addressed:
-
When removing the invalid
pacer_doc_id
from the fallback lookup, theRECAPDocument
is matched but retains its previous invalidpacer_doc_id
. Should we update the incorrectpacer_doc_id
with the new one? Currently, thepacer_doc_id
is assigned here:
courtlistener/cl/recap/mergers.py
Line 965 in 3060ac0
rd.pacer_doc_id = rd.pacer_doc_id or docket_entry["pacer_doc_id"] While prioritizing the current
pacer_doc_id
is generally expected behavior, in this case we'd need to update it from thedocket_entry
if it's not blank. -
The fallback lookup without
pacer_doc_id
can raise aMultipleObjectsReturned
error if there is more than one main PACER Document in the docket entry. Should we apply the same logic to select the latest RECAPDocument with PDF and remove duplicates?
courtlistener/cl/recap/mergers.py
Line 948 in 3060ac0
except RECAPDocument.MultipleObjectsReturned:
We could create a method for this logic and use it in both try/except blocks.
17f5914
to
34bb0c6
Compare
I think so, added handling for this.
Yeah, cleaned this up a bit by making a function for removing duplicates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ttys0dev thank you! Changes look good.
Just a couple of remaining comments, please.
for more information, see https://pre-commit.ci
Co-authored-by: Alberto Islas <[email protected]>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
2755d0e
to
35c9536
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_available=False, | ||
**params, | ||
) | ||
rds_created.append(rd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this line rds_created.append(rd)
to apply only in cases where the RD is created. Previous approach was also applying it to existing RDs.
I think we need a fallback here in case we have entries with bad
pacer_doc_id
's.