Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix finding document entries with a bad pacer_doc_id #4885

Merged
merged 11 commits into from
Jan 7, 2025

Conversation

ttys0dev
Copy link
Contributor

@ttys0dev ttys0dev commented Jan 3, 2025

I think we need a fallback here in case we have entries with bad pacer_doc_id's.

@mlissner
Copy link
Member

mlissner commented Jan 3, 2025

Thanks! One day, we'll catch all these variations, and retire to a beach, I suppose. Until then, Alberto, can you please review? Seems like we'll want a test for this as well?

@ttys0dev ttys0dev force-pushed the update-bad-pacer-doc-id branch from 60cfec2 to a335d6b Compare January 3, 2025 17:39
@albertisfu albertisfu self-requested a review January 4, 2025 01:02
Copy link
Contributor

@albertisfu albertisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ttys0dev I've added a couple of related tests. They're currently failing due to two issues that may require to be addressed:

  1. When removing the invalid pacer_doc_id from the fallback lookup, the RECAPDocument is matched but retains its previous invalid pacer_doc_id. Should we update the incorrect pacer_doc_id with the new one? Currently, the pacer_doc_id is assigned here:

    rd.pacer_doc_id = rd.pacer_doc_id or docket_entry["pacer_doc_id"]

    While prioritizing the current pacer_doc_id is generally expected behavior, in this case we'd need to update it from the docket_entry if it's not blank.

  2. The fallback lookup without pacer_doc_id can raise a MultipleObjectsReturned error if there is more than one main PACER Document in the docket entry. Should we apply the same logic to select the latest RECAPDocument with PDF and remove duplicates?

    except RECAPDocument.MultipleObjectsReturned:

    We could create a method for this logic and use it in both try/except blocks.

@ttys0dev ttys0dev force-pushed the update-bad-pacer-doc-id branch from 17f5914 to 34bb0c6 Compare January 4, 2025 09:20
@ttys0dev
Copy link
Contributor Author

ttys0dev commented Jan 4, 2025

  1. Should we update the incorrect pacer_doc_id with the new one?

I think so, added handling for this.

We could create a method for this logic and use it in both try/except blocks.

Yeah, cleaned this up a bit by making a function for removing duplicates.

@albertisfu albertisfu self-requested a review January 6, 2025 17:24
Copy link
Contributor

@albertisfu albertisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttys0dev thank you! Changes look good.

Just a couple of remaining comments, please.

cl/recap/mergers.py Outdated Show resolved Hide resolved
cl/recap/mergers.py Outdated Show resolved Hide resolved
@ttys0dev ttys0dev force-pushed the update-bad-pacer-doc-id branch from 2755d0e to 35c9536 Compare January 6, 2025 19:37
@albertisfu albertisfu self-requested a review January 6, 2025 20:51
Copy link
Contributor

@albertisfu albertisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ttys0dev for the changes. I've made a small tweak to a docstring and moved the line rds_created.append(rd) as explained in a comment.

@mlissner this looks good to be merged.

is_available=False,
**params,
)
rds_created.append(rd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this line rds_created.append(rd) to apply only in cases where the RD is created. Previous approach was also applying it to existing RDs.

@mlissner mlissner merged commit 738f6f9 into freelawproject:main Jan 7, 2025
10 checks passed
@ttys0dev ttys0dev deleted the update-bad-pacer-doc-id branch January 7, 2025 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants