Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4826 Replicate RECAP PDF uploads to subdockets #4857

Merged
merged 6 commits into from
Jan 8, 2025

Conversation

albertisfu
Copy link
Contributor

@albertisfu albertisfu commented Dec 28, 2024

This PR adds support for replicating PDF uploads to sub-dockets, following a similar approach to attachment pages. A step was added in find_subdocket_pdf_rds before process_recap_pdf to look for subdockets where documents should be merged.

The process flow is:

  • The process aborts if the PDF upload belongs to an appellate court, since doppel cases cannot exist in appellate courts.
  • Similar to attachment pages, a RECAPDocument queryset finds unique RDs with the same pacer_doc_id in the same court. This query was moved to a helper method since it's common to both find_subdocket_pdf_rds and find_subdocket_att_page_rds.
  • Additional ProcessingQueue entries are created for each additional RECAPDocument where the PDF needs replication.
  • If the original PQ lacks a pacer_case_id (optional in PDF uploads), one is assigned during the first iteration so the PQ can succeed when processed by process_recap_pdf. Otherwise, the lookup will fail with RECAPDocument.MultipleObjectsReturned.
  • PQ creation is wrapped in transaction.atomic to roll back any objects if errors occur. This change was also applied to find_subdocket_att_page_rds.
  • Removed redundant code block in process_recap_attachment.

When working on this I noticed that within process_recap_pdf there is the fallback query:

rd = await RECAPDocument.objects.aget(pacer_doc_id=pq.pacer_doc_id)

Is it correct that this query only use pacer_doc_id ? Not sure if pacer_doc_ids are unique across all courts in PACER. If they're not, I think it would be safer to change it to:

rd = await RECAPDocument.objects.aget(pacer_doc_id=pq.pacer_doc_id, court_id=pq.court_id)?

Let me know what do you think.

@albertisfu albertisfu marked this pull request as ready for review December 30, 2024 18:00
@albertisfu albertisfu requested a review from mlissner December 30, 2024 18:00
@albertisfu albertisfu linked an issue Dec 30, 2024 that may be closed by this pull request
@mlissner
Copy link
Member

Is it correct that this query only use pacer_doc_id ? Not sure if pacer_doc_ids are unique across all courts in PACER. If they're not, I think it would be safer to change it to:

Yeah, that's definitely a bug, and your fix of adding the court to it should help a lot, thank you.

@johnhawkinson
Copy link
Contributor

Is it correct that this query only use pacer_doc_id ? Not sure if pacer_doc_ids are unique across all courts in PACER.

They are unique! The first 3 digits identify the court, see the doc1 URLs section of https://github.com/freelawproject/juriscraper/blob/main/juriscraper/pacer/notes.md

Yeah, that's definitely a bug, and your fix of adding the court to it should help a lot, thank you.

Why "definitely"?

@mlissner
Copy link
Member

I forgot about that, you're right, John!

Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks! Onward to a proper review.

Comment on lines 766 to 777
appellate_court_ids = [
court_pk
async for court_pk in (
Court.federal_courts.appellate_pacer_courts().values_list(
"pk", flat=True
)
)
]
if pq.court_id in appellate_court_ids:
# Abort the process for appellate documents. Subdockets cannot be found
# in appellate cases.
return pqs_to_process_pks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can avoid creating the full list of appellate courts. Consider the following approach:

Suggested change
appellate_court_ids = [
court_pk
async for court_pk in (
Court.federal_courts.appellate_pacer_courts().values_list(
"pk", flat=True
)
)
]
if pq.court_id in appellate_court_ids:
# Abort the process for appellate documents. Subdockets cannot be found
# in appellate cases.
return pqs_to_process_pks
appellate_court_ids = Court.federal_courts.appellate_pacer_courts()
if appellate_court_ids.filter(pk=pq.court_id).exists():
# Abort the process for appellate documents. Subdockets cannot be found
# in appellate cases.
return pqs_to_process_pks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Yeah this approach is simpler. I've applied it.

Copy link
Contributor

@ERosendo ERosendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's merge after addressing my comment.

@ERosendo ERosendo assigned albertisfu and unassigned ERosendo Jan 6, 2025
@albertisfu
Copy link
Contributor Author

@ERosendo Thanks! I've applied your suggestion regarding the exists Court query for validating the court type. I've also replicated the same approach in other similar queries in cl.recap.api_serializers to optimize them.

Let me know what do you think.

@albertisfu albertisfu assigned ERosendo and unassigned albertisfu Jan 7, 2025
Copy link
Contributor

@ERosendo ERosendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @albertisfu. LGTM :shipit:

@mlissner mlissner merged commit 437c5d7 into main Jan 8, 2025
15 checks passed
@mlissner mlissner deleted the 4826-replicate-pdf-uploads-to-subdockets branch January 8, 2025 00:44
Copy link

sentry-io bot commented Jan 8, 2025

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ WriteTimeout /api/rest/{version}/recap/ View Issue
  • ‼️ ReadTimeout /api/rest/{version}/recap/ View Issue
  • ‼️ Multiple RECAPDocuments returned when processing pdf upload /api/rest/{version}/recap/ View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Replicate RECAP PDF uploads to subdockets
4 participants