-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4826 Replicate RECAP PDF uploads to subdockets #4857
Conversation
Yeah, that's definitely a bug, and your fix of adding the court to it should help a lot, thank you. |
They are unique! The first 3 digits identify the court, see the
Why "definitely"? |
I forgot about that, you're right, John! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks! Onward to a proper review.
cl/recap/tasks.py
Outdated
appellate_court_ids = [ | ||
court_pk | ||
async for court_pk in ( | ||
Court.federal_courts.appellate_pacer_courts().values_list( | ||
"pk", flat=True | ||
) | ||
) | ||
] | ||
if pq.court_id in appellate_court_ids: | ||
# Abort the process for appellate documents. Subdockets cannot be found | ||
# in appellate cases. | ||
return pqs_to_process_pks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can avoid creating the full list of appellate courts. Consider the following approach:
appellate_court_ids = [ | |
court_pk | |
async for court_pk in ( | |
Court.federal_courts.appellate_pacer_courts().values_list( | |
"pk", flat=True | |
) | |
) | |
] | |
if pq.court_id in appellate_court_ids: | |
# Abort the process for appellate documents. Subdockets cannot be found | |
# in appellate cases. | |
return pqs_to_process_pks | |
appellate_court_ids = Court.federal_courts.appellate_pacer_courts() | |
if appellate_court_ids.filter(pk=pq.court_id).exists(): | |
# Abort the process for appellate documents. Subdockets cannot be found | |
# in appellate cases. | |
return pqs_to_process_pks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Yeah this approach is simpler. I've applied it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's merge after addressing my comment.
@ERosendo Thanks! I've applied your suggestion regarding the Let me know what do you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @albertisfu. LGTM
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
This PR adds support for replicating PDF uploads to sub-dockets, following a similar approach to attachment pages. A step was added in
find_subdocket_pdf_rds
beforeprocess_recap_pdf
to look for subdockets where documents should be merged.The process flow is:
pacer_doc_id
in the same court. This query was moved to a helper method since it's common to bothfind_subdocket_pdf_rds
andfind_subdocket_att_page_rds
.ProcessingQueue
entries are created for each additional RECAPDocument where the PDF needs replication.pacer_case_id
(optional in PDF uploads), one is assigned during the first iteration so the PQ can succeed when processed byprocess_recap_pdf
. Otherwise, the lookup will fail withRECAPDocument.MultipleObjectsReturned
.transaction.atomic
to roll back any objects if errors occur. This change was also applied tofind_subdocket_att_page_rds
.process_recap_attachment
.When working on this I noticed that within
process_recap_pdf
there is the fallback query:courtlistener/cl/recap/tasks.py
Line 264 in cb012e8
Is it correct that this query only use
pacer_doc_id
? Not sure ifpacer_doc_ids
are unique across all courts in PACER. If they're not, I think it would be safer to change it to:rd = await RECAPDocument.objects.aget(pacer_doc_id=pq.pacer_doc_id, court_id=pq.court_id)
?Let me know what do you think.