Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

37 CFR 1 - FR correction notice interpreted as modifying non-existent appendix #380

Open
gregoryfoster opened this issue May 24, 2017 · 1 comment

Comments

@gregoryfoster
Copy link
Contributor

Dev environment: current master [ b2a4c07 ] + PR #378

To reproduce the warning:

eregs clear
eregs preprocess_notice E7-19326
eregs write_to output

This results in the output:

... regparser.notice.amendments.appendix     Could not find Appendix E7 to part 1
... regparser.notice.xml                     Unable to fetch amendments for docket E7-19326

This warning occurs when processing 72 FR 55055 amendment 1 at regparser/notice/amendments/appendix.py:31. From what I can tell, the notice is interpreted as amending a non-existent E7 appendix in 37 CFR 1. The parser appears to be deriving the appendix identifier from the FR document ID (E7-19326). Can you confirm and recommend an approach here?

@cmc333333
Copy link
Member

Hey @gregoryfoster, looks like this is being triggered by the first AMDPAR in that notice (which isn't rendering properly on the federalregister's site). If you look at the XML, you'll see

<AMDPAR>
In rule FR Doc. E7-16574, August 22, 2007 (72 FR 46899), make the following corrections:
</AMDPAR>

We can pop that into the amdparser to see how it's read:

In [1]: from lxml import etree

In [2]: from regparser.notice.amdparser import parse_amdpar

In [3]: parse_amdpar(etree.fromstring('<AMDPAR>In rule FR Doc. E7-16574, August 22, 2007 (72 FR 46899), make the following corrections:</AMDPAR>'), [])
Out[3]:
(<Element EREGS_INSTRUCTIONS at 0x7fc67d62af88>,
 [None, 'Appendix:E7', '16574'])

You'll notice that the second value (the resulting "context") points to section 16574 of appendix E7 (clearly not correct). If we dig into the amdparser (specifically regparser.grammar.amdpar:appendix_section -> regparser.grammar.unified:appendix_with_section -> regparser.grammar.atomic.appendix_digit) we can see why. E7 could be an appendix, and 16574 could be a section within an appendix (according to the current rules).

I think the section number is the bit that makes the most sense to twiddle here; let's modify appendix_digit to only accept 1-4 character sections. While I've seen 2-digit appendix sections and can imagine 4-digit ones, 5 seems excessive. We can probably get away with a Regex parser that respects word-boundaries but only allows 1-4 characters. Let us know if that's not enough to get you started!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants