Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

14 CFR XML File Structure #1

Open
MichaelDMcCracken opened this issue Mar 4, 2018 · 2 comments
Open

14 CFR XML File Structure #1

MichaelDMcCracken opened this issue Mar 4, 2018 · 2 comments

Comments

@MichaelDMcCracken
Copy link
Collaborator

MichaelDMcCracken commented Mar 4, 2018

@ryanburnette We will keep a detailed log of the structure of the XML files here for the purpose of accessing particular regs.

CFRDOC

x = Aviation14CFR.data
x.children[1].children[5] => TITLE
x.children[1].children[5].children[7] => CHAPTER (contains all the relevant regs)

CHAPTER

Below the CHAPTER heading, the file contains a Table of Contents and the individual subchapters for that chapter. In Title 14, Chapter I is divided into Volumes 1 - 3 that correspond to each file that we have downloaded.

x.children[1].children[5].children[7].children[1]

  • Returns TOC
  • Table of Contents for the entire CHAPTER

x.children[1].children[5].children[7].children[x]

  • Returns SUBCHAP
  • Contains individual parts pertaining to that subchapter
  • x can be any odd integer above 1, up to the number of subchapters contained in that volume plus 4

Example: Volume 1 contains 3x SUBCHAP. Therefore, each SUBCHAP can be accessed with values of 3, 5, and 7 for x

SUBCHAP

Where x pertains to the desired SUBCHAP...

x.children[1].children[5].children[7].children[x].children[3]

  • Returns HD
  • Contains the SUBCHAP name

PART

x.children[1].children[5].children[7].children[x].children[y]

  • Returns PART
  • Contains SUBPART (where applicable) or SECTION
  • y must be an odd integer above 5 but no greater than the number of SUBCHAP (if applicable) or SECTION plus 6 (don't hold me to that, yet)

x.children[1].children[5].children[7].children[x].children[y].children[5]

  • Returns CONTENTS
  • Basically a list of all SECTION for that given PART

SUBPART

x.children[1].children[5].children[7].children[x].children[y]

SECTION

If no SUBPART exists, the following will open to a specific SECTION

x.children[1].children[5].children[7].children[x].children[y]

Example to find 14 CFR 3.5

x.children[1] => CFRDOC
x.children[1].children[5] => TITLE
x.children[1].children[5].children[7] => CHAPTER => Chapter I, FAA, DOT
x.children[1].children[5].children[7].children[3] =>SUBCHAP => Subchapter A - Definitions and General Requirements
x.children[1].children[5].children[7].children[3].children[7] => PART => Part 3 - General Requirements
x.children[1].children[5].children[7].children[3].children[7].children[13] => SECTION => § 3.5
x.children[1].children[5].children[7].children[3].children[7].children[13].children[7] => P
x.children[1].children[5].children[7].children[3].children[7].children[13].children[7].children[2]

Example to Find 14 CFR 3.5

x.xpath("/CFRDOC/TITLE/CHAPTER/SUBCHAP[1]/PART[2]/SECTION[2]/P[2]").text =>

"\n        Airworthy means the aircraft conforms to its type design and is ..."

It appears that the xpath above still contains children. Child[0] contains the "\n " while [1] and [2] actually contain the text we desire. My guess is that this is due to formatting requirements on the ecfr.gov website specifically for definitions. Only time will tell, but I do not think that every reg will have this. And even if it does, I don't necessarily think there's anything wrong with keeping the format to match.

We can also run .length on the xpath request to determine if a given request will return empty or not.

Notes

  1. The even numbered children always (in every case I can find, at least) contain a string that just adds a line break.
  2. It appears that the .children[0] usually returns the text within a tag
@ryanburnette
Copy link
Member

ryanburnette commented Mar 4, 2018

Try running .path on any of these paths. For example:

Aviation14CFR.data.children[1].children[5].children[7].path
=> "/CFRDOC/TITLE/CHAPTER"

So that's the same as ...

Aviation14CFR.data.xpath("/CFRDOC/TITLE/CHAPTER")

@ryanburnette
Copy link
Member

Aviation14CFR.data.children[1].children[5].children[7].children[3].children[7].children[13].children[7].path
=> "/CFRDOC/TITLE/CHAPTER/SUBCHAP[1]/PART[2]/SECTION[2]/P[2]"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants