Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Combine source_xml and tagged_text #287

Open
cmc333333 opened this issue Jul 29, 2016 · 0 comments
Open

Combine source_xml and tagged_text #287

cmc333333 opened this issue Jul 29, 2016 · 0 comments

Comments

@cmc333333
Copy link
Member

When creating XML nodes, we generally set two, xml-containing fields: source_xml is an lxml node and tagged_text is an unescaped XML fragment. Different parts of the application use different portions of that data. Let's get rid of this duplication and confusion by standardizing on a single string field (and perhaps a cache-able XML-generating property).

Some complications:

  • tagged_text is very similar to text, including the use of unescaped &s. This similarity allows layers to inspect the tagged_text and apply the results to the text field.
  • Nodes may arise from an amalgamation of multiple XML elements
  • tagged_text currently strips the outer-most tag (generally a P)
cmc333333 pushed a commit to cmc333333/regulations-parser that referenced this issue Jul 29, 2016
Unfortunately, `tagged_text` contains unescaped `&`s, which caused etree to
explode. This patches the logic to account for this scenario.

I've also created eregs#287 to note that we need to replace these fields
altogether.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant