Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider leaving section header out of parsed OSD content #46

Closed
dylanbeaudette opened this issue Feb 14, 2022 · 2 comments
Closed

consider leaving section header out of parsed OSD content #46

dylanbeaudette opened this issue Feb 14, 2022 · 2 comments

Comments

@dylanbeaudette
Copy link
Member

I can't remember if we have talked about this or not, but I'd like to weigh the pros/cons of removing the section title from the parsed OSD content. We could do this automatically such that the JSON files are "clean", or as a post-processing step. Either way, need a way to exclude for pattern matching searches or the pending "dump" into new NASIS tables. Are there any reasons to leave the section headers in the text?

library(soilDB)
g <- get_OSD("Marshall")
g$TYPE.LOCATION
TYPE LOCATION: Major Land Resource Area (MLRA) 107B-Iowa and Missouri Deep Loess Hills, Cass County, Iowa subset; about 3 miles northwest of Atlantic; located about 1,227 feet west and 245 feet south of the northeast corner of section 34, T. 77 N., R. 37 W.; USGS Atlantic topographic quadrangle; lat. 41 degrees 25 minutes 55 seconds N. and long. 95 degrees 05 minutes 03 seconds W., NAD 83.
@brownag
Copy link
Member

brownag commented Feb 14, 2022

The only reason to include them was because there is a decent amount of non-standard formatting/section headers, and sometimes sections are split apart and combined. For instance "USE:" separated from "VEGETATION:" for "USE AND VEGETATION" section.

To detect those issues it is helpful to have the header content included in the text. We discussed this in #25 and decided to "keep as-is until collapsing and reordering sections into groups is removed; the only way to reliably deparse combined sections is if their headers are included"

There isn't anything that is doing QC on the generalized standard section groups v.s. what is actually in the OSD at this point, but that was always my intention. This is something we talked about and I would be happy to find a way to remove the headers, but it might need to be done as post processing, or the way that split sections are handled changed.

@dylanbeaudette
Copy link
Member Author

Ah right, thanks for the reminder. Post-processing is totally fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants