Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in fetchNASIS with diagHzBoolean #158

Closed
smroecker opened this issue Jan 20, 2021 · 6 comments
Closed

error in fetchNASIS with diagHzBoolean #158

smroecker opened this issue Jan 20, 2021 · 6 comments
Labels
NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions

Comments

@smroecker
Copy link
Member

smroecker commented Jan 20, 2021

Their is an error in fetchNASIS. I noticed while fetching data for pedons queried using the user pedon ids 2015MT663%. The total number of ochric.epipedons should equal 61, but its returning 8.

I think the issue is coming from line 161. e.g. site(h) <- extended_data$diagHzBoolean

I think the issue is due to a difference in the data type being joined on via site(). In the site slot peiid is a character, while in diagHzBoolean the peiid is a integer. If this is indeed the issue, it begs the question, how many other joins are breaking because of a different in data type???

On a related .diagHzLongtoWide() looks a bit verbose. I think something like the following would be simplier.

.diagHzLongtoWide <- function(df) as.data.frame.matrix(table(df[["peiid"]], df[["featkind"]]) > 0)

@brownag brownag added the NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions label Jan 20, 2021
@brownag
Copy link
Member

brownag commented Jan 20, 2021

See below


Can you install aqp off CRAN and see if you still have the issue?

This sounds data.table related -- cant say for sure without you posting the stack trace / error message -- and we just moved data.table into imports in aqp on the master branch off github (ncss-tech/aqp#157). I was worried about the downstream effect of merge.data.table getting triggered if we aren't explicit and I literally was about to go in and do some testing related to the upcoming 2.5.9 soilDB release #155 and my last remaining TODO.

There is a simple fix on aqp side in site<- if my suspicions are correct

@brownag brownag added the data.table Related to package-wide safety of data.table objects and methods label Jan 20, 2021
@brownag
Copy link
Member

brownag commented Jan 21, 2021

False alarm. Though i am gonna test the cr*p out of the data.table stuff now and have a couple more safety ideas.

Using latest GitHub:

# remotes::install_github(c("ncss-tech/aqp","ncss-tech/soilDB"), dep=F)

library(aqp)
#> This is aqp 1.27
#> 
#> Attaching package: 'aqp'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(soilDB)

packageVersion("aqp")
#> [1] '1.27'

packageVersion("soilDB")
#> [1] '2.5.9'

# populate selected set with 2015MT663%
f <- fetchNASIS()
#> mixing dry colors ... [1 of 98 horizons]
#> mixing moist colors ... [1 of 454 horizons]
#> Warning: some records are missing rock fragment volume, these have been removed
#> -> QC: some fragsize_h values == 76mm, may be mis-classified as cobbles [93 / 907 records]
#> Warning: some records are missing artifact volume, these have been removed
#> Warning: all records are missing artifact volume (NULL). buffering result with
#> NA. will be converted to zero if nullFragsAreZero = TRUE.
#> -> QC: horizon errors detected, use `get('bad.pedon.ids', envir=soilDB.env)` for related userpedonid values or `get('bad.horizons', envir=soilDB.env)` for related horizon designations


site(f)$ochric.epipedon
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
#> [25]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
#> [37] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
sum(site(f)$ochric.epipedon)
#> [1] 8

length(f)
#> [1] 46

# get extended data independently
ex <- get_extended_data_from_NASIS_db()
#> Warning: some records are missing rock fragment volume, these have been removed
#> -> QC: some fragsize_h values == 76mm, may be mis-classified as cobbles [93 / 907 records]
#> Warning: some records are missing artifact volume, these have been removed

#> Warning: all records are missing artifact volume (NULL). buffering result with
#> NA. will be converted to zero if nullFragsAreZero = TRUE.

# number of unique peiid in the diagnostics table
length(unique(ex$diagHzBoolean$peiid))
#> [1] 115

# 61
sum(ex$diagHzBoolean$ochric.epipedon)
#> [1] 61

# which diagnostics correspond to profiles in the SPC?
dp <- subset(ex$diagHzBoolean, (peiid %in% profile_id(f)) & ochric.epipedon)
length(unique(dp$peiid))
#> [1] 8

# as expected
length(unique(dp$peiid)) == sum(site(f)$ochric.epipedon)
#> [1] TRUE

@brownag brownag removed the data.table Related to package-wide safety of data.table objects and methods label Jan 21, 2021
@brownag
Copy link
Member

brownag commented Jan 21, 2021

On a related .diagHzLongtoWide() looks a bit verbose. I think something like the following would be simplier.

.diagHzLongtoWide <- function(df) as.data.frame.matrix(table(df[["peiid"]], df[["featkind"]]) > 0)

This sounds plausible, feel free to submit a PR I would gladly review it

@brownag
Copy link
Member

brownag commented Jan 21, 2021

I was out for a walk with the dog, and realized that rmHzErrors=FALSE would probably give you your expected value of 61. Can confirm that is the case for that 2015MT663 dataset -- which is one I am interested in making into a test case after the SQLite stuff goes live. This is a good test for fetchNASIS rmHzErrors and general "fixing" of database errors.

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Jan 21, 2021

.diagHzLongtoWide <- function(df) as.data.frame.matrix(table(df[["peiid"]], df[["featkind"]]) > 0)

Agreed, this function can probably be replaced with dcast(...). Making note of it in #161

@smroecker
Copy link
Member Author

Amazing what dog walks can do when it comes to trouble shooting. rmHzErrors = FALSE appears to have done the trick. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions
Projects
None yet
Development

No branches or pull requests

3 participants