-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.table rewrite of logic functions, profileApply / mutate_profile #174
Comments
Cool. Do you think that an on-the-fly conversion to |
There is not very much overhead with converting to EDIT: library(data.table)
library(tibble)
library(aqp, warn.conflicts = FALSE)
#> This is aqp 1.26
data(sp4)
depths(sp4) <- id ~ top + bottom
top <- sp4[1, ][[horizonDepths(sp4)[1]]]
bottom <- sp4[1, ][[horizonDepths(sp4)[2]]]
# data.frame -> x
microbenchmark::microbenchmark(data.frame(top, bottom), data.table(top, bottom), tibble(top, bottom))
#> Unit: microseconds
#> expr min lq mean median uq max neval=100
#> data.frame(top, bottom) 247.436 272.4865 346.8239 294.8015 327.1955 3598.218
#> data.table(top, bottom) 195.566 227.3570 292.9348 242.6355 267.3560 1671.898
#> tibble(top, bottom) 743.337 841.6940 1307.6296 911.5610 958.8015 38277.681
lsp4 <- as.list(data.frame(top, bottom))
# list -> x
microbenchmark::microbenchmark(as.data.frame(lsp4), as.data.table(lsp4), as_tibble(lsp4))
#> Unit: microseconds
#> expr min lq mean median uq max neval=100
#> as.data.frame(lsp4) 231.792 244.0465 257.8300 250.3375 260.6730 361.660
#> as.data.table(lsp4) 148.525 162.3775 178.7438 171.8700 179.1835 736.985
#> as_tibble(lsp4) 141.511 150.6820 162.7615 155.0220 160.4615 562.501 |
What do you think about these three scenarios:
I really like the idea of letting folks choose the tabular back-end, with the possibility to perform on-the-fly- conversion to / from when a significant performance gain is possible. In other words, I'm happy to use |
FWIW I agree with you @dylanbeaudette -- I think it's good to have advanced user deciding to use a performance-oriented "tabular" backend (either |
Given the way I have implemented data.table and tibble support, it is imminently possible to support some other things like dbplyr for "lazy" SoilProfileCollections. I have actually tried this -- several months ago -- would be worth revisiting given some other more recent developments in that arena. I think we are all in agreement -- default for SPCs are data.frames -- and all you will ever see out of them are data.frames unless you manually change the data.frame class in the SPC and rebuild (as I do in the example above) or build with the desired object type. e.g. On-the-fly conversion is really not an issue. I suppose if you want to manually trigger a different output, that could be done on a function specific basis. Personally, I prefer to get back in the class that all the slots of my SPC are. So, forgive me for not thinking this is a big deal. We are already doing mostly 1 and 3 for several things in aqp I have implemented. For instance, all calls to |
Not a big deal, but trying to get a better idea of how you see these functions invoked and expectations. It seems like we are all on the same page. I've had similar conversations with myself about internal use of |
I am going to close this issue since the new method I posted here was incorporated into My feeling on mutate_profile and profileApply are this:
|
This is a sub-issue related to #157 -- which is too "big" to tackle in one bite IMO.
Significant benefits can come from
data.table
rewrite of logic functions and soonmutate_profile
/profileApply
in terms of memory usage and elapsed time for profile-level evaluations. For thissp4
-based demo, the data.table rewrite is ~50x faster in fast mode (most commonly used output) and 10x faster in slow mode (identical output).Here is demo showing a drop-in-replacement for checkHzDepthLogic.
New function
"Fast" results
And the benchmarks
The text was updated successfully, but these errors were encountered: