-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e03c718
commit 0cec12e
Showing
1,118 changed files
with
97,799 additions
and
129,200 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -117,3 +117,6 @@ dist | |
.yarn/unplugged | ||
.yarn/build-state.yml | ||
.pnp.* | ||
|
||
/.quarto/ | ||
*.tif |
Binary file not shown.
This file was deleted.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions
14
_freeze/junkdrawer/riemann/index/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"hash": "a4cf13a44d6dd737af29d343d43a2dd7", | ||
"result": { | ||
"markdown": "---\ntitle: \"Multi-scale model assessment with spatialsample\"\nauthor: \"Mike Mahoney\"\ndate: last-modified\nformat: \n html:\n code-link: true\nexecute:\n freeze: true\n---\n\n\n\n\nModeling spatially structured data is complicated. In addition to the usual difficulty of statistical modeling, models of spatially structured data may have spatial structure in their errors, with different regions being more or less well-described by a given model. This also means that accuracy metrics for these models might change depending on what spatial scale is being assessed. Only investigating model accuracy at larger aggregation scales, such as when accuracy is only assessed for the entire study area as a whole, might \"smooth out\" these local differences and present an inaccurate picture of model performance.\n\nFor this reason, a number of researchers (most notably, [Riemann et al. (2010)](https://www.nrs.fs.fed.us/pubs/jrnl/2010/nrs_2010_riemann_001.pdf)[^1]) have suggested assessing models at multiple scales of spatial aggregation to ensure cross-scale differences in model accuracy are identified and reported. This post walks through how to do that using the new [spatialsample](https://spatialsample.tidymodels.org/) package.\n\n[^1]: Riemann, R., Wilston, B. T., Lister, A., and Parks, S. 2010. An effective assessment protocol for continuous geospatial datasets of forest characteristics using USFS Forest Inventory and Analysis (FIA) data. Remote Sensing of Environment, 114, pp. 2337-2353. doi: 10.1016/j.rse.2010.05.010.\n\nBecause Riemann et al. were working with data from the US Forest Inventory and Analysis (FIA) program, we're going to do the same. However, because our main goal is to show how spatialsample can support this type of analysis, we won't spend a ton of time worrying about any of the quirks of FIA data[^2] or on feature engineering. Instead, we're going to use a simple linear model to see if we can predict how much aboveground biomass (\"AGB\"; all the non-root woody bits) there is in a forest based on how many trees there are. We'll use all the FIA field data from New York State, USA. \n\n[^2]: Among them that only forested areas are measured, where \"forested\" means \"principally used as forest\" which excludes parks but includes recently clear-cut lands, and that plot locations are considered personally identifying information under the farm bill of 1985, and so as to not identify anyone the coordinates in public data are \"fuzzed\" by a few miles and approximately 20% of plot coordinates are swapped with other plots in the data set. Which is to say, consult your local forester or ecologist if you want to use FIA data to answer real questions in your own work.\n\nBecause we're mostly interested in assessing our models, I'm going to mostly ignore how exactly I downloaded and wrangled the FIA data for this post. If you're curious, I've hidden the code below:\n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-2_a9a1daa216dd073497bd1026dab2f7e9'}\n\n```{.r .cell-code code-fold=\"true\"}\nlibrary(dplyr)\n\n# Download the FIA database for New York over the internet,\n# and unzip it into our local directory\n#\n# This updates annually, which means that this post likely won't\n# generate the exact same results after 2022\nhttr::GET(\n \"https://apps.fs.usda.gov/fia/datamart/Databases/SQLite_FIADB_NY.zip\",\n httr::write_disk(\"SQLite_FIADB_NY.zip\", TRUE)\n)\n\nunzip(\"SQLite_FIADB_NY.zip\")\n\n# We're going to work with the database through dplyr's database connections\n#\n# But first, we need to create a DBI connection to the database and\n# load out tables:\ncon <- DBI::dbConnect(RSQLite::SQLite(), dbname = \"FIADB_NY.db\")\ntrees <- tbl(con, \"TREE\")\n\nplots <- tbl(con, \"PLOT\")\n\n# The FIA database has every measurement ever collected by the program;\n# we'll filter to only the most recent survey for each of the plots.\n#\n# Plots are measured on a rolling 7 year basis, so we'll also cut out any\n# plots which might not be remeasured anymore with a call to filter()\nplots <- plots |> \n group_by(PLOT) |> \n filter(INVYR == max(INVYR, na.rm = TRUE)) |> \n ungroup() |> \n filter(INVYR > 2009) |> \n collect()\n\ncopy_to(con, plots, \"newest_plots\", TRUE)\nnewest_plots <- tbl(con, \"newest_plots\")\n\n# Now we'll use a filtering join to select only trees measured in the most\n# recent sample at each plot\n#\n# We'll also count how many trees were at each plot,\n# sum up their AGB, \n# and save out a few other useful columns like latitude and longitude\nplot_measurements <- trees |> \n right_join(newest_plots, by = c(\"INVYR\", \"PLOT\")) |> \n group_by(PLOT) |> \n summarise(\n yr = mean(INVYR, na.rm = TRUE),\n plot = mean(PLOT, na.rm = TRUE),\n lat = mean(LAT, na.rm = TRUE),\n long = mean(LON, na.rm = TRUE),\n n_trees = n(),\n agb = sum(DRYBIO_AG, na.rm = TRUE)\n ) |> \n collect() |> \n mutate(\n # Because of how we joined, `n_trees` is always at least 1 -- \n # even if there are 0 trees\n n_trees = ifelse(is.na(agb) & n_trees == 1, 0, n_trees),\n agb = ifelse(is.na(agb), 0, agb)\n )\n\nDBI::dbDisconnect(con)\n\nreadr::write_csv(plot_measurements, \"plot_measurements.csv\")\n```\n:::\n\n\nWith that pre-processing done, it's time for us to load our data and turn it into an sf object. We're going to reproject our data to use a coordinate reference system that the US government tends to use for national data products, like the FIA:\n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-3_8d519cb783ecc24ab9ab71a89c72cfbd'}\n\n```{.r .cell-code}\nlibrary(sf)\n\ninvisible(sf_proj_network(TRUE))\n\nplot_measurements <- readr::read_csv(\"https://www.mm218.dev/junkdrawer/riemann/plot_measurements.csv\") |> \n st_as_sf(coords = c(\"long\", \"lat\"), crs = 4326) |> \n st_transform(5070)\n```\n:::\n\n\nAnd this is what we're going to go ahead and resample. We want to assess our model's performance at multiple scales, following the approach in Riemann et al. That means we need to do the following:\n\n1. Block our study area using multiple sets of regular hexagons of different sizes, and assign our data to the hexagon it falls into within each set.\n2. Perform leave-one-block-out cross-validation with each of those sets, fitting our model to n - 1 of the n hexagons we've created and assessing it on the hold-out hexagon.\n3. Calculate model accuracy for each size based on those held-out hexes.\n\nSo to get started, we need to block our study area. We can do this using the `spatial_block_cv()` function from spatialsample. We'll generate ten different sets of hexagon tiles, using `cellsize` arguments of between 10,000 and 100,000 meters[^3]. The code to do that, and to store all of our resamples in a single tibble, looks like this[^4]:\n\n[^3]: This value is in meters because our coordinate reference system is in meters. It represents the length of the [apothem](https://en.wikipedia.org/wiki/Apothem), from the center of each polygon to the middle of the side. We're using hexagons because Riemann et al. also used hexagons. \n\n[^4]: `v` is `Inf` because we want to perform leave-one-block-out cross-validation, but we don't know how many blocks there will be before they're created. This is the supported way to do leave-one-X-out cross-validation in spatialsample > 0.2.0 (another option is to set `v = NULL`).\n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-4_c9f6d59a5c9db207f44b23537327ee11'}\n\n```{.r .cell-code}\nset.seed(123)\nlibrary(dplyr)\nlibrary(spatialsample)\ncellsize <- seq(10, 100, 10) * 1000\n\nriemann_resamples <- tibble(\n resamples = purrr::map(\n cellsize, \n \\(cs) {\n spatial_block_cv(\n plot_measurements,\n v = Inf,\n cellsize = cs,\n square = FALSE\n )\n }\n ),\n cellsize = cellsize\n)\n```\n:::\n\n\nIf we want, we can visualize one (or more) of our resamples, to get a sense of what our tiling looks like:\n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-5_47927a616ab25100d9067fa78796ff2b'}\n\n```{.r .cell-code}\nautoplot(riemann_resamples$resamples[[10]])\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\nAnd that's step 1 of the process completed! Now we need to move on to step 2, and actually fit models to each of these resamples. I want to highlight that this is a _lot_ of models, and so is going to take a while[^5]:\n\n[^5]: Linear regression was invented in 1805, ish, long before the Analytical Engine was a twinkle in Babbage's eye. Whenever I get frustrated at how long fitting multiple models like this takes, I like to take a step back and recognize that I am asking my poor overworked computer to fit roughly as many models as were used in the first ~100 years of the technique's life. \n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-6_203f633130982a3e10be3bc1d02942f9'}\n\n```{.r .cell-code}\npurrr::map_dbl(\n riemann_resamples$resamples,\n nrow\n) |> \n sum()\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2600\n```\n:::\n:::\n\nWith that said, actually fitting those few thousand models is a two part process. First, we're going to load the rest of the tidymodels packages and use them to define a workflow (from the workflows package), specifying the formula and model that we want to fit to each resample:\n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-7_3148976037a06bd519ff92691085eda6'}\n\n```{.r .cell-code}\nlibrary(tidymodels)\n\nlm_workflow <- workflow() |> \n add_model(linear_reg()) |> \n add_formula(agb ~ n_trees)\n```\n:::\n\n\nNext, we'll actually apply that workflow a few thousand times! We'll calculate two metrics for each run of the model: the root-mean-squared error (RMSE) and the mean absolute error (MAE). We can add these metrics as a new column to our resamples using the following:\n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-8_26e509d72d92c408fc0643fc6b3e489d'}\n\n```{.r .cell-code}\nriemann_resamples <- riemann_resamples |> \n mutate(\n resampled_outputs = purrr::map(\n resamples, \n fit_resamples,\n object = lm_workflow,\n metrics = metric_set(\n rmse,\n mae\n )\n )\n )\n```\n:::\n\n\nThe `riemann_resamples` object now includes both our original resamples as well as the accuracy metrics associated with each run of the model. A very cool thing about this approach is that we can now visualize our block-level accuracy metrics with a few lines of code. \n\nFor instance, if we wanted to plot block-level RMSE for our largest assessment scale, we could use the following code to \"unnest\" our nested metric and resample columns:\n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-9_a70aa73bfcb2b5ad1896982fe088d7d5'}\n\n```{.r .cell-code}\nriemann_resamples$resampled_outputs[[10]] |> \n mutate(splits = purrr::map(splits, assessment)) |> \n unnest(.metrics) |> \n filter(.metric == \"rmse\") |> \n unnest(splits) |> \n st_as_sf() |> \n ggplot(aes(color = .estimate)) + \n geom_sf()\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\nWe can also go on to the third step of our assessment process, and get our model accuracy metrics for each aggregation scale we investigated. We'll create a new data frame with only our cellsize variable and the associated model metrics:\n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-10_cc9d162150cef70d7dcc926bf5fa8fa7'}\n\n```{.r .cell-code}\nriemann_metrics <- riemann_resamples |> \n transmute(\n cellsize = cellsize,\n resampled_metrics = purrr::map(resampled_outputs, collect_metrics)\n ) |> \n unnest(resampled_metrics)\n\nhead(riemann_metrics)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 6 × 7\n cellsize .metric .estimator mean n std_err .config \n <dbl> <chr> <chr> <dbl> <int> <dbl> <chr> \n1 10000 mae standard 5787. 1541 99.9 Preprocessor1_Model1\n2 10000 rmse standard 6980. 1541 121. Preprocessor1_Model1\n3 20000 mae standard 5722. 424 130. Preprocessor1_Model1\n4 20000 rmse standard 7644. 424 169. Preprocessor1_Model1\n5 30000 mae standard 5637. 205 161. Preprocessor1_Model1\n6 30000 rmse standard 7725. 205 218. Preprocessor1_Model1\n```\n:::\n:::\n\n\nAnd just like that, we've got a multi-scale assessment of our model's accuracy! We can then use this to investigate and report how well our model does at different levels of aggregation. For instance, by plotting RMSE against MAE at various scales, it appears that our RMSE increases with aggregation while MAE decreases. This hints that, as we aggregate our predictions to larger hexagons, more of our model's overall error is driven by large outliers:\n\n\n::: {.cell hash='index_cache/html/unnamed-chunk-11_7192542fa5a721656761fee2bf2ffce1'}\n\n```{.r .cell-code}\nlibrary(ggplot2)\n\nggplot(riemann_metrics, aes(cellsize, mean, color = .metric)) + \n geom_line() +\n geom_point() + \n theme_minimal()\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
16 changes: 16 additions & 0 deletions
16
_freeze/posts/2019/03/thesis-published/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"hash": "8c6c17e2c81c4259197af2c25181076b", | ||
"result": { | ||
"markdown": "---\ntitle: \"Thesis Now Available in ESF Digital Commons\"\ncategories: [\"Publications\", \"Beaver\"]\ndate: '2019-03-27'\nupdated: '2019-03-27'\ndescription: \"It's a real humdinger.\"\nauthor:\n - name: Mike Mahoney\n url: {}\noutput:\n distill::distill_article:\n self_contained: false\n---\n\n\nMy thesis is now available on the [ESF Digital Commons](https://digitalcommons.esf.edu/honors/141/)! I'm extremely grateful to Drs. John Drake and Bill Shields for their help in the revision and submission process, and of course to Dr. John Stella for the extensive support he provided throughout the project, from conceptualization to publication.\n\nIn the thesis, we look at the impacts beavers have on the forest community around them as they remove trees for food and building dams. While people had looked at these impacts in other parts of beaver's range, the Adirondacks are a strange enough ecosystem - being largely protected from anthropogenic disturbances, most of the forest landscape exhibits only one or two age classes - that we weren't sure how applicable conclusions from these regions would be. What we found was that while the broad conclusions of these studies held true - beavers still operate as central place foragers and create large disturbances in the landscape - the lack of early-successional species throughout the Adirondacks seriously shifted which stems were harvested preferrentially. We also found a lot of variance in the patterns of how individual species were utilized - for instance, beaver harvested almost any size speckled alder they could find, so long as it was close to their dam, but would harvest red maple at any distance, so long as the stem was small.\n\n\n\nWe're currently working on a journal article version of the thesis, using an expanded dataset and focusing more closely on the patterns in forage selectivity we found, and how they differ from other regions. That should hopefully be in the review process within the next few weeks.\n", | ||
"supporting": [ | ||
"thesis-published_files" | ||
], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": null | ||
} | ||
} |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
16 changes: 16 additions & 0 deletions
16
_freeze/posts/2020/01/heddlr-release/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"hash": "66c0e946c192bd51acac4beab1034ad7", | ||
"result": { | ||
"markdown": "---\ntitle: \"Announcing {heddlr}, now on CRAN!\"\ncategories: [\"R\", \"R Packages\", \"R Markdown\"]\ndate: '2020-01-23'\nupdated: '2020-01-23'\ndescription: \"Write less boilerplate, get more done.\"\nauthor:\n - name: Mike Mahoney\n url: {}\noutput:\n distill::distill_article:\n self_contained: false\n---\n\n\nMy first package just got published to CRAN today! `heddlr` is a set of tools \nthat make it easier to write modular R Markdown documents by decomposing them \ninto a set of patterns which can be repeated and combined based on your input \ndata, letting you dynamically add and remove sections based on your data. I \nstarted this package to solve an issue I found myself running into when \nbuilding [flexdashboards](https://rmarkdown.rstudio.com/flexdashboard/), and \nhave since found out that there's all sorts of cool tricks you can do by \napplying this type of functional programming mindset to R Markdown documents.\n\nYou can find out more on heddlr's [documentation website](https://mikemahoney218.github.io/heddlr/),\nproudly made in R via [pkgdown](https://pkgdown.r-lib.org/). This first version \non CRAN is 0.5.0, with 0.1 -> 0.4 previously released on [GitHub](https://github.com/mikemahoney218/heddlr).\n", | ||
"supporting": [ | ||
"heddlr-release_files" | ||
], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": null | ||
} | ||
} |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
16 changes: 16 additions & 0 deletions
16
_freeze/posts/2020/03/spacey-release/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"hash": "4b3407afcb239d7233d4e16f97af8afe", | ||
"result": { | ||
"markdown": "---\ntitle: \"Announcing {spacey}, now on CRAN!\"\ncategories: [\"R\", \"R Packages\", \"maps\", \"spacey\", \"geospatial data\"]\ndate: '2020-03-24'\nupdated: '2020-03-24'\ndescription: \"USGS data access and rayshader maps, done cheap.\"\nauthor:\n - name: Mike Mahoney\n url: {}\noutput:\n distill::distill_article:\n self_contained: false\n---\n\n\nI've launched a new package to CRAN! `spacey` helps you pull elevation and \nimage overlay data from the USGS and ESRI, then helps you turn them into \nbeautiful maps via [the fantastic `rayshader` package](https://www.rayshader.com/).\n\nThe package has a [documentation website](https://mikemahoney218.github.io/spacey/)\nbuilt with [pkgdown](https://pkgdown.r-lib.org/) -- check it out for more \ninformation!\n", | ||
"supporting": [ | ||
"spacey-release_files" | ||
], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": null | ||
} | ||
} |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Oops, something went wrong.