Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posts/2023-06-06-spatialsample_splits/ #17

Open
utterances-bot opened this issue Jun 6, 2023 · 2 comments
Open

posts/2023-06-06-spatialsample_splits/ #17

utterances-bot opened this issue Jun 6, 2023 · 2 comments

Comments

@utterances-bot
Copy link

Mike Mahoney - From the inbox: How can I get fold assignments from spatialsample?

Straightforward methods for answering a straightforward question.

https://www.mm218.dev/posts/2023-06-06-spatialsample_splits/

Copy link

Hi Mike,
This is super useful. I have a question related to the package, but not to this question in specific.

Is there a way to specify to cluster the spatial data frame based on more than one variable? Let's say that I have a spatial data frame with 3 variables in total: crime, income, geometry.
When I want to generate cluster only by "crime", I delete the "income" variable and run the "spatial_clustering_cv". However, I would like to know if there is a way the "spatial_clustering_cv" can identify clusters based on both socio-economic variables "crime" and "income".

Many thanks in advance!

@mikemahoney218
Copy link
Owner

mikemahoney218 commented Jun 6, 2023

Hi @adrianuzkcc !

So this changed in January this year, as part of spatialsample 0.3.0. Functions in spatialsample now only accept sf objects, and only assign to folds based on the geometry column in those sf objects. This is motivated in part because otherwise you're assuming that "crime" or "income" are in the same units as your spatial data, and that a unit of distance along any of these axes is equally important -- that a 1m change in spatial distance is the same as a 1 dollar change in income or a 1 unit change in crime.

There's some interesting work being done on blocking based upon both spatial locations and predictor variables, including this paper from last week/next month. None of that has made its way into spatialsample yet -- I'd like to see more people talking about/using these types of methods before I commit to maintaining code for them long-term! -- but I would love for them to eventually get added to the package.

If you do have a situation where you've got a lot of variables that share units and are equally important (or any data with a meaningful non-spatial "distance" metric), check out the new-ish rsample::clustering_cv(). This is a really flexible function which lets you specify your variables, as well as your own distance and clustering functions, in order to perform clustering on any set of variables that makes sense for your problem.

Hope that answers your question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants