Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh: support patsy model formulas #77

Open
knaaptime opened this issue Feb 14, 2020 · 7 comments
Open

enh: support patsy model formulas #77

knaaptime opened this issue Feb 14, 2020 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@knaaptime
Copy link
Member

similar to what I've just raised over at spreg, it would be a really nice addition to allow model specifications via patsy formulas. In this case, it would kill two birds with one stone, since I notice predict method hasnt yet been implemented and including a patsy API would go a long way towards addressing #47

I can get started working on this if folks agree, but also like spreg I'd be interested in (1) whether folks want to include this addition and (2) what a good api strategy would be like

@knaaptime knaaptime added the enhancement New feature or request label Feb 14, 2020
@ljwolf
Copy link
Member

ljwolf commented Feb 15, 2020 via email

@darribas
Copy link
Member

Just a crazy and probably dumb idea but will throw it out there just in case there's some value. Would it make sense to have an "umbrella" module for all models in pysal that implements this formula approach but allows us to do it somehow "on top" of all the packages we have that implement models?

I'm thinking something where the user could pass a formula, a GeoDataFrame, and either the class or a str for the model they want to run, and the module/method would do the magic of dispatching everything. If well-designed, it'd be much easier to use from the user's perspective, and it'd also allow us to benefit from having pysal as a "package of packages"/federation, unifying APIs where possible across modules.

What do you think?

@ljwolf
Copy link
Member

ljwolf commented Feb 17, 2020

I really like that! We'd need to spec out 4 things with this, I think. Only the first was on my radar before... let's consider hoval = crime + income.

  1. What about autoregression? Before, something I had suggested was defining an operator to specify that something was simultaneous autoregressive. In that proposal, something like r(hoval) = crime + income was SAR-lag, and hoval = crime + income + r() was SAR-Error. HAC estimators would still need to be specified in a keyword, I think.
  2. What about instruments?
  3. Now, we have mgwr, what about locality? Same as above, we could define a l() function to mean "local", so that hoval = l(crime) + l(income) is an MGWR for crime & income, but hoval = l(crime + income) is a GWR, and hoval = l(crime) + income is semiparametric GWR with only a local term for crime. @TaylorOshan, perspective?
  4. With spvcm, what about multilevels? We'd need to figure out a lme4-style syntax, in addition to a spreg-style autoregressive indicator, since patsy doesn't understand the pipe-plus-grouping syntax, (effect | group).

@TaylorOshan
Copy link
Collaborator

I recall chatting about this a few years back. In highlighting those four issues above that need to be addressed in order to produce module-wide formula API, I think I am sensing two different situations. One is some kind of functionality that creates a design matrix to be passed to a method whereas the other, which could satisfy all four of the above points, is a dispatcher that allows one or more methods to be called by only specifying a formula? In terms of mgwr, I think it would be really neat to have a formula based API that would allow you to deploy all the different variations of gwr/mgwr/semiparametric, though I wonder if this would be too specific to this type of method. For example if we have a single API that accommodates all four points above, are we opening users up to the possibility of easily specifying nonsensical models? Perhaps a simple API for building design matrices would be a good place to start that applies module-wide and then we could build module-specific tweaks and dispatchers on top of this?

@knaaptime
Copy link
Member Author

knaaptime commented Feb 18, 2020

i was thinking along the same lines as Taylor. Ideally we could have a dispatcher that lives in libpysal and provides a robust way of expressing lots of different models using only a formula. If we're going to put some real effort into this, this is probably the "right" way because it opens the door to a wider variety of model specs.

As a first cut, though, we could use patsy to just prepare input data to the existing models (i.e. where models live in their own classes), if for no other reason than to make it easier to use geodataframes. Responding also to @lanselin 's comment from the other thread

not only is there a potential issue with spatial lags, there are also regime variables. how would those fit into the patsy syntax?
same with spatially lagged explanatory variables (SLX, spatial Durbin), ideally computed on the fly (but not in the current implementation). and where would the weights be specified?

I think we could use something like the groups and re_formula arguments for spreg regimes and spvcm grouping variables like statsmodels does for multilevel models (in R, more nlme than lme4, where random is specified separately). I think a stateful transform might work for lagged explanatory variables but the shortest path would probably be to have grouping/regime/W/additional lags in separate arguments, similar to the way it's handled now.

I was looking into some of these ideas here. It seems to work pretty well for mgwr. It fails for spreg though... I don't think it's related to patsy per-se but also stumped for other ideas.

@knaaptime
Copy link
Member Author

an additional small thing is the way intercepts are handled. our packages expect matrices without the constant, so right now patsy strings need to exclude the intercept

@darribas
Copy link
Member

darribas commented Feb 18, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants