-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STEP - Causality Model Proposal #40
base: main
Are you sure you want to change the base?
Conversation
Hi @Spinachboul ! Really interesting. Your prototypical implementation is more focused on causal discovery, isn't it? Since Should we separate causal discovery (identifying edges in a causal graph) and causal estimation (quantifying the causal effects)? |
@felipeangelimvieira |
@felipeangelimvieira |
@felipeangelimvieira |
It seems like something huge, we should possibly study the possibilities if we are willing to go that way 🤔 |
I mentioned this idea to @fkiraly earlier in a meetup |
There is at least one other discussion stream, it is in the Waterloo Diabetes project - FYI @RobotPsychologist, @dvirzg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting proposal!
I think this needs more details:
- what are the API constraints on the return of
get_causal_relationships
? Can this be any object? Or does this need to have a specific structure - What does
fit
take as input?
Specifically, could you write complete docstrings?
Further, the design looks much like the BaseParamFitter
API in the param_est
module, with y
added - currently the design does not have an y
, but the PR of @satvshr sktime/sktime#7737 does add this argument.
|
Can you please be more specific, what do these booleans mean, and what are keys/values? |
@fkiraly Example {
("series_1", "series_2"): True,
("series_1", "series_3"): False,
("series_2", "series_1"): False,
} |
hm, I see. This seems a bit coarse though, since the usual type of causal model includes information such as a lag and/or temporal information? What I conclude is that the return type could differ, so to me it looks like conceptually this could be a |
I see the alignment with |
This is the output from the initial impementation of the
class ParamFitter(BaseParamFitter):
def _fit(self, X):
self.fitted_params_ = {"mean": X.mean()}
return self
Return TypeA dictionary which consists of p_values and F_Statistics as values and predictors as keys Would love to know your feedback on this!! |
Hey @Spinachboul!
I do not think you should do that, given that it conflicts with the API design throughout the codebase (any core dev can verify that claim). Just to make a suggestion, I think we should pause this PR (i.e., the incomplete |
Yeah that works too, and then would simply go with the absract methods of |
Sorry I lost track of this discussion thread, it seems like this may be a significant and interesting project. Perhaps some world class expertise would be appropriate to bring in on this discussion? @dvirzg would your supervisors at the Perimeter Institute be interested in contributing to this? |
@RobotPsychologist That would be great! |
@Spinachboul, thanks for your ideas, and sorry for the late reply! Regarding the Granger causality fitter, I think this does not jutsify a new base class, but it does justify a new instance of the parameter fitter. I would use the version by @satvshr here: sktime/sktime#7737 which allows a second argument The fitted parameters would be the most significant lag, and the other parameters such as significances, statistics per lag, etc. This would go in the same API reference category as the AR-based lag fitter that @satvshr was going to add. |
Regarding causality models overall: I think we should look at a broader class of examples first. Granger causality is a misnomer and - despite its name - is not actually a causality model (along the lines of "do operator" in the Pearl calculus). It is a simple lag association model, which is the deeper reason that we can map it onto the parameter fitter model. I agree with @RobotPsychologist that this PR would benefit from some expertise, more specifically, examples of time series specific modelling approaches that should be expected to be covered. |
@fkiraly @RobotPsychologist from sktime.param_est.base import BaseParamFitter
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import grangercausalitytests
class GrangerCausalityFitter(BaseParamFitter):
def __init__(self, max_lag=5, significance_level=0.05):
self.max_lag = max_lag
self.significance_level = significance_level
super().__init__()
def _fit(self, X, y=None):
# Ensure y is a DataFrame for compatibility
y = y if isinstance(y, pd.DataFrame) else y.to_frame()
data = pd.concat([y, X], axis=1)
results = grangercausalitytests(data, max_lag=self.max_lag, verbose=False)
p_values = {}
test_statistics = {}
for lag, result in results.items():
p_values[lag] = result[0]['ssr_ftest'][1]
test_statistics[lag] = result[0]['ssr_ftest'][0]
significant_lags = {lag: p for lag, p in p_values.items() if p < self.significance_level}
most_significant_lag = min(significant_lags, key=significant_lags.get) if significant_lags else None
self.fitted_params_ = {
"most_significant_lag": most_significant_lag,
"p_values": p_values,
"test_statistics": test_statistics,
}
return self
def get_fitted_params(self):
return self.fitted_params_ |
So the above code checks the following points:
|
ARlagorder was added in #7693 and merged, if that reference is required here. |
Thanks for the heads up! @satvshr.. This would be helpful in considering the optimal lag value |
Great! Except that I would put the fitted params not in a single dict, but direct attributes ending in underscore. I would also call the different lags simply It would be appreciated if you could open - or comment in an existing issue, if exists - for Granger causality based lag estimation. |
@fkiraly Sure! I have just raised an issue for the same on |
Granger Causality
with the utility ofstatsmodel
package.SCMs (Structural Causal Models)
,Causal Impact Analysis
.@fkiraly
Would like your review on this proposal and feedback for the same.