Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STEP - Causality Model Proposal #40

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions steps/25_causality_model_for_time_series_forecasting/step.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Causality-Based Models for Time Series Analysis in sktime

Contributors: [Spinachboul](https://github.com/Spinachboul)

## Introduction

Time series analysis often focuses on correlation and predictive modeling, but understanding causal relationships is crucial for decision-making, policy evaluation, and robust forecasting. This proposal aims to integrate causality-based models into sktime, enabling users to uncover and leverage causal relationships in time series data. The initial implementation will focus on Granger Causality, with plans to expand to other methods like Structural Causal Models (SCMs) and Causal Impact Analysis.

## Contents

1. Problem Statement
2. Description of proposed solution
3. Motivation
4. Discussion and Comparison of Alternative Solutions
5. Detailed Description of Design and Implementation
6. Prototypical Implementation

## Problem statement

Current time series analysis tools, including sktime, primarily focus on correlation-based methods and predictive modeling. However, these approaches do not explicitly model causal relationships, which are essential for:

- Understanding the drivers of time series dynamics.
- Evaluating the impact of interventions or policy changes.
- Improving forecasting accuracy by accounting for causal effects.

There is a need for a unified framework within sktime to incorporate causality-based models, making it easier for users to analyze and leverage causal relationships in time series data.

## Description of proposed solution

The proposed solution involves integrating causality-based models into sktime, starting with Granger Causality and expanding to other methods such as:

- Structural Causal Models (SCMs): Representing causal relationships using directed acyclic graphs (DAGs).
- Causal Impact Analysis: Estimating the effect of interventions on time series.

## Motivation

**Enhanced Insights**: Causal models provide a deeper understanding of time series dynamics beyond correlation.

**Improved Forecasting**: Incorporating causal relationships can lead to more accurate and interpretable forecasts.

**Policy and Decision-Making**: Causal models are critical for evaluating interventions in domains like economics, healthcare, and climate science.

**Community Demand**: There is growing interest in causal inference within the time series community, and sktime is well-positioned to address this need.

## Discussion and comparison of alternative solutions

1. **Grangular Causality**
- **Pros**: Simple, widely used, and easy to interpret.
- **Cons**: Limited to linear relationships and may not capture true causality in complex systems.

2. **Structural Causal Models(SCMs)**
- **Pros**: Can model complex, non-linear relationships and handle cofounding variables.
- **Cons**: Requires domain knowledge to specify causal graphs and is computationally intensive.

3. **Causal Impact Analysis**
- **Pros**: Specifically designed for intervention analysis and easy to use.
- **Cons**: Limited to scenarios with a clear pre- and post-intervention period.

## Detailed description of design and implementation of proposed solution

### Design Principles

1) **Consistent API**: Follow sktime's `fit`, `predict`, and `transform` interface.
2) **Modularity**: Allow users to easily switch between different causality models.
3) **Interoperability**: Ensure compatibility with sktime's data structures and pipelines.

### Implementation Plan

1) **Granger Causality**
- Use `statsmodels` for the core implementation.
- Add methods for testing and visualizing causal relationships.
2) **Structural Causal Models (SCMs)**
- Integrate with libraries like `pywhy` or `dowhy`. Official documentation can be found here: [Click Here](https://www.pywhy.org/)
- Provide tools for specifying and validating causal graphs.
3) **Causal Impact Analysis**
- Adapt Google's `CausalImpact` library for sktime's interface.
- Add functionality for visualizing intervention effects.

### Prototypical Implementation

```python
from sktime.base import BaseEstimator
from statsmodels.tsa.stattools import grangercausalitytests

class GrangerCausality(BaseEstimator):
def __init__(self, maxlag=2, test="ssr_chi2test"):
self.maxlag = maxlag
self.test = test

def fit(self, X, y=None):
# X is a pandas DataFrame with multiple time series
self.results_ = {}
for col in X.columns:
if col != y.name:
test_result = grangercausalitytests(X[[y.name, col]], maxlag=self.maxlag, verbose=False)
self.results_[col] = test_result
return self

def get_causal_relationships(self):
# Summarize causal relationships
causal_relationships = {}
for col, result in self.results_.items():
p_values = [result[lag][0][self.test][1] for lag in range(1, self.maxlag + 1)]
causal_relationships[col] = min(p_values) < 0.05 # Check significance at 5% level
return causal_relationships
```