- 🚀 Getting Started
- 🧩 Modules
- 🗺 Roadmap
- 🤝 Contributing
- 📄 License
- 📍 Time Series Bootstrapping Methods intro
- 👏 Contributors
tsbootstrap
provides a unified, sklearn
-like interface to all bootstrap methods.
Example using a MovingBlockBootstrap
- all bootstrap algorithms follow
the same interface!
from tsbootstrap import MovingBlockBootstrap
import numpy as np
# Create custom time series data. While below is for univariate time series, the bootstraps can handle multivariate time series as well.
n_samples = 10
X = np.arange(n_samples)
# Instantiate the bootstrap object
n_bootstraps = 3
block_length = 3
rng = 42
mbb = MovingBlockBootstrap(
n_bootstraps=n_bootstraps, rng=rng, block_length=block_length
)
# Generate bootstrapped samples
return_indices = False
bootstrapped_samples = mbb.bootstrap(X, return_indices=return_indices)
# Collect bootstrap samples
X_bootstrapped = []
for data in bootstrapped_samples:
X_bootstrapped.append(data)
X_bootstrapped = np.array(X_bootstrapped)
tsbootstrap
is installed via pip
, either from PyPI or locally.
- Python (3.9 or higher)
pip
(latest version recommended), plus suitable environment manager (venv
,conda
)
You can also consider using uv
to speed up environment setu.
To install the latest release of tsbootstrap
directly from PyPI, run:
pip install tsbootstrap
To install with all optional dependencies:
pip install "tsbootstrap[all_extras]"
Bootstrap algorithms manage their own dependencies - if an extra is needed but not present, the object will raise this at construction.
The tsbootstrap
package contains various modules that handle tasks such as bootstrapping, time series simulation, and utility functions. This modular approach ensures flexibility, extensibility, and ease of maintenance.
root
File | Summary |
---|---|
setup.sh | Shell script for initial setup and environment configuration. |
commitlint.config.js | Configuration for enforcing conventional commit messages. |
CITATION.cff | Citation metadata for the project. |
CODE_OF_CONDUCT.md | Guidelines for community conduct and interactions. |
CONTRIBUTING.md | Instructions for contributing to the project. |
.codeclimate.yml | Configuration for Code Climate quality checks. |
.gitignore | Specifies files and folders to be ignored by Git. |
.pre-commit-config.yaml | Configuration for pre-commit hooks. |
poetry.toml | Configuration file for Poetry package management. |
tsbootstrap_logo.png | Project logo image. |
tsbootstrap
File | Summary |
---|---|
block_generator.py | Generates blocks for bootstrapping. |
markov_sampler.py | Implements sampling methods based on Markov models. |
time_series_model.py | Defines base and specific time series models. |
block_length_sampler.py | Samples block lengths for block bootstrapping methods. |
base_bootstrap.py | Contains the implementation for different types of base, abstract bootstrapping classes for time series data. |
base_bootstrap_configs.py | Provides configuration classes for different base, abstract bootstrapping classes. |
block_bootstrap.py | Contains the implementation for different types of block bootstrapping methods for time series data. |
block_bootstrap_configs.py | Provides configuration classes for different block bootstrapping methods. |
bootstrap.py | Contains the implementation for different types of bootstrapping methods for time series data, including residual, distribution, markov, statistic-preserving, and sieve. |
time_series_simulator.py | Simulates time series data based on various models. |
block_resampler.py | Implements methods for block resampling in time series. |
tsfit.py | Fits time series models to data. |
ranklags.py | Provides functionalities to rank lags in a time series. |
utils
File | Summary |
---|---|
types.py | Defines custom types used across the project. |
validate.py | Contains validation utilities. |
odds_and_ends.py | Contains miscellaneous utility functions. |
This is an abridged version; for the complete and evolving list of plans and improvements, see Issue #144.
- Performance and Scaling: handling large datasets, distributed backend integration (
Dask
,Spark
,Ray
), profiling/optimization - Tuning and AutoML: adaptive block length, adaptive resampling, evaluation based parameter selection
- Real-time and Stream Data: stream bootstraps, data update interface
- Stage 2
sktime
Integration: evaluation module, datasets, benchmarks, sktime forecasters in bootstraps - API and Capability Extension: panel/hierarchical data, exogenous data, update/stream, model state management
- Scope Extension (TBD): time series augmentation, fully probabilistic models
Contributions are always welcome!
See our good first issues for getting started.
Below is a quick start guide to contributing.
-
Fork the tsbootstrap repository
-
Clone the fork to local:
git clone https://github.com/astrogilda/tsbootstrap
-
In the local repository root, set up a python environment, e.g.,
venv
orconda
. -
Editable install via
pip
, including developer dependencies:
pip install -e ".[dev]"
The editable install ensures that changes to the package are reflected in your environment.
After installation, you can verify that tsbootstrap has been installed correctly by checking its version or by trying to import it in Python:
python -c "import tsbootstrap; print(tsbootstrap.__version__)"
This command should output the version number of tsbootstrap without any errors, indicating that the installation was successful.
That's it! You are now set up and ready to go. You can start using tsbootstrap for your time series bootstrapping needs.
Contributions are always welcome! Please follow these steps:
- Create a new branch with a descriptive name (e.g.,
new-feature-branch
orbugfix-issue-123
).
git checkout -b new-feature-branch
- Make changes to the project's codebase.
- Commit your changes to your local branch with a clear commit message that explains the changes you've made.
git commit -m 'Implemented new feature.'
- Push your changes to your forked repository on GitHub using the following command
git push origin new-feature-branch
- Create a new pull request to the original project repository. In the pull request, describe the changes you've made and why they're necessary. The project maintainers will review your changes and provide feedback or merge them into the main branch.
To run all tests, in your developer environment, run:
pytest tests/
Individual bootstrap algorithms can be tested as follows:
from tsbootstrap.utils import check_estimator
check_estimator(my_bootstrap_algo)
For more detailed information on how to contribute, please refer to our CONTRIBUTING.md guide.
This project is licensed under the ℹ️ MIT
License. See the LICENSE file for additional info.
Thanks goes to these wonderful people:
This project follows the all-contributors specification. Contributions of any kind welcome!
tsbootstrap
is a comprehensive project designed to implement an array of bootstrapping techniques specifically tailored for time series data. This project is targeted towards data scientists, statisticians, economists, and other professionals or researchers who regularly work with time series data and require robust methods for generating bootstrapped copies of univariate and multivariate time series data.
Time series bootstrapping is a nuanced resampling method that is applied to time-dependent data. Traditional bootstrapping methods often assume independence between data points, which is an assumption that does not hold true for time series data where a data point is often dependent on previous data points. Time series bootstrapping techniques respect the chronological order and correlations of the data, providing more accurate estimates of uncertainty or variability.
The tsbootstrap
project offers a diverse set of bootstrapping techniques that can be applied to either the entire input time series (classes prefixed with Whole
), or after partitioning the data into blocks (classes prefixed with Block
). These methodologies can be applied directly to the raw input data or to the residuals obtained after fitting one of the five statistical models defined in time_series_model.py
(classes with Residual
in their names).
Block Bootstrap is a prevalent approach in time series bootstrapping. It involves resampling blocks of consecutive data points, thus respecting the internal structures of the data. There are several techniques under Block Bootstrap, each with its unique approach. tsbootstrap
provides highly flexible block bootstrapping, allowing the user to specify the block length sampling, block generation, and block resampling strategies. For additional details, refer to block_length_sampler.py
, block_generator.py
, and block_resampler.py
.
The Moving Block Bootstrap, Circular Block Bootstrap, Stationary Block Bootstrap, and NonOverlapping Block Bootstrap methods are all variations of the Block Bootstrap that use different methods to sample the data, maintaining various types of dependencies.
Bartlett's, Blackman's, Hamming's, Hanning's, and Tukey's Bootstrap methods are specific implementations of the Block Bootstrap that use different window shapes to taper the data, reducing the influence of data points far from the center. In tsbootstrap
, these methods inherit from MovingBlockBootstrap
, but can easily be modified to inherit from any of the other three base block bootstrapping classes.
Each method comes with its distinct strengths and weaknesses. The choice of method should be based on the characteristics of the data and the specific requirements of the analysis.
This method is implemented in MovingBlockBootstrap
and is used for time series data where blocks of data are resampled to maintain the dependency structure within the blocks. It's useful when the data has dependencies that need to be preserved. It's not recommended when the data does not have any significant dependencies.
This method is implemented in CircularBlockBootstrap
and treats the data as if it is circular (the end of the data is next to the beginning of the data). It's useful when the data is cyclical or seasonal in nature. It's not recommended when the data does not have a cyclical or seasonal component.
This method is implemented in StationaryBlockBootstrap
and randomly resamples blocks of data with block lengths that follow a geometric distribution. It's useful for time series data where the degree of dependency needs to be preserved, and it doesn't require strict stationarity of the underlying process. It's not recommended when the data has strong seasonality or trend components which violate the weak dependence assumption.
This method is implemented in NonOverlappingBlockBootstrap
and resamples blocks of data without overlap. It's useful when the data has dependencies that need to be preserved and when overfitting is a concern. It's not recommended when the data does not have any significant dependencies or when the introduction of bias due to non-overlapping selection is a concern.
Bartlett's method is a time series bootstrap method that uses a window or filter that tapers off as you move away from the center of the window. It's useful when you have a large amount of data and you want to reduce the influence of the data points far away from the center. This method is not advised when the tapering of data points is not desired or when the dataset is small as the tapered data points might contain valuable information. It is implemented in BartlettsBootstrap
.
Similar to Bartlett's method, Blackman's method uses a window that tapers off as you move away from the center of the window. The key difference is the shape of the window (Blackman window has a different shape than Bartlett). It's useful when you want to reduce the influence of the data points far from the center with a different window shape. It's not recommended when the dataset is small or tapering of data points is not desired. It is implemented in BlackmanBootstrap
.
Similar to the Bartlett and Blackman methods, the Hamming method uses a specific type of window function. It's useful when you want to reduce the influence of the data points far from the center with the Hamming window shape. It's not recommended for small datasets or when tapering of data points is not desired. It is implemented in HammingBootstrap
.
This method also uses a specific type of window function. It's useful when you want to reduce the influence of the data points far from the center with the Hanning window shape. It's not recommended for small datasets or when tapering of data points is not desired. It is implemented in HanningBootstrap
.
Similar to the Bartlett, Blackman, Hamming, and Hanning methods, the Tukey method uses a specific type of window function. It's useful when you want to reduce the influence of the data points far from the center with the Tukey window shape. It's not recommended for small datasets or when tapering of data points is not desired. It is implemented in TukeyBootstrap
.
Residual Bootstrap is a method designed for time series data where a model is fit to the data, and the residuals (the difference between the observed and predicted data) are bootstrapped. It's particularly useful when a good model fit is available for the data. However, it's not recommended when a model fit is not available or is poor. tsbootstrap
provides four time series models to fit to the input data -- AutoReg
, ARIMA
, SARIMA
, and VAR
(for multivariate input time series data). For more details, refer to time_series_model.py
and tsfit.py
.
Statistic-Preserving Bootstrap is a unique method designed to generate bootstrapped time series data while preserving a specific statistic of the original data. This method can be beneficial in scenarios where it's important to maintain the original data's characteristics in the bootstrapped samples. It is implemented in StatisticPreservingBootstrap
.
Distribution Bootstrap generates bootstrapped samples by fitting a distribution to the residuals and then generating new residuals from the fitted distribution. The new residuals are then added to the fitted values to create the bootstrapped samples. This method is based on the assumption that the residuals follow a specific distribution (like Gaussian, Poisson, etc). It's not recommended when the distribution of residuals is unknown or hard to determine. It is implemented in DistributionBootstrap
.
Markov Bootstrap is used for bootstrapping time series data where the residuals of the data are presumed to follow a Markov process. This method is especially useful in scenarios where the current residual primarily depends on the previous one, with little to no dependency on residuals from further in the past. Markov Bootstrap technique is designed to preserve this dependency structure in the bootstrapped samples, making it particularly valuable for time series data that exhibits Markov properties. However, it's not advisable when the residuals of the time series data exhibit long-range dependencies, as the Markov assumption of limited dependency may not hold true. It is implemented in MarkovBootstrap
. See markov_sampler.py
for implementation details.
Sieve Bootstrap is designed for handling dependent data, where the residuals of the time series data follow an autoregressive process. This method aims to preserve and simulate the dependencies inherent in the original data within the bootstrapped samples. It operates by approximating the autoregressive process ofthe residuals using a finite order autoregressive model. The order of the model is determined based on the data, and the residuals are then bootstrapped. The Sieve Bootstrap technique is particularly valuable for time series data that exhibits autoregressive properties. However, it's not advisable when the residuals of the time series data do not follow an autoregressive process. It is implemented in SieveBootstrap
. See time_series_simulator.py
for implementations details.