aluminium-prediction

Overview

This project was partly generated by Kedro with Kedro-Viz setup, which was generated using kedro 0.19.4.

Take a look at the hints in readme/Kedro.md

Project plan

See project diagram here.

Problem statement

We want to predict future values of function $y(t)$ and we assume that it can depend on past values of some other functions $x_1, x_2, \ldots, x_m$ in the past as well as on the past values of $y$ itself.

In discretized form (with constant timestep $\Delta t$) we can write it as mapping:

$$ \lim\limits_{k\to\infty} \left[\begin{array}{c} x_{1}(t) \\ x_{1}(t-\Delta t)\\ x_{1}(t-2\Delta t)\\ \vdots\\ x_{1}(t-k\Delta t)\\ x_{2}(t)\\ x_{2}(t-\Delta t)\\ x_{2}(t-2\Delta t)\\ \vdots\\ x_{2}(t-k\Delta t)\\ \vdots\\ \vdots\\ x_{m}(t)\\ x_{m}(t-\Delta t)\\ x_{m}(t-2\Delta t)\\ \vdots\\ x_{m}(t-k\Delta t)\\ y(t)\\ y(t-\Delta t)\\ y(t-2\Delta t)\\ \vdots\\ y(t-k\Delta t) \end{array}\right] \qquad\mapsto\qquad \left[\begin{array}{c} y(t+\Delta t) \\ y(t+2\Delta t) \\ \vdots \\ y(t+l\Delta t) \\ \end{array}\right] \qquad\qquad k,l \in \mathbb{Z}_+ $$

This unknown mapping will be approximated using supervised machine learing methods, i. e. neural networks.

See more here.

Pipelines

Data aquisition pipeline

This pipeline is located at /src/aluminium_prediction/pipelines/aquisition.

Aquisition pipeline collects data scraping them from websites or utilizing certain APIs and resulting dataset contains information about:

In order to aim most "interesting" variables the dependencies (information transfer / causal relationships) between different timeseries should be determined -- i. e. using cross-corelation or transfer entropy.

This part should also produce raport about current data.

Data preparation & augmentation pipeline

Model training pipeline

Inference pipeline

Model update pipeline

API

Should return reports as http websites.

Raports should contain current most important price indicators and price predictions.

Potential issues

Too small dataset
- In this article LSTMs with ~200.000 parameters and transformers with ~20.000 parameters are trained, while we can use data from 2000-4000 days.
- Probably more than one variable should be considered (not only aluminium prices). <---- Dimensionality reduction? Autoencoder? PCA? Granger Causality Test? Transfer Entropy?
- Transformer NN demand less parameters in comparison with LSTM. Some convolution layers can also improve performance (see here).
- As we can have data from different sources (which sligthly differ) we can produce multiple different time series
We don't expect data to be stationary in time?
- Model should not be trained with earlier data to be tested with later data/predict new data
- How to choose data 'points' for test/valid/train from dataset? Randomly? K-fold? https://stats.stackexchange.com/questions/14099/using-k-fold-cross-validation-for-time-series-model-selection

Literature

Transfer of "information" between timeseries

We can aim two issues concernig information/entropy/complexcity of timeseries:

How much information is in given timeseries?
We want to describe to determine if adding new timeseries to dataset can improve inference performace i. e. if there is information transfer between two series.

Measures for first issuse can be Approximate Entropy (AppEn), Sample Entropy (SampEn) or Renyi Entropy. Second issue can be solved using some measures of information transfer, i. e. Transfer Entropy,¹?

Python implementaion of entropy algorithms is provided by EntropyHub

Usage of AppEn in financial timeseries was described in S. Pincus, R. E. Kalman, Irregularity, volatility, risk, and financial market time series.

Transfer Entropy is most promising approach to measuring transfer of information between timeseries. It was introduced in T. Schreiber, Phys. Rev. Lett. 85 (2000) 461. and described in wider context by P. Jizba, H. Kleinert, M. Shefaat Rényi's information transfer between financial time series (Physica A: Statistical Mechanics and Its Applications. 391 (10): 2971–2989.). Python implementaion is provided by copent package, but olny as an approximation in terms of Coupola entropy (see). Github repository for copent here.

Neural networks

Example applications of neural networks in financial time series forecasting are available, i. e.:

Forecating models

gluonts, https://ts.gluon.ai/stable/tutorials/advanced_topics/howto_pytorch_lightning.html

https://github.com/thuml/Time-Series-Library/tree/main

uni2ts/moirai, https://huggingface.co/Salesforce/moirai-1.0-R-large

Time Series Transformer (TST), https://huggingface.co/blog/time-series-transformers

$$ \begin{array}{ccccc} & \text{TST} & \text{moirai} & \text{TSlib}\\ \text{static real features} & yes & no & no & \text{data with lower sampling (i.e GDP)}\\ \text{(past) dynamic real features} & no & \text{prediction only} & no & \text{features we don't want to predict}\\ \\ \end{array} $$

Information about deployment

https://gitlab.com/inzynier-ai/

Cross-entropy is somehow different, because it is defined on two probability distributions and not timeseries, but in EntopyHub are some functions for estimation of entropy between two univariate data sequences. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.vscode		.vscode
conf		conf
data		data
notebooks		notebooks
readme		readme
src		src
tests		tests
.gitignore		.gitignore
Plotly_example.ipynb		Plotly_example.ipynb
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aluminium-prediction

Overview

Project plan

Problem statement

Pipelines

Data aquisition pipeline

Data preparation & augmentation pipeline

Model training pipeline

Inference pipeline

Model update pipeline

API

Potential issues

Literature

Transfer of "information" between timeseries

Neural networks

Forecating models

Information about deployment

About

Releases

Packages

Languages

KKobuszewski/aluminium-prediction

Folders and files

Latest commit

History

Repository files navigation

aluminium-prediction

Overview

Project plan

Problem statement

Pipelines

Data aquisition pipeline

Data preparation & augmentation pipeline

Model training pipeline

Inference pipeline

Model update pipeline

API

Potential issues

Literature

Transfer of "information" between timeseries

Neural networks

Forecating models

Information about deployment

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages