Skip to content

KKobuszewski/aluminium-prediction

Repository files navigation

aluminium-prediction

Overview

This project was partly generated by Kedro with Kedro-Viz setup, which was generated using kedro 0.19.4.

Take a look at the hints in readme/Kedro.md



Project plan

See project diagram here.



Problem statement

We want to predict future values of function $y(t)$ and we assume that it can depend on past values of some other functions $x_1, x_2, \ldots, x_m$ in the past as well as on the past values of $y$ itself.

In discretized form (with constant timestep $\Delta t$) we can write it as mapping:

$$ \lim\limits_{k\to\infty} \left[\begin{array}{c} x_{1}(t) \\ x_{1}(t-\Delta t)\\ x_{1}(t-2\Delta t)\\ \vdots\\ x_{1}(t-k\Delta t)\\ x_{2}(t)\\ x_{2}(t-\Delta t)\\ x_{2}(t-2\Delta t)\\ \vdots\\ x_{2}(t-k\Delta t)\\ \vdots\\ \vdots\\ x_{m}(t)\\ x_{m}(t-\Delta t)\\ x_{m}(t-2\Delta t)\\ \vdots\\ x_{m}(t-k\Delta t)\\ y(t)\\ y(t-\Delta t)\\ y(t-2\Delta t)\\ \vdots\\ y(t-k\Delta t) \end{array}\right] \qquad\mapsto\qquad \left[\begin{array}{c} y(t+\Delta t) \\ y(t+2\Delta t) \\ \vdots \\ y(t+l\Delta t) \\ \end{array}\right] \qquad\qquad k,l \in \mathbb{Z}_+ $$

This unknown mapping will be approximated using supervised machine learing methods, i. e. neural networks.

See more here.



Pipelines

Data aquisition pipeline

This pipeline is located at /src/aluminium_prediction/pipelines/aquisition.

Aquisition pipeline collects data scraping them from websites or utilizing certain APIs and resulting dataset contains information about:

In order to aim most "interesting" variables the dependencies (information transfer / causal relationships) between different timeseries should be determined -- i. e. using cross-corelation or transfer entropy.


This part should also produce raport about current data.


Data preparation & augmentation pipeline

Model training pipeline

Inference pipeline

Model update pipeline

API

Should return reports as http websites.

Raports should contain current most important price indicators and price predictions.

Potential issues

  1. Too small dataset

    • In this article LSTMs with ~200.000 parameters and transformers with ~20.000 parameters are trained, while we can use data from 2000-4000 days.
    • Probably more than one variable should be considered (not only aluminium prices). <---- Dimensionality reduction? Autoencoder? PCA? Granger Causality Test? Transfer Entropy?
    • Transformer NN demand less parameters in comparison with LSTM. Some convolution layers can also improve performance (see here).
    • As we can have data from different sources (which sligthly differ) we can produce multiple different time series
  2. We don't expect data to be stationary in time?

Literature

Transfer of "information" between timeseries

We can aim two issues concernig information/entropy/complexcity of timeseries:

  1. How much information is in given timeseries?
  2. We want to describe to determine if adding new timeseries to dataset can improve inference performace i. e. if there is information transfer between two series.

Measures for first issuse can be Approximate Entropy (AppEn), Sample Entropy (SampEn) or Renyi Entropy. Second issue can be solved using some measures of information transfer, i. e. Transfer Entropy,1?

Python implementaion of entropy algorithms is provided by EntropyHub

Usage of AppEn in financial timeseries was described in S. Pincus, R. E. Kalman, Irregularity, volatility, risk, and financial market time series.

Transfer Entropy is most promising approach to measuring transfer of information between timeseries. It was introduced in T. Schreiber, Phys. Rev. Lett. 85 (2000) 461. and described in wider context by P. Jizba, H. Kleinert, M. Shefaat Rényi's information transfer between financial time series (Physica A: Statistical Mechanics and Its Applications. 391 (10): 2971–2989.). Python implementaion is provided by copent package, but olny as an approximation in terms of Coupola entropy (see). Github repository for copent here.

Neural networks

Example applications of neural networks in financial time series forecasting are available, i. e.:

Forecating models

gluonts, https://ts.gluon.ai/stable/tutorials/advanced_topics/howto_pytorch_lightning.html

https://github.com/thuml/Time-Series-Library/tree/main

uni2ts/moirai, https://huggingface.co/Salesforce/moirai-1.0-R-large

Time Series Transformer (TST), https://huggingface.co/blog/time-series-transformers

$$ \begin{array}{ccccc} & \text{TST} & \text{moirai} & \text{TSlib}\\ \text{static real features} & yes & no & no & \text{data with lower sampling (i.e GDP)}\\ \text{(past) dynamic real features} & no & \text{prediction only} & no & \text{features we don't want to predict}\\ \\ \end{array} $$

Information about deployment

https://gitlab.com/inzynier-ai/

Footnotes

  1. Cross-entropy is somehow different, because it is defined on two probability distributions and not timeseries, but in EntopyHub are some functions for estimation of entropy between two univariate data sequences.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages