This project was partly generated by Kedro with Kedro-Viz setup, which was generated using kedro 0.19.4
.
Take a look at the hints in readme/Kedro.md
See project diagram here.
We want to predict future values of function
In discretized form (with constant timestep
This unknown mapping will be approximated using supervised machine learing methods, i. e. neural networks.
See more here.
This pipeline is located at /src/aluminium_prediction/pipelines/aquisition.
Aquisition pipeline collects data scraping them from websites or utilizing certain APIs and resulting dataset contains information about:
- aluminium prices,
- aluminium production
- indicators of consumption
- indicators of market uncertainity
- Indicators of trade
- USD/EUR ???
In order to aim most "interesting" variables the dependencies (information transfer / causal relationships) between different timeseries should be determined -- i. e. using cross-corelation or transfer entropy.
This part should also produce raport about current data.
Should return reports as http websites.
Raports should contain current most important price indicators and price predictions.
-
Too small dataset
- In this article LSTMs with ~200.000 parameters and transformers with ~20.000 parameters are trained, while we can use data from 2000-4000 days.
- Probably more than one variable should be considered (not only aluminium prices). <---- Dimensionality reduction? Autoencoder? PCA? Granger Causality Test? Transfer Entropy?
- Transformer NN demand less parameters in comparison with LSTM. Some convolution layers can also improve performance (see here).
- As we can have data from different sources (which sligthly differ) we can produce multiple different time series
-
We don't expect data to be stationary in time?
- Model should not be trained with earlier data to be tested with later data/predict new data
- How to choose data 'points' for test/valid/train from dataset? Randomly? K-fold? https://stats.stackexchange.com/questions/14099/using-k-fold-cross-validation-for-time-series-model-selection
We can aim two issues concernig information/entropy/complexcity of timeseries:
- How much information is in given timeseries?
- We want to describe to determine if adding new timeseries to dataset can improve inference performace i. e. if there is information transfer between two series.
Measures for first issuse can be Approximate Entropy (AppEn), Sample Entropy (SampEn) or Renyi Entropy. Second issue can be solved using some measures of information transfer, i. e. Transfer Entropy,1?
Python implementaion of entropy algorithms is provided by EntropyHub
Usage of AppEn in financial timeseries was described in S. Pincus, R. E. Kalman, Irregularity, volatility, risk, and financial market time series.
Transfer Entropy is most promising approach to measuring transfer of information between timeseries. It was introduced in T. Schreiber, Phys. Rev. Lett. 85 (2000) 461. and described in wider context by P. Jizba, H. Kleinert, M. Shefaat Rényi's information transfer between financial time series (Physica A: Statistical Mechanics and Its Applications. 391 (10): 2971–2989.). Python implementaion is provided by copent package, but olny as an approximation in terms of Coupola entropy (see). Github repository for copent here.
Example applications of neural networks in financial time series forecasting are available, i. e.:
gluonts, https://ts.gluon.ai/stable/tutorials/advanced_topics/howto_pytorch_lightning.html
https://github.com/thuml/Time-Series-Library/tree/main
uni2ts/moirai, https://huggingface.co/Salesforce/moirai-1.0-R-large
Time Series Transformer (TST), https://huggingface.co/blog/time-series-transformers
https://gitlab.com/inzynier-ai/
Footnotes
-
Cross-entropy is somehow different, because it is defined on two probability distributions and not timeseries, but in EntopyHub are some functions for estimation of entropy between two univariate data sequences. ↩