Skip to content

v0.3.0

Compare
Choose a tag to compare
@mitokic mitokic released this 10 Aug 19:31
· 176 commits to main since this release
2dfe73a

finnts 0.3.0

Improvements

  • Spark data frame support. Initial input data can now be a spark data frame, enabling millions of time series to be ran across a spark compute cluster.
  • Updated train/validation/test process for multivariate ML models.
  • In addition to existing forecast_time_series(), added new sub components of the finnts forecast process that can be called separately or in a production pipeline. Allows for more control of the forecast process
    • prep_data()
    • prep_models()
    • train_models()
    • ensemble_models()
    • final_models()
  • Automated read and write capabilities. Intermediate and final Finn outputs are now automatically written to disk (see options below). This creates better MLOps capabilities, easier scale on spark, and better fault tolerance by not needing to start the whole forecast process over from scratch if an error occurred.
    • Temporary location on local machine, which will then get deleted after R session is closed.
    • Path on local machine or a mounted Azure Data Lake Storage path in spark to save the intermediate and final Finn run results.
    • Azure Blob Storage to store non-spark runs on a data lake. SharePoint/OneDrive storage to store non-spark runs within M365.
  • New MLOps features that allow you to retrieve the final trained models through get_trained_models(), get specific run information thorough get_run_info(), and even retrieve the initial feature engineered data through get_prepped_data().

Deprecated

  • run_model_parallel has been replaced with inner_parallel within forecast_time_series()
  • Data being returned as a list when running forecast_time_series(). Instead please use get_forecast_data() to retrieve Finn forecast outputs.

Breaking Changes

  • No longer support for Azure Batch parallel processing, please use spark instead
  • Parallel processing through spark now needs a mounted Azure Data Lake Storage path supplied through set_run_info(). Please refer to the vignettes for more details.