-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Add sparse non-negative OLS and WLS via QP for MinTraceSparse
#319
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks this is great work! Some minor comments so far:
|
Hi, Olivier! I hope you're well. The linting issue is strange, thanks for pointing that out! I thought I ran Yup, I also thought that three is close to excessive. The good news is that they're both light on dependencies as both have numpy and scipy dependencies which don't conflict with ours. OSQP has a third dependency called qdldl. Would you rather scale it down to one sparse QP solver? It's straightforward to express the secondary approach in a problem format that OSQP will accept as the only change is reformulating the linear constraints, or vice versa for Clarabel. In my experience, the trade-off between the two in this context is that OSQP is slightly faster than Clarabel but slightly less reliable, particularly for the second approach unless you set higher accuracy, hence the incorporation of the two solvers. Unfortunately, we have to keep quadprog for now until My design decision for the PR just comes down to my workflow in my job and what I think is most valuable in a large business context. When evaluating tens or hundreds of millions of forecasts, it's important to have a process that's robust and performant. At that scale, in this case it's inconsequential whether for example, the QP solvers struggled with a particular reconciliation problem, or which if any of the two approaches were successful, as long as the best estimate coherent non-negative forecasts are returned for a given I understand if you think this methodology isn't suitable for most users of the package. If you'd like, I can easily simplify it and scale it way back to something like I find the QP solutions almost always improve upon the heuristic solution if successful, which you can measure by using the heuristic solution as a baseline and calculating metrics like the MSSE or RMSSE. I can put something together that quantitatively shows the differences between the QP approaches and the heuristic approach and measures the alignment of the two approaches with the dense approach in This is purely anecdotal, but using Let me know what you think and I can do the tests and make the changes. 😄 |
Thanks for the elaborate reply, hope you're well too. Bit of holidays, so late reply from my side.
In my IDE (VSCode) I need to make sure the notebooks are cleared, then saved, and then cleaned. That usually fixes it.
I'm fine with adding dependencies, and the proposed workflow, if we have a test demonstrating the clear benefit over the scaled down / simple solution. In my opinion, the result of that test/comparison decides what the default should be. Little benefit? Keep default at 'heuristic' or 'osqp', or whatever we think is the best default. Large benefit? Keep default at 'combined', which is the workflow you've suggested at this moment. The benefit should be well balanced between additional computational cost vs accuracy. That's why the test for multiple datasets, including runtime per solution and accuracy per solution is necessary. Then we can make that decision. Wdyt? |
I definitely agree on the testing approach, but honestly that is a hell of a lot of testing for this feature. I should've just kept it simple! 😅 I'm going to remove all of my code for the warm start, "secondary approach", and solution evaluation, and just solve the "primary approach" using Clarabel, triggered when It should then be straightforward to add GLS at a later date, as it's going to be impractical to warm start this way when we use the covariance matrix or Schäfer-Strimmer shrinkage estimator as |
Ok - sure it's that much? I was thinking of something like the below, and just evaluate everything at eps of 1e-5 or 1e-6. That's 24 experiments, for which I'd assume runtime of the Sparse variants is super fast, so in terms of time only the dense variant is time-consuming, no? I think the work you did is great and I don't want you to have to spend hours testing. |
Introduction
This PR improves the current sparse non-negative reconciliation in$S^{T}W^{-1}S$ , thereby maintaining the sparsity of the problem.
MinTraceSparse
by expanding upon the heuristic from my previous PR, #284, to improve the quality of the non-negative reconciled forecasts for large numbers of time series. It converts the OLS and WLS estimators into quadratic programming (QP) problems with non-negativity constraints, analagous to the approach inMinTrace
, but most importantly avoiding the evaluation of the denseThe implementation in this PR is robust and performant, allowing users dealing with large numbers of time series—which is extremely common in industrial contexts—to reliably generate high quality non-negative OLS and WLS reconciled forecasts in a fraction of the time.
It adds a new QP solver, Clarabel.rs, as a dependency to the package, which uses an interior point algorithm and is now distributed as the standard solver in CVXPY. The new behaviour is triggered by default upon instantiation of
MinTraceSparse
when anonnegative
ofTrue
is passed, but can be toggled off by settingqp
toFalse
, which will revert to the method detailed in #284.I will add GLS estimators,$W$ by making it an upper-triangular matrix, maintaining a reasonable degree of sparsity and decreasing the time taken to invert it.
mint_cov
andmint_shrink
, in a future PR as we can exploit the symmetry ofBackground
where$A$ is the "aggregation matrix", which describes the relationship between the bottom level (leaf node) time series and the aggregated (parent node) time series, $W^{-1}$ is the precision matrix for the problem, $\hat{y}_{h}$ are the base forecasts for a particular horizon step, $h$ , and $s$ is a composition (product) of two convex cones—$\mathcal{Z}$ , a zero cone to model the equality constraints, and $\mathbb{R}_{+}$ , a non-negative cone to model the inequality constraints.
This formulation is the most efficient as$\left(I_{n_{a}} \quad -A\right)$ has the set of all reconciled forecasts in its null space, so the graph structure is encoded in just $n_{a}$ equality constraints, with a remaining $n_{b}$ inequality constraints to constrain $x$ to the convex, non-negative feasible region of the solution space.
We try this approach for each$h$ , and if the solver solves the problem within the default tolerance of $1\mathrm{e}{-8}$ , the primal solutions are post-processed by clipping to $[0, \infty)$ and aggregating with $S$ .
An alternative way of framing the problem in the Wickramasuriya, Turlach, and Hyndman paper, that still maintains the sparsity, is shown below.
where$S$ is the "summing matrix", and $z$ is an auxiliary variable.
This increases the number of decision variables from$n$ to $n$ + $n_{b}$ and the "nnz" of the linear constraints matrix by $2n_{b}$ relative to the first approach, so it is a tad slower in most cases, at least those with a diagonal $W$ .
Benchmarks
As expected, this implementation provides the same MAE and RMSE as$\left( 30\, 490 \times 42\, 840 \right)$ . However, the performance of the heuristic using WLS with variance scaling is bad in these benchmarks, even for Tourism $\left( 304 \times 555 \right)$ , so should be investigated further. If this is also the case when $\left(I_{n_{a}} \quad -A\right)x + \mathcal{Z_{n_{a}}} = 0$ to ensure coherence.
MinTrace
, but is significantly faster. The performance of the heuristic solution in #284 is surprisingly good for OLS and WLS with structural scaling, with MAE extremely close to the QP approaches in a fraction of the time, but with a noticeably worse RMSE at the larger scale of M5nonnegative
isFalse
, an idea would be to change this going forward sowls_var
uses the QP approach without the non-negative constraints, so justThe benchmarks show that the default should be a
qp
ofTrue
to provide the highest quality solutions.Wall Time
MinTrace
MinTraceSparse
#284
MinTraceSparse
#314
ols
wls_struct
wls_var
ols
wls_struct
wls_var
MASE
MinTrace
MinTraceSparse
#284
MinTraceSparse
#314
ols
wls_struct
wls_var
ols
wls_struct
wls_var
1 scaled by the MAE of the out-of-sample
MinTrace
solution2 scaled by the MAE of the out-of-sample heuristic solution as
MinTrace
did not finishRMSSE
MinTrace
MinTraceSparse
#284
MinTraceSparse
#314
ols
wls_struct
wls_var
ols
wls_struct
wls_var
1 scaled by the RMSE of the out-of-sample
MinTrace
solution2 scaled by the RMSE of the out-of-sample heuristic solution as
MinTrace
did not finish