-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP isolate convergence warning in LogReg #225
Conversation
Codecov Report
@@ Coverage Diff @@
## main #225 +/- ##
=======================================
Coverage 86.50% 86.50%
=======================================
Files 14 14
Lines 963 963
Branches 128 128
=======================================
Hits 833 833
Misses 100 100
Partials 30 30
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
To clarify: can you get the exact
The goal is not just to produce a ConvergenceWarning, this is easy (just put a hard problem with a low regularization, lots of features and few iterations). The goal is to understand why the solver fails in the setup of the test. So the first step is to get the exact problem causing the warning (not to generate random X and y ad to try to get a ConvergenceWarning on it). |
@mathurinm, I have two hypotheses regarding the
|
How is alpha as a fraction of alpha_max, ie norm(X.T @ y, ord=np.inf) ? |
|
The fact that the data is not centered does not mean that the solver should not converge on it when fit_intercept=False. Fitting an intercept or not just means solving one optimization problem or another. |
@mathurinm, also referring to the plot, primal stagnates whereas gap keeps decreasing. It seems like we are stuck at a degenerate point. |
The plot displays incomparable quantities: gap goes to 0 while primal converges to a > 0 value, hence primal may continue decreasing at the same speed as the gap but it's not visible. You need to substract the primal limit to the primal objectives if you want comparable quantities
Stopping based on the primal does not offer guarantees and that is not the way we have chosen in celer. |
@mathurinm, you are right! Anyways, in |
This figure is mathematically impossible, the duality gap is always greater than the primal suboptimality. Can you also look into the large peak for the gap around iteration 65 ? |
It is possible, but the double axis scales tricked you. A good reason to never use it. I agree the peak is huge and looks strange |
Ah yes thanks Joseph. Also beware of the way you compute the primal optimum, since we're looking at a convergence issue the last primal after 100 iterations is not necessarily equal to the optimum up to machine precision |
It's true. The scale of the multiaxis plot misled us. I don't have a comment on the pick around iteration 65. Indeed, it doesn't break the previous rule. Finally, I am pretty sure that there is no (small) mistake in the implementation of |
Referring to my knowledge of deep learning, I think that the slowness of the solver might be due to (some sort of) vanishing gradient. Indeed, we compute the gradient of the sigmoid function with data drawn from |
Thanks for the reproducing scripts @Badr-MOUFAD addressed in #227 |
fixes #215