Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP isolate convergence warning in LogReg #225

Closed
wants to merge 17 commits into from

Conversation

Badr-MOUFAD
Copy link
Collaborator

fixes #215

@codecov-commenter
Copy link

codecov-commenter commented Apr 1, 2022

Codecov Report

Merging #225 (634870b) into main (700c280) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #225   +/-   ##
=======================================
  Coverage   86.50%   86.50%           
=======================================
  Files          14       14           
  Lines         963      963           
  Branches      128      128           
=======================================
  Hits          833      833           
  Misses        100      100           
  Partials       30       30           
Flag Coverage Δ
unittests ∅ <ø> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
celer/homotopy.py 86.82% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 700c280...634870b. Read the comment docs.

@mathurinm
Copy link
Owner

To clarify: can you get the exact X, y and regularization strength that cause these convergence warnings output by pytest :

celer/tests/test_logreg.py::test_LogisticRegression[True]
  /home/mathurin/workspace/celer/celer/homotopy.py:311: ConvergenceWarning: Objective did not converge: duality gap: 0.841892524980679, tolerance: 0.005545177444479563. Increasing `tol` may make the solver faster without affecting the results much. 
  Fitting data with very small alpha causes precision issues.
    sol = newton_celer(

celer/tests/test_logreg.py::test_LogisticRegression[True]
  /home/mathurin/workspace/celer/celer/homotopy.py:311: ConvergenceWarning: Objective did not converge: duality gap: 23.09072008747797, tolerance: 0.006931471805599453. Increasing `tol` may make the solver faster without affecting the results much. 
  Fitting data with very small alpha causes precision issues.
    sol = newton_celer(

celer/tests/test_logreg.py::test_LogisticRegression[True]
  /home/mathurin/workspace/celer/celer/homotopy.py:311: ConvergenceWarning: Objective did not converge: duality gap: 2.1031969926275593, tolerance: 0.006931471805599453.

The goal is not just to produce a ConvergenceWarning, this is easy (just put a hard problem with a low regularization, lots of features and few iterations).

The goal is to understand why the solver fails in the setup of the test. So the first step is to get the exact problem causing the warning (not to generate random X and y ad to try to get a ConvergenceWarning on it).
Is it clearer now ?

@Badr-MOUFAD
Copy link
Collaborator Author

@mathurinm, I have two hypotheses regarding the ConvergenceWarning:

  • alpha is too small (small depends on X, y and tol). Hence, a lower bound should be computed to prevent that.
  • X is not centered. Indeed, the dumped data is drawn from normal(100, 1).

@mathurinm
Copy link
Owner

How is alpha as a fraction of alpha_max, ie norm(X.T @ y, ord=np.inf) ?
What makes you think centering plays a role here ?

@Badr-MOUFAD
Copy link
Collaborator Author

Badr-MOUFAD commented Apr 3, 2022

  • The order of alpha / alpha_max is 1e-5
  • I think centering is not implemented (referring to this line). The dumped data, namely X is drawn from N(100, 1). I don't get warnings when I test with X drawn from N(0, 1)

@mathurinm
Copy link
Owner

The fact that the data is not centered does not mean that the solver should not converge on it when fit_intercept=False. Fitting an intercept or not just means solving one optimization problem or another.

@Badr-MOUFAD
Copy link
Collaborator Author

Badr-MOUFAD commented Apr 3, 2022

  • primal decreases exponentially in the first iteration and slows down (stagnates) later.
  • (quasi) inverse pattern is observed for gap.

@mathurinm, also referring to the plot, primal stagnates whereas gap keeps decreasing. It seems like we are stuck at a degenerate point.
Can't we just break when primal stagnates since our objective is to minimize it?

logs

@josephsalmon
Copy link
Contributor

Some people do indeed, but mostly when the pb tends to degenerate towards probabilities 0 or 1:
see glmnet paper:
image

I have not heard of tricks around 0.5 0.5 (your case here) though.

@mathurinm
Copy link
Owner

The plot displays incomparable quantities: gap goes to 0 while primal converges to a > 0 value, hence primal may continue decreasing at the same speed as the gap but it's not visible. You need to substract the primal limit to the primal objectives if you want comparable quantities

It seems like we are stuck at a degenerate point.
Degenerate in which sense ?

Stopping based on the primal does not offer guarantees and that is not the way we have chosen in celer.

@Badr-MOUFAD
Copy link
Collaborator Author

@mathurinm, you are right!
When visualizing primal residuals, we get the (quasi) same pattern.

Anyways, in LogisticRegression, when using celer as solver instead of proximal newton, the fit converges in 2 iterations.
refer to celer/tests/isolate_conv_warn.py

primal-residuals

@mathurinm
Copy link
Owner

mathurinm commented Apr 4, 2022

This figure is mathematically impossible, the duality gap is always greater than the primal suboptimality. Can you also look into the large peak for the gap around iteration 65 ?

@josephsalmon
Copy link
Contributor

It is possible, but the double axis scales tricked you. A good reason to never use it.

I agree the peak is huge and looks strange

@mathurinm
Copy link
Owner

Ah yes thanks Joseph.
@Badr-MOUFAD the double axis is misleading and there is no reason to use it as the two plotted quantities are directly homogeneous and comparable.

Also beware of the way you compute the primal optimum, since we're looking at a convergence issue the last primal after 100 iterations is not necessarily equal to the optimum up to machine precision

@Badr-MOUFAD
Copy link
Collaborator Author

This figure is mathematically impossible,
the duality gap is always greater than the primal suboptimality
Can you also look into the large peak for the gap around iteration 65?

It's true. The scale of the multiaxis plot misled us.
With that being said, by fixing that, there is no contradiction. The plot of the dual gap is always above the residuals.

I don't have a comment on the pick around iteration 65. Indeed, it doesn't break the previous rule.

Finally, I am pretty sure that there is no (small) mistake in the implementation of newton_celer, especially knowing that it was coded by @mathurinm.

mul_axis_scaled_plus__

@Badr-MOUFAD
Copy link
Collaborator Author

Referring to my knowledge of deep learning, I think that the slowness of the solver might be due to (some sort of) vanishing gradient. Indeed, we compute the gradient of the sigmoid function with data drawn from N(100, 1).

celer/homotopy.py Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
@Badr-MOUFAD Badr-MOUFAD requested a review from mathurinm April 6, 2022 15:59
@mathurinm
Copy link
Owner

Thanks for the reproducing scripts @Badr-MOUFAD

addressed in #227

@mathurinm mathurinm closed this Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

warnings in Logisticregression with prox newton
4 participants