-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: AdaptiveLasso and AdaptiveLassoCV #169
Conversation
Codecov Report
@@ Coverage Diff @@
## master #169 +/- ##
==========================================
- Coverage 86.08% 84.47% -1.62%
==========================================
Files 13 14 +1
Lines 913 979 +66
Branches 120 122 +2
==========================================
+ Hits 786 827 +41
- Misses 97 122 +25
Partials 30 30
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@josephsalmon @agramfort @QB3 Agree with the name for new parameter: |
Maybe this is a little bit off topic but I put it here, just in case:
|
This is interesting, and weird! |
@@ -100,7 +100,10 @@ def celer( | |||
cdef int[:] all_features = np.arange(n_features, dtype=np.int32) | |||
|
|||
for t in range(max_iter): | |||
if t != 0: | |||
# if t != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a severe regression a the beginning of paths for large X (e.g. finance) where X.T @ theta is recomputed while there are 3 features in the WS.
Maintaining Xtheta from one alpha to the other may be a fix.
I made another experiment on Finance: The part where it gets funny: AdaptiveLasso uses only one coefficient:
On a different note, I think the way the adaptive path is computed is suboptimal: instead on doing
we should do
in order to benefit from warm start. Otherwise, for a fresh new alpha (without weights), initiliazing with the last reweighting solution for the previous alpha is not very useful. Makes sense @josephsalmon |
sparsity =1 for AdaptiveLassoCV? really unexpected (normalization issue?). I am also supprised that for alpha -> 0 the two methods reach different errors level (should be the OLS performance, but I agree in high dim, there might be different least-squares solutions). And I agree with the loop inversion: very good idea! |
I am not sure why the two paths should tend to same solutions as alphas goes to 0 : ot should be the basis pursuis performance for the Lasso, but for the adaptive lasso I don't know what the limit is expected to be. |
You are right, I just thought they would be closer... |
@@ -276,13 +279,269 @@ def _more_tags(self): | |||
return {'multioutput': False} | |||
|
|||
|
|||
class AdaptiveLasso(Lasso): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now thinking this is a bad name, as adaptive Lasso takes weights equal to 1 / w^ols_j, and it performs a single Lasso.
What we implement is rather iterative l1 reweighting (Candes Wakin Boyd paper).
IterativeReweightedLasso seems more correct for me, but it does not pop up on google, and I don't know if it good for visibility. When people talk about it in scikit-learn, they say adaptive lasso : scikit-learn/scikit-learn#4912
@agramfort @josephsalmon do you have an opinion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main naming issue to me is that an estimator should be well-defined mathematically; then the implementation is something under the hood for the user.
Here, the algorithm you are proposing is a DC-programming (à la Gasso et al. 2009) approach for solving sparse regression with \ell_0.5 regularization (also referred to as reweighted l1 by Cands et al.).
Hence, I would be in favor of separating the "theoretical" estimator from the algorithms used (for instance a coordinate descent alternative could be considered as another solver for sparse regression with \ell_0.5 regularization).
I agree that AdaptiveLasso is originally 2-step:
- OLS
- Lasso reweighted with weights controlled by step 1.
But I think this is vague enough in the original article (any consistent estimator can be used in the first step, not only OLS), so we can reuse this naming.
Potentially, the exponent \gamma (corresponding to the \ell_q norm used in the Adaptive Lasso paper) could be an optional parameter something like:
lq_norm = 0.5
(with the possibility to add more variants later on).
So in the end, I won't bother too much about the naming and stick to AdaptiveLasso as a good shortcut.
It's way faster this way but some gap scaling is wrong somewhere, we get negative ones in doc examples and on news20:
|
58568db
to
ddd12bb
Compare
closes #165