Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with the pmax option and the gaussian family... #40

Open
michaelcoconnor opened this issue May 20, 2019 · 2 comments
Open

Problem with the pmax option and the gaussian family... #40

michaelcoconnor opened this issue May 20, 2019 · 2 comments

Comments

@michaelcoconnor
Copy link

It seems that if I set pmax=n where n is any integer then I don't get past the following lines in glmnet.py (around line 315): `

nx = options['pmax']; if len(nx) == 0: nx = min(ne*2 + 20, nvars)`

Of course if pmax is an integer it doesn't have a length, so that seems to be the problem. This seems to originate in glmnetSet.py where the default value of pmax is set to scipy.empty([0]) which has a length of zero.

Upon encountering this I attempted a fix by replacing scipy.empty([0]) in glmnetSet.py with None and revising the code at about line 315 of glmnet.py to:

nx = options['pmax']; if nx is None: nx = min(ne*2 + 20, nvars)

Then if I do a run with pmax=nvars everything is fine. However, if I set pmax<nvars, say 8 instead of 10, I get Warning: Non-fatal error in glmnet library call... with error codes that varied if I changed pmax.

I have traced where nx is submitted to the Fortran code but don't see anything that could cause an error (but I'm no expert about any of this).

So then it occurred to me that, like the participants in this matter, I found the actual meaning of dfmax and pmax to be obscure... thanks in no small measure to the indefinite wording of this. So I tried setting pmax=None (in my modified code) but dfmax=n where n was varied. No errors were encountered but if n was set to, say, 2, then the number of non-zero betas was unaffected and exceeded 2. So I'm at a loss as to how to proceed, to realize the promise of dfmax and pmax. And I don't know if my fix of the integer problem is really OK.

@michaelcoconnor
Copy link
Author

I have just found some additional information on pmax and dfmax. Scroll down about half way to the elnet call arguments. There as in this project's code pmax is nx internally and dfmax is ne.

@michaelcoconnor
Copy link
Author

I cranked up the competing python-glmnet project 's code running the gaussian (aka linear) and modified it to permit a pmax entry (they do have a max_features which is the same as dfmax).

The results were exactly the same. Without setting those options the coefficients produced are the same as with the present project, and the errors upon attempting to use pmax are about the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant