Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical takes C-1 inputs for C classes #55

Closed
rossviljoen opened this issue Nov 4, 2021 · 11 comments
Closed

Categorical takes C-1 inputs for C classes #55

rossviljoen opened this issue Nov 4, 2021 · 11 comments

Comments

@rossviljoen
Copy link
Contributor

rossviljoen commented Nov 4, 2021

Currently, the CategoricalLikelihood is defined to take a vector of C-1 inputs to produce a distribution with C classes by appending a 0 to the input vector before going through the softmax.

This is fine for simple stuff, but I think there are some cases where you'd want to give all C inputs - e.g. if you're doing multi-class classification with a different kernel for each class, I don't think this would be possible with the current version?

Should I make a PR to change it or is there a reason to keep it as is? (would always still be possible to use the current version by appending a zero yourself)

@willtebbutt

@devmotion
Copy link
Member

The motivation for fixing one input was to ensure that the mapping is invertible: we map C-1 inputs to the C-1 dimensional simplex. It is the natural generalization of the logistic function, as used e.g. in multinomial logistic regression. I can see though that it can be a bit inconvenient sometimes.

@rossviljoen
Copy link
Contributor Author

What do you think then - change it or leave as is?

@theogf
Copy link
Member

theogf commented Jan 27, 2022

I have been dealing with categorical likelihoods again recently and I think both are just as valid (interestingly having C inputs adds an unnecessary degree of freedom, and I am not sure what the effect on inference are).
I wanted to add that the choice of the added input (0) should be changeable by the user.

I will make a PR to allow for all these options, maybe I can find an elegant formulation.

Related to this is #58

@theogf
Copy link
Member

theogf commented Mar 9, 2022

Solved by #61 I believe

@theogf theogf closed this as completed Mar 9, 2022
@theogf
Copy link
Member

theogf commented Mar 16, 2022

@devmotion Could you comment on the exchangability of the classes when using the C-1 inputs? Would it still be valid?

@devmotion
Copy link
Member

I'm not sure, what exactly do you mean?

@theogf
Copy link
Member

theogf commented Mar 16, 2022

In the C inputs, C classes case I can interchange any class by interchanging the input. Right?
But is it also true for C - 1 inputs?
In another words, is a simplex invariant under permutations?

@devmotion
Copy link
Member

If you interchange two of the C-1 inputs, then the probabilities of the corresponding two classes will be interchanged as well. And if you want to interchange some class with the reference class, you can either change the reference class or set its input to the additive inverse and subtract it from all other inputs. Is that what you're after?

E.g., if C = 3, then in the case of C - 1 = 2 inputs the vector of class probabilities is computed as softmax([input1, input2, 0]) (by our convention for the reference class). So if you swap input1 and input2, then the probabilities for the first and the second class are swapped. If you want to swap e.g. the first and the third class, then you could just use the first class as reference class instead of the third one. Alternatively, since softmax is shift-invariant we have softmax([0, input2, input1]) = softmax([-input1, input2 - input1, 0]), i.e., you can multiply the first input by -1 and subtract it from all other inputs to interchange the probabilities of classes 1 and 3, without changing the reference class or changing the other class probabilities.

@theogf
Copy link
Member

theogf commented Mar 17, 2022

Thanks, that is really insightful. My PI was having doubts on this version and was arguing about the exchangeability but I could not find proper arguments.

So interestingly I made a few experiments with my logistic-softmax link. On a simple 1-D example I generate data with C-1 input, and fit it with both C-1 and C GPs.
The C-1 inputs provide consistently a better estimate of the true categorical probabilities but the log-likelihood is worse than with C inputs!

@devmotion
Copy link
Member

devmotion commented Mar 17, 2022

The C-1 is common in multinomial logistic regression (and, of course, logistic regression): https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_set_of_independent_binary_regressions With C-1 inputs one also has the nice interpretation of the inputs as log odds which is lost in case of C inputs.

@theogf
Copy link
Member

theogf commented Mar 17, 2022

Sure! I think he was directly having in mind processes where the order matters like the stick-breaking process https://en.wikipedia.org/wiki/Dirichlet_process#The_stick-breaking_process but probably got confused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants