Categorical takes C-1 inputs for C classes #55

rossviljoen · 2021-11-04T22:30:46Z

Currently, the CategoricalLikelihood is defined to take a vector of C-1 inputs to produce a distribution with C classes by appending a 0 to the input vector before going through the softmax.

This is fine for simple stuff, but I think there are some cases where you'd want to give all C inputs - e.g. if you're doing multi-class classification with a different kernel for each class, I don't think this would be possible with the current version?

Should I make a PR to change it or is there a reason to keep it as is? (would always still be possible to use the current version by appending a zero yourself)

@willtebbutt

The text was updated successfully, but these errors were encountered:

devmotion · 2021-11-04T22:38:05Z

The motivation for fixing one input was to ensure that the mapping is invertible: we map C-1 inputs to the C-1 dimensional simplex. It is the natural generalization of the logistic function, as used e.g. in multinomial logistic regression. I can see though that it can be a bit inconvenient sometimes.

rossviljoen · 2021-11-09T21:54:52Z

What do you think then - change it or leave as is?

theogf · 2022-01-27T11:33:17Z

I have been dealing with categorical likelihoods again recently and I think both are just as valid (interestingly having C inputs adds an unnecessary degree of freedom, and I am not sure what the effect on inference are).
I wanted to add that the choice of the added input (0) should be changeable by the user.

I will make a PR to allow for all these options, maybe I can find an elegant formulation.

Related to this is #58

theogf · 2022-03-09T10:53:35Z

Solved by #61 I believe

theogf · 2022-03-16T16:31:13Z

@devmotion Could you comment on the exchangability of the classes when using the C-1 inputs? Would it still be valid?

devmotion · 2022-03-16T19:37:43Z

I'm not sure, what exactly do you mean?

theogf · 2022-03-16T19:48:13Z

In the C inputs, C classes case I can interchange any class by interchanging the input. Right?
But is it also true for C - 1 inputs?
In another words, is a simplex invariant under permutations?

devmotion · 2022-03-16T21:28:06Z

If you interchange two of the C-1 inputs, then the probabilities of the corresponding two classes will be interchanged as well. And if you want to interchange some class with the reference class, you can either change the reference class or set its input to the additive inverse and subtract it from all other inputs. Is that what you're after?

E.g., if C = 3, then in the case of C - 1 = 2 inputs the vector of class probabilities is computed as softmax([input1, input2, 0]) (by our convention for the reference class). So if you swap input1 and input2, then the probabilities for the first and the second class are swapped. If you want to swap e.g. the first and the third class, then you could just use the first class as reference class instead of the third one. Alternatively, since softmax is shift-invariant we have softmax([0, input2, input1]) = softmax([-input1, input2 - input1, 0]), i.e., you can multiply the first input by -1 and subtract it from all other inputs to interchange the probabilities of classes 1 and 3, without changing the reference class or changing the other class probabilities.

theogf · 2022-03-17T10:02:39Z

Thanks, that is really insightful. My PI was having doubts on this version and was arguing about the exchangeability but I could not find proper arguments.

So interestingly I made a few experiments with my logistic-softmax link. On a simple 1-D example I generate data with C-1 input, and fit it with both C-1 and C GPs.
The C-1 inputs provide consistently a better estimate of the true categorical probabilities but the log-likelihood is worse than with C inputs!

devmotion · 2022-03-17T10:37:55Z

The C-1 is common in multinomial logistic regression (and, of course, logistic regression): https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_set_of_independent_binary_regressions With C-1 inputs one also has the nice interpretation of the inputs as log odds which is lost in case of C inputs.

theogf · 2022-03-17T10:42:50Z

Sure! I think he was directly having in mind processes where the order matters like the stick-breaking process https://en.wikipedia.org/wiki/Dirichlet_process#The_stick-breaking_process but probably got confused

theogf mentioned this issue Jan 27, 2022

Rework categorical to allow multiple variants #61

Merged

theogf closed this as completed Mar 9, 2022

devmotion mentioned this issue Aug 13, 2023

Question on simplex bijector implementation TuringLang/Bijectors.jl#283

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorical takes C-1 inputs for C classes #55

Categorical takes C-1 inputs for C classes #55

rossviljoen commented Nov 4, 2021 •

edited

Loading

devmotion commented Nov 4, 2021

rossviljoen commented Nov 9, 2021

theogf commented Jan 27, 2022 •

edited

Loading

theogf commented Mar 9, 2022

theogf commented Mar 16, 2022

devmotion commented Mar 16, 2022

theogf commented Mar 16, 2022 •

edited

Loading

devmotion commented Mar 16, 2022

theogf commented Mar 17, 2022

devmotion commented Mar 17, 2022 •

edited

Loading

theogf commented Mar 17, 2022

Categorical takes C-1 inputs for C classes #55

Categorical takes C-1 inputs for C classes #55

Comments

rossviljoen commented Nov 4, 2021 • edited Loading

devmotion commented Nov 4, 2021

rossviljoen commented Nov 9, 2021

theogf commented Jan 27, 2022 • edited Loading

theogf commented Mar 9, 2022

theogf commented Mar 16, 2022

devmotion commented Mar 16, 2022

theogf commented Mar 16, 2022 • edited Loading

devmotion commented Mar 16, 2022

theogf commented Mar 17, 2022

devmotion commented Mar 17, 2022 • edited Loading

theogf commented Mar 17, 2022

rossviljoen commented Nov 4, 2021 •

edited

Loading

theogf commented Jan 27, 2022 •

edited

Loading

theogf commented Mar 16, 2022 •

edited

Loading

devmotion commented Mar 17, 2022 •

edited

Loading