Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the y_likelihoods #306

Open
Bian-jh opened this issue Sep 8, 2024 · 6 comments
Open

About the y_likelihoods #306

Bian-jh opened this issue Sep 8, 2024 · 6 comments

Comments

@Bian-jh
Copy link

Bian-jh commented Sep 8, 2024

Sorry to bother you, I am recently been in the area of compression, take VARIATIONAL IMAGE COMPRESSION WITH A SCALE HYPERPRIOR as example, what is the meaning of each element in y_likelihoods matrix("likelihoods": {"y": y_likelihoods})?

@YodaEmbedding
Copy link
Contributor

YodaEmbedding commented Sep 8, 2024

Please see the following two sections:

Briefly, each element of y is encoded using a probability distribution p. The y_likelihoods represent the value p(y).

A given element $\hat{y}_i \in \mathbb{Z}$ of the latent tensor $\boldsymbol{\hat{y}}$ is compressed using its encoding distribution $p_{{\hat{y}}_i} : \mathbb{Z} \to [0, 1]$, as visualized in the figure below. The rate cost for encoding $\hat{y}_i$ is the negative log-likelihood, $R_{{\hat{y}}_i} = -\log_2 p_{{\hat{y}}_i}({\hat{y}}_i)$, measured in bits. Afterward, the exact same encoding distribution is used by the decoder to reconstruct the encoded symbol.

Visualization of an encoding distribution used for compressing a single element $\hat{y}_i$. The height of the bin $p_{\hat{y}_i}(\hat{y}_i)$ is highlighted since the rate is equal to the negative log of this bin.

@Bian-jh
Copy link
Author

Bian-jh commented Sep 9, 2024

Thank you for your reply! I really appreciate it!!I still have some issues to ask:
(1) I remove the compression of z and directly predict scales from latent y. Then I pass the scales into GaussianConditional(None) for compression.(Because I want to use compression task as an auxiliary task.) Does it make sense?
(2) I output the max value of y_likelihoods,the max value reaches 1.Is it normal? I feel so confused!
Hoping for your reply!

@YodaEmbedding
Copy link
Contributor

YodaEmbedding commented Sep 9, 2024

  1. The exact same scales need to be available at the decoder, so you will need to transmit them to the decoder as well.
    ENCODER   ==== Communication channel ====>  DECODER
    
    Everything should work fine as long as the decoder can reconstruct the exact same distribution used for encoding.
  2. A likelihood of $p_{{\hat{y}}_i}({\hat{y}}_i)= 1$ means the rate cost is $R_{{\hat{y}}_i} = -\log_2 p_{{\hat{y}}_i}({\hat{y}}_i) = 0$ bits. That's the best-case scenario — being able to predict exactly what the decoded value is, without any uncertainty.

@Bian-jh
Copy link
Author

Bian-jh commented Sep 23, 2024

Thank you so much! I have another question:how can I calculate the y_likelihoods if I don't use quantization(add uniform noise)(Because I want to apply it to my research domain)

@Bian-jh
Copy link
Author

Bian-jh commented Sep 28, 2024

Sorry to bother you again.The element of y_likelihoods represents the value p(yi).Is p(yi) the probability that the symbol yi appears? If so,how is it ensured that the sum of the probabilities of all symbols appearing is equal to 1?

@YodaEmbedding
Copy link
Contributor

YodaEmbedding commented Sep 30, 2024

For the $i$-th symbol, there is a probability distribution $f_i : \mathbb{Z} \to [0, 1]$. (I called it $p_{{\hat{y}}_i}$ earlier.) By definition, $\int_{-\infty}^{\infty} f_i(t) \, dt = 1$.

But how is $f_i$ determined? For the mean-scale hyperprior,

  • To encode $\hat{y}_i$, we use the $\mu_i$ and $\sigma_i$ that is outputted by $h_s$, and construct a Gaussian from it: $f_i(t) = \mathcal{N}(t ; \mu_i, \sigma_i^2)$.
  • To encode $\hat{z}_i$, we use whichever $f_i$ that is learned by the “fully factorized” entropy bottleneck's small fully-connected network for the respective channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants