Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I wanted to migrate that program to sequence signal data, I found that @register_model("mbt2018") The compress and decompress methods of class JointAutoregressiveHierarchicalPriors are very complex #294

Open
wei-mei opened this issue Jun 1, 2024 · 1 comment

Comments

@wei-mei
Copy link

wei-mei commented Jun 1, 2024

When I wanted to migrate that program to sequence signal data, I found that
The compress and decompress methods of class

@register_model("mbt2018")
JointAutoregressiveHierarchicalPriors

are very complex. Why is it so complex compared to the compress function and the decompress function of the previous models?

s = 4  # scaling factor between z and y
kernel_size = 5  # context prediction kernel size
padding = (kernel_size - 1) // 2

y_height = z_hat.size(2) * s
y_width = z_hat.size(3) * s
  • What is the purpose of the above s? Is it necessary?
y_q = self.gaussian_conditional.quantize(y_crop, "symbols", means_hat)
  • Why is symbol quantization used here instead of dequantize, which is the opposite of training?

Also, do I have to train the aux loss very small to get similar results for the forward function and the compress and decompress functions?

Thank you very much for your answer!

@YodaEmbedding
Copy link
Contributor

YodaEmbedding commented Jun 4, 2024

The autoregression portion requires a loop of many steps at runtime. This is because the needed information for decoding only becomes available as the tensor is being decoded pixel-by-pixel from top-left to bottom-right. In contrast, during training, all the information about the tensor is immediately available, so that "decoding" can be done in one small step.

Autoregressive loop
The above operations are repeated in a loop done in raster-scan order (top-left to bottom-right). Previously decoded (purple) pixels are used to help predict the current pixel (yellow).

Also, previous models contain some amount of code for runtime decoding too, but it's hidden inside the EntropyBottleneck and GaussianConditional classes.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants