Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new codec into CompressAI #277

Open
ali9609 opened this issue Mar 23, 2024 · 4 comments
Open

Adding new codec into CompressAI #277

ali9609 opened this issue Mar 23, 2024 · 4 comments

Comments

@ali9609
Copy link

ali9609 commented Mar 23, 2024

Hi,

First of all thanks for the great work, it really simplies the maintainence and development of codecs, which are otherwise very hard to do so. I am actually trying to add another codec in to the compressAI library but I need some quick suggestions since it works a bit different than typical codecs.

So I have the following feature compression codec by Ahuja et al., CVPR, 2023.

image

The training objective of Ahuja et al., CVPR, 2023 remains the same as the orignal hyperprior architecture by Baelle et al., ICLR 2018, specifically the rate term is given as follows.

Rate=$\mathbb{E}[log_{2}p(y | z)+log_{2}p(z)]$

According to the diagram of the codec of the paper given above , I should do the following using the implementation of the hyperprior from compressai/models/google.py.

        y = self.g_a(x)
        z = self.h_a(torch.abs(y))
        z_hat, z_likelihoods = self.entropy_bottleneck(z) #Compute $log_{2}p(z)]$
        scales_hat = self.h_s(z_hat)
        y_hat, y_likelihoods = self.gaussian_conditional(y, scales_hat) #Compute $log_{2}p(y|z)$
        x_hat = self.g_s(y_hat)

As you already know self.gaussian_conditional is dependant on scales_hat during both training and inference. However, in the codec shown above, they somehow compute the self.gaussian_conditional only during training for loss calculation but throw it away during inference. Is there a way I can tweak the code given above such that i can also do what they are proposing? Thank you very much

@YodaEmbedding
Copy link
Contributor

YodaEmbedding commented Mar 23, 2024

For forward, during training, output both y_likelihoods and z_likelihoods:

    def forward(self, x, training=None):
        if training is None:
            training = self.training

        y = self.g_a(x)

        # Adapted from FactorizedPrior:
        y_infer_hat, y_infer_likelihoods = self.entropy_bottleneck_y(y.detach())

        if training:
            # Copied from MeanScaleHyperprior:
            z = self.h_a(y)
            z_hat, z_likelihoods = self.entropy_bottleneck(z)
            gaussian_params = self.h_s(z_hat)
            scales_hat, means_hat = gaussian_params.chunk(2, 1)
            y_hat, y_likelihoods = self.gaussian_conditional(
                y, scales_hat, means_hat
            )
            likelihoods = {
               "y": y_likelihoods,
               "z": z_likelihoods,
               "y_infer": y_infer_likelihoods,
            }
        else:
           y_hat = y_infer_hat
           likelihoods = {
               "y_infer": y_infer_likelihoods,
           }

        x_hat = self.g_s(y_hat)

        if not training:
            # Optionally avoid training g_s if training for only inference mode loss.
            # This can be done for any other outputs of y_hat, too.
            # In practice, it shouldn't really matter though.
            # Another easy alternative is just to freeze g_s's weights.
            #
            # x_hat = x_hat.detach()

            # Optional:
            # x_hat = x_hat.clamp_(0, 1)

            pass

        return {
            "x_hat": x_hat,
            "likelihoods": likelihoods,
        }

The compress/decompress can be adapted from FactorizedPrior:

    def compress(self, x):
        y = self.g_a(x)
        y_strings = self.entropy_bottleneck_y.compress(y)
        return {"strings": [y_strings], "shape": y.size()[-2:]}

    def decompress(self, strings, shape):
        assert isinstance(strings, list) and len(strings) == 1
        y_hat = self.entropy_bottleneck_y.decompress(strings[0], shape)
        x_hat = self.g_s(y_hat).clamp_(0, 1)
        return {"x_hat": x_hat}

Unless I'm misunderstanding something, the pretrained hyperprior weights should also work with this architecture. You can load those, then freeze all those weights, then train only entropy_bottleneck_y.

@ali9609
Copy link
Author

ali9609 commented Mar 23, 2024

Thank you very much for your response, if i follow the architectural setting provided by the author Ahuja et al., CVPR, 2023, I will have different number of channels for y=64 channels and z=8 channels. This further imply that i would have to declare separate self.entropy_bottleneck2 for y. This would mean that I cannot use the pretrained hyperprior weights.. since there is no entropy_bottleneck2 exists. Is there any workaround for this? how to incoporate this during training?

@YodaEmbedding
Copy link
Contributor

YodaEmbedding commented Mar 23, 2024

Ah yes. There are a few possible approaches:

  1. Load model that is pretrained with hyperprior rate loss. Then, freeze its weights and only train entropy_bottleneck_y.
  2. Detach y (prevents g_a from receiving gradients), and add the likelihoods for entropy_bottleneck_y to the loss. Discard y_infer_hat. In this simple setup, it's equal to y_hat anyways.

I've updated the code above with another entropy_bottleneck_y. (Effectively approach 2.)

@ali9609
Copy link
Author

ali9609 commented Mar 24, 2024

Thank you very much for your response. I actually tried both of the approaches, just for having some comparison.

Approach 1:

Idea1: Freeze everything and train entropy_bottleneck_y.

Setting: The codec was initially trained for 30 epochs. After adding entropy_bottleneck_y, I trained it again for 30 epochs, since only single layer was trainable, it was actually quite fast to do it.

Results:

  • Orignal Hyperprior Codec: Bpp = 0.24, PSNR = 32.89

  • After training of entropy_bottleneck_y Bpp = 0.254, PSNR = 32.24
    and throwing away the Hyperprior:

Remarks: I think the results look reasonable, its obvious that factorized prior is not bitrate efficient as compared to hyperprior. I believe its not possible to achieve the exact same/improve RD tradeoff. Some degradation is possible.

Idea2: Use your given code and retrain the codec from scratch. The detach trick make it possible to do so

Setting: The codec was trained for 30 epochs. Note that the same alpha was used for the experiment as above but it resulted in a different tradeoff.

Results:

  • Orignal Hyperprior Codec: Bpp = 0.24, PSNR = 32.89

  • After adding the codec as you given above Bpp = 0.165, PSNR = 31.6

Remarks: The same value of alpha resulted in a completely different tradeoff. Probably need to increase the value of alpha to see the exact/near tradeoff.

I think Idea1 is more useful to me as it gives much control and you can also use the pre-trained weights.

Please let me know if something looks unsual or there is a need for improvment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants