Why batchnorm as final layer? #1

Rasmuskh · 2021-10-29T11:24:52Z

Hi,
I noticed that you add a batchnorm layer as the final layer of your VGG-like network. Could you explain why this is necessary?

I am using your code to train a ResNet18 model using your BayesBinn optimizer, and noticed that it is necessary to add batchnorm at the output layer for this model as well in order to achieve good performance (with batchnorm at the output layer it performs very well).

mengxiangming · 2021-10-31T09:35:30Z

Hi Rasmuskh,

Thank you for this question, which is an interesting observation.

For the VGG-like network, we simply use the same network structure as the following paper:
Alizadeh, Milad, et al. "An empirical study of binary neural networks' optimization." ICLR2018.

The code of Alizadeh et al 2018 can be found here.

We did not make a detailed analysis of the network structure and just use the same one as Alizadeh et al 2018 for ease of comparison. Intuitively, it might be that the final output value without normalization is not suitable for the loss function used, e.g., the absolute magnitue is too large due to the constraint of the binary weights. This can be checked by plotting the histograms of the output value without normalization and compare it with that of the BN output. Hope this conjecture will be helpful.

Best regards,
Xiangming

Rasmuskh · 2021-10-31T10:52:49Z

Thank you,
That is very helpful :)
I will have a look at the Alizadeh paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why batchnorm as final layer? #1

Why batchnorm as final layer? #1

Rasmuskh commented Oct 29, 2021

mengxiangming commented Oct 31, 2021

Rasmuskh commented Oct 31, 2021

Why batchnorm as final layer? #1

Why batchnorm as final layer? #1

Comments

Rasmuskh commented Oct 29, 2021

mengxiangming commented Oct 31, 2021

Rasmuskh commented Oct 31, 2021