Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResNet50: bottom blob of expand layer #18

Open
qilinli opened this issue Oct 13, 2017 · 6 comments
Open

ResNet50: bottom blob of expand layer #18

qilinli opened this issue Oct 13, 2017 · 6 comments

Comments

@qilinli
Copy link

qilinli commented Oct 13, 2017

Hi there,

Thanks for sharing the pre-trained models.
I am learning the ResNets50 and have a question about the architecture. It sames that there are quite few places different with original ResNets.

  1. The data preprocess is changed from mean subtraction to batch normalization, which has been noted.

However I aware another main difference in the expanding convolution layer. For example the first one:

layer {
name: "layer_64_1_conv_expand"
type: "Convolution"
bottom: "layer_64_1_conv1"
top: "layer_64_1_conv_expand"
.......

It shows that the bottom blob come from "layer_64_1_conv1", which was "conv1_pool" in the original architecture. Is this a modification? As shown by your results that you can consistently improve the accuracy compared to the original implementation, it this the reason?

@MarcelSimon
Copy link
Contributor

MarcelSimon commented Oct 14, 2017

Hi!
There is a pooling in both my and Kaiming's implementation. I can't see what you mean. Could you please provide line numbers for both prototxts?

@qilinli
Copy link
Author

qilinli commented Oct 14, 2017

@MarcelSimon sorry didn;t make it clear. I mean in the prototxt cnn-models/ResNet_preact/ResNet50_cvgj/train.prototxt line 295-318, which is the first expanding layer. Yours are expanded from "layer_64_1_conv1".

While in He's implementation deep-residual-networks/prototxt/ResNet-50-deploy.prototxt (cannot find train.prototxt) line 60-72 layer ''res2a_branch1'' (which corresponds to your expand layer, both use 1*1 convolution increase channel nums), the bottom layer is bottom: "pool1" which means he expands from the previous pooling layer.

And it's the same for all expanding layer. I think it is quite a big difference.

  • [ ]

@MarcelSimon
Copy link
Contributor

I see, thanks a lot for pointing that out! The difference occurs only at the first expand layer, the other ones are correct.
The batch norm, scale and relu is shared, because it is the preactivation variant. However, the first expand should indeed use the conv1_pool as input. I will add a remark to the README soon

@qilinli
Copy link
Author

qilinli commented Oct 17, 2017

As you mentioned the sharing batch norm, scale, it reminds me another difference between yours and He's implementation. If you check ther implementation http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006 ( graph)
or
https://github.com/KaimingHe/deep-residual-networks/blob/master/prototxt/ResNet-50-deploy.prototxt (prototxt)
they actually use two batch norm+ scale for two branches, which means they do not share them. While you indeed did the batch norm + scale after branch merge, which is shared.

@MarcelSimon
Copy link
Contributor

MarcelSimon commented Oct 18, 2017

The implementation you are referring to is the original ResNet, not the preactivation variant. Please see https://github.com/facebook/fb.resnet.torch/blob/master/models/preresnet.lua and https://github.com/KaimingHe/resnet-1k-layers/blob/master/resnet-pre-act.lua#L63 for the preactivation variant

@qilinli
Copy link
Author

qilinli commented Oct 19, 2017

I see. Thanks a lot @MarcelSimon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants