Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model didn't converge. #2

Open
rAm1n opened this issue Jan 26, 2018 · 7 comments
Open

Model didn't converge. #2

rAm1n opened this issue Jan 26, 2018 · 7 comments

Comments

@rAm1n
Copy link

rAm1n commented Jan 26, 2018

Hi @liorshk

Thanks for sharing your code. It seems clean and well-written, however, I had problem having it converge.

I trained it on filtered version of MsCeleb with 5 Million images and 79K identities. Your hyper-parameters seems to be identical with the Tensorflow implementation davidsandberg/facenet and I also tried different ones but I never got more than 65% accuracy on LFW.

I think it's mostly because of the way that triplet selection has been implemented. The paper suggests having batches of 1800 images from a certain number of identities (40-45), rather than choosing it completely randomly. I tried this but only with 180 images at most, yet still it didn't converge.

Do you have any idea that can help me? If you had any success training the model, could you please share your weights too?

Thanks,

@ahkarami
Copy link

ahkarami commented Feb 2, 2018

Dear @rAm1n,
Notice that the base CNN model of this repository is ResNet18, but the TensorFlow version used the Inception-ResNet-V1.
About the triplet selection issue, which I also wonder to learn how one can train models via it, maybe the below links would help you:
A PyTorch Implementation for Triplet Networks

@rAm1n
Copy link
Author

rAm1n commented Feb 3, 2018

Hi @ahkarami

Thanks for for pointing out the issue with ResNet version. I am aware of it but, unfortunately I had no luck getting any number better than 65% on LFW. Regardless of the encoder network, something around 90+ is definitely achievable with triplet loss.

I think the link that you shared is an implementation of this paper which is a bit different with FaceNet. I've stop working on it for a short while but I recommend this paper to you:

How to Train Triplet Networks with 100K Identities?

also, if you are really interested about embeddings and solving face verification with open-set configuration, make sure to have a look on recent works based on angular loss: insightface, sphereface

@ahkarami
Copy link

ahkarami commented Feb 4, 2018

Dear @rAm1n,
Thank you very much for your complete answer.

@magwyz
Copy link

magwyz commented Aug 31, 2018

Hi @rAm1n,
Did you find a way to get a better accuracy with LFW ? I am also stuck at 67%.

@rAm1n
Copy link
Author

rAm1n commented Aug 31, 2018

Hi @magwyz

I didn't really continue working on this. If you really want to make this work, maybe start with a Softmax version and then fine-tune using triplet-loss. Also, re-implementing the triplet selection from the tensorflow repository might help. And don't forget to play with the learning rate too. I would guess it will take time too converge and most probably the loss will drop rapidly after few hours of training.

@magwyz
Copy link

magwyz commented Aug 31, 2018

Thanks @rAm1n for the hints!

@tbmoon
Copy link

tbmoon commented Oct 29, 2018

Hi rAm1n,

https://github.com/tbmoon/facenet

I achieved an 90% accuracy on LFW dataset. If you are interested in my codes, don't hesitate to refer to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants