Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Implementation #7

Closed
dongqian0206 opened this issue Jan 23, 2019 · 4 comments
Closed

About Implementation #7

dongqian0206 opened this issue Jan 23, 2019 · 4 comments

Comments

@dongqian0206
Copy link

Hi Jiacheng

Thanks for providing your source codes.
According to the questions in the Issues, Does this parameter setting (Dataptb_Distvmf_Modelnvrnn_EnclstmBiFalse_Emb100_Hid400_lat50_lr10.0_drop0.5_kappa35.0_auxw0.0001_normfFalse_nlay1_mixunk0.0_inpzTrue_cdbit0_cdbow0) achieve the best performance on PTB dataset? I will use your proposed model as the baselines.

This is the result.
| End of training | Recon Loss 4.62 | KL Loss 0.16 | Test Loss 4.78 | Test PPL 119.00

And for the results (NLL and PPL), you mentioned in the paper that reported values are actually a lower bound on the true NLL, computed from the ELBO by sampling z.

Does this mean that first you draw a sampe from vmf or Gaussian distribution, and then feed it in the decoder part (standard and inputless mode), finally compute the sum of its reconstruction loss and kl loss?

@jiacheng-xu
Copy link
Owner

Hi Dong,
Yes, that's the right configuration as far as I recorded. My suggestion is to train a bit longer and you should be able to get under 105 without too much effort. Pytorch LM may be a helpful resource.
And yes.

@dongqian0206
Copy link
Author

Hi Dong,
Yes, that's the right configuration as far as I recorded. My suggestion is to train a bit longer and you should be able to get under 105 without too much effort. Pytorch LM may be a helpful resource.
And yes.

Hi Jiacheng. Thanks for your reply.

I am also curious about the implementation of the vanilla NVRNN (the baseline in your paper). Actually, I tried different KL annealing ways (linear or logistic) to alleviate the posterior collpase issue. It seems that different datasets need different KL annealing ways to achieve good performance (obviously). Your reported performance of the vanllia NVRNN is better than that of my implementation. The difference is the chosen optimizer and KL annealing parameters (e.g., epoch, start steps, duration). Do you have any insight on how to train this vanllia model? (The aim of the proposed model is to alleviate this issue. :D)

Thanks.

best,
Dong

@jiacheng-xu
Copy link
Owner

From my experience, the annealing trick doesn't help too much (at least you need to try {linear, sigmoid, ...}x{datasets}). My suggestion is to follow the Pytroch LM example which provides me a lot of insights. And potentially better encoder model can be helpful.

@dongqian0206
Copy link
Author

From my experience, the annealing trick doesn't help too much (at least you need to try {linear, sigmoid, ...}x{datasets}). My suggestion is to follow the Pytroch LM example which provides me a lot of insights. And potentially better encoder model can be helpful.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants