-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About Implementation #7
Comments
Hi Dong, |
Hi Jiacheng. Thanks for your reply. I am also curious about the implementation of the vanilla NVRNN (the baseline in your paper). Actually, I tried different KL annealing ways (linear or logistic) to alleviate the posterior collpase issue. It seems that different datasets need different KL annealing ways to achieve good performance (obviously). Your reported performance of the vanllia NVRNN is better than that of my implementation. The difference is the chosen optimizer and KL annealing parameters (e.g., epoch, start steps, duration). Do you have any insight on how to train this vanllia model? (The aim of the proposed model is to alleviate this issue. :D) Thanks. best, |
From my experience, the annealing trick doesn't help too much (at least you need to try {linear, sigmoid, ...}x{datasets}). My suggestion is to follow the Pytroch LM example which provides me a lot of insights. And potentially better encoder model can be helpful. |
Thanks. |
Hi Jiacheng
Thanks for providing your source codes.
According to the questions in the Issues, Does this parameter setting (Dataptb_Distvmf_Modelnvrnn_EnclstmBiFalse_Emb100_Hid400_lat50_lr10.0_drop0.5_kappa35.0_auxw0.0001_normfFalse_nlay1_mixunk0.0_inpzTrue_cdbit0_cdbow0) achieve the best performance on PTB dataset? I will use your proposed model as the baselines.
This is the result.
| End of training | Recon Loss 4.62 | KL Loss 0.16 | Test Loss 4.78 | Test PPL 119.00
And for the results (NLL and PPL), you mentioned in the paper that reported values are actually a lower bound on the true NLL, computed from the ELBO by sampling z.
Does this mean that first you draw a sampe from vmf or Gaussian distribution, and then feed it in the decoder part (standard and inputless mode), finally compute the sum of its reconstruction loss and kl loss?
The text was updated successfully, but these errors were encountered: