About Implementation #7

dongqian0206 · 2019-01-23T13:01:35Z

Hi Jiacheng

Thanks for providing your source codes.
According to the questions in the Issues, Does this parameter setting (Dataptb_Distvmf_Modelnvrnn_EnclstmBiFalse_Emb100_Hid400_lat50_lr10.0_drop0.5_kappa35.0_auxw0.0001_normfFalse_nlay1_mixunk0.0_inpzTrue_cdbit0_cdbow0) achieve the best performance on PTB dataset? I will use your proposed model as the baselines.

And for the results (NLL and PPL), you mentioned in the paper that reported values are actually a lower bound on the true NLL, computed from the ELBO by sampling z.

Does this mean that first you draw a sampe from vmf or Gaussian distribution, and then feed it in the decoder part (standard and inputless mode), finally compute the sum of its reconstruction loss and kl loss?

jiacheng-xu · 2019-01-23T14:05:28Z

Hi Dong,
Yes, that's the right configuration as far as I recorded. My suggestion is to train a bit longer and you should be able to get under 105 without too much effort. Pytorch LM may be a helpful resource.
And yes.

dongqian0206 · 2019-01-23T14:48:34Z

Hi Dong,
Yes, that's the right configuration as far as I recorded. My suggestion is to train a bit longer and you should be able to get under 105 without too much effort. Pytorch LM may be a helpful resource.
And yes.

Hi Jiacheng. Thanks for your reply.

I am also curious about the implementation of the vanilla NVRNN (the baseline in your paper). Actually, I tried different KL annealing ways (linear or logistic) to alleviate the posterior collpase issue. It seems that different datasets need different KL annealing ways to achieve good performance (obviously). Your reported performance of the vanllia NVRNN is better than that of my implementation. The difference is the chosen optimizer and KL annealing parameters (e.g., epoch, start steps, duration). Do you have any insight on how to train this vanllia model? (The aim of the proposed model is to alleviate this issue. :D)

Thanks.

best,
Dong

jiacheng-xu · 2019-01-23T14:52:09Z

From my experience, the annealing trick doesn't help too much (at least you need to try {linear, sigmoid, ...}x{datasets}). My suggestion is to follow the Pytroch LM example which provides me a lot of insights. And potentially better encoder model can be helpful.

dongqian0206 · 2019-01-23T14:54:13Z

From my experience, the annealing trick doesn't help too much (at least you need to try {linear, sigmoid, ...}x{datasets}). My suggestion is to follow the Pytroch LM example which provides me a lot of insights. And potentially better encoder model can be helpful.

Thanks.

dongqian0206 closed this as completed Jan 23, 2019

thequilo mentioned this issue Mar 9, 2019

Reconstruct Results / Implementation Details #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Implementation #7

About Implementation #7

dongqian0206 commented Jan 23, 2019

jiacheng-xu commented Jan 23, 2019

dongqian0206 commented Jan 23, 2019

jiacheng-xu commented Jan 23, 2019

dongqian0206 commented Jan 23, 2019

About Implementation #7

About Implementation #7

Comments

dongqian0206 commented Jan 23, 2019

jiacheng-xu commented Jan 23, 2019

dongqian0206 commented Jan 23, 2019

jiacheng-xu commented Jan 23, 2019

dongqian0206 commented Jan 23, 2019