Learning the initial weights of the heads leads to NaN #2

tristandeleu · 2015-09-17T17:47:59Z

When setting the learn_init=True parameter on the heads, the error and parameters of the heads become NaN after a few iterations (not necessarily on the first one, but it can happen after 100+ iterations).

How to reproduce it:

heads = [
    WriteHead([controller, memory], shifts=(-1, 1), name='write', learn_init=True),
    ReadHead([controller, memory], shifts=(-1, 1), name='read', learn_init=True)
]

This is a non-blocking issue since learning these weights may not actually make sense (we can just leave equi-probability as is as the first step).

The text was updated successfully, but these errors were encountered:

tristandeleu · 2015-09-18T16:28:39Z

This may be due to the norm constraint of the weights (and the initial weights) being violated during training. The weights are required to sum to one, but the (vanilla) training procedure does not guarantee it. The sum-to-one constraint is critical since other setups may lead to entries in w_tilde being negative -- which explains the NaNs in w \propto w_tilde ** gamma.

tristandeleu · 2015-09-28T09:33:27Z

Learning the initial weights might be something we'll eventually need. Initializing them to a uniform probability over all the addresses almost necessarily forces the first step to write in a distributed way (over multiple addresses, instead of hard addressing).

Instead of learning the raw weight_init, which may have some issues as explained in #2 (comment), we could learn some kind of initialization that needs to go through a normalization step to get w_0. The process would be to learn weight_init (keep this shared variable as a parameter) and then get the first weight as

w_0 = normalize(rectify(weight_init))

With an additional rectify() nonlinearity to favor sparse initializations.

EderSantana · 2016-02-24T23:35:52Z

hi @tristandeleu, when learning the initial weights, are you making sure they are behind of a softmax? In other word, are you learning the initial logits instead? If so, there is no problem if they get negative values. I had this problem in my NTM implementation as well.

tristandeleu · 2016-02-25T12:46:14Z

When I originally opened this issue I didn't, which was a mistake on my end. I haven't tried to learn the logit. But indeed you're right, I think this is the right solution (I only sketched the idea in this issue).
All in all I ended up leaving the learn_init=False for the weights in my experiments, and initialize them as one-hot vectors. But I haven't found a good way to allow both to fix the initialization (eg. with OneHot) and learning logit weights.

EderSantana · 2016-02-25T15:07:08Z

I'm new to your codebase, could you point me out where you get the initial weights, I could try to check that out.

tristandeleu · 2016-02-26T14:26:45Z

The initial weights are defined here: https://github.com/snipsco/ntm-lasagne/blob/master/ntm/heads.py#L102
But for now, there's no correct way to learn these weights unfortunately.

tristandeleu added the bug label Sep 17, 2015

tristandeleu closed this as completed Sep 18, 2015

tristandeleu reopened this Sep 18, 2015

tristandeleu added the to-do label Sep 28, 2015

tristandeleu mentioned this issue Sep 29, 2015

Repeat Copy task #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning the initial weights of the heads leads to NaN #2

Learning the initial weights of the heads leads to NaN #2

tristandeleu commented Sep 17, 2015

tristandeleu commented Sep 18, 2015

tristandeleu commented Sep 28, 2015

EderSantana commented Feb 24, 2016

tristandeleu commented Feb 25, 2016

EderSantana commented Feb 25, 2016

tristandeleu commented Feb 26, 2016

Learning the initial weights of the heads leads to NaN #2

Learning the initial weights of the heads leads to NaN #2

Comments

tristandeleu commented Sep 17, 2015

tristandeleu commented Sep 18, 2015

tristandeleu commented Sep 28, 2015

EderSantana commented Feb 24, 2016

tristandeleu commented Feb 25, 2016

EderSantana commented Feb 25, 2016

tristandeleu commented Feb 26, 2016