Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeat Copy task #8

Open
tristandeleu opened this issue Sep 28, 2015 · 2 comments
Open

Repeat Copy task #8

tristandeleu opened this issue Sep 28, 2015 · 2 comments
Labels

Comments

@tristandeleu
Copy link
Collaborator

Repeat Copy task

I will gather all the progress on the Repeat Copy task in this issue. I will likely update this issue regularly (hopefully), so you may want to unsubscribe from this issue if you don't want to get all the spam.

@tristandeleu
Copy link
Collaborator Author

Training the NTM on sequences of length one

Just like in #4, I have started the experiments on the Repeat Copy task by training the NTM on sequences of length one with random repetitions between 1 and 5. The training went surprisingly well and it converged in a few thousands iterations (see the learning curve below). Tests on sequences of length 1 show that the NTM was able to properly repeat length-one inputs as it was trained on
repeat-success-01
We can't clearly see the input due to normalization, the flag representing the number of repetitions being a scalar.

However learning on only sequences of length one may bias the experiment and do not actually learn the proper task. Instead, the NTM seems to have learned the following procedure:

For each vector in the input sequence:
    Write its representation on the first address
Decode and repeat the vector in the first address

Hence the model does not show any sign of generalization yet, and tests on longer sequences seem to confirm that the NTM writes all the input sequence on the first memory address, overwriting previous steps.
repeat-failure-02

Learning curve

Similar to #4
learning-curve

Parameters of the experiment

Same parameters as in #6

@tristandeleu
Copy link
Collaborator Author

Training on sequences of length 3-5

As the initial experiment in #8 (comment) suggested, training on very short sequences raises the problem of not learning the proper repeat task.

  • If we train on sequences of length one, the NTM learns to write on and read from the first address only.
  • If we train on sequences of length 2, the NTM is likely to favor shifts between two adjacent memory locations rather than going back to the first address.

Therefore I trained the NTM on sequences of length 3 to 5 with a number of repetitions up to 5. After running a few tests, it shows that the NTM is indeed able to repeat the input sequence on the output
repeat-success-01

What is particularly outstanding is that it seems to show great generalization properties, on both the length of the input sequence and the number of repetitions. Here are a few tests where

  • The length of the sequence is greater than the training sequences (here 3 repeats of a sequence of length 10)
    repeat-success-02
  • The number of repetitions is greater than the training data (here 10 repeats of a sequence of length 4)
    repeat-success-03

However, even if the quality in generalization looks good, it happens that sometimes the NTM misses a few bits in its predictions (see below). Another surprising behavior is that the NTM writes the number of repetitions of the same address as the last vector in the input sequence, which may explain that. Maybe when the number of repetitions is too large, then it overwrites the whole or part of the last vector of the input sequence in the memory.
repeat-failure-01
Here we can see that the NTM completely misses the 4th (last) input vector in its prediction. Here we have 20 repetitions of a sequence of length 4.

If we look at the add vectors, we can see that the NTM writes a large value in memory for the number of repetitions at the same locations as the last input vector, which corrupts its representation in the memory and explains why the prediction is incorrect. To fix this, we may have to train on larger number of repetitions.
repeat-failure-02

Learning curve

learning-curve

Parameters of the experiment

Overall the same parameters as in #8 (comment). It is worth noting though that I do not learn the parameters (weight matrix and biases) of key and beta for the write head. They are not needed for this task and seem to be the cause of NaN issues that need to be fixed. (Update 13/10: it now works with all the parameters). The initial weight vectors are still [1, 0, ..., 0] and still not learned (see #2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant