Relational recurrent neural networks #17

flrngel · 2018-06-19T14:44:04Z

Abstract

paper's model built with intuition
so it is unclear why this works well
shows state-of-the-art on the WikiText103, Project Gutenberg, and GigaWord datasets

toy tested to stress current models; LSTMs, DNC (from differentiable neural computers)
Relational Memory Core(RMC) uses
- multi-head dot product attention (see Transformer)
- memory interactions

this is process of understanding the ways in which entities are connected
author claims that current models can be cast as relational reasoning
- convolutional can be said to compute a relation of pixels (entities)
- node and edge in message passing neural networks are entities
  - learnable nodes-edge connection can be relation
- see also [13, 14, 17]
but in current model, it is unclear whether inductive biases makes limitation
- memory augmented networks

paper's model is assembled with
1. LSTMs
2. memory augmented
3. non-local net
paper's model applied attention between memories at a single time step
- this differs from current models that uses attention across all previous representations

should note that MHDPA uses KQV linears

equals to which means proposed update to M(memory)

means row/memory-wise MLP with layer normalization

Figure3 is from n-th farthest task

flrngel added NLP Recurrent Neural Network labels Jun 19, 2018