You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the generate() function right now only takes the last position of token generated, and shifts the entire window of input one position forward to generate the next output token, still only taking the last position. https://github.com/karpathy/ng-video-lecture/blob/master/gpt.py#L189
I was curious and looked at the entire output contents. For the input, I fed the output of a previous run with the current generate() function, so the input token sequence would be completely "based on the behavior of the model itself", so to speak. then I generated the entire list of T tokens from output. to my surprise, the output is very much gibberish , and quite different from the input (though I could still see a few matches).
I can't figure out why the current method of only taking from the last output position produces seemingly fluent sequences, while the output from middle of the block doesn't make sense. in the current scheme, input grows from torch.zeros((1,1)), up to block size, so during this period, it should be no different from what an output position in the middle of block_size sees, as the output position has masked out all input after it, effective it becomes the end of output window too
The text was updated successfully, but these errors were encountered:
The question of 'why we only taking the last position of logits', I think all positions of output logits have meaning. The meaning of logits at location i: consider the input [0:i], what the following output should be. So when we train the model, the model can learn from the shorter length.
An Training Example in Word Level: input: "I like shopping online".
input
output
output logits position
I
like
1
I like
shopping
2
I like shopping
online
3
During training, all these losses are collected by cross entropy. During inference, as we only consider the next token, we should only take the last element of logits.
the generate() function right now only takes the last position of token generated, and shifts the entire window of input one position forward to generate the next output token, still only taking the last position. https://github.com/karpathy/ng-video-lecture/blob/master/gpt.py#L189
I was curious and looked at the entire output contents. For the input, I fed the output of a previous run with the current generate() function, so the input token sequence would be completely "based on the behavior of the model itself", so to speak. then I generated the entire list of T tokens from output. to my surprise, the output is very much gibberish , and quite different from the input (though I could still see a few matches).
I can't figure out why the current method of only taking from the last output position produces seemingly fluent sequences, while the output from middle of the block doesn't make sense. in the current scheme, input grows from torch.zeros((1,1)), up to block size, so during this period, it should be no different from what an output position in the middle of block_size sees, as the output position has masked out all input after it, effective it becomes the end of output window too
The text was updated successfully, but these errors were encountered: