You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"The generative process is the same as in auto-regressive language models: generation begins with an empty string, and at the 𝑖-th step a token 𝑧𝑖 is sampled"
Since the generative process is conducted token by token, I'm wondering about what is the meaning of calculating a reward for an incomplete sentence in the learning objective? Thanks if you can help me understand this :)
The text was updated successfully, but these errors were encountered:
Hi @StarDewXXX, to compute the intermediate reward (i.e. after each token) we append an EOS token to the tokens generated so far and then compute the reward.
"The generative process is the same as in auto-regressive language models: generation begins with an empty string, and at the 𝑖-th step a token 𝑧𝑖 is sampled"
Since the generative process is conducted token by token, I'm wondering about what is the meaning of calculating a reward for an incomplete sentence in the learning objective? Thanks if you can help me understand this :)
The text was updated successfully, but these errors were encountered: