Question about the learning objective mentioned in paper #5

StarDewXXX · 2024-03-25T01:39:22Z

"The generative process is the same as in auto-regressive language models: generation begins with an empty string, and at the 𝑖-th step a token 𝑧𝑖 is sampled"

Since the generative process is conducted token by token, I'm wondering about what is the meaning of calculating a reward for an incomplete sentence in the learning objective? Thanks if you can help me understand this :)

MJ10 · 2024-03-27T03:23:13Z

Hi @StarDewXXX, to compute the intermediate reward (i.e. after each token) we append an EOS token to the tokens generated so far and then compute the reward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the learning objective mentioned in paper #5

Question about the learning objective mentioned in paper #5

StarDewXXX commented Mar 25, 2024

MJ10 commented Mar 27, 2024 •

edited

Loading

Question about the learning objective mentioned in paper #5

Question about the learning objective mentioned in paper #5

Comments

StarDewXXX commented Mar 25, 2024

MJ10 commented Mar 27, 2024 • edited Loading

MJ10 commented Mar 27, 2024 •

edited

Loading