You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, while benchmarking my own implementation of Rainbow I had been using the dopamine full version as a reference, however I noticed that in the get_logits and get_q_values functions in line 95 and 100 respectively their is no axis mapped over the key, and later on in the train script only a single key is split for each of the forward passes.
I am just curious, if given the use of Noisy Networks in the full rainbow algorithm, is this implementation detail intentional? As it will lead to the same noisy parameters for each statement and thus a higher bias to the gradient updates. From my understanding the alternative is to split the keys per forward pass and for each input, which will introduce some additional computation. I understand this might additional bias be negligible compared to the computation saved but am interested to hear :)
Thanks for your time.
The text was updated successfully, but these errors were encountered:
The absence of axis mapping over the key in the get_logits and get_q_values functions is likely a design choice to optimize computational efficiency. While it could introduce some bias in the gradient updates, it is likely that the bias is considered negligible in the context of the overall training process and the savings in computation.
However, if you are concerned about the bias and have the computational resources to handle it, splitting the keys per forward pass is a reasonable approach to reduce that bias.
Hi, while benchmarking my own implementation of Rainbow I had been using the dopamine full version as a reference, however I noticed that in the get_logits and get_q_values functions in line 95 and 100 respectively their is no axis mapped over the key, and later on in the train script only a single key is split for each of the forward passes.
I am just curious, if given the use of Noisy Networks in the full rainbow algorithm, is this implementation detail intentional? As it will lead to the same noisy parameters for each statement and thus a higher bias to the gradient updates. From my understanding the alternative is to split the keys per forward pass and for each input, which will introduce some additional computation. I understand this might additional bias be negligible compared to the computation saved but am interested to hear :)
Thanks for your time.
The text was updated successfully, but these errors were encountered: