Add aggregation (reduction) method for N-steps calculations #32
Replies: 3 comments 1 reply
-
@jamartinh Can you give us more detail calculation you want? For simple average, is it not enough by setting |
Beta Was this translation helpful? Give feedback.
-
Hi @ymd-h , do you mean that when creating the N-steps dict for the env_dict I can use for instance: N=10
n_step_dict = {
"size": N,
"gamma": 1.0,
"rew": "rew",
"next": "next_obs"} And then, the samples will give be the sample["rew"] so I can make just AVG = sample["rew"]/N ? It seems that this may work ! |
Beta Was this translation helpful? Give feedback.
-
Thanks!
El lun, 22 ene 2024, 13:49, H.Yamada ***@***.***> escribió:
… @jamartinh <https://github.com/jamartinh>
Yes, it is what I mean.
If you find any problems, please let us know.
—
Reply to this email directly, view it on GitHub
<#32 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA3NNM6EPJ77ZNUMEWPMTQTYPZN6XAVCNFSM6AAAAABCDNIOZ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DEMBYGIZDS>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi, would it be possible to add a parameter to the N-step subdict of the ReplayBuffer constructor so that one can specify the kind of aggregator/reductio method of the n-steps returns.
Normally the standard case is accumulate or cumsum, however forinstance for average reinforcement learning would be nice to have not the sum but the mean so one can use N-step returns for average reward RL as well.
I am not being able to compile now cpprb on my machines because my OS is centos 7 and I have compiler obsolescence.
Thanks !
Beta Was this translation helpful? Give feedback.
All reactions