Replies: 1 comment
-
Hi, You are right, the update rule can be apply on each time step, but it not necessarily need to be use in each time step. You can design your reward function to give reward at any point during the time steps and on the rest it can send zero reward. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I got another question around an implementation detail I just came across. Looking at the learning rules and especially reward-based learning I can see that in the Network.run() method connection weights are updated every timestep. The total timesteps are calculated as
This implies IMHO that the weights are updated with every approximation step. Coming from a reinforcement learning setting, where an action as a whole is rewarded this seems strange to me. The agent interacts with the environment by performing an action and receiving a (scalar) reward. To compute the action it is necessary to simulate multiple time steps, esp. in the case of population coding. Therefore I would expect that the update step is taken at the end of sequence, i.e. at the last timestep of the run() method or it would be even possible to supply the reward later implying the following pattern
Does that make sense or am I confusing things?
In the case of MSTDP(ET) I can see that the reward is multiplied with connection.dt. Is it correct that this is done to spread the reward across time steps?
Thanks and best,
Peter
Beta Was this translation helpful? Give feedback.
All reactions