Inquiries regards to target model update #2

originholic · 2015-12-23T15:13:03Z

Hello,
Thanks for sharing this great code of chainer based DQN.

I recently started to use chainer. The code works great for me and I would like to implement a critic-actor architecture based on your DQN code. I don't know whether it is strange to ask question like this here, since I couldn't find appropriate forum to ask about chainer. But it will be really appreciated if help can be offered.

I can see in the code that the target model is updated as follow which directly copy from its model:

self.model_target = copy.deepcopy(self.model)

If I want the target to be updated slower base on this paper:

θ' ← τθ + (1 −τ )θ'

Because the θ and θ' is the weights of the model and target_model, I was thinking of doing:

self.target_model.W.data = tau*self.model.W.data + (1-tau)*target_model.W.data

Is this the right way of doing it? or any better suggestions of doing it using chainer?
Many thanks

The text was updated successfully, but these errors were encountered:

ugo-nama-kun · 2015-12-24T06:18:31Z

Hello,

Thank you for your interest.

Your question is quite interesting and important for me. That's kind of "Theano-like" update rule is very important for the future development of deep reinforcement learning with Chainer.

Unfortunately, I've never tried that's kind of update rule in chainer code, then I'm currently not sure your suggestion works well.

However, I think that "Theano-like" update rule must be necessary for me in the near future.
I'll look for some appropriate way and notice here.
And I really appreciate if you notice me when your method actually works well.

thanks

originholic · 2016-01-03T04:04:36Z

Happy new year, and many thanks for responding.

This few days, I have tried to run tests on the update method that I mentioned previously, with the critic-actor architecture using a continuous carpole balancing domain.

However the result is strange, since I could see it try to learn balancing at the beginning as the step reward gradually raise, when the results reaches the goal of balancing steps, the reward start to decrease from the goal step. I am not able to identify whether it is the problem of the update or the architecture or the parameter.

But, yes I will keep trying on this issue to see whether this type of update works with chainer.

ugo-nama-kun · 2016-02-15T15:05:52Z

Hello originholic,

Sorry for late response.

Today I just uploaded the test codes for testing the "moving copy" in chainer:
https://github.com/ugo-nama-kun/moving_copy_in_chainer.git
To run my codes, you need the latest chainer package.

I constructed a very simple classification task and tested both of cpu-based and gpu-based implementation, and it looks working correctly.
Actually, my codes are just same with your suggested code ;-)

Finally, I think the cart-pole balancing task is a bit too complex for checking a code in continuous RL algorithms. If you are not used to implement RL algorithms, I recommend you to use the mountain-car task instead of cart-pole. Because the mountain-car task have only 2-dimensional state space, the visualization of value function, policy and agent's behavior are very straightforward.

originholic · 2016-02-22T13:31:24Z

Hello @ugo-nama-kun ,
Sorry for the delayed reply.
And many thanks for testing out the update method for chainer, so I can be more confident with the update when I run the actor-critic experiments.

I think you are right on the count that cart-pole balancing task is quite complex to test out, It is likely that my work would have been less painful if I had started with simpler task, but since this balancing scenario is very close to the task that I want to work on, so I stick to the cart-pole task.

Eventually, I got a "moderately good" CartPole results after a long manual hyperparameter search, although there are still lots works on tuning the deep neural network used, as the network always overfit, I have to do early stopping to prevent that somehow. Really appreciated for your helps, and I will share the actor-critic code I was working on sometimes in my repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiries regards to target model update #2

Inquiries regards to target model update #2

originholic commented Dec 23, 2015

ugo-nama-kun commented Dec 24, 2015

originholic commented Jan 3, 2016

ugo-nama-kun commented Feb 15, 2016

originholic commented Feb 22, 2016

Inquiries regards to target model update #2

Inquiries regards to target model update #2

Comments

originholic commented Dec 23, 2015

ugo-nama-kun commented Dec 24, 2015

originholic commented Jan 3, 2016

ugo-nama-kun commented Feb 15, 2016

originholic commented Feb 22, 2016