Perform fire on reset after doing no-ops, make fire_on_reset configurable #158

ankeshanand · 2020-05-22T00:44:53Z

Fire on reset should probably happen after doing the no-ops, not before. Firing and then doing no-ops for 30 steps is probably harmful in some cases. OpenAI baselines follow this order as well: https://github.com/openai/baselines/blob/8c2aea2addc9f3ba36d4a0c937e6a2d09830afc7/baselines/ppo1/run_atari.py

I have also made fire_on_reset configurable since repositories that reproduce DeepMind numbers don't use it (dopamine and @Kaixhin's rainbow). Setting it to False is probably a better default but I didn't change that yet.

Edit: Switched to False by default at Kai's suggestion, my own experiments with Data efficient rainbow also suggest that using FireOnReset hurts.

…able

codecov-commenter · 2020-05-22T00:48:09Z

Codecov Report

Merging #158 into master will decrease coverage by 0.02%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##           master     #158      +/-   ##
==========================================
- Coverage   22.53%   22.51%   -0.03%     
==========================================
  Files         128      128              
  Lines        8002     8005       +3     
==========================================
- Hits         1803     1802       -1     
- Misses       6199     6203       +4

Flag	Coverage Δ
#unittests	`22.51% <75.00%> (-0.03%)`	⬇️

Impacted Files	Coverage Δ
rlpyt/envs/atari/atari_env.py	`80.64% <75.00%> (-2.83%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85d4e01...aecb175. Read the comment docs.

Kaixhin · 2020-05-22T08:51:09Z

Check this openai/baselines#240 - I confirmed with DM that they don't use it.

ankeshanand · 2020-05-22T18:57:20Z

Thanks a lot Kai, my own experiments with DER suggest not using fire on reset is better as well! While we have you here, could you clarify the following settings as well?

Did you find clipping the grad norm not useful? I saw your repo doesn't use it.
What's the correct exploration strategy to use during eval with NoisyNets? Does DM still use a small epsilon?
When prioriotizing with C51, is it useful to use cross-entropy over KL? I saw both dopamine and your repo use CE, but the Rainbow paper mentions KL.

Kaixhin · 2020-05-22T19:11:02Z

I think it's not mentioned in the Rainbow paper so presume unused?
Evaluation is ε-greedy with ε = 0.001, but no noise in the noisy layers.
The original distributional RL paper's algorithm specifies the cross-entropy loss.

ankeshanand · 2020-05-22T19:18:57Z

Thanks, super helpful! For 1, I was just looking at the DER paper, so maybe it only helps there.

Kaixhin · 2020-05-22T19:27:28Z

Oh - that specifies that it was used in both. I found the DER paper was useful for specifying hyperparameters, so I'll add it in to my repo...

ankeshanand · 2020-05-22T19:33:56Z

Good catch! If you want to be super consistent, both version don't seem to be using Dueling as well.

Kaixhin · 2020-05-22T19:36:33Z

Dueling is one of the main components of Rainbow (both)?

ankeshanand · 2020-05-22T19:38:28Z

I don't think so.

From the Rainbow paper:

In aggregate, we did not observe a significant difference when removing the dueling network from the full Rainbow.

And, the DER paper mentions for both variants:

Update: Distributional Double Q

which I assume implies not using dueling

Kaixhin · 2020-05-22T19:47:33Z

Seems like that means there's no significant difference, but they do keep it. As for the update rule, distributional changes the form, and double Q-learning changes the form, but dueling is an architectural change that is agnostic to the update rule.

ankeshanand · 2020-05-22T19:52:40Z

I see, that might be the case!

Kaixhin · 2020-05-25T20:33:26Z

Confirmed with Matteo Hessel that gradient clipping is used in Rainbow.

astooke · 2020-06-30T18:32:04Z

OK looks good, thanks for digging into the details!

I thought fire was needed in some games to start the action after a lost life? I guess you're right tho on Breakout, if you press fire, then wait up to 30 steps, the ball might have already gone too low to reach....could be why I'm getting low scores in that game sometimes hahah hmmm...

DanielTakeshi · 2020-07-19T17:13:58Z

@astooke and everyone else here, I am just curious, but are there benchmarks on Breakout with and without this change, to see if reward is better? I also notice (at least anecdotally) that before this pull request, my Breakout scores are OK sometimes but can be lower than expected. I have not tried with this change.

…able (astooke#158) * Perform fire on reset after doing no-ops, make fire_on_reset configurable * Fix typo * Set fire_on_reset to be False by default

rfali · 2023-05-16T06:36:04Z

@Kaixhin
From your comment above

I think it's not mentioned in the Rainbow paper so presume unused?

Evaluation is ε-greedy with ε = 0.001, but no noise in the noisy layers.

The original distributional RL paper's algorithm specifies the cross-entropy loss.

Can you please share the source of "2. Evaluation is ε-greedy with ε = 0.001, but no noise in the noisy layers". I thought the Rainbow Paper (https://arxiv.org/pdf/1710.02298.pdf) on page 4 says to set ε = 0.

Kaixhin · 2023-05-16T06:52:40Z

@rfali the part you highlighted from the Rainbow paper seems to refer to training settings, rather than evaluation settings. Unfortunately this bit of info was a bit hard to find, and I'm afraid I can't remember where I picked it up (it's been several years since I worked on Atari), so if you do find a primary source that says otherwise you should trust that instead.

rfali · 2023-05-16T07:04:21Z

Thanks @Kaixhin, the only other source I have found that does something similar is RLLib benchmark experiments which used e=0.01. https://github.com/ray-project/rl-experiments#dqn--rainbow

Note that RLlib evaluation scores include the 1% random actions of epsilon-greedy exploration. You can expect slightly higher rewards when rolling out the policies without any exploration at all.

rfali · 2023-05-16T07:31:21Z

Update: I did find that Dopamine also sets epsilon=0.001 during evaluation. See the Full Rainbow config here.

However, the above config also sets epsilon_train=0.01, which while the config also sets noisy=True, sets the epsilon to 0 later. See google/dopamine#201 (comment) and the associated commit for more details.

ankeshanand added 2 commits May 21, 2020 20:38

Perform fire on reset after doing no-ops, make fire_on_reset configur…

8c58819

…able

Fix typo

3149515

Set fire_on_reset to be False by default

aecb175

astooke merged commit 4acbe31 into astooke:master Jun 30, 2020

astooke mentioned this pull request Jul 7, 2020

Breakout benchmarks #174

Open

rfali mentioned this pull request May 16, 2023

Value of Epsilon Decay Period google/dopamine#201

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform fire on reset after doing no-ops, make fire_on_reset configurable #158

Perform fire on reset after doing no-ops, make fire_on_reset configurable #158

ankeshanand commented May 22, 2020 •

edited

Loading

codecov-commenter commented May 22, 2020 •

edited

Loading

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020 •

edited

Loading

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020

Kaixhin commented May 25, 2020

astooke commented Jun 30, 2020

DanielTakeshi commented Jul 19, 2020

rfali commented May 16, 2023 •

edited

Loading

Kaixhin commented May 16, 2023

rfali commented May 16, 2023 •

edited

Loading

rfali commented May 16, 2023 •

edited

Loading

Perform fire on reset after doing no-ops, make fire_on_reset configurable #158

Perform fire on reset after doing no-ops, make fire_on_reset configurable #158

Conversation

ankeshanand commented May 22, 2020 • edited Loading

codecov-commenter commented May 22, 2020 • edited Loading

Codecov Report

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020 • edited Loading

Kaixhin commented May 22, 2020

ankeshanand commented May 22, 2020

Kaixhin commented May 25, 2020

astooke commented Jun 30, 2020

DanielTakeshi commented Jul 19, 2020

rfali commented May 16, 2023 • edited Loading

Kaixhin commented May 16, 2023

rfali commented May 16, 2023 • edited Loading

rfali commented May 16, 2023 • edited Loading

ankeshanand commented May 22, 2020 •

edited

Loading

codecov-commenter commented May 22, 2020 •

edited

Loading

ankeshanand commented May 22, 2020 •

edited

Loading

rfali commented May 16, 2023 •

edited

Loading

rfali commented May 16, 2023 •

edited

Loading

rfali commented May 16, 2023 •

edited

Loading