Policy evaluation #6

Evil-Incorporated · 2019-04-01T20:27:27Z

In the paper you mention that the policy evaluation is done by sampling K random sample sequences. First, I am just curious as to how big of a K are you using, second with high model horizon of like H=16, how is that you are sampling a sequence that will even have anything close to optimal behavior, so essentially my question is when you say you are sampling K randomly sampled action sequences, are you sampling random actions for each time step because if you are then for large model horizons you need a huge value for K to get to a action sequence that might be close to optimal, or are you doing something else and if so please tell me what that is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy evaluation #6

Policy evaluation #6

Evil-Incorporated commented Apr 1, 2019 •

edited

Loading

Policy evaluation #6

Policy evaluation #6

Comments

Evil-Incorporated commented Apr 1, 2019 • edited Loading

Evil-Incorporated commented Apr 1, 2019 •

edited

Loading