Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy evaluation #6

Open
Evil-Incorporated opened this issue Apr 1, 2019 · 0 comments
Open

Policy evaluation #6

Evil-Incorporated opened this issue Apr 1, 2019 · 0 comments

Comments

@Evil-Incorporated
Copy link

Evil-Incorporated commented Apr 1, 2019

In the paper you mention that the policy evaluation is done by sampling K random sample sequences. First, I am just curious as to how big of a K are you using, second with high model horizon of like H=16, how is that you are sampling a sequence that will even have anything close to optimal behavior, so essentially my question is when you say you are sampling K randomly sampled action sequences, are you sampling random actions for each time step because if you are then for large model horizons you need a huge value for K to get to a action sequence that might be close to optimal, or are you doing something else and if so please tell me what that is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant