You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper you mention that the policy evaluation is done by sampling K random sample sequences. First, I am just curious as to how big of a K are you using, second with high model horizon of like H=16, how is that you are sampling a sequence that will even have anything close to optimal behavior, so essentially my question is when you say you are sampling K randomly sampled action sequences, are you sampling random actions for each time step because if you are then for large model horizons you need a huge value for K to get to a action sequence that might be close to optimal, or are you doing something else and if so please tell me what that is.
The text was updated successfully, but these errors were encountered:
In the paper you mention that the policy evaluation is done by sampling K random sample sequences. First, I am just curious as to how big of a K are you using, second with high model horizon of like H=16, how is that you are sampling a sequence that will even have anything close to optimal behavior, so essentially my question is when you say you are sampling K randomly sampled action sequences, are you sampling random actions for each time step because if you are then for large model horizons you need a huge value for K to get to a action sequence that might be close to optimal, or are you doing something else and if so please tell me what that is.
The text was updated successfully, but these errors were encountered: