tags | aliases | ||||
---|---|---|---|---|---|
|
|
A Policy is like an instruction list that tells an agent what actions to take. It can be deterministic in which case it is denoted by
Often these parameters are denoted with
A deterministic policie can easily be expresed as a normal function that takes the observations as inputs and returns the actions
network = Sequential(
layers.Dense(64, "relu", input_size="obs_dim")
layers.Dense(128, "relu")
layers.Dense(120, "tanh")
)
actions = network(observations)
Stochastic Policies usualy come in 2 flavors:
- Categorical policies for discrete action space
- Diagonal Gaussian policies for continuous action spaces
Computing the actions given a state from the policy is more difficult just as computling Log Likelihoods of particular actions.
A categorical policy is built like a classifier over all actions. the input is the observation.
Sampling Given the probabilities for each action PyTorch and Tensorflow have built in tools for sampling.
Log Likelihood Denote the last layer of probabilities as