From 6376f1a40ac5b5db83ce0dd9fdf3071fa5813dfd Mon Sep 17 00:00:00 2001 From: Yang Zuo <66569051+zuoyangjkpi@users.noreply.github.com> Date: Sun, 5 Jun 2022 16:09:09 +0200 Subject: [PATCH] Update sac.rst I'm not pretty sure but I think the squashed Gaussian distribution seems to be the stochastic policy distribution instead of the action itself right? The action must then be obtained through sampling from the policy distribution. --- docs/algorithms/sac.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/algorithms/sac.rst b/docs/algorithms/sac.rst index 6df7ff501..062794e73 100644 --- a/docs/algorithms/sac.rst +++ b/docs/algorithms/sac.rst @@ -153,7 +153,7 @@ The way we optimize the policy makes use of the **reparameterization trick**, in .. math:: - \tilde{a}_{\theta}(s, \xi) = \tanh\left( \mu_{\theta}(s) + \sigma_{\theta}(s) \odot \xi \right), \;\;\;\;\; \xi \sim \mathcal{N}(0, I). + \pi_{\theta}(\tilde{a}_{\theta}(s,\xi)|s)} = \tanh\left( \mu_{\theta}(s) + \sigma_{\theta}(s) \odot \xi \right), \;\;\;\;\; \xi \sim \mathcal{N}(0, I). .. admonition:: You Should Know @@ -318,4 +318,4 @@ Other Public Implementations .. _`SAC release repo`: https://github.com/haarnoja/sac .. _`Softlearning repo`: https://github.com/rail-berkeley/softlearning -.. _`Yarats and Kostrikov repo`: https://github.com/denisyarats/pytorch_sac \ No newline at end of file +.. _`Yarats and Kostrikov repo`: https://github.com/denisyarats/pytorch_sac