From 6376f1a40ac5b5db83ce0dd9fdf3071fa5813dfd Mon Sep 17 00:00:00 2001
From: Yang Zuo <66569051+zuoyangjkpi@users.noreply.github.com>
Date: Sun, 5 Jun 2022 16:09:09 +0200
Subject: [PATCH] Update sac.rst

I'm not pretty sure but I think the squashed Gaussian distribution seems to be the stochastic policy distribution instead of the action itself right? The action must then be obtained through sampling from the policy distribution.
---
 docs/algorithms/sac.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/algorithms/sac.rst b/docs/algorithms/sac.rst
index 6df7ff501..062794e73 100644
--- a/docs/algorithms/sac.rst
+++ b/docs/algorithms/sac.rst
@@ -153,7 +153,7 @@ The way we optimize the policy makes use of the **reparameterization trick**, in
 
 .. math::
 
-    \tilde{a}_{\theta}(s, \xi) = \tanh\left( \mu_{\theta}(s) + \sigma_{\theta}(s) \odot \xi \right), \;\;\;\;\; \xi \sim \mathcal{N}(0, I).
+    \pi_{\theta}(\tilde{a}_{\theta}(s,\xi)|s)} = \tanh\left( \mu_{\theta}(s) + \sigma_{\theta}(s) \odot \xi \right), \;\;\;\;\; \xi \sim \mathcal{N}(0, I).
 
 .. admonition:: You Should Know
 
@@ -318,4 +318,4 @@ Other Public Implementations
 
 .. _`SAC release repo`: https://github.com/haarnoja/sac
 .. _`Softlearning repo`: https://github.com/rail-berkeley/softlearning
-.. _`Yarats and Kostrikov repo`: https://github.com/denisyarats/pytorch_sac
\ No newline at end of file
+.. _`Yarats and Kostrikov repo`: https://github.com/denisyarats/pytorch_sac