From 4c0e6b34e72e6a849e926dc29b88a8f5cf4ad15c Mon Sep 17 00:00:00 2001
From: Diederik Huige <31347265+Dhuige@users.noreply.github.com>
Date: Mon, 18 Dec 2023 12:08:56 +0100
Subject: [PATCH] Update sac.rst

Unclear phrasing referring to though -> although. But this could also be changed to through depending on context
---
 docs/algorithms/sac.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/algorithms/sac.rst b/docs/algorithms/sac.rst
index 6df7ff501..96f44d6a9 100644
--- a/docs/algorithms/sac.rst
+++ b/docs/algorithms/sac.rst
@@ -159,7 +159,7 @@ The way we optimize the policy makes use of the **reparameterization trick**, in
 
     This policy has two key differences from the policies we use in the other policy optimization algorithms:
 
-    **1. The squashing function.** The :math:`\tanh` in the SAC policy ensures that actions are bounded to a finite range. This is absent in the VPG, TRPO, and PPO policies. It also changes the distribution: before the :math:`\tanh` the SAC policy is a factored Gaussian like the other algorithms' policies, but after the :math:`\tanh` it is not. (You can still compute the log-probabilities of actions in closed form, though: see the paper appendix for details.)
+    **1. The squashing function.** The :math:`\tanh` in the SAC policy ensures that actions are bounded to a finite range. This is absent in the VPG, TRPO, and PPO policies. It also changes the distribution: before the :math:`\tanh` the SAC policy is a factored Gaussian like the other algorithms' policies, but after the :math:`\tanh` it is not. (You can still compute the log-probabilities of actions in closed form, although: see the paper appendix for details.)
 
     **2. The way standard deviations are parameterized.** In VPG, TRPO, and PPO, we represent the log std devs with state-independent parameter vectors. In SAC, we represent the log std devs as outputs from the neural network, meaning that they depend on state in a complex way. SAC with state-independent log std devs, in our experience, did not work. (Can you think of why? Or better yet: run an experiment to verify?)
 
@@ -318,4 +318,4 @@ Other Public Implementations
 
 .. _`SAC release repo`: https://github.com/haarnoja/sac
 .. _`Softlearning repo`: https://github.com/rail-berkeley/softlearning
-.. _`Yarats and Kostrikov repo`: https://github.com/denisyarats/pytorch_sac
\ No newline at end of file
+.. _`Yarats and Kostrikov repo`: https://github.com/denisyarats/pytorch_sac