diff --git a/README.md b/README.md
index e39bbb2..5f4687b 100644
--- a/README.md
+++ b/README.md
@@ -71,7 +71,7 @@ To install specific packages, you can use the following commands:
 ```bash
 # From PyPI
 pip install a2perf[web_navigation]
-pip install a2perf[circuit_training]  
+pip install a2perf[circuit_training]
 pip install a2perf[quadruped_locomotion]
 
 # From source
diff --git a/docs/_static/img/gminiwob_scene.png b/docs/_static/img/gminiwob_scene.png
new file mode 100644
index 0000000..d509d85
Binary files /dev/null and b/docs/_static/img/gminiwob_scene.png differ
diff --git a/docs/content/circuit_training/CircuitTraining-Ariane-v0.ipynb b/docs/content/circuit_training/CircuitTraining-Ariane-v0.ipynb
deleted file mode 100644
index ffe26ab..0000000
--- a/docs/content/circuit_training/CircuitTraining-Ariane-v0.ipynb
+++ /dev/null
@@ -1,217 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": "# Ariane"
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-08-15T18:13:44.733493Z",
-     "start_time": "2024-08-15T18:13:39.500066Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "from a2perf.domains import circuit_training\n",
-    "import gymnasium as gym\n",
-    "\n",
-    "env = gym.make('CircuitTraining-Ariane-v0')"
-   ],
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "2024-08-15 14:13:40.063385: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
-      "2024-08-15 14:13:40.197600: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
-      "2024-08-15 14:13:40.197668: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
-      "2024-08-15 14:13:40.218491: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
-      "2024-08-15 14:13:40.269410: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
-      "To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
-      "2024-08-15 14:13:40.963834: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n"
-     ]
-    }
-   ],
-   "execution_count": 1
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "<table>\n",
-    "    <tr>\n",
-    "        <th style=\"text-align:right\">Action Space</th>\n",
-    "        <td style=\"text-align:left\">Discrete(16384)</td>\n",
-    "    </tr>\n",
-    "    <tr>\n",
-    "        <th style=\"text-align:right\">Observation Space</th>\n",
-    "        <td style=\"text-align:left\">\n",
-    "            Dict('current_node': Box(0, 3499, (1,), int32), 'fake_net_heatmap': Box(0.0, 1.0, (16384,), float32), 'is_node_placed': Box(0, 1, (3500,), int32), 'locations_x': Box(0.0, 1.0, (3500,), float32), 'locations_y': Box(0.0, 1.0, (3500,), float32), 'mask': Box(0, 1, (16384,), int32), 'netlist_index': Box(0, 0, (1,), int32))\n",
-    "        </td>\n",
-    "    </tr>\n",
-    "    <tr>\n",
-    "        <th style=\"text-align:right\">Reward Range</th>\n",
-    "        <td style=\"text-align:left\">(0, 1)</td>\n",
-    "    </tr>\n",
-    "    <tr>\n",
-    "        <th style=\"text-align:right\">Creation</th>\n",
-    "        <td style=\"text-align:left\">gym.make(\"CircuitTraining-Ariane-v0\")</td>\n",
-    "    </tr>\n",
-    "</table>"
-   ]
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "## Description\n",
-    "\n",
-    "Circuit Training is an open-source framework for generating chip floor plans with distributed deep reinforcement learning. This framework reproduces the methodology published in the Nature 2021 paper:\n",
-    "\n",
-    "A graph placement methodology for fast chip design. Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter & Jeff Dean, 2021. Nature, 594(7862), pp.207-212. [PDF]\n",
-    "\n",
-    "At each timestep, the agent must place a single macro onto the chip canvas. \n"
-   ]
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": "## Action Space\n"
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-07-21T13:06:40.008121Z",
-     "start_time": "2024-07-21T13:06:40.004369Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "env.action_space",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Discrete(16384)"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 2
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": "Circuit Training represents the chip canvas as a grid. The action space corresponds to the different locations that the next macro can be placed onto the canvas. In the Ariane netlist case, the canvas is of size $128 \\times 128$, resulting in $16384$ possible actions."
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": "## Observation Encoding\n"
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-07-21T13:13:09.395228Z",
-     "start_time": "2024-07-21T13:13:09.391323Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "env.observation_space",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Dict('current_node': Box(0, 3499, (1,), int32), 'fake_net_heatmap': Box(0.0, 1.0, (16384,), float32), 'is_node_placed': Box(0, 1, (3500,), int32), 'locations_x': Box(0.0, 1.0, (3500,), float32), 'locations_y': Box(0.0, 1.0, (3500,), float32), 'mask': Box(0, 1, (16384,), int32), 'netlist_index': Box(0, 0, (1,), int32))"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 3
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "| Key | Description |\n",
-    "|-----|-------------|\n",
-    "| current_node | The node currently being considered for placement |\n",
-    "| fake_net_heatmap | A representation of estimated connections between nodes |\n",
-    "| is_node_placed | Indicates which nodes have already been placed on the chip |\n",
-    "| locations_x | The x-coordinates of placed nodes |\n",
-    "| locations_y | The y-coordinates of placed nodes |\n",
-    "| mask | Indicates which actions are valid in the current state |\n",
-    "| netlist_index | Identifier for the current netlist being processed |"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Rewards\n",
-    "\n",
-    "The reward is evaluated at the end of each episode. The placement cost binary is used to calculate the reward based on proxy wirelength, congestion, and density. An infeasible placement results in a reward of -1.0.\n",
-    "\n",
-    "The reward function is defined as:\n",
-    "\n",
-    "$$R(p, g) = -\\text{Wirelength}(p, g) - \\lambda \\cdot \\text{Congestion}(p, g) - \\gamma \\cdot \\text{Density}(p, g)$$\n",
-    "\n",
-    "Where:\n",
-    "- $p$ represents the placement\n",
-    "- $g$ represents the netlist graph\n",
-    "- $\\lambda$ is the congestion weight\n",
-    "- $\\gamma$ is the density weight\n",
-    "\n",
-    "Default values in A2Perf:\n",
-    "- The congestion weight $\\lambda$ is set to 0.01\n",
-    "- The density weight $\\gamma$ is set to 0.01 \n",
-    "- The maximum density threshold is set to 0.6\n",
-    "\n",
-    "These default values are based on the methodology described in [Mirhoseini et al. (2021)][1].\n",
-    "\n",
-    "[1]: https://www.nature.com/articles/s41586-021-03544-w \"A graph placement methodology for fast chip design\""
-   ]
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "## Termination\n",
-    "\n",
-    "The episode is terminated once all macros have been placed on the canvas, then the final reward is calculated."
-   ]
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "## Registered Configurations\n",
-    "* `CircuitTraining-Ariane-v0`"
-   ]
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": ""
-  }
- ],
- "metadata": {
-  "language_info": {
-   "name": "python"
-  },
-  "kernelspec": {
-   "name": "python3",
-   "language": "python",
-   "display_name": "Python 3 (ipykernel)"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/docs/content/circuit_training/CircuitTraining-Ariane-v0.md b/docs/content/circuit_training/CircuitTraining-Ariane-v0.md
new file mode 100644
index 0000000..36e94ab
--- /dev/null
+++ b/docs/content/circuit_training/CircuitTraining-Ariane-v0.md
@@ -0,0 +1,127 @@
+# Ariane
+
+![The Ariane RISC-V CPU](../../_static/img/CircuitTraining-Ariane-v0.gif)
+
+## Environment Creation
+
+```python
+from a2perf.domains import circuit_training
+import gymnasium as gym
+
+env = gym.make('CircuitTraining-Ariane-v0')
+```
+
+#### Optional parameters:
+
+| Parameter                  | Type                  | Default                                                | Description                                                                                                              |
+|----------------------------|-----------------------|--------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| `netlist_file`             | str                   | path to `netlist.pb.txt`                               | Path to the input netlist file. Predefined by using `Ariane` or `ToyMacro`.                                              |
+| `init_placement`           | str                   | path to `initial.plc`                                  | Path to the input initial placement file, used to read grid and canvas size. Predefined by using `Ariane` or `ToyMacro`. |
+| `plc_wrapper_main`         | str                   | `a2perf/domains/circuit_training/bin/plc_wrapper_main` | Main PLC wrapper.                                                                                                        |
+| `create_placement_cost_fn` | Callable              | `placement_util.create_placement_cost`                 | A function that creates the `PlacementCost` object given the netlist and initial placement file.                         |
+| `std_cell_placer_mode`     | str                   | `'fd'`                                                 | Options for fast standard cells placement. The `fd` option uses the force-directed algorithm.                            |
+| `cost_info_fn`             | Callable              | `cost_info_function`                                   | The cost function that, given the `plc` object, returns the RL cost.                                                     |
+| `global_seed`              | int                   | `0`                                                    | Global seed for initializing environment features, ensuring consistency across actors.                                   |
+| `netlist_index`            | int                   | `0`                                                    | Netlist index in the model static features.                                                                              |
+| `is_eval`                  | bool                  | `False`                                                | If set, saves the final placement in `output_dir`.                                                                       |
+| `save_best_cost`           | bool                  | `False`                                                | If set, saves the placement if its cost is better than the previously saved placement.                                   |
+| `output_plc_file`          | str                   | `''`                                                   | The path to save the final placement.                                                                                    |
+| `cd_finetune`              | bool                  | `False`                                                | If True, runs coordinate descent to fine-tune macro orientations. Meant for evaluation, not training.                    |
+| `cd_plc_file`              | str                   | `'ppo_cd_placement.plc'`                               | Name of the coordinate descent fine-tuned `plc` file, saved in the same directory as `output_plc_file`.                  |
+| `train_step`               | Optional[tf.Variable] | `None`                                                 | A `tf.Variable` indicating the training step, used for saving `plc` files during evaluation.                             |
+| `output_all_features`      | bool                  | `False`                                                | If true, outputs all observation features. Otherwise, only outputs dynamic observations.                                 |
+| `node_order`               | str                   | `'descending_size_macro_first'`                        | The sequence order of nodes placed by RL.                                                                                |
+| `save_snapshot`            | bool                  | `True`                                                 | If true, saves the snapshot placement.                                                                                   |
+| `save_partial_placement`   | bool                  | `False`                                                | If true, evaluation also saves the placement even if RL does not place all nodes when an episode is done.                |
+| `use_legacy_reset`         | bool                  | `False`                                                | If true, uses the legacy reset method.                                                                                   |
+| `use_legacy_step`          | bool                  | `False`                                                | If true, uses the legacy step method.                                                                                    |
+| `render_mode`              | str                   | `None`                                                 | Specifies the rendering mode `human` or `rgb_array`, if any.                                                             |
+
+## Description
+
+Circuit Training is an open-source framework for generating chip floor plans
+with distributed deep reinforcement learning. This framework reproduces the
+methodology published in the Nature 2021 paper:
+
+A graph placement methodology for fast chip design. Azalia Mirhoseini, Anna
+Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang,
+Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, Jiwoo Pak, Andy Tong,
+Kavya Srinivasa, William Hang, Emre Tuncer, Quoc V. Le, James Laudon, Richard
+Ho, Roger Carpenter & Jeff Dean, 2021. Nature, 594(7862), pp.207-212. [PDF]
+
+At each timestep, the agent must place a single macro onto the chip canvas.
+
+**Note**: this environment is only supported on Linux based OSes.
+
+## Action Space
+
+Circuit Training represents the chip canvas as a grid.
+The action space corresponds to the different locations that the next macro can
+be placed onto the canvas without violating any hard constraints on density or
+blockages.
+At each step, the agent places a macro. Once all macros are placed, a
+force-directed method is used to place clusters of standard cells.
+
+## Observation Space
+
+The observation space encodes information about the partial placement of the
+circuit.
+This includes:
+
+- `current_node`: the current node to be placed, which is a single integer
+  ranging from 0 to 3499.
+- `fake_net_heatmap`: a fake net heatmap, which provides a continuous
+  representation of the heatmap with values between 0.0 and 1.0 across 16,384
+  points.
+- `is_node_placed`: the placement status of nodes, a binary array of size 3500,
+  showing whether each node has been placed (1) or not (0).
+- `locations_x`: node locations in the x-axis, a continuous array of size 3500
+  with values ranging from 0.0 to 1.0, representing the x-coordinates of the
+  nodes.
+- `locations_y`: node locations in the y-axis, similar to locations_x, but for
+  the y-coordinates.
+- `mask`: a mask, a binary array of size 16,384 indicating the validity or
+  usability of each point in the net heatmap.
+- `netlist_index`: a netlist index. This usually acts as a placeholder, and is
+  fixed at 0.
+
+## Rewards
+
+The reward is evaluated at the end of each episode. The placement cost binary is
+used to calculate the reward based on proxy wirelength, congestion, and density.
+An infeasible placement results in a reward of -1.0.
+
+The reward function is defined as:
+
+$$R(p, g) = -\text{Wirelength}(p, g) - \lambda \cdot \text{Congestion}(p, g) - \gamma \cdot \text{Density}(p, g)$$
+
+Where:
+
+- $p$ represents the placement
+- $g$ represents the netlist graph
+- $\lambda$ is the congestion weight
+- $\gamma$ is the density weight
+
+Default values in A2Perf:
+
+- The congestion weight $\lambda$ is set to 0.01
+- The density weight $\gamma$ is set to 0.01
+- The maximum density threshold is set to 0.6
+
+These default values are based on the methodology described
+in [Mirhoseini et al. (2021)][1].
+
+[1]: https://www.nature.com/articles/s41586-021-03544-w "A graph placement methodology for fast chip design"
+
+## Episode End
+
+The episode ends when all nodes have been placed.
+
+## Termination
+
+The episode is terminated once all macros have been placed on the canvas, then
+the final reward is calculated.
+
+## Registered Configurations
+
+* `CircuitTraining-Ariane-v0`
diff --git a/docs/content/circuit_training/CircuitTraining-ToyMacro-v0.md b/docs/content/circuit_training/CircuitTraining-ToyMacro-v0.md
new file mode 100644
index 0000000..542feb7
--- /dev/null
+++ b/docs/content/circuit_training/CircuitTraining-ToyMacro-v0.md
@@ -0,0 +1,127 @@
+# Ariane
+
+![The Toy Macro Standard Cell  CPU](../../_static/img/CircuitTraining-ToyMacro-v0.gif)
+
+## Environment Creation
+
+```python
+from a2perf.domains import circuit_training
+import gymnasium as gym
+
+env = gym.make('CircuitTraining-ToyMacro-v0')
+```
+
+#### Optional parameters:
+
+| Parameter                  | Type                  | Default                                                | Description                                                                                                              |
+|----------------------------|-----------------------|--------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| `netlist_file`             | str                   | path to `netlist.pb.txt`                               | Path to the input netlist file. Predefined by using `Ariane` or `ToyMacro`.                                              |
+| `init_placement`           | str                   | path to `initial.plc`                                  | Path to the input initial placement file, used to read grid and canvas size. Predefined by using `Ariane` or `ToyMacro`. |
+| `plc_wrapper_main`         | str                   | `a2perf/domains/circuit_training/bin/plc_wrapper_main` | Main PLC wrapper.                                                                                                        |
+| `create_placement_cost_fn` | Callable              | `placement_util.create_placement_cost`                 | A function that creates the `PlacementCost` object given the netlist and initial placement file.                         |
+| `std_cell_placer_mode`     | str                   | `'fd'`                                                 | Options for fast standard cells placement. The `fd` option uses the force-directed algorithm.                            |
+| `cost_info_fn`             | Callable              | `cost_info_function`                                   | The cost function that, given the `plc` object, returns the RL cost.                                                     |
+| `global_seed`              | int                   | `0`                                                    | Global seed for initializing environment features, ensuring consistency across actors.                                   |
+| `netlist_index`            | int                   | `0`                                                    | Netlist index in the model static features.                                                                              |
+| `is_eval`                  | bool                  | `False`                                                | If set, saves the final placement in `output_dir`.                                                                       |
+| `save_best_cost`           | bool                  | `False`                                                | If set, saves the placement if its cost is better than the previously saved placement.                                   |
+| `output_plc_file`          | str                   | `''`                                                   | The path to save the final placement.                                                                                    |
+| `cd_finetune`              | bool                  | `False`                                                | If True, runs coordinate descent to fine-tune macro orientations. Meant for evaluation, not training.                    |
+| `cd_plc_file`              | str                   | `'ppo_cd_placement.plc'`                               | Name of the coordinate descent fine-tuned `plc` file, saved in the same directory as `output_plc_file`.                  |
+| `train_step`               | Optional[tf.Variable] | `None`                                                 | A `tf.Variable` indicating the training step, used for saving `plc` files during evaluation.                             |
+| `output_all_features`      | bool                  | `False`                                                | If true, outputs all observation features. Otherwise, only outputs dynamic observations.                                 |
+| `node_order`               | str                   | `'descending_size_macro_first'`                        | The sequence order of nodes placed by RL.                                                                                |
+| `save_snapshot`            | bool                  | `True`                                                 | If true, saves the snapshot placement.                                                                                   |
+| `save_partial_placement`   | bool                  | `False`                                                | If true, evaluation also saves the placement even if RL does not place all nodes when an episode is done.                |
+| `use_legacy_reset`         | bool                  | `False`                                                | If true, uses the legacy reset method.                                                                                   |
+| `use_legacy_step`          | bool                  | `False`                                                | If true, uses the legacy step method.                                                                                    |
+| `render_mode`              | str                   | `None`                                                 | Specifies the rendering mode `human` or `rgb_array`, if any.                                                             |
+
+## Description
+
+Circuit Training is an open-source framework for generating chip floor plans
+with distributed deep reinforcement learning. This framework reproduces the
+methodology published in the Nature 2021 paper:
+
+A graph placement methodology for fast chip design. Azalia Mirhoseini, Anna
+Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang,
+Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, Jiwoo Pak, Andy Tong,
+Kavya Srinivasa, William Hang, Emre Tuncer, Quoc V. Le, James Laudon, Richard
+Ho, Roger Carpenter & Jeff Dean, 2021. Nature, 594(7862), pp.207-212. [PDF]
+
+At each timestep, the agent must place a single macro onto the chip canvas.
+
+**Note**: this environment is only supported on Linux based OSes.
+
+## Action Space
+
+Circuit Training represents the chip canvas as a grid.
+The action space corresponds to the different locations that the next macro can
+be placed onto the canvas without violating any hard constraints on density or
+blockages.
+At each step, the agent places a macro. Once all macros are placed, a
+force-directed method is used to place clusters of standard cells.
+
+## Observation Space
+
+The observation space encodes information about the partial placement of the
+circuit.
+This includes:
+
+- `current_node`: the current node to be placed, which is a single integer
+  ranging from 0 to 3499.
+- `fake_net_heatmap`: a fake net heatmap, which provides a continuous
+  representation of the heatmap with values between 0.0 and 1.0 across 16,384
+  points.
+- `is_node_placed`: the placement status of nodes, a binary array of size 3500,
+  showing whether each node has been placed (1) or not (0).
+- `locations_x`: node locations in the x-axis, a continuous array of size 3500
+  with values ranging from 0.0 to 1.0, representing the x-coordinates of the
+  nodes.
+- `locations_y`: node locations in the y-axis, similar to locations_x, but for
+  the y-coordinates.
+- `mask`: a mask, a binary array of size 16,384 indicating the validity or
+  usability of each point in the net heatmap.
+- `netlist_index`: a netlist index. This usually acts as a placeholder, and is
+  fixed at 0.
+
+## Rewards
+
+The reward is evaluated at the end of each episode. The placement cost binary is
+used to calculate the reward based on proxy wirelength, congestion, and density.
+An infeasible placement results in a reward of -1.0.
+
+The reward function is defined as:
+
+$$R(p, g) = -\text{Wirelength}(p, g) - \lambda \cdot \text{Congestion}(p, g) - \gamma \cdot \text{Density}(p, g)$$
+
+Where:
+
+- $p$ represents the placement
+- $g$ represents the netlist graph
+- $\lambda$ is the congestion weight
+- $\gamma$ is the density weight
+
+Default values in A2Perf:
+
+- The congestion weight $\lambda$ is set to 0.01
+- The density weight $\gamma$ is set to 0.01
+- The maximum density threshold is set to 0.6
+
+These default values are based on the methodology described
+in [Mirhoseini et al. (2021)][1].
+
+[1]: https://www.nature.com/articles/s41586-021-03544-w "A graph placement methodology for fast chip design"
+
+## Episode End
+
+The episode ends when all nodes have been placed.
+
+## Termination
+
+The episode is terminated once all macros have been placed on the canvas, then
+the final reward is calculated.
+
+## Registered Configurations
+
+* `CircuitTraining-ToyMacro-v0`
diff --git a/docs/content/circuit_training/CircuitTraining-ToyMacroStdcell-v0.ipynb b/docs/content/circuit_training/CircuitTraining-ToyMacroStdcell-v0.ipynb
deleted file mode 100644
index f0111e1..0000000
--- a/docs/content/circuit_training/CircuitTraining-ToyMacroStdcell-v0.ipynb
+++ /dev/null
@@ -1,267 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": "# Toy Macro Standard Cell"
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-07-21T17:47:38.578728Z",
-     "start_time": "2024-07-21T17:47:35.999779Z"
-    }
-   },
-   "cell_type": "code",
-   "source": [
-    "from a2perf.domains import circuit_training\n",
-    "import gymnasium as gym\n",
-    "\n",
-    "env = gym.make('CircuitTraining-ToyMacro-v0')"
-   ],
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "2024-07-21 13:47:36.273879: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
-      "2024-07-21 13:47:36.299009: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
-      "2024-07-21 13:47:36.299034: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
-      "2024-07-21 13:47:36.300083: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
-      "2024-07-21 13:47:36.304647: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
-      "To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
-      "2024-07-21 13:47:36.808584: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
-      "WARNING:absl:block_name is not set. Please add the block_name in:\n",
-      "/home/ike2030/workspace/a2perf/repo_new/a2perf/domains/circuit_training/circuit_training/environment/test_data/toy_macro_stdcell/netlist.pb.txt\n",
-      "or in:\n",
-      "/home/ike2030/workspace/a2perf/repo_new/a2perf/domains/circuit_training/circuit_training/environment/test_data/toy_macro_stdcell/initial.plc\n",
-      "/home/ike2030/miniconda3/envs/a2perf_circuit_training/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.\n",
-      "  return _methods._mean(a, axis=axis, dtype=dtype,\n",
-      "/home/ike2030/miniconda3/envs/a2perf_circuit_training/lib/python3.10/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in divide\n",
-      "  ret = ret.dtype.type(ret / rcount)\n",
-      "/home/ike2030/miniconda3/envs/a2perf_circuit_training/lib/python3.10/site-packages/gymnasium/utils/passive_env_checker.py:32: UserWarning: \u001B[33mWARN: A Box observation space maximum and minimum values are equal. Actual equal coordinates: [(0,)]\u001B[0m\n",
-      "  logger.warn(\n"
-     ]
-    }
-   ],
-   "execution_count": 3
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-07-21T17:47:43.889997Z",
-     "start_time": "2024-07-21T17:47:43.885078Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "env.observation_space",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Dict('current_node': Box(0, 3499, (1,), int32), 'fake_net_heatmap': Box(0.0, 1.0, (16384,), float32), 'is_node_placed': Box(0, 1, (3500,), int32), 'locations_x': Box(0.0, 1.0, (3500,), float32), 'locations_y': Box(0.0, 1.0, (3500,), float32), 'mask': Box(0, 1, (16384,), int32), 'netlist_index': Box(0, 0, (1,), int32))"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 4
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-07-21T17:47:58.159889Z",
-     "start_time": "2024-07-21T17:47:58.157250Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "env.action_space",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Discrete(16384)"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 5
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "<table>\n",
-    "    <tr>\n",
-    "        <th style=\"text-align:right\">Action Space</th>\n",
-    "        <td style=\"text-align:left\">Discrete(16384)</td>\n",
-    "    </tr>\n",
-    "    <tr>\n",
-    "        <th style=\"text-align:right\">Observation Space</th>\n",
-    "        <td style=\"text-align:left\">\n",
-    "            Dict('current_node': Box(0, 3499, (1,), int32), 'fake_net_heatmap': Box(0.0, 1.0, (16384,), float32), 'is_node_placed': Box(0, 1, (3500,), int32), 'locations_x': Box(0.0, 1.0, (3500,), float32), 'locations_y': Box(0.0, 1.0, (3500,), float32), 'mask': Box(0, 1, (16384,), int32), 'netlist_index': Box(0, 0, (1,), int32))\n",
-    "        </td>\n",
-    "    </tr>\n",
-    "    <tr>\n",
-    "        <th style=\"text-align:right\">Creation</th>\n",
-    "        <td style=\"text-align:left\">gym.make(\"CircuitTraining-ToyMacro-v0\")</td>\n",
-    "    </tr>\n",
-    "</table>"
-   ]
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "## Description\n",
-    "\n",
-    "Circuit Training is an open-source framework for generating chip floor plans with distributed deep reinforcement learning. This framework reproduces the methodology published in the Nature 2021 paper:\n",
-    "\n",
-    "A graph placement methodology for fast chip design. Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter & Jeff Dean, 2021. Nature, 594(7862), pp.207-212. [PDF]\n",
-    "\n",
-    "At each timestep, the agent must place a single macro onto the chip canvas. \n"
-   ]
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": "## Action Space\n"
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-07-21T17:51:19.077192Z",
-     "start_time": "2024-07-21T17:51:19.071196Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "env.action_space",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Discrete(16384)"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 7
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "\n",
-    "Circuit Training represents the chip canvas as a grid. The action space corresponds to the different locations that the next macro can be placed onto the canvas. In the Toy Macro netlist case, the canvas is of size $128 \\times 128$, resulting in $16384$ possible actions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": "## Observation Encoding\n"
-  },
-  {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-07-21T17:51:49.691864Z",
-     "start_time": "2024-07-21T17:51:49.683712Z"
-    }
-   },
-   "cell_type": "code",
-   "source": "env.observation_space",
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Dict('current_node': Box(0, 3499, (1,), int32), 'fake_net_heatmap': Box(0.0, 1.0, (16384,), float32), 'is_node_placed': Box(0, 1, (3500,), int32), 'locations_x': Box(0.0, 1.0, (3500,), float32), 'locations_y': Box(0.0, 1.0, (3500,), float32), 'mask': Box(0, 1, (16384,), int32), 'netlist_index': Box(0, 0, (1,), int32))"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "execution_count": 8
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "| Key | Description |\n",
-    "|-----|-------------|\n",
-    "| current_node | The node currently being considered for placement |\n",
-    "| fake_net_heatmap | A representation of estimated connections between nodes |\n",
-    "| is_node_placed | Indicates which nodes have already been placed on the chip |\n",
-    "| locations_x | The x-coordinates of placed nodes |\n",
-    "| locations_y | The y-coordinates of placed nodes |\n",
-    "| mask | Indicates which actions are valid in the current state |\n",
-    "| netlist_index | Identifier for the current netlist being processed |"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Rewards\n",
-    "\n",
-    "The reward is evaluated at the end of each episode. The placement cost binary is used to calculate the reward based on proxy wirelength, congestion, and density. An infeasible placement results in a reward of -1.0.\n",
-    "\n",
-    "The reward function is defined as:\n",
-    "\n",
-    "$$R(p, g) = -\\text{Wirelength}(p, g) - \\lambda \\cdot \\text{Congestion}(p, g) - \\gamma \\cdot \\text{Density}(p, g)$$\n",
-    "\n",
-    "Where:\n",
-    "- $p$ represents the placement\n",
-    "- $g$ represents the netlist graph\n",
-    "- $\\lambda$ is the congestion weight\n",
-    "- $\\gamma$ is the density weight\n",
-    "\n",
-    "Default values in A2Perf:\n",
-    "- The congestion weight $\\lambda$ is set to 0.01\n",
-    "- The density weight $\\gamma$ is set to 0.01 \n",
-    "- The maximum density threshold is set to 0.6\n",
-    "\n",
-    "These default values are based on the methodology described in [Mirhoseini et al. (2021)][1].\n",
-    "\n",
-    "[1]: https://www.nature.com/articles/s41586-021-03544-w \"A graph placement methodology for fast chip design\""
-   ]
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "## Termination\n",
-    "\n",
-    "The episode is terminated once all macros have been placed on the canvas, then the final reward is calculated."
-   ]
-  },
-  {
-   "metadata": {},
-   "cell_type": "markdown",
-   "source": [
-    "## Registered Configurations\n",
-    "- `CircuitTraining-ToyMacro-v0`"
-   ]
-  }
- ],
- "metadata": {
-  "language_info": {
-   "name": "python"
-  },
-  "kernelspec": {
-   "name": "python3",
-   "language": "python",
-   "display_name": "Python 3 (ipykernel)"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/docs/content/web_navigation/WebNavigation-Difficulty-01-v0.ipynb b/docs/content/web_navigation/WebNavigation-Difficulty-01-v0.ipynb
index dfb9e6a..b3499b7 100644
--- a/docs/content/web_navigation/WebNavigation-Difficulty-01-v0.ipynb
+++ b/docs/content/web_navigation/WebNavigation-Difficulty-01-v0.ipynb
@@ -3,7 +3,13 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "# Web Navigation "
+   "source": [
+    "# Web Navigation\n",
+    "\n",
+    "This environment is included in A2Perf.\n",
+    "\n",
+    "![The Ariane RISC-V CPU](../../../media/gminiwob_scene.png)"
+   ]
   },
   {
    "cell_type": "markdown",

Action Space	Discrete(16384)
Observation Space	\n", - " Dict('current_node': Box(0, 3499, (1,), int32), 'fake_net_heatmap': Box(0.0, 1.0, (16384,), float32), 'is_node_placed': Box(0, 1, (3500,), int32), 'locations_x': Box(0.0, 1.0, (3500,), float32), 'locations_y': Box(0.0, 1.0, (3500,), float32), 'mask': Box(0, 1, (16384,), int32), 'netlist_index': Box(0, 0, (1,), int32))\n", - "
Reward Range	(0, 1)
Creation	gym.make(\"CircuitTraining-Ariane-v0\")