Fixes and improvements

aai-institute · May 15, 2024 · 7801c0f · 7801c0f
1 parent 1205694
commit 7801c0f
Show file tree

Hide file tree

Showing 8 changed files with 496 additions and 366 deletions.
diff --git a/notebooks/nb_20_dynamic_programming.ipynb b/notebooks/nb_20_dynamic_programming.ipynb
@@ -133,7 +133,9 @@
     "\n",
     "Dynamic programming (DP) is a method that in general solves optimization problems that involve making a sequence of decisions (multi-stage decision making problems) by determining, for each decision, subproblems that can be solved similarily, such that an optimal solution of the original problem can be found from optimal solutions of subproblems. This method is based on *Bellman’s Principle of Optimality*:\n",
     "\n",
-    "> An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision."
+    "> An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.\n",
+    "\n",
+    "A problem is said to satisfy the Principle of Optimality if the sub-solutions of an optimal solution of the problem are themselves optimal solutions for their subproblems."
    ]
   },
   {
@@ -696,7 +698,7 @@
     "It can be show that for every $s, \\tau \\in [t_0, t_f]$, $s \\leq \\tau$ , and $\\mathbf{x} \\in \\mathbf{X}$, we have:\n",
     "\n",
     "$$\n",
-    "V(s, \\mathbf{x}) = \\underset{}{\\min} \\left[ \\int\\limits_{s}^{\\tau} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau + V(\\tau, \\mathbf{x}(\\tau)) \\right]\n",
+    "V(s, \\mathbf{x}) = \\underset{\\mathbf{u(t)}}{min} \\left[ \\int\\limits_{s}^{\\tau} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau + V(\\tau, \\mathbf{x}(\\tau)) \\right]\n",
     "$$\n",
     "\n",
     "Which is another version of the Bellman equation."
@@ -708,47 +710,13 @@
    "source": [
     "### Hamilton-Jacobi-Bellman Equation\n",
     "\n",
-    "The Hamilton-Jacobi-Bellman (HJB) equation is the following:\n",
+    "The Hamilton-Jacobi-Bellman (HJB) equation is given by:\n",
     "\n",
     "$$\n",
     "- \\frac{\\partial V}{\\partial t} = \\underset{\\mathbf{u(t)}}{min} \\left[ \\left( \\frac{\\partial V}{\\partial \\mathbf{x}} \\right)^T f(\\mathbf{x}(t), \\mathbf{u}(t)) + c(\\mathbf{x}(t), \\mathbf{u}(t)) \\right]\n",
     "$$\n",
     "\n",
-    "\n",
-    ":::{note} Mathematical Derivation\n",
-    ":class: dropdown\n",
-    "\n",
-    "It can be shown that the optimal condition is also satisfied in this case:\n",
-    "\n",
-    "$$\n",
-    "V(x(t_0), t_0, t_f) = \\underset{\\mathbf{u(t)}}{min} \\left[ c(t + \\right]\n",
-    "$$\n",
-    "\n",
-    "$$\n",
-    "V(x(t_0), t_0, t_f) = V(x(t_0), t_0, t) + V(x(t), t, t_f)\n",
-    "$$\n",
-    "\n",
-    "$$\n",
-    "\\frac{d}{dt}(V(\\mathbf{x}(t), t, t_f) = \\frac{\\partial V}{\\partial t} + \\left( \\frac{\\partial V}{\\partial \\mathbf{x}} \\right)^T \\underbrace{\\frac{d\\mathbf{x}}{dt}}_{= f(\\mathbf{x}(t), \\mathbf{u}(t))}\n",
-    "$$\n",
-    "\n",
-    "The term on the left can be simplified to:\n",
-    "\n",
-    "$$\n",
-    "\\frac{d}{dt}(V(\\mathbf{x}(t), t, t_f) &= \\underset{\\mathbf{u(t)}}{min} \\frac{d}{dt} \\left[ c(\\mathbf{x}(t_f), t_f) + \\int\\limits_{t_0}^{t_f} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau  \\right] \\\\\n",
-    "&= \\underset{\\mathbf{u(t)}}{min} \\left[ \\frac{d}{dt} \\int\\limits_{t_0}^{t_f} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau \\right] \\\\\n",
-    "&= \\underset{\\mathbf{u(t)}}{min} \\left[ -c(\\mathbf{x}(t), \\mathbf{u}(t)) \\right]\n",
-    "$$\n",
-    "\n",
-    "Replacing this new expression into the original one and moving some terms aronud we get:\n",
-    "\n",
-    "$$\n",
-    "- \\frac{\\partial V}{\\partial t} = \\underset{\\mathbf{u(t)}}{min} \\left[ \\left( \\frac{\\partial V}{\\partial \\mathbf{x}} \\right)^T f(\\mathbf{x}(t), \\mathbf{u}(t)) + c(\\mathbf{x}(t), \\mathbf{u}(t)) \\right]\n",
-    "$$\n",
-    "\n",
-    "This is called the Hamilton-Jacobi-Bellman (HJB) equation.\n",
-    "\n",
-    ":::"
+    "It is a sufficient condition of optimality i.e., that if $V$ satisfies the HJB, it must be the value function."
    ]
   }
  ],

diff --git a/notebooks/nb_40_LQR.ipynb b/notebooks/nb_40_LQR.ipynb
@@ -169,59 +169,67 @@
     "\\mathbf{x}_{t+1} = A \\mathbf{x}_t + B \\mathbf{u}_t\n",
     "$$\n",
     "\n",
-    "where $\\mathbf{x} \\in \\mathbb {R} ^{n}$ (that is, $\\mathbf{x}$ is an $n$-dimensional real-valued vector) is the state of the system and $\\mathbf{u} \\in \\mathbb {R} ^{m}$ is the control input.\n",
-    "\n",
-    "Given a quadratic cost function for the system in the infinite-horizon case, defined as:\n",
-    "\n",
-    "$$\n",
-    "J(\\mathbf{x}_0, \\mathbf{u}) = \\sum \\limits _{k = 0}^{\\infty} \\mathbf{x}_k^{T}Q \\mathbf{x}_k + \\mathbf{u}_k^{T} R \\mathbf{u}_k\n",
-    "$$\n",
-    "\n",
-    "With $Q = Q^T \\succeq 0$, $R = R^T \\succeq 0$."
+    "where $\\mathbf{x} \\in \\mathbb {R} ^{n}$ (that is, $\\mathbf{x}$ is an $n$-dimensional real-valued vector) is the state of the system and $\\mathbf{u} \\in \\mathbb {R} ^{m}$ is the control input."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {
-    "slideshow": {
-     "slide_type": "subslide"
-    }
-   },
+   "metadata": {},
    "source": [
+    "## Finite Horizon\n",
+    "\n",
+    "Given a quadratic cost function for the system in the finite-horizon case, defined as:\n",
+    "|\n",
+    "$$\n",
+    "J(\\mathbf{x}_0, \\mathbf{u}, N) = \\mathbf{x}_N^{T} Q_N \\mathbf{x}_N + \\sum \\limits _{k = 0}^{N-1} \\mathbf{x}_k^{T}Q \\mathbf{x}_k + \\mathbf{u}_k^{T} R \\mathbf{u}_k\n",
+    "$$\n",
+    "\n",
+    "With $Q_N = Q_N^T \\succeq 0$, $Q = Q^T \\succeq 0$, $R = R^T \\succeq 0$.\n",
+    "\n",
     "It can be shown that the control law that minizes the cost is given by:\n",
     "\n",
     "$$\n",
-    "\\mathbf{u}_k = -K \\mathbf{x}_k\n",
+    "\\mathbf{u}_k = K_k \\mathbf{x}_k\n",
     "$$\n",
     "\n",
-    "With: $K = (R + B^T P B)^{-1} B^T P B$\n",
+    "With: $K_k = -(R + B^T P_k B)^{-1} B^T P_k B$\n",
     "\n",
-    "and $P$ is found by solving the discrete time algebraic Riccati equation (DARE):\n",
+    "and $P_k$ is found by solving the discrete time dynamic Riccati equation:\n",
     "\n",
     "$$\n",
-    "Q + A^{T}PA-(A^{T}PB)(R+B^{T}PB)^{-1}(B^{T}PA) = P.\n",
-    "$$"
+    "P_{k-1} = Q + A^{T}P_k A - (A^{T} P_k B)(R+B^{T}P_k B)^{-1}(B^{T}P_k A)\n",
+    "$$\n",
+    "\n",
+    "With $P_N = Q_N$"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    ":::{admonition} Derivation\n",
-    ":class: tip\n",
+    "## Infinite Horizon\n",
+    "\n",
+    "Given a quadratic cost function for the system in the infinite-horizon case, defined as:\n",
     "\n",
     "$$\n",
-    "V_k(x) = \\min_{u \\in \\mathbf{U}} \\left[ c(x, u) + V_{k+1}(f(x, u)) \\right]\n",
+    "J(\\mathbf{x}_0, \\mathbf{u}) = \\sum \\limits _{k = 0}^{\\infty} \\mathbf{x}_k^{T}Q \\mathbf{x}_k + \\mathbf{u}_k^{T} R \\mathbf{u}_k\n",
     "$$\n",
     "\n",
+    "With $Q = Q^T \\succeq 0$, $R = R^T \\succeq 0$.\n",
+    "\n",
+    "It can be shown that the control law that minizes the cost is given by:\n",
+    "\n",
     "$$\n",
-    "V_k(x) = \\min_{u \\in \\mathbf{U}} \\left[ \\mathbf{x}_k^{T}Q \\mathbf{x}_k + \\mathbf{u}_k^{T} R \\mathbf{u}_k + V_{k+1}(f(x, u)) \\right]\n",
+    "\\mathbf{u}_k = K \\mathbf{x}_k\n",
     "$$\n",
     "\n",
+    "With: $K = -(R + B^T P B)^{-1} B^T P B$\n",
+    "\n",
+    "and $P$ is found by solving the discrete time algebraic Riccati equation (DARE):\n",
+    "\n",
     "$$\n",
-    "V_0(x) = \\min_{u \\in \\mathbf{U}} \\left[ \\mathbf{x}_0^{T}Q \\mathbf{x}_0 + \\mathbf{u}_0^{T} R \\mathbf{u}_0 + V_{k+1}(f(x_0, u_0)) \\right]\n",
-    "$$\n",
-    ":::"
+    "P = Q + A^{T}PA-(A^{T}PB)(R+B^{T}PB)^{-1}(B^{T}PA)\n",
+    "$$"
    ]
   },
   {
@@ -529,7 +537,7 @@
     ")\n",
     ":::\n",
     "\n",
-    "#### Simulation\n",
+    "**Simulation**\n",
     "\n",
     ":::{code-cell} ipython3\n",
     "inverted_pendulum_lqr_controller.reset_history()\n",
@@ -545,7 +553,7 @@
     "animate_inverted_pendulum_simulation(inverted_pendulum_lqr_controller.data)\n",
     ":::\n",
     "\n",
-    "#### Evaluation\n",
+    "**Evaluation**\n",
     "\n",
     ":::{code-cell} ipython3\n",
     "class LQRController:\n",