Skip to content

Commit

Permalink
Fixes and improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
AnesBenmerzoug committed May 15, 2024
1 parent 1205694 commit 7801c0f
Show file tree
Hide file tree
Showing 8 changed files with 496 additions and 366 deletions.
44 changes: 6 additions & 38 deletions notebooks/nb_20_dynamic_programming.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,9 @@
"\n",
"Dynamic programming (DP) is a method that in general solves optimization problems that involve making a sequence of decisions (multi-stage decision making problems) by determining, for each decision, subproblems that can be solved similarily, such that an optimal solution of the original problem can be found from optimal solutions of subproblems. This method is based on *Bellman’s Principle of Optimality*:\n",
"\n",
"> An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision."
"> An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.\n",
"\n",
"A problem is said to satisfy the Principle of Optimality if the sub-solutions of an optimal solution of the problem are themselves optimal solutions for their subproblems."
]
},
{
Expand Down Expand Up @@ -696,7 +698,7 @@
"It can be show that for every $s, \\tau \\in [t_0, t_f]$, $s \\leq \\tau$ , and $\\mathbf{x} \\in \\mathbf{X}$, we have:\n",
"\n",
"$$\n",
"V(s, \\mathbf{x}) = \\underset{}{\\min} \\left[ \\int\\limits_{s}^{\\tau} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau + V(\\tau, \\mathbf{x}(\\tau)) \\right]\n",
"V(s, \\mathbf{x}) = \\underset{\\mathbf{u(t)}}{min} \\left[ \\int\\limits_{s}^{\\tau} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau + V(\\tau, \\mathbf{x}(\\tau)) \\right]\n",
"$$\n",
"\n",
"Which is another version of the Bellman equation."
Expand All @@ -708,47 +710,13 @@
"source": [
"### Hamilton-Jacobi-Bellman Equation\n",
"\n",
"The Hamilton-Jacobi-Bellman (HJB) equation is the following:\n",
"The Hamilton-Jacobi-Bellman (HJB) equation is given by:\n",
"\n",
"$$\n",
"- \\frac{\\partial V}{\\partial t} = \\underset{\\mathbf{u(t)}}{min} \\left[ \\left( \\frac{\\partial V}{\\partial \\mathbf{x}} \\right)^T f(\\mathbf{x}(t), \\mathbf{u}(t)) + c(\\mathbf{x}(t), \\mathbf{u}(t)) \\right]\n",
"$$\n",
"\n",
"\n",
":::{note} Mathematical Derivation\n",
":class: dropdown\n",
"\n",
"It can be shown that the optimal condition is also satisfied in this case:\n",
"\n",
"$$\n",
"V(x(t_0), t_0, t_f) = \\underset{\\mathbf{u(t)}}{min} \\left[ c(t + \\right]\n",
"$$\n",
"\n",
"$$\n",
"V(x(t_0), t_0, t_f) = V(x(t_0), t_0, t) + V(x(t), t, t_f)\n",
"$$\n",
"\n",
"$$\n",
"\\frac{d}{dt}(V(\\mathbf{x}(t), t, t_f) = \\frac{\\partial V}{\\partial t} + \\left( \\frac{\\partial V}{\\partial \\mathbf{x}} \\right)^T \\underbrace{\\frac{d\\mathbf{x}}{dt}}_{= f(\\mathbf{x}(t), \\mathbf{u}(t))}\n",
"$$\n",
"\n",
"The term on the left can be simplified to:\n",
"\n",
"$$\n",
"\\frac{d}{dt}(V(\\mathbf{x}(t), t, t_f) &= \\underset{\\mathbf{u(t)}}{min} \\frac{d}{dt} \\left[ c(\\mathbf{x}(t_f), t_f) + \\int\\limits_{t_0}^{t_f} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau \\right] \\\\\n",
"&= \\underset{\\mathbf{u(t)}}{min} \\left[ \\frac{d}{dt} \\int\\limits_{t_0}^{t_f} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau \\right] \\\\\n",
"&= \\underset{\\mathbf{u(t)}}{min} \\left[ -c(\\mathbf{x}(t), \\mathbf{u}(t)) \\right]\n",
"$$\n",
"\n",
"Replacing this new expression into the original one and moving some terms aronud we get:\n",
"\n",
"$$\n",
"- \\frac{\\partial V}{\\partial t} = \\underset{\\mathbf{u(t)}}{min} \\left[ \\left( \\frac{\\partial V}{\\partial \\mathbf{x}} \\right)^T f(\\mathbf{x}(t), \\mathbf{u}(t)) + c(\\mathbf{x}(t), \\mathbf{u}(t)) \\right]\n",
"$$\n",
"\n",
"This is called the Hamilton-Jacobi-Bellman (HJB) equation.\n",
"\n",
":::"
"It is a sufficient condition of optimality i.e., that if $V$ satisfies the HJB, it must be the value function."
]
}
],
Expand Down
64 changes: 36 additions & 28 deletions notebooks/nb_40_LQR.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -169,59 +169,67 @@
"\\mathbf{x}_{t+1} = A \\mathbf{x}_t + B \\mathbf{u}_t\n",
"$$\n",
"\n",
"where $\\mathbf{x} \\in \\mathbb {R} ^{n}$ (that is, $\\mathbf{x}$ is an $n$-dimensional real-valued vector) is the state of the system and $\\mathbf{u} \\in \\mathbb {R} ^{m}$ is the control input.\n",
"\n",
"Given a quadratic cost function for the system in the infinite-horizon case, defined as:\n",
"\n",
"$$\n",
"J(\\mathbf{x}_0, \\mathbf{u}) = \\sum \\limits _{k = 0}^{\\infty} \\mathbf{x}_k^{T}Q \\mathbf{x}_k + \\mathbf{u}_k^{T} R \\mathbf{u}_k\n",
"$$\n",
"\n",
"With $Q = Q^T \\succeq 0$, $R = R^T \\succeq 0$."
"where $\\mathbf{x} \\in \\mathbb {R} ^{n}$ (that is, $\\mathbf{x}$ is an $n$-dimensional real-valued vector) is the state of the system and $\\mathbf{u} \\in \\mathbb {R} ^{m}$ is the control input."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"metadata": {},
"source": [
"## Finite Horizon\n",
"\n",
"Given a quadratic cost function for the system in the finite-horizon case, defined as:\n",
"|\n",
"$$\n",
"J(\\mathbf{x}_0, \\mathbf{u}, N) = \\mathbf{x}_N^{T} Q_N \\mathbf{x}_N + \\sum \\limits _{k = 0}^{N-1} \\mathbf{x}_k^{T}Q \\mathbf{x}_k + \\mathbf{u}_k^{T} R \\mathbf{u}_k\n",
"$$\n",
"\n",
"With $Q_N = Q_N^T \\succeq 0$, $Q = Q^T \\succeq 0$, $R = R^T \\succeq 0$.\n",
"\n",
"It can be shown that the control law that minizes the cost is given by:\n",
"\n",
"$$\n",
"\\mathbf{u}_k = -K \\mathbf{x}_k\n",
"\\mathbf{u}_k = K_k \\mathbf{x}_k\n",
"$$\n",
"\n",
"With: $K = (R + B^T P B)^{-1} B^T P B$\n",
"With: $K_k = -(R + B^T P_k B)^{-1} B^T P_k B$\n",
"\n",
"and $P$ is found by solving the discrete time algebraic Riccati equation (DARE):\n",
"and $P_k$ is found by solving the discrete time dynamic Riccati equation:\n",
"\n",
"$$\n",
"Q + A^{T}PA-(A^{T}PB)(R+B^{T}PB)^{-1}(B^{T}PA) = P.\n",
"$$"
"P_{k-1} = Q + A^{T}P_k A - (A^{T} P_k B)(R+B^{T}P_k B)^{-1}(B^{T}P_k A)\n",
"$$\n",
"\n",
"With $P_N = Q_N$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{admonition} Derivation\n",
":class: tip\n",
"## Infinite Horizon\n",
"\n",
"Given a quadratic cost function for the system in the infinite-horizon case, defined as:\n",
"\n",
"$$\n",
"V_k(x) = \\min_{u \\in \\mathbf{U}} \\left[ c(x, u) + V_{k+1}(f(x, u)) \\right]\n",
"J(\\mathbf{x}_0, \\mathbf{u}) = \\sum \\limits _{k = 0}^{\\infty} \\mathbf{x}_k^{T}Q \\mathbf{x}_k + \\mathbf{u}_k^{T} R \\mathbf{u}_k\n",
"$$\n",
"\n",
"With $Q = Q^T \\succeq 0$, $R = R^T \\succeq 0$.\n",
"\n",
"It can be shown that the control law that minizes the cost is given by:\n",
"\n",
"$$\n",
"V_k(x) = \\min_{u \\in \\mathbf{U}} \\left[ \\mathbf{x}_k^{T}Q \\mathbf{x}_k + \\mathbf{u}_k^{T} R \\mathbf{u}_k + V_{k+1}(f(x, u)) \\right]\n",
"\\mathbf{u}_k = K \\mathbf{x}_k\n",
"$$\n",
"\n",
"With: $K = -(R + B^T P B)^{-1} B^T P B$\n",
"\n",
"and $P$ is found by solving the discrete time algebraic Riccati equation (DARE):\n",
"\n",
"$$\n",
"V_0(x) = \\min_{u \\in \\mathbf{U}} \\left[ \\mathbf{x}_0^{T}Q \\mathbf{x}_0 + \\mathbf{u}_0^{T} R \\mathbf{u}_0 + V_{k+1}(f(x_0, u_0)) \\right]\n",
"$$\n",
":::"
"P = Q + A^{T}PA-(A^{T}PB)(R+B^{T}PB)^{-1}(B^{T}PA)\n",
"$$"
]
},
{
Expand Down Expand Up @@ -529,7 +537,7 @@
")\n",
":::\n",
"\n",
"#### Simulation\n",
"**Simulation**\n",
"\n",
":::{code-cell} ipython3\n",
"inverted_pendulum_lqr_controller.reset_history()\n",
Expand All @@ -545,7 +553,7 @@
"animate_inverted_pendulum_simulation(inverted_pendulum_lqr_controller.data)\n",
":::\n",
"\n",
"#### Evaluation\n",
"**Evaluation**\n",
"\n",
":::{code-cell} ipython3\n",
"class LQRController:\n",
Expand Down
Loading

0 comments on commit 7801c0f

Please sign in to comment.