diff --git a/notebooks/_static/images/20_optimality_principle_graph.png b/notebooks/_static/images/20_optimality_principle_graph.png new file mode 100644 index 0000000..7e5b19f --- /dev/null +++ b/notebooks/_static/images/20_optimality_principle_graph.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ded82bcfa356a68490cc27c52ff15f29f141a7085f48b32636b499f9850ee06 +size 18523 diff --git a/notebooks/nb_10_introduction.ipynb b/notebooks/nb_10_introduction.ipynb index 6fac03e..0113db2 100644 --- a/notebooks/nb_10_introduction.ipynb +++ b/notebooks/nb_10_introduction.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": 8, "metadata": { "editable": true, "init_cell": true, @@ -23,7 +23,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 9, "metadata": { "init_cell": true, "scene__Default Scene": true, @@ -201,7 +201,7 @@ "" ] }, - "execution_count": 2, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -513,9 +513,22 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "%%html\n", "" @@ -746,7 +759,12 @@ "source": [ "# Optimal Control\n", "\n", - "Optimal control theory is a branch of control theory that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. The fundamental idea in optimal control is to formulate the goal of control as the long-term optimization of a scalar cost function as opposed to formulating the objective as direct constraints on the system's behaviour (e.g. overshoot) as done in classical control.\n", + "Optimal control theory is a branch of control theory that emerged an independent field emerged in the 1950s. It deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. The fundamental idea in optimal control is to formulate the goal of control as the long-term optimization of a scalar cost function as opposed to formulating the objective as direct constraints on the system's behaviour (e.g. overshoot) as done in classical control.\n", + "\n", + "Optimal control is one of the most useful systematic methods for controller design. It has several advantages:\n", + "- It gives a systematic approach to the solution of control problems.\n", + "- There are normally many possible solutions to a control problem. Some are good, others are poor.\n", + " Optimal control reduces this redundancy by selecting a controller that isoptimalbest according to some cost function.\n", "\n", "The optimal control problem is to find a control $u^* \\in \\mathbf{U}$ which causes the system $\\dot{x}(t) = f(x(t), u(t))$ to follow a trajectory $x^* \\in \\mathbf{X}$ that minimizes the cost (performance measure)." ] @@ -802,11 +820,9 @@ "- Infinite Horizon:\n", "\n", " $$\n", - " J(x_0, u) = \\sum \\limits_{k = 0}^{\\infty} \\gamma^k c_k(x_k, u_k)\n", + " J(x_0, u) = \\sum \\limits_{k = 0}^{\\infty} c_k(x_k, u_k)\n", " $$\n", "\n", - " Where $0 < \\gamma \\le 1$ is a discount factor.\n", - "\n", "- Stochastic finite horizon:\n", "\n", " $$\n", @@ -855,10 +871,10 @@ "Optimal control problems solving methods can be classified in three main families:\n", "Dynamic Programming (DP), Indirect Methods based on calculus of variation and Direct Methods.\n", "\n", - "- DP is helpful where the number of states is limited and the dynamics are known. It divides an optimal control issue into smaller subproblems and recursively solves each.\n", + "- DP is helpful where the number of states is limited and the dynamics are known. It divides an optimal control problem into smaller subproblems and recursively solves each one of them.\n", "\n", - "- Indirect methods rely on Pontryagin’s Minimum Principle (PMP) to derive the necessary conditions\n", - " for optimality. This method uses the Hamiltonian of the system to reduce the global optimal control problem\n", + "- Indirect methods rely on Pontryagin’s Minimum Principle (PMP) to derive the necessary conditions for optimality.\n", + " This method uses the Hamiltonian of the system to reduce the global optimal control problem\n", " to the solution of a system of $2N$ equations given in the form of a two-point boundary value problem (BVP). \n", " Problems involving continuous states and control inputs benefit most from it.\n", "\n", diff --git a/notebooks/nb_20_dynamic_programming.ipynb b/notebooks/nb_20_dynamic_programming.ipynb index 2ebbfa1..3c4f4e0 100644 --- a/notebooks/nb_20_dynamic_programming.ipynb +++ b/notebooks/nb_20_dynamic_programming.ipynb @@ -52,6 +52,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "init_cell": true, "scene__Initialization": true, "tags": [ "ActiveScene" @@ -129,23 +130,23 @@ "source": [ "## Dynamic Programming\n", "\n", - "Dynamic programming (DP) is a method that in general solves optimization problems that involve making a sequence of decisions by determining, for each decision, subproblems that can be solved similarily, such that an optimal solution of the original problem can be found from optimal solutions of subproblems. This method is based on Bellman’s Principle of Optimality:\n", + "Dynamic programming (DP) is a method that in general solves optimization problems that involve making a sequence of decisions (multi-stage decision making problems) by determining, for each decision, subproblems that can be solved similarily, such that an optimal solution of the original problem can be found from optimal solutions of subproblems. This method is based on *Bellman’s Principle of Optimality*:\n", "\n", "> An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision." ] }, { "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, + "metadata": {}, "source": [ - ":::{figure} _static/images/20_optimality_principle.png\n", + ":::{figure} _static/images/20_optimality_principle_graph.png\n", ":width: 50%\n", - ":align: center\n", - "Schematic illustration of the principle of optimality. The tail $\\{u_k^∗, \\dots, u_{N-1}^*\\}$ of an optimal sequence $\\{u_0^∗, \\dots, u_{N-1}^*\\}$ is optimal for the tail subproblem that starts at the state $x_k^*$ of the optimal state trajectory.\n", + "Representation of optimality principle on a graph. *Taken from [this blog post](https://int8.io/bellman-equations-reinforcement-learning/)*.\n", + "- **Black** arrows represent sequence of optimal policy actions – the one that is evaluated with the greatest value.\n", + "- **Green** arrow is optimal policy first action (decision) – when applied it yields a subproblem with new initial state. Principle of optimality is\n", + " related to this subproblem optimal policy.\n", + "- **Green** circle represents initial state for a subproblem (the original one or the one induced by applying first action)\n", + "- **Red** circle represents terminal state – assuming our original parametrization it is the maze exit\n", ":::" ] }, @@ -157,6 +158,8 @@ } }, "source": [ + "Usually creativity is required before we can recognize that a particular problem can be cast effectively as a dynamic program and often subtle insights are necessary to restructure the formulation so that it can be solved effectively.\n", + "\n", "Dynamic Programming is a very general solution method for problems which have these properties:\n", "\n", "- Optimal substructure (Principle of optimality applies)\n", @@ -176,7 +179,7 @@ } }, "source": [ - "Dynamic programming is used across a wide variety of domains, e.g.\n", + "It is used across a wide variety of domains, e.g.\n", "\n", "- Scheduling algorithms\n", "- Graph algorithms (e.g., shortest path algorithms)\n", @@ -188,21 +191,41 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Bellman Equation\n", + "## Discrete Time\n", "\n", - "We define the **cost-to-go function** (also known as **optimal value function**) for any feasible $x_0 \\in \\mathbf{X}$ as[^*]:\n", + "We define the **optimla cost-to-go function** (also known as **value function**) for any feasible $\\mathbf{x} \\in \\mathbf{X}$ as[^*]:\n", "\n", "$$\n", - "V(x_0) := \\min_{u \\in \\mathbf{U}} J(x_0, u)\n", + "V(\\mathbf{x}) := \\min_{u \\in \\mathbf{U}} J(\\mathbf{x}, \\mathbf{u})\n", "$$\n", "\n", "[^*]: for the sake of simplicity we will focus on the discrete-time case\n", "\n", - "An admissible control sequence $u^*$ is called optimal, if\n", + "An admissible control sequence $\\mathbf{u}^*$ is called optimal, if\n", "\n", "$$\n", - "V(x_0) = J(x_0, u^*)\n", + "V(\\mathbf{x}) = J(\\mathbf{x}, \\mathbf{u}^*)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Finite Horizon\n", + "\n", + "Given a finite-horizon decision problem:\n", + "\n", "$$\n", + "V(x_0) = \\min_{u \\in \\mathbf{U}} J(x_0, u) = \\min_{u \\in \\mathbf{U}} \\left[ c_N(x_N) + \\sum \\limits_{k = 0}^{N-1} c_k(x_k, u_k) \\right]\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Infinite Horizon\n", "\n", "Given an infinite-horizon decision problem:\n", "\n", @@ -222,61 +245,18 @@ "V(x_0) = \\min_{u \\in \\mathbf{U}} \\left[ c_0(x_0, u_0) + V(x_1) \\right]\n", "$$\n", "\n", - "We obtain what we call the Bellman equation." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "We define the **cost-to-go function** (also known as **optimal value function**) as[^*]:\n", - "\n", - "$$\n", - "V_0(x_0) := \\min_{u \\in \\mathbf{U}} J_0(x_0, u)\n", - "$$\n", - "\n", - "[^*]: for the sake of simplicity we will focus on the discrete-time case\n", - "\n", - "An admissible control sequence $u^*$ is called optimal, if\n", + "If we now replace $\\mathbf{x}_1 = f(\\mathbf{x}_0, \\mathbf{u}_0)$ and then get rid of time indices we finally get:\n", "\n", "$$\n", - "V_0(x_0) = J_0(x_0, u^*)\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "For any feasible $x_0 \\in \\mathbf{X}$ the optimal value function satisfies\n", - "\n", - "$$\n", - "V_k(x_0) = \\displaystyle \\min_{u \\in \\mathbf{U}} \\left[ c_{k}(x_0, u) + V_{k+1}(f(x_0, u)) \\right]\n", + "V(\\mathbf{x}) = \\min_{\\mathbf{u} \\in \\mathbf{U}} \\left[ c(\\mathbf{x}, \\mathbf{u}) + V(f(\\mathbf{x}, \\mathbf{u})) \\right]\n", "$$\n", "\n", - "Moreover, if $u^*$ is an optimal control, then\n", + "This equation is called the *Bellman equation*.\n", "\n", - "$$\n", - "V_0(x_0) = c_{k}(x_0, u) + V_{k+1}(f(x_0, u))\n", - "$$\n", - "\n", - "and\n", + "If we unroll the expression for a few more steps and follow similar steps we can obtain a more general equation:\n", "\n", "$$\n", - "V_0(x_0) = J_0(x_0, u^*)\n", + "V(\\mathbf{x}) = \\min_{\\mathbf{u} \\in \\mathbf{U}} \\left[ c(\\mathbf{x}, \\mathbf{u}) + V(f(\\mathbf{x}, \\mathbf{u})) \\right]\n", "$$" ] }, @@ -290,7 +270,7 @@ "source": [ "### DP Algorithm\n", "\n", - "For every initial state $x_0$, the optimal cost is equal to $V_N(x_0)$, given by the last step of the following algorithm, which proceeds backward in time from stage $N-1$ to stage $0$:\n", + "For every initial state $\\mathbf{x}_0$, the optimal cost is equal to $V_N(x_0)$, given by the last step of the following algorithm, which proceeds backward in time from stage $N-1$ to stage $0$:\n", "\n", "- Start with $V_N(x_N) = g_N(x_N)$\n", "- then for $k = \\{0, \\dots , N - 1\\}$, let:\n", @@ -332,7 +312,7 @@ } }, "source": [ - "## Graph Search" + "### Graph Search" ] }, { @@ -476,9 +456,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Optimal Control as Graph Search\n", + "### Optimal Control as Graph Search\n", "\n", - "We can formulate optimal control as a graph search by either considering a system with discrete states and actions or by discretizing a system with continuous states and actions." + "We can formulate discrete-time optimal control problems as graph search problems by either considering a system with discrete states and actions or by discretizing a system with continuous states and actions." ] }, { @@ -490,15 +470,6 @@ "````" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - ":::{figure} _static/images/20_constrained_motion.png\n", - ":width: 60%\n", - ":::" - ] - }, { "cell_type": "code", "execution_count": null, @@ -538,24 +509,6 @@ "plot_grid_graph(G, show_start_to_target_paths=False)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - ":::{exercise}\n", - "How many possible paths from start position to target position are there?\n", - ":::" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "plot_grid_graph(G, show_start_to_target_paths=True)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -567,6 +520,7 @@ "\n", "Use Dynamic Programming to solve this problem:\n", "\n", + "- Determine the number of possible paths from start position to target position.\n", "- Compute the optimal cost-to-go for each state.\n", "- Determine the optimal plan using the computed optimal cost-to-go.\n", "- Implement the plan in the environment.\n", @@ -628,6 +582,12 @@ ")\n", "```\n", "\n", + "To determine the number of paths from start node to target node we use:\n", + "\n", + "```{code-cell}\n", + "len(list(nx.all_simple_paths(G.to_undirected(), source=G.start_node, target=G.target_node, cutoff=len(G))))\n", + "```\n", + "\n", "After that, to plot all paths from start to end we use:\n", "\n", "```{code-cell} python3\n", @@ -660,16 +620,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Continous Optimal Control" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Hamilton-Jacobi-Bellman Equation\n", + "## Continous Time\n", "\n", - "Let's consider the continous optimal control problem with finite horizon over the time period $[t_0 ,t_f]$.\n", + "Let's consider a continuous-time optimal control problem with finite horizon over the time period $[t_0 ,t_f]$.\n", "\n", "The system's dynamics is given by:\n", "\n", @@ -677,18 +630,45 @@ "\\dot{\\mathbf{x}}(t) = f(\\mathbf{x}(t), \\mathbf{u}(t))\n", "$$\n", "\n", - "The cost function is given by:\n", + "With the initial state $\\mathbf{x}(t_0) = \\mathbf{x}_0$\n", + "\n", + "The cost-to-go is given by:\n", + "\n", + "$$\n", + "J(\\mathbf{x}(t), \\mathbf{u}(t), t_0, t_f) = c_f(\\mathbf{x}(t_f), t_f) + \\int\\limits_{t_0}^{t_f} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau\n", + "$$\n", + "\n", + "The optimal cost-to-go is given by:\n", "\n", "$$\n", - "J(\\mathbf{x}(t), \\mathbf{u}(t), t_0, t_f) = c(\\mathbf{x}(t_f), t_f) + \\int\\limits_{t_0}^{t_f} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau\n", + "\\displaystyle V(\\mathbf{x}(t), t_0, t_f) = \\underset{\\mathbf{u(t)}}{min} \\left[ J(\\mathbf{x}(t), \\mathbf{u}(t), t_0, t_f) \\right]\n", "$$\n", "\n", - "The optimal cost-to-go function is given by:\n", + "It can be show that for every $s, \\tau \\in [t_0, t_1]$, $s \\leq \\tau$ , and $\\mathbf{x} \\in \\mathbf{X}$, we have:\n", "\n", "$$\n", - "\\displaystyle V(x(t), t_0, t_f) = \\underset{\\mathbf{u(t)}}{min} \\left[ J(\\mathbf{x}(t), \\mathbf{u}(t), t_0, t_f) \\right]\n", + "V(s, \\mathbf{x}) = \\underset{}{\\min} \\left[ \\int\\limits_{s}^{\\tau} c(\\mathbf{x}(t), \\mathbf{u}(t)) d\\tau + V(\\tau, \\mathbf{x}(\\tau)) \\right]\n", "$$\n", "\n", + "Which is another version of the Bellman equation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Hamilton-Jacobi-Bellman Equation\n", + "\n", + "The Hamilton-Jacobi-Bellman (HJB) equation is the following:\n", + "\n", + "$$\n", + "- \\frac{\\partial V}{\\partial t} = \\underset{\\mathbf{u(t)}}{min} \\left[ \\left( \\frac{\\partial V}{\\partial \\mathbf{x}} \\right)^T f(\\mathbf{x}(t), \\mathbf{u}(t)) + c(\\mathbf{x}(t), \\mathbf{u}(t)) \\right]\n", + "$$\n", + "\n", + "\n", + ":::{note} Mathematical Derivation\n", + ":class: dropdown\n", + "\n", "It can be shown that the optimal condition is also satisfied in this case:\n", "\n", "$$\n", @@ -717,7 +697,17 @@ "- \\frac{\\partial V}{\\partial t} = \\underset{\\mathbf{u(t)}}{min} \\left[ \\left( \\frac{\\partial V}{\\partial \\mathbf{x}} \\right)^T f(\\mathbf{x}(t), \\mathbf{u}(t)) + c(\\mathbf{x}(t), \\mathbf{u}(t)) \\right]\n", "$$\n", "\n", - "This is called the Hamilton-Jacobi-Bellman (HJB) equation." + "This is called the Hamilton-Jacobi-Bellman (HJB) equation.\n", + "\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Numerical Methods for Continuous-time Optimal Control\n", + "\n" ] }, { diff --git a/notebooks/nb_30_systems.ipynb b/notebooks/nb_30_systems.ipynb index c1590bb..3204a83 100644 --- a/notebooks/nb_30_systems.ipynb +++ b/notebooks/nb_30_systems.ipynb @@ -781,13 +781,6 @@ "source": [ "animate_full_inverted_pendulum_simulation(inverted_pendulum_nonlinear_simulator.data)" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { diff --git a/notebooks/nb_40_LQR.ipynb b/notebooks/nb_40_LQR.ipynb index dae245c..e759892 100644 --- a/notebooks/nb_40_LQR.ipynb +++ b/notebooks/nb_40_LQR.ipynb @@ -479,7 +479,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Your Solution Here" + "# Your solution here" ] }, { @@ -497,7 +497,7 @@ "\n", "We first need the environment, linear model and simulator:\n", "\n", - ":::{code-cell}\n", + ":::{code-cell} ipython3\n", "inverted_pendulum_env = create_inverted_pendulum_environment(max_steps=200)\n", "inverted_pendulum_model = build_inverted_pendulum_linear_model(inverted_pendulum_env)\n", "inverted_pendulum_simulator = Simulator(inverted_pendulum_model)\n", @@ -507,7 +507,7 @@ "\n", "The goal is to keep the inverted pendulum upright. For that we define the following objective matrices and setpoint:\n", "\n", - ":::{code-cell} python3\n", + ":::{code-cell} ipython3\n", "Q = np.diag([100, 1])\n", "R = np.diag([10])\n", "setpoint = np.zeros((2, 1))\n", @@ -518,7 +518,7 @@ "\n", "We then create the controller:\n", "\n", - ":::{code-cell}\n", + ":::{code-cell} ipython3\n", "inverted_pendulum_lqr_controller = build_lqr_controller(\n", " model=inverted_pendulum_model,\n", " t_step=inverted_pendulum_env.dt,\n", @@ -531,7 +531,7 @@ "\n", "#### Simulation\n", "\n", - ":::{code-cell}\n", + ":::{code-cell} ipython3\n", "inverted_pendulum_lqr_controller.reset_history()\n", "inverted_pendulum_simulator.reset_history()\n", "x0 = np.zeros((2, 1))\n", @@ -542,27 +542,27 @@ " u0 = inverted_pendulum_lqr_controller.make_step(x0)\n", " x0 = inverted_pendulum_simulator.make_step(u0)\n", "\n", - "animate_inverted_pendulum_simulation(lqr.data)\n", + "animate_inverted_pendulum_simulation(inverted_pendulum_lqr_controller.data)\n", ":::\n", "\n", "#### Evaluation\n", "\n", - ":::{code-cell}\n", + ":::{code-cell} ipython3\n", "class LQRController:\n", " def __init__(self, K: K) -> None:\n", " self.K = K\n", "\n", " def act(self, observation: NDArray) -> NDArray:\n", - " return self.K @ observation[[1, 3]]\n", + " return self.K @ observation[[2, 3]]\n", ":::\n", "\n", - ":::{code-cell}\n", + ":::{code-cell} ipython3\n", "lqr_controller = LQRController(inverted_pendulum_lqr_controller.K)\n", "results = simulate_environment(inverted_pendulum_env, max_steps=200, controller=lqr_controller)\n", "show_video(results.frames, fps=1 / inverted_pendulum_env.dt)\n", ":::\n", "\n", - ":::{code-cell}\n", + ":::{code-cell} ipython3\n", "plt.close()\n", "T = np.arange(len(results.observations)) * inverted_pendulum_env.dt\n", "plot_inverted_pendulum_results(T=T, observations=results.observations, actions=results.actions, reference=np.inf);\n", @@ -570,6 +570,192 @@ "\n", "::::" ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Iterative Linear Quadratic Regulator\n", + "\n", + "Iterative Linear Quadratic Regulator (iLQR) is an extension of LQR control to non-linear system with non-linear quadratic costs.\n", + "\n", + "The idea is to approximate the cost and dynamics as quadratic and affine, respectively, then exactly solve the resulting LQR problem." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "The basic flow of the algorithm is:\n", + "\n", + "1. Initialize with initial state $\\mathbf{x}_0$ and initial control sequence $\\mathbf{U} = \\{\\mathbf{u}_{0}, \\mathbf{u}_{1}, \\dots, \\mathbf{u}_{N-1}\\}$.\n", + "2. Do a forward pass using the non-linear dynamics, i.e. simulate the system using $(\\mathbf{x}_0, \\mathbf{U})$ to get the trajectory through state space, $\\mathbf{X}$, that results from applying the control sequence $\\mathbf{X}$ starting in $\\mathbf{x}_0$.\n", + "3. Do a backward pass, estimate the value function and dynamics for each $(\\mathbf{x}, \\mathbf{u})$ in the state-space and control signal trajectories.\n", + "4. Calculate an updated control signal $\\hat{\\mathbf{U}}$ and evaluate cost of trajectory resulting from $(\\mathbf{x}_0, \\hat{\\mathbf{U}})$.\n", + " \n", + " 1. If $|(\\textrm{cost}(x_0, \\hat{\\textbf{U}}) - \\textrm{cost}(x_0, \\textbf{U})| < \\textrm{threshold},$ then we've converged and exit.\n", + " 2. If $\\textrm{cost}(x_0, \\hat{\\textbf{U}}) < \\textrm{cost}(x_0, \\textbf{U}),$ then set $\\textbf{U} = \\hat{\\textbf{U}},$ and change the update size to be more aggressive. Go back to step 2.\n", + " 3. If $\\textrm{cost}(x_0, \\hat{\\textbf{U}}) \\geq \\textrm{cost}(x_0, \\textbf{U}),$ change the update size to be more modest. Go back to step 3." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + ":::{note} Mathematical Derivation\n", + ":class: dropdown\n", + "\n", + "Given a discrete-time non-linear system with state-space representation: \n", + "\n", + "$$\n", + "\\mathbf{x}_{t+1} = f(\\mathbf{x}_t, \\mathbf{u}_t)\n", + "$$\n", + "\n", + "we approximate it with a first-order Taylor-series expansion about nominal trajectories\n", + "$\\mathbf{X} = \\{\\mathbf{x}_0, \\dots, \\mathbf{x}_N\\}, \\mathbf{U} = \\{ \\mathbf{u}_0, \\dots, \\mathbf{u}_N \\}$:\n", + "\n", + "$$\n", + "\\begin{array}{lll}\n", + "\\mathbf{x}_{t+1} + \\delta\\mathbf{x}_{t+1} &= f(\\mathbf{x}_t + \\delta \\mathbf{x}_t, \\mathbf{u}_t + \\delta \\mathbf{u}_t) \\\\ \n", + "& \\approx f(\\mathbf{x}_t, \\mathbf{u}_t)\n", + "+ \\frac{\\partial f}{\\partial \\mathbf{x}}|_{\\mathbf{x}_t, \\mathbf{u}_t}\\delta\\mathbf{x}_t\n", + "+ \\frac{\\partial f}{\\partial \\mathbf{u}}|_{\\mathbf{x}_t, \\mathbf{u}_t}\\delta\\mathbf{u}_t\\\\\n", + "\\delta\\mathbf{x}_{t+1} &= A_t\\delta\\mathbf{x}_t + B_t\\delta\\mathbf{u}_t\n", + "\\end{array}\n", + "$$\n", + "\n", + "with\n", + "$A_t = A(\\mathbf{x}_t, \\mathbf{u}_t) = \\frac{\\partial f}{\\partial \\mathbf{x}}|_{\\mathbf{x}_t, \\mathbf{u}_t}$ and\n", + "$B_t = B(\\mathbf{x}_t, \\mathbf{u}_t) = \\frac{\\partial f}{\\partial \\mathbf{u}}|_{\\mathbf{x}_t, \\mathbf{u}_t}$\n", + "\n", + "Given a general cost function that is not linear-quadratic, we use a second-order Taylor Series Expansion to linearize the dynamics into a form common for optimal control problems:\n", + "\n", + "$$\n", + "\\begin{array}{lll}\n", + "J_N(\\mathbf{x}_0, \\mathbf{U}) &=\n", + "l_f(\\mathbf{x}_N) + \\sum \\limits_{t=1}^{N-1} l(\\mathbf{x}_t, \\mathbf{u}_t)\\\\\n", + "&\\approx \\frac{1}{2} \\mathbf{x}^T_N Q \\mathbf{x}_N + q_N^T \\mathbf{x}_N\n", + "+ \\sum \\limits_{t=1}^{N-1}\n", + "\\frac{1}{2} \\mathbf{x}^T_t Q_t \\mathbf{x}_t\n", + "+ \\frac{1}{2} \\mathbf{u}^T_t R_t \\mathbf{u}_t\n", + "+ \\frac{1}{2} \\mathbf{x}^T_t H_t \\mathbf{u}_t\n", + "+ \\frac{1}{2} \\mathbf{u}^T_t H_t^T \\mathbf{x}_t\n", + "+ q_t^T \\mathbf{x}_t + + r_t^T \\mathbf{u}_t\n", + "\\end{array}\n", + "$$\n", + "\n", + "We define the following variables for convenience:\n", + "\n", + "$$\n", + "\\begin{array}{lll}\n", + "l_x &:= \\frac{\\partial l}{\\partial \\mathbf{x}}|_{\\mathbf{x}_t, \\mathbf{u}_t} = Q_t \\mathbf{x}_t + q_t\n", + "\\\\\n", + "l_u &:= \\frac{\\partial l}{\\partial \\mathbf{u}}|_{\\mathbf{x}_t, \\mathbf{u}_t} = R_t \\mathbf{u}_t + r_t\n", + "\\\\\n", + "l_{xx} &:= \\frac{\\partial^2 l}{\\partial \\mathbf{x}^2}|_{\\mathbf{x}_t, \\mathbf{u}_t} = Q_t\n", + "\\\\\n", + "l_{uu} &:= \\frac{\\partial^2 l}{\\partial \\mathbf{u}^2}|_{\\mathbf{x}_t, \\mathbf{u}_t} = R_t\n", + "\\\\\n", + "l_{xu} &:= \\frac{\\partial^2 l}{\\partial \\mathbf{x}\\mathbf{u}}|_{\\mathbf{x}_t, \\mathbf{u}_t} = H_t\n", + "\\\\\n", + "l_{ux} &:= \\frac{\\partial^2 l}{\\partial \\mathbf{u}\\mathbf{x}}|_{\\mathbf{x}_t, \\mathbf{u}_t} = H_t^T\n", + "\\end{array}\n", + "$$\n", + "\n", + "We can apply Bellman's Principle of Optimality to define the optimal cost-to-go:\n", + "\n", + "$$\n", + "\\begin{array}{lll}\n", + "V_N(\\mathbf{x}_N) &= g_f(\\mathbf{x}_N)\\\\\n", + "V_{t}(\\mathbf{x}_t) &= \\displaystyle \\min_u {g(\\mathbf{x}_{t}, \\mathbf{u}_{t}) + V_{t+1}(f(\\mathbf{x}_{t}, \\mathbf{u}_{N-t}))}\\\\\n", + "V_{t}(\\mathbf{x}_t) &= \\displaystyle \\min_u Q_{t}(\\mathbf{x}_{t}, \\mathbf{u}_{t})\n", + "\\end{array}\\\\\n", + "$$\n", + "\n", + "The $Q$-function is the discrete-time analogue of the Hamiltonian, sometimes known as the pseudo-Hamiltonian.\n", + "\n", + "We approximate the cost-to-go function as locally quadratic near the nominal trajectory gives us (we drop the time index in some of the equations for the sake of readability):\n", + "\n", + "$$\n", + "\\delta V(\\mathbf{x}) = s^T \\delta\\mathbf{x} + \\frac{1}{2} \\delta\\mathbf{x}^T S \\delta\\mathbf{x}\n", + "$$\n", + "\n", + "with \n", + "$s = \\frac{\\partial V}{\\partial \\mathbf{x}}|_{\\mathbf{x}},\n", + "S = \\frac{\\partial^2 V}{\\partial \\mathbf{x}^2}|_{\\mathbf{x}}$\n", + "\n", + "Similarily:\n", + "\n", + "$$\n", + "\\delta Q(\\mathbf{x}, \\mathbf{u}) = \n", + "\\frac{1}{2}\n", + "\\begin{bmatrix} \\delta\\mathbf{x} \\\\ \\delta\\mathbf{u} \\end{bmatrix}^T\n", + "\\begin{bmatrix} Q_{xx} & Q_{xu} \\\\ Q_{ux} & Q_{uu} \\end{bmatrix}\n", + "\\begin{bmatrix} \\delta\\mathbf{x} \\\\ \\delta\\mathbf{u} \\end{bmatrix}\n", + "+\n", + "\\begin{bmatrix} Q_{x} \\\\ Q_{u} \\end{bmatrix}^T\n", + "\\begin{bmatrix} \\delta\\mathbf{x} \\\\ \\delta\\mathbf{u} \\end{bmatrix}\n", + "$$\n", + "\n", + "with:\n", + "\n", + "$$\n", + "\\begin{array}{lll}\n", + "Q_x &= l_x + s_{t+1} A_t \\\\\n", + "Q_u &= l_u + s_{t+1} B_t \\\\\n", + "Q_{xx} &= l_{xx} + A_t^T S_{t+1} A_t \\\\\n", + "Q_{uu} &= l_{uu} + B_t^T S_{t+1} B_t \\\\\n", + "Q_{ux} &= l_{ux} + B_t^T S_{t+1} A_t\n", + "\\end{array}\n", + "$$\n", + "\n", + "The optimal control modification $\\delta\\mathbf{u}^∗$ for some state perturbation $\\delta\\mathbf{x}$, is obtained by minimizing the quadratic model:\n", + "\n", + "$$\n", + "\\delta\\mathbf{u}^∗(\\delta\\mathbf{x}) = \\displaystyle\\arg\\min_{\\delta\\mathbf{u}} Q(\\delta\\mathbf{x}, \\delta\\mathbf{u}) = k + K\\delta\\mathbf{x}\n", + "$$\n", + "\n", + "This is a locally-linear feedback policy with:\n", + "\n", + "$$\n", + "k := -Q_{uu}^{-1}Q_u\\\\\n", + "K := -Q_{uu}^{-1}Q_{ux}\n", + "$$\n", + "\n", + "Plugging this back into the expansion of $Q$, a quadratic model of $V$ is obtained. After simplification it is:\n", + "\n", + "$$\n", + "\\begin{array}{lll}\n", + "\\Delta V &= \\frac{1}{2} k^T Q_{uu} k + k^T Q_u\\\\\n", + "s &= Q_x + K^T Q_{uu} k + K^T Q_u + Q_{ux}^T k\\\\\n", + "S &= Q_{xx} + K^T Q_{uu} K + K^T Q_{ux} + Q_{ux}^T K\n", + "\\end{array}\n", + "$$\n", + "\n", + "Once these terms are computed, a forward pass computes a new trajectory:\n", + "\n", + "$$\n", + "\\begin{array}{lll}\n", + "\\hat{x}_0 &= x_0\\\\\n", + "\\hat{u}_t &= u_t + k_t + K_t(\\hat{x}_t - x_t)\\\\\n", + "\\hat{x}_{t+1} &= f(\\hat{x}_t, \\hat{u}_t)\n", + "\\end{array}\n", + "$$\n", + "\n", + ":::" + ] } ], "metadata": { diff --git a/notebooks/nb_50_MPC.ipynb b/notebooks/nb_50_MPC.ipynb index c1771a1..a635bf5 100644 --- a/notebooks/nb_50_MPC.ipynb +++ b/notebooks/nb_50_MPC.ipynb @@ -89,6 +89,7 @@ "import numpy as np\n", "import seaborn as sns\n", "from do_mpc.controller import MPC\n", + "from do_mpc.simulator import Simulator\n", "from do_mpc.model import Model\n", "from numpy.typing import NDArray\n", "\n", @@ -96,6 +97,11 @@ " build_mpc_controller,\n", ")\n", "\n", + "from training_ml_control.plots import (\n", + " plot_cart_results,\n", + " plot_inverted_pendulum_results,\n", + ")\n", + "\n", "from training_ml_control.environments import (\n", " create_cart_environment,\n", " create_inverted_pendulum_environment,\n", @@ -112,6 +118,7 @@ "from training_ml_control.model import (\n", " build_cart_model,\n", " build_inverted_pendulum_linear_model,\n", + " build_inverted_pendulum_nonlinear_model,\n", ")\n", "\n", "from training_ml_control.nb_utils import (\n", @@ -186,7 +193,7 @@ } }, "source": [ - "Model Predictive Control (MPC) is a control algorithm based on solving an **on-line** optimal control problem. A receding horizon approach is used which can be summarized in the following steps:\n", + "Model Predictive Control (**MPC**), also referred to as *Receding Horizon Control and Moving Horizon Optimal Control*, is a control algorithm based on solving an **on-line** optimal control problem. A receding horizon approach is used which can be summarized in the following steps:\n", "\n", "1. At time $t$ and for the current state $x_t$, solve, on-line, an open-loop optimal control problem over some future interval taking account the current and future constraints.\n", "2. Apply the first step in the optimal control sequence.\n", @@ -202,6 +209,8 @@ }, "source": [ ":::{figure} _static/images/40_model_predictive_control_horizon.svg\n", + ":width: 50%\n", + "\n", "Illustration of the problem solved by MPC at state $x_k$.\n", "We minimize the cost function over the next $l$ stages.\n", "We then apply the control of the optimizing sequence up to the control horizon.\n", @@ -218,6 +227,7 @@ }, "source": [ ":::{figure} _static/images/40_mpc_block_diagram.svg\n", + ":width: 60%\n", "MPC Block Diagram.\n", ":::" ] @@ -238,189 +248,26 @@ ] }, { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "## Iterative Linear Quadratic Regulator\n", - "\n", - "Iterative Linear Quadratic Regulator (iLQR) is an extension of LQR control to non-linear system with non-linear quadratic costs.\n", - "\n", - "The idea is to approximate the cost and dynamics as quadratic and affine, respectively, then exactly solve the resulting LQR problem." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ - "The basic flow of the algorithm is:\n", - "\n", - "1. Initialize with initial state $\\mathbf{x}_0$ and initial control sequence $\\mathbf{U} = \\{\\mathbf{u}_{0}, \\mathbf{u}_{1}, \\dots, \\mathbf{u}_{N-1}\\}$.\n", - "2. Do a forward pass using the non-linear dynamics, i.e. simulate the system using $(\\mathbf{x}_0, \\mathbf{U})$ to get the trajectory through state space, $\\mathbf{X}$, that results from applying the control sequence $\\mathbf{X}$ starting in $\\mathbf{x}_0$.\n", - "3. Do a backward pass, estimate the value function and dynamics for each $(\\mathbf{x}, \\mathbf{u})$ in the state-space and control signal trajectories.\n", - "4. Calculate an updated control signal $\\hat{\\mathbf{U}}$ and evaluate cost of trajectory resulting from $(\\mathbf{x}_0, \\hat{\\mathbf{U}})$.\n", - " \n", - " 1. If $|(\\textrm{cost}(x_0, \\hat{\\textbf{U}}) - \\textrm{cost}(x_0, \\textbf{U})| < \\textrm{threshold},$ then we've converged and exit.\n", - " 2. If $\\textrm{cost}(x_0, \\hat{\\textbf{U}}) < \\textrm{cost}(x_0, \\textbf{U}),$ then set $\\textbf{U} = \\hat{\\textbf{U}},$ and change the update size to be more aggressive. Go back to step 2.\n", - " 3. If $\\textrm{cost}(x_0, \\hat{\\textbf{U}}) \\geq \\textrm{cost}(x_0, \\textbf{U}),$ change the update size to be more modest. Go back to step 3." + "?? build_mpc_controller" ] }, { "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, + "metadata": {}, "source": [ - ":::{note} Mathematical Derivation\n", - ":class: dropdown\n", - "\n", - "Given a discrete-time non-linear system with state-space representation: \n", - "\n", - "$$\n", - "\\mathbf{x}_{t+1} = f(\\mathbf{x}_t, \\mathbf{u}_t)\n", - "$$\n", - "\n", - "we approximate it with a first-order Taylor-series expansion about nominal trajectories\n", - "$\\mathbf{X} = \\{\\mathbf{x}_0, \\dots, \\mathbf{x}_N\\}, \\mathbf{U} = \\{ \\mathbf{u}_0, \\dots, \\mathbf{u}_N \\}$:\n", - "\n", - "$$\n", - "\\begin{array}{lll}\n", - "\\mathbf{x}_{t+1} + \\delta\\mathbf{x}_{t+1} &= f(\\mathbf{x}_t + \\delta \\mathbf{x}_t, \\mathbf{u}_t + \\delta \\mathbf{u}_t) \\\\ \n", - "& \\approx f(\\mathbf{x}_t, \\mathbf{u}_t)\n", - "+ \\frac{\\partial f}{\\partial \\mathbf{x}}|_{\\mathbf{x}_t, \\mathbf{u}_t}\\delta\\mathbf{x}_t\n", - "+ \\frac{\\partial f}{\\partial \\mathbf{u}}|_{\\mathbf{x}_t, \\mathbf{u}_t}\\delta\\mathbf{u}_t\\\\\n", - "\\delta\\mathbf{x}_{t+1} &= A_t\\delta\\mathbf{x}_t + B_t\\delta\\mathbf{u}_t\n", - "\\end{array}\n", - "$$\n", - "\n", - "with\n", - "$A_t = A(\\mathbf{x}_t, \\mathbf{u}_t) = \\frac{\\partial f}{\\partial \\mathbf{x}}|_{\\mathbf{x}_t, \\mathbf{u}_t}$ and\n", - "$B_t = B(\\mathbf{x}_t, \\mathbf{u}_t) = \\frac{\\partial f}{\\partial \\mathbf{u}}|_{\\mathbf{x}_t, \\mathbf{u}_t}$\n", - "\n", - "Given a general cost function that is not linear-quadratic, we use a second-order Taylor Series Expansion to linearize the dynamics into a form common for optimal control problems:\n", - "\n", - "$$\n", - "\\begin{array}{lll}\n", - "J_N(\\mathbf{x}_0, \\mathbf{U}) &=\n", - "l_f(\\mathbf{x}_N) + \\sum \\limits_{t=1}^{N-1} l(\\mathbf{x}_t, \\mathbf{u}_t)\\\\\n", - "&\\approx \\frac{1}{2} \\mathbf{x}^T_N Q \\mathbf{x}_N + q_N^T \\mathbf{x}_N\n", - "+ \\sum \\limits_{t=1}^{N-1}\n", - "\\frac{1}{2} \\mathbf{x}^T_t Q_t \\mathbf{x}_t\n", - "+ \\frac{1}{2} \\mathbf{u}^T_t R_t \\mathbf{u}_t\n", - "+ \\frac{1}{2} \\mathbf{x}^T_t H_t \\mathbf{u}_t\n", - "+ \\frac{1}{2} \\mathbf{u}^T_t H_t^T \\mathbf{x}_t\n", - "+ q_t^T \\mathbf{x}_t + + r_t^T \\mathbf{u}_t\n", - "\\end{array}\n", - "$$\n", - "\n", - "We define the following variables for convenience:\n", - "\n", - "$$\n", - "\\begin{array}{lll}\n", - "l_x &:= \\frac{\\partial l}{\\partial \\mathbf{x}}|_{\\mathbf{x}_t, \\mathbf{u}_t} = Q_t \\mathbf{x}_t + q_t\n", - "\\\\\n", - "l_u &:= \\frac{\\partial l}{\\partial \\mathbf{u}}|_{\\mathbf{x}_t, \\mathbf{u}_t} = R_t \\mathbf{u}_t + r_t\n", - "\\\\\n", - "l_{xx} &:= \\frac{\\partial^2 l}{\\partial \\mathbf{x}^2}|_{\\mathbf{x}_t, \\mathbf{u}_t} = Q_t\n", - "\\\\\n", - "l_{uu} &:= \\frac{\\partial^2 l}{\\partial \\mathbf{u}^2}|_{\\mathbf{x}_t, \\mathbf{u}_t} = R_t\n", - "\\\\\n", - "l_{xu} &:= \\frac{\\partial^2 l}{\\partial \\mathbf{x}\\mathbf{u}}|_{\\mathbf{x}_t, \\mathbf{u}_t} = H_t\n", - "\\\\\n", - "l_{ux} &:= \\frac{\\partial^2 l}{\\partial \\mathbf{u}\\mathbf{x}}|_{\\mathbf{x}_t, \\mathbf{u}_t} = H_t^T\n", - "\\end{array}\n", - "$$\n", - "\n", - "We can apply Bellman's Principle of Optimality to define the optimal cost-to-go:\n", - "\n", - "$$\n", - "\\begin{array}{lll}\n", - "V_N(\\mathbf{x}_N) &= g_f(\\mathbf{x}_N)\\\\\n", - "V_{t}(\\mathbf{x}_t) &= \\displaystyle \\min_u {g(\\mathbf{x}_{t}, \\mathbf{u}_{t}) + V_{t+1}(f(\\mathbf{x}_{t}, \\mathbf{u}_{N-t}))}\\\\\n", - "V_{t}(\\mathbf{x}_t) &= \\displaystyle \\min_u Q_{t}(\\mathbf{x}_{t}, \\mathbf{u}_{t})\n", - "\\end{array}\\\\\n", - "$$\n", - "\n", - "The $Q$-function is the discrete-time analogue of the Hamiltonian, sometimes known as the pseudo-Hamiltonian.\n", - "\n", - "We approximate the cost-to-go function as locally quadratic near the nominal trajectory gives us (we drop the time index in some of the equations for the sake of readability):\n", - "\n", - "$$\n", - "\\delta V(\\mathbf{x}) = s^T \\delta\\mathbf{x} + \\frac{1}{2} \\delta\\mathbf{x}^T S \\delta\\mathbf{x}\n", - "$$\n", - "\n", - "with \n", - "$s = \\frac{\\partial V}{\\partial \\mathbf{x}}|_{\\mathbf{x}},\n", - "S = \\frac{\\partial^2 V}{\\partial \\mathbf{x}^2}|_{\\mathbf{x}}$\n", - "\n", - "Similarily:\n", - "\n", - "$$\n", - "\\delta Q(\\mathbf{x}, \\mathbf{u}) = \n", - "\\frac{1}{2}\n", - "\\begin{bmatrix} \\delta\\mathbf{x} \\\\ \\delta\\mathbf{u} \\end{bmatrix}^T\n", - "\\begin{bmatrix} Q_{xx} & Q_{xu} \\\\ Q_{ux} & Q_{uu} \\end{bmatrix}\n", - "\\begin{bmatrix} \\delta\\mathbf{x} \\\\ \\delta\\mathbf{u} \\end{bmatrix}\n", - "+\n", - "\\begin{bmatrix} Q_{x} \\\\ Q_{u} \\end{bmatrix}^T\n", - "\\begin{bmatrix} \\delta\\mathbf{x} \\\\ \\delta\\mathbf{u} \\end{bmatrix}\n", - "$$\n", - "\n", - "with:\n", - "\n", - "$$\n", - "\\begin{array}{lll}\n", - "Q_x &= l_x + s_{t+1} A_t \\\\\n", - "Q_u &= l_u + s_{t+1} B_t \\\\\n", - "Q_{xx} &= l_{xx} + A_t^T S_{t+1} A_t \\\\\n", - "Q_{uu} &= l_{uu} + B_t^T S_{t+1} B_t \\\\\n", - "Q_{ux} &= l_{ux} + B_t^T S_{t+1} A_t\n", - "\\end{array}\n", - "$$\n", - "\n", - "The optimal control modification $\\delta\\mathbf{u}^∗$ for some state perturbation $\\delta\\mathbf{x}$, is obtained by minimizing the quadratic model:\n", - "\n", - "$$\n", - "\\delta\\mathbf{u}^∗(\\delta\\mathbf{x}) = \\displaystyle\\arg\\min_{\\delta\\mathbf{u}} Q(\\delta\\mathbf{x}, \\delta\\mathbf{u}) = k + K\\delta\\mathbf{x}\n", - "$$\n", - "\n", - "This is a locally-linear feedback policy with:\n", - "\n", - "$$\n", - "k := -Q_{uu}^{-1}Q_u\\\\\n", - "K := -Q_{uu}^{-1}Q_{ux}\n", - "$$\n", - "\n", - "Plugging this back into the expansion of $Q$, a quadratic model of $V$ is obtained. After simplification it is:\n", - "\n", - "$$\n", - "\\begin{array}{lll}\n", - "\\Delta V &= \\frac{1}{2} k^T Q_{uu} k + k^T Q_u\\\\\n", - "s &= Q_x + K^T Q_{uu} k + K^T Q_u + Q_{ux}^T k\\\\\n", - "S &= Q_{xx} + K^T Q_{uu} K + K^T Q_{ux} + Q_{ux}^T K\n", - "\\end{array}\n", - "$$\n", - "\n", - "Once these terms are computed, a forward pass computes a new trajectory:\n", - "\n", - "$$\n", - "\\begin{array}{lll}\n", - "\\hat{x}_0 &= x_0\\\\\n", - "\\hat{u}_t &= u_t + k_t + K_t(\\hat{x}_t - x_t)\\\\\n", - "\\hat{x}_{t+1} &= f(\\hat{x}_t, \\hat{u}_t)\n", - "\\end{array}\n", - "$$\n", + "`build_mpc_controller` does the following:\n", "\n", - ":::" + "- First, it creates an instance of the MPC class is generated with the passeed model.\n", + "- It configures this object with parameters such as the time step and the horizon as well as parameters that are specific to the solver we are using.\n", + "- It then sets the control objective using the passed terminal and stage costs.\n", + "- It also restricts the input force by adding an optional penalty.\n", + "- After that, it sets upper and lower limits for the state and force.\n", + "- Finally, it sets everything up and returns the controller object." ] }, { @@ -431,26 +278,17 @@ } }, "source": [ - "[do-mpc](https://www.do-mpc.com/en/latest/) uses [CasADi](https://web.casadi.org/python-api/) (an open-source tool for nonlinear optimization and algorithmic differentiation) for the modeling part and for the different cost functions. Here's a table of useful operators:\n", + "[do-mpc](https://www.do-mpc.com/en/latest/), the package we're using in this training, uses [CasADi](https://web.casadi.org/python-api/) (an open-source tool for nonlinear optimization and algorithmic differentiation) for the modeling part and for the different cost functions. Here's a table of useful operators:\n", "\n", "| Operator | Description |Equation |\n", "| --- | --- | --- |\n", "| [casadi.sumsqr(x)](https://web.casadi.org/python-api/#casadi.casadi.sumsqr) | Squared-sum | $\\sum x_i^2$ |\n", "| [casadi.norm_2(x)](https://web.casadi.org/python-api/#casadi.casadi.norm_2) | $L_2$-norm | $\\sqrt{\\sum x_i^2}$ |\n", - "| [casadi.norm_1(x)](https://web.casadi.org/python-api/#casadi.casadi.norm_1) | $L_1$-norm | $\\sum |x_i|$ |\n", + "| [casadi.norm_1(x)](https://web.casadi.org/python-api/#casadi.casadi.norm_1) | $L_1$-norm | $\\sum|x_i|$ |\n", "| [casadi.bilin(A, x)](https://web.casadi.org/python-api/#casadi.casadi.bilin) | Quadratic Form | $x^T A x$ |\n", "| [casadi.bilin(A, x, y)](https://web.casadi.org/python-api/#casadi.casadi.bilin) | Bilinear Form | $x^T A y$ |" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "?? build_mpc_controller" - ] - }, { "cell_type": "markdown", "metadata": { @@ -459,33 +297,9 @@ } }, "source": [ - "#### Controller\n", + "## Cart\n", "\n", - "First, we create an instance of the MPC class is generated with the Mass-Spring-Damper prediction model defined above." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], - "source": [ - "mpc = do_mpc.controller.MPC(mass_spring_damper)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "We choose the finite prediction horizon `n_horizon = 20`, the time step `t_step = 0.04s` to be the same as the environment's time step. We also set the parameters of the applied discretization scheme orthogonal collocation." + "Now, we design an MPC controller for the Cart system." ] }, { @@ -498,19 +312,11 @@ }, "outputs": [], "source": [ - "env = create_mass_spring_damper_environment()\n", - "mpc_params = {\n", - " \"n_horizon\": 20,\n", - " \"t_step\": env.dt,\n", - " \"state_discretization\": \"collocation\",\n", - " \"collocation_type\": \"radau\",\n", - " \"collocation_deg\": 3,\n", - " \"collocation_ni\": 1,\n", - " \"store_full_solution\": True,\n", - " # Use MA27 linear solver in ipopt for faster calculations:\n", - " \"nlpsol_opts\": {\"ipopt.linear_solver\": \"mumps\"},\n", - "}\n", - "mpc.set_param(**mpc_params)" + "cart_env = create_cart_environment(goal_position=9)\n", + "cart_model = build_cart_model(cart_env)\n", + "cart_simulator = Simulator(cart_model)\n", + "cart_simulator.set_param(t_step=cart_env.dt)\n", + "cart_simulator.setup()" ] }, { @@ -521,28 +327,22 @@ } }, "source": [ - "#### Objective\n", - "\n", - "The control objective is to move the mass to a desired position (`0.1`) and keep it there." + "The control objective is to move the cart to a desired position (`9`)." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, + "metadata": {}, "outputs": [], "source": [ - "xss = np.array([0.1, 0.0])\n", - "distance_cost = 100 * casadi.norm_2(mass_spring_damper.x.cat - xss)\n", + "setpoint = np.array([cart_env.goal_position, 1.0])\n", + "distance_cost = 100 * casadi.norm_2(cart_model.x.cat - setpoint)\n", "terminal_cost = distance_cost\n", "stage_cost = distance_cost\n", - "print(f\"{stage_cost=}\")\n", - "print(f\"{terminal_cost=}\")\n", - "mpc.set_objective(mterm=terminal_cost, lterm=stage_cost)" + "display_array(\"Setpoint\", setpoint)\n", + "print(f\"Stage Cost = {stage_cost}\")\n", + "print(f\"Terminal Cost = {terminal_cost}\")" ] }, { @@ -566,63 +366,49 @@ }, "outputs": [], "source": [ - "force_penalty = 1e-2\n", - "mpc.set_rterm(force=force_penalty)" + "force_penalty = 1e-2" ] }, { "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, + "metadata": {}, "source": [ - "#### Constraints\n", - "\n", - "We apply constraints on the force. In this case, there is only an upper and lower bounds for the force." + "We define as well upper and lower limits for the state and force" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, + "metadata": {}, "outputs": [], "source": [ - "# lower and upper bounds of the input\n", - "u_max = 20\n", - "mpc.bounds[\"lower\", \"_u\", \"force\"] = -u_max\n", - "mpc.bounds[\"upper\", \"_u\", \"force\"] = u_max" + "x_limits = np.array([-10, 10])\n", + "u_limits = np.array([-30, 30])" ] }, { "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, + "metadata": {}, "source": [ - "#### Setup\n", - "\n", - "We can now setup the controller." + "We then create an instance of `MPC` from the `do_mpc` package using the already defined `build_mpc_controller` function." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, + "metadata": {}, "outputs": [], "source": [ - "mpc.setup()" + "cart_mpc_controller = build_mpc_controller(\n", + " model=cart_model,\n", + " t_step=cart_env.dt,\n", + " n_horizon=10,\n", + " stage_cost=stage_cost,\n", + " terminal_cost=terminal_cost,\n", + " force_penalty=force_penalty,\n", + " x_limits=x_limits,\n", + " u_limits=u_limits,\n", + ")" ] }, { @@ -649,15 +435,16 @@ "outputs": [], "source": [ "%%capture\n", - "mpc.reset_history()\n", - "mass_spring_damper_simulator.reset_history()\n", + "cart_mpc_controller.reset_history()\n", + "cart_simulator.reset_history()\n", "x0 = np.zeros((2, 1))\n", - "mass_spring_damper_simulator.x0 = x0\n", - "mpc.x0 = x0\n", - "mpc.set_initial_guess()\n", + "cart_simulator.x0 = x0\n", + "cart_mpc_controller.x0 = x0\n", + "cart_mpc_controller.set_initial_guess()\n", + "\n", "for k in range(100):\n", - " u0 = mpc.make_step(x0)\n", - " x0 = mass_spring_damper_simulator.make_step(u0)" + " u = cart_mpc_controller.make_step(x0)\n", + " x0 = cart_simulator.make_step(u)" ] }, { @@ -670,7 +457,7 @@ }, "outputs": [], "source": [ - "animate_mass_spring_damper_simulation(mpc.data)" + "animate_cart_simulation(cart_mpc_controller.data)" ] }, { @@ -697,14 +484,14 @@ "outputs": [], "source": [ "class MPCController:\n", - " def __init__(self, mpc: do_mpc.controller.MPC) -> None:\n", + " def __init__(self, mpc: MPC) -> None:\n", " self.mpc = mpc\n", " self.mpc.reset_history()\n", " self.mpc.x0 = np.zeros(2)\n", " self.mpc.set_initial_guess()\n", "\n", " def act(self, observation: NDArray) -> NDArray:\n", - " return mpc.make_step(observation.reshape(-1, 1)).ravel()" + " return self.mpc.make_step(observation.reshape(-1, 1)).ravel()" ] }, { @@ -719,11 +506,8 @@ "source": [ "%%capture\n", "max_steps = 100\n", - "env = create_mass_spring_damper_environment(\n", - " render_mode=render_mode, max_steps=max_steps\n", - ")\n", - "controller = MPCController(mpc)\n", - "results = simulate_environment(env, max_steps=max_steps, controller=controller)" + "controller = MPCController(cart_mpc_controller)\n", + "results = simulate_environment(cart_env, max_steps=max_steps, controller=controller)" ] }, { @@ -736,20 +520,23 @@ }, "outputs": [], "source": [ - "show_video(results.frames, fps=1 / env.dt)" + "show_video(results.frames, fps=1 / cart_env.dt)" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, + "metadata": {}, "outputs": [], "source": [ - "animate_mass_spring_damper_simulation(mpc.data)" + "plt.close()\n", + "T = np.arange(len(results.observations)) * cart_env.dt\n", + "plot_cart_results(\n", + " T=T,\n", + " observations=results.observations,\n", + " actions=results.actions,\n", + " reference=cart_env.goal_position,\n", + ")" ] }, { @@ -760,15 +547,35 @@ } }, "source": [ - ":::{exercise} Linear Inverted Pendulum MPC\n", + "::::{exercise} Linear Inverted Pendulum MPC\n", ":label: inverted-pendulum-linear-mpc\n", "\n", - "- Design an MPC controller for the linearized inverted pendulum.\n", + "- Design an MPC controller for linearized inverted pendulum model to keep the pole upright by following these steps:\n", + " 1. Define stage and terminal costs.\n", + " 1. Define setpoint $\\begin{bmatrix}\\theta_s \\\\ \\dot{\\theta}_s\\end{bmatrix} = \\begin{bmatrix}0.0 \\\\ 0.0 \\end{bmatrix}$.\n", + " 1. Define appropriate force penalty.\n", + " 1. Create an mpc controller by calling the `build_mpc_controller` function and passing the appropriate arguments.\n", + " 1. Simulate the system using the simulator and the controller.\n", + " 1. Evaluate the controller on the environment.\n", "- For each case, try different cost functions:\n", " - $\\sum \\theta^2$\n", " - $\\sum |\\theta|$\n", " - $E_{\\text{kinetic}} - E_{\\text{potential}}$\n", - ":::" + "\n", + ":::{hint}\n", + "You can access the inverted pendulum's kinetic, respectively potential, energy using `inverted_pendulum_lin_model.aux[\"E_kinetic\"]`, respectively `inverted_pendulum_lin_model.aux[\"E_potential\"]`)\n", + ":::\n", + "\n", + "::::" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Your solution here" ] }, { @@ -782,6 +589,95 @@ }, "source": [ "::::{solution} inverted-pendulum-linear-mpc\n", + ":class: dropdown\n", + "\n", + "We first need the environment, linear model and simulator:\n", + "\n", + ":::{code-cell} ipython3\n", + "inverted_pendulum_env = create_inverted_pendulum_environment(max_steps=200)\n", + "inverted_pendulum_lin_model = build_inverted_pendulum_linear_model(inverted_pendulum_env)\n", + "inverted_pendulum_lin_simulator = Simulator(inverted_pendulum_lin_model)\n", + "inverted_pendulum_lin_simulator.set_param(t_step=inverted_pendulum_env.dt)\n", + "inverted_pendulum_lin_simulator.setup()\n", + ":::\n", + "\n", + "The goal is to keep the inverted pendulum upright. For that we define the following costs, setpoint and force penalty:\n", + "\n", + ":::{code-cell} ipython3\n", + "setpoint = np.zeros((2, 1))\n", + "distance_cost = casadi.bilin(np.diag([100, 1]), inverted_pendulum_lin.x.cat - setpoint)\n", + "terminal_cost = distance_cost\n", + "stage_cost = distance_cost\n", + "force_penalty = 1e-2\n", + "display_array(\"Setpoint\", setpoint)\n", + "print(f\"{stage_cost=}\")\n", + "print(f\"{terminal_cost=}\")\n", + ":::\n", + "\n", + "We define as well upper and lower limits for the state and force\n", + "\n", + ":::{code-cell} ipython3\n", + "x_limits = np.array([-np.inf, np.inf])\n", + "u_limits = np.array([-inverted_pendulum_env.force_max, inverted_pendulum_env.force_max])\n", + ":::\n", + "\n", + "After that, we create the controller:\n", + "\n", + ":::{code-cell} ipython3\n", + "inverted_pendulum_mpc_controller = build_mpc_controller(\n", + " model=inverted_pendulum_lin_model,\n", + " t_step=inverted_pendulum_env.dt,\n", + " n_horizon=100,\n", + " stage_cost=stage_cost,\n", + " terminal_cost=terminal_cost,\n", + " force_penalty=force_penalty,\n", + " x_limits=x_limits,\n", + " u_limits=u_limits,\n", + ")\n", + ":::\n", + "\n", + "#### Simulation\n", + " \n", + ":::{code-cell} ipython3\n", + "inverted_pendulum_mpc_controller.reset_history()\n", + "inverted_pendulum_lin_simulator.reset_history()\n", + "x0 = np.zeros((2, 1))\n", + "x0[0] = 0.01\n", + "inverted_pendulum_simulator.x0 = x0\n", + "\n", + "for k in range(50):\n", + " u0 = inverted_pendulum_mpc_controller.make_step(x0)\n", + " x0 = inverted_pendulum_lin_simulator.make_step(u0)\n", + "\n", + "animate_inverted_pendulum_simulation(inverted_pendulum_mpc_controller.data)\n", + ":::\n", + "\n", + "#### Evaluation\n", + "\n", + ":::{code-cell} ipython3\n", + "class MPCController:\n", + " def __init__(self, mpc: do_mpc.controller.MPC) -> None:\n", + " self.mpc = mpc\n", + " self.mpc.reset_history()\n", + " self.mpc.x0 = np.zeros(2)\n", + " self.mpc.set_initial_guess()\n", + "\n", + " def act(self, observation: NDArray) -> NDArray:\n", + " return mpc.make_step(observation[[2, 3]].reshape(-1, 1)).ravel()\n", + ":::\n", + "\n", + ":::{code-cell} ipython3\n", + "mpc_controller = MPCController(inverted_pendulum_mpc_controller)\n", + "results = simulate_environment(inverted_pendulum_mpc_controller, max_steps=200, controller=mpc_controller)\n", + "show_video(results.frames, fps=1 / inverted_pendulum_env.dt)\n", + ":::\n", + "\n", + ":::{code-cell} ipython3\n", + "plt.close()\n", + "T = np.arange(len(results.observations)) * inverted_pendulum_env.dt\n", + "plot_inverted_pendulum_results(T=T, observations=results.observations, actions=results.actions, reference=np.inf);\n", + ":::\n", + "\n", "::::" ] }, @@ -790,98 +686,181 @@ "metadata": { "slideshow": { "slide_type": "subslide" - }, - "solution2": "hidden" + } }, "source": [ - "#### Controller" + "::::{exercise} Non-Linear Inverted Pendulum MPC\n", + ":label: non-linear-inverted-pendulum-mpc\n", + "\n", + "- Design an MPC Controller for the non-linear inverted pendulum system for two different cases:\n", + " 1. Cart at origin and upright Pendulum: Set the reference for the cart position to the origin.\n", + " 2. Pendulum Swing-up: Set the initial angle to to $-\\pi$ i.e. start with the pendulum at the bottom.\n", + " 3. Same as 2 but set the reference for the cart position to the origin.\n", + " 4. Make the pendulum rotate as fast as possible.\n", + "\n", + ":::{hint}\n", + "Use the following to create the environment with initial angle set to -np.pi and cutoff angle to np.inf for second > case.\n", + "\n", + "```{code-cell} ipython3\n", + "env = create_inverted_pendulum_environment(theta_threshold=np.inf, initial_angle=-np.pi)\n", + "```\n", + ":::\n", + "\n", + ":::{hint}\n", + "You can access the inverted pendulum's kinetic, respectively potential, energy using `inverted_pendulum_lin_model.aux[\"E_kinetic\"]`, respectively `inverted_pendulum_lin_model.aux[\"E_potential\"]`)\n", + ":::\n", + "\n", + "::::" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" - }, - "outputs": [], + "cell_type": "markdown", + "metadata": {}, "source": [ - "mpc = do_mpc.controller.MPC(inverted_pendulum_lin)" + ":::{solution} non-linear-inverted-pendulum-mpc\n", + ":::" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, + "metadata": {}, "outputs": [], "source": [ - "env = create_inverted_pendulum_environment()\n", - "mpc_params = {\n", - " \"n_horizon\": 50,\n", - " \"t_step\": env.dt,\n", - " \"state_discretization\": \"collocation\",\n", - " \"collocation_type\": \"radau\",\n", - " \"collocation_deg\": 3,\n", - " \"collocation_ni\": 1,\n", - " \"store_full_solution\": True,\n", - " # Use MA27 linear solver in ipopt for faster calculations:\n", - " \"nlpsol_opts\": {\"ipopt.linear_solver\": \"mumps\"},\n", - "}\n", - "mpc.set_param(**mpc_params)" + "# Your solution here" ] }, { "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, + "metadata": {}, "source": [ - "#### Objective" + "::::{solution} non-linear-inverted-pendulum-mpc\n", + ":class: dropdown\n", + "\n", + "#### Fast Rotation\n", + "\n", + "We first need the environment, non-linear model and simulator:\n", + "\n", + ":::{code-cell} ipython3\n", + "inverted_pendulum_env = create_inverted_pendulum_environment(max_steps=200)\n", + "inverted_pendulum_nonlin_model = build_inverted_pendulum_nonlinear_model(inverted_pendulum_env)\n", + "inverted_pendulum_nonlin_simulator = Simulator(inverted_pendulum_nonlin_model)\n", + "inverted_pendulum_nonlin_simulator.set_param(t_step=inverted_pendulum_env.dt)\n", + "inverted_pendulum_nonlin_simulator.setup()\n", + ":::\n", + "\n", + "The goal is to keep the inverted pendulum upright. For that we define the following costs and force penalty:\n", + "\n", + ":::{code-cell} ipython3\n", + "rotation_cost = - 1000 * inverted_pendulum_nonlin_model.x[3]\n", + "terminal_cost = rotation_cost\n", + "stage_cost = rotation_cost\n", + "force_penalty = 0.0\n", + "print(f\"{stage_cost=}\")\n", + "print(f\"{terminal_cost=}\")\n", + ":::\n", + "\n", + "We define as well upper and lower limits for the state and force\n", + "\n", + ":::{code-cell} ipython3\n", + "x_limits = np.array([-np.inf, np.inf])\n", + "u_limits = np.array([-inverted_pendulum_env.force_max, inverted_pendulum_env.force_max])\n", + ":::\n", + "\n", + "After that, we create the controller:\n", + "\n", + ":::{code-cell} ipython3\n", + "inverted_pendulum_mpc_controller = build_mpc_controller(\n", + " model=inverted_pendulum_nonlin_model,\n", + " t_step=inverted_pendulum_env.dt,\n", + " n_horizon=100,\n", + " stage_cost=stage_cost,\n", + " terminal_cost=terminal_cost,\n", + " force_penalty=force_penalty,\n", + " x_limits=x_limits,\n", + " u_limits=u_limits,\n", + ")\n", + ":::\n", + "\n", + "#### Simulation\n", + " \n", + ":::{code-cell} ipython3\n", + "inverted_pendulum_mpc_controller.reset_history()\n", + "inverted_pendulum_nonlin_simulator.reset_history()\n", + "x0 = np.zeros((4, 1))\n", + "x0[2] = 0.01\n", + "inverted_pendulum_nonlin_simulator.x0 = x0\n", + "\n", + "for k in range(50):\n", + " u0 = inverted_pendulum_mpc_controller.make_step(x0)\n", + " x0 = inverted_pendulum_nonlin_simulator.make_step(u0)\n", + "\n", + "animate_inverted_pendulum_simulation(inverted_pendulum_mpc_controller.data)\n", + ":::\n", + "\n", + "#### Evaluation\n", + "\n", + ":::{code-cell} ipython3\n", + "class MPCController:\n", + " def __init__(self, mpc: do_mpc.controller.MPC) -> None:\n", + " self.mpc = mpc\n", + " self.mpc.reset_history()\n", + " self.mpc.x0 = np.zeros(2)\n", + " self.mpc.set_initial_guess()\n", + "\n", + " def act(self, observation: NDArray) -> NDArray:\n", + " return mpc.make_step(observation.reshape(-1, 1)).ravel()\n", + ":::\n", + "\n", + ":::{code-cell} ipython3\n", + "mpc_controller = MPCController(inverted_pendulum_mpc_controller)\n", + "results = simulate_environment(inverted_pendulum_env, max_steps=200, controller=mpc_controller)\n", + "show_video(results.frames, fps=1 / inverted_pendulum_env.dt)\n", + ":::\n", + "\n", + ":::{code-cell} ipython3\n", + "plt.close()\n", + "T = np.arange(len(results.observations)) * inverted_pendulum_env.dt\n", + "plot_inverted_pendulum_results(T=T, observations=results.observations, actions=results.actions, reference=np.inf);\n", + ":::\n", + "\n", + "::::" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" + "slide_type": "subslide" + } }, - "outputs": [], "source": [ - "xss = np.array([0.0, 0.0])\n", - "distance_cost = casadi.bilin(np.diag([100, 1]), inverted_pendulum_lin.x.cat - xss)\n", - "terminal_cost = distance_cost\n", - "stage_cost = distance_cost\n", - "print(f\"{stage_cost=}\")\n", - "print(f\"{terminal_cost=}\")\n", - "mpc.set_objective(mterm=terminal_cost, lterm=stage_cost)" + "## MPC Controller Design Challenges\n", + "\n", + "Some of the challenges that we are faced with when designing an MPC controller include:\n", + "\n", + "- Choosing the right hyper-parameters to ensure optimality as well as feasibility.\n", + "- Ensuring recursive feasibility and achieving optimality despite a short prediction horizon.\n", + "- Satisfying input and state constraints in the presence of uncertainty.\n", + "- Ensuring computational tractability by properly reformulating constraints and costs and parameterizing control policies" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" + "slide_type": "slide" + } }, - "outputs": [], "source": [ - "force_penalty = 1e-4\n", - "mpc.set_rterm(force=force_penalty)" + "## Robust MPC\n", + "\n", + "Robust Model Predictive Control (RMPC) is related to a variety of methods designed to guarantee control performance using optimization algorithms while considering systems with uncertainties. It is concerned with the control of system that are only approximately known. Usually, it is assumed that the system lies in a set of possible systems and this set can be quantitatively characterized\n", + "\n", + "- Robust MPC guarantees constraint satisfaction for all uncertain element realizations.\n", + "- The model is split into a nominal part and additive uncertainty in a compact set. \n", + "- The controller is designed to be robust against the uncertainty.\n", + "- The MPC cost is typically optimized for the nominal system." ] }, { @@ -889,735 +868,12 @@ "metadata": { "slideshow": { "slide_type": "subslide" - }, - "solution2": "hidden" + } }, "source": [ - "#### Constraints" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "# lower and upper bounds of the input\n", - "u_max = 3\n", - "mpc.bounds[\"lower\", \"_u\", \"force\"] = -u_max\n", - "mpc.bounds[\"upper\", \"_u\", \"force\"] = u_max" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "source": [ - "#### Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "mpc.setup()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "source": [ - "#### Simulation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "%%capture\n", - "mpc.reset_history()\n", - "linear_simulator.reset_history()\n", - "x0 = np.zeros((2, 1))\n", - "x0[0] = 0.01\n", - "linear_simulator.x0 = x0\n", - "mpc.x0 = x0\n", - "mpc.set_initial_guess()\n", - "for k in range(100):\n", - " u0 = mpc.make_step(x0)\n", - " x0 = linear_simulator.make_step(u0)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "animate_inverted_pendulum_simulation(mpc.data)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "source": [ - "#### Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "class MPCController:\n", - " def __init__(self, mpc: do_mpc.controller.MPC) -> None:\n", - " self.mpc = mpc\n", - " self.mpc.reset_history()\n", - " self.mpc.x0 = np.zeros(2)\n", - " self.mpc.set_initial_guess()\n", - "\n", - " def act(self, observation: NDArray) -> NDArray:\n", - " return mpc.make_step(observation[[1, 3]].reshape(-1, 1)).ravel()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "%%capture\n", - "max_steps = 500\n", - "env = create_inverted_pendulum_environment(max_steps=max_steps)\n", - "controller = MPCController(mpc)\n", - "results = simulate_environment(env, max_steps=max_steps, controller=controller)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "show_video(results.frames, fps=1 / env.dt)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "animate_inverted_pendulum_simulation(mpc.data)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "## Exercise\n", - "\n", - "- Design an MPC Controller for the non-linear inverted pendulum system for two different cases:\n", - " 1. Cart at origin and upright Pendulm: Set the reference for the cart position to the origin.\n", - " 2. Pendulum Swing-up: Set the initial angle to to $-\\pi$ i.e. start with the pendulum at the bottom.\n", - " 3. Same as 2 but set the reference for the cart position to the origin.\n", - " 4. Make the pendulum rotate as fast as possible.\n", - " \n", - "> **Note 1** Use the following to create the environment with initial angle set to -np.pi and cutoff angle to np.inf for second > case.\n", - "> ```python\n", - "> env = create_inverted_pendulum_environment(cutoff_angle=np.inf, initial_angle=-np.pi)\n", - "> ```\n", - "\n", - "> **Note 2**: You can access the inverted pendulum's kinetic, respectively potential, energy using\n", - "> `inverted_pendulum.aux[\"E_kinetic\"]`, respectively `inverted_pendulum.aux[\"E_potential\"]`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "## Solution" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden", - "solution2_first": true - }, - "source": [ - "### Swing-up" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "source": [ - "#### Controller" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "mpc = do_mpc.controller.MPC(inverted_pendulum)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "env = create_inverted_pendulum_environment()\n", - "mpc_params = {\n", - " \"n_horizon\": 120,\n", - " \"t_step\": env.dt,\n", - " \"state_discretization\": \"collocation\",\n", - " \"collocation_type\": \"radau\",\n", - " \"collocation_deg\": 3,\n", - " \"collocation_ni\": 1,\n", - " \"store_full_solution\": True,\n", - " # Use MA27 linear solver in ipopt for faster calculations:\n", - " \"nlpsol_opts\": {\"ipopt.linear_solver\": \"mumps\"},\n", - "}\n", - "mpc.set_param(**mpc_params)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "source": [ - "#### Objective" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "# energy_cost = 0.1 * inverted_pendulum.aux[\"E_kinetic\"] - 1000 * inverted_pendulum.aux[\"E_potential\"]\n", - "energy_cost = -10000 * inverted_pendulum.aux[\"E_potential\"]\n", - "terminal_cost = energy_cost\n", - "stage_cost = energy_cost\n", - "print(f\"{stage_cost=}\")\n", - "print(f\"{terminal_cost=}\")\n", - "mpc.set_objective(mterm=terminal_cost, lterm=stage_cost)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "force_penalty = 0.0\n", - "mpc.set_rterm(force=force_penalty)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "source": [ - "#### Constraints" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "# lower and upper bounds of the position\n", - "x_max = env.x_threshold\n", - "mpc.bounds[\"lower\", \"_x\", \"position\"] = -x_max\n", - "mpc.bounds[\"upper\", \"_x\", \"position\"] = x_max\n", - "# lower and upper bounds of the input\n", - "u_max = env.force_max\n", - "mpc.bounds[\"lower\", \"_u\", \"force\"] = -u_max\n", - "mpc.bounds[\"upper\", \"_u\", \"force\"] = u_max" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "source": [ - "#### Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "mpc.setup()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "source": [ - "#### Simulation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "%%capture\n", - "mpc.reset_history()\n", - "nonlinear_simulator.reset_history()\n", - "x0 = np.array([0.0, -0.99 * np.pi, 0.0, 0.0])\n", - "nonlinear_simulator.x0 = x0\n", - "mpc.x0 = x0\n", - "mpc.set_initial_guess()\n", - "for k in range(100):\n", - " u0 = mpc.make_step(x0)\n", - " x0 = nonlinear_simulator.make_step(u0)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "animate_full_inverted_pendulum_simulation(mpc.data)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "source": [ - "#### Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "class MPCController:\n", - " def __init__(self, mpc: do_mpc.controller.MPC) -> None:\n", - " self.mpc = mpc\n", - " self.mpc.reset_history()\n", - " self.mpc.x0 = np.array([0.0, -0.99 * np.pi, 0.0, 0.0])\n", - " self.mpc.set_initial_guess()\n", - "\n", - " def act(self, observation: NDArray) -> NDArray:\n", - " return mpc.make_step(observation.reshape(-1, 1)).ravel()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "%%capture\n", - "max_steps = 500\n", - "env = create_inverted_pendulum_environment(\n", - " max_steps=max_steps,\n", - " theta_threshold=np.inf,\n", - " theta_initial=-np.pi,\n", - ")\n", - "controller = MPCController(mpc)\n", - "results = simulate_environment(env, max_steps=max_steps, controller=controller)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "show_video(results.frames, fps=1 / env.dt)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - }, - "solution2": "hidden" - }, - "outputs": [], - "source": [ - "animate_full_inverted_pendulum_simulation(mpc.data)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# Stochastic Optimal Control Problem" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "So far we have focused on deterministic systems with no noise or disturbances. This is however not the case for real systems.\n", - "The methods discussed in this part deal with the problem of controlling dynamical systems\n", - "that are subject to system constraints under uncertainty, which can affect numerous parts of the\n", - "problem formulation. The system dynamics in discrete-time is given by:\n", - "\n", - "$$\n", - "x_{k+1} = f_t(x_k, u_k, k, w_k, \\theta_t)\n", - "$$\n", - "\n", - "Where:\n", - "- $x_k \\in \\mathbf{R}^n$ is the system state at time $k$.\n", - "- $u_k \\in \\mathbf{R}^n$ is the applied input at time $k$.\n", - "- $w_k$ describes a sequence of random variables corresponding to disturbances or process noise in the system, which are often assumed to be independent and identically distributed (i.i.d.).\n", - "- $\\theta_t \\sim \\mathcal{Q}^{\\theta_t}$ is a random variable describing the parametric uncertainty of the system, which is therefore constant over time.\n", - "- The subscript $t$ is used to emphasize that these quantities represent the true system dynamics or true optimal control problem. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "The true problem therefore relates to the development of an optimal controller for a distribution of systems given by $\\mathcal{Q}^{\\theta_t}$ under random disturbances $w_k$.\n", - "\n", - "The optimality of the controller is defined with respect to a cost or objective function. In the\n", - "presence of random model uncertainties, the cost is often defined as the expectation of a sum of\n", - "potentially time-varying stage costs of the states and inputs over a possibly infinite horizon $T$:\n", - "\n", - "$$\n", - "J_t = E\\left(\\sum \\limits_{k=0}^{T} g_t(x_k, u_k, k)\\right),\n", - "$$\n", - "\n", - "where the expected value is taken with respect to all random variables." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "## Stochastic Predictive Control\n", - "\n", - "The constrained stochastic optimal control problem can be formulated as:\n", - "\n", - "$$\n", - "\\begin{array}{ll}\n", - "J_t^* = \\displaystyle\\min_{\\pi_{k}} & \n", - "E\\left[\\sum\\limits_{k=0}^{T} g_t(x_k, u_k, k)\\right]\n", - "\\\\\n", - "\\text{subject to} & x_{k + 1}= f_t(x_k, u_k, k, w_k, \\theta_t)\n", - "\\\\\n", - "& u_k = \\pi_k(x_0, \\dots, x_k)\\\\\n", - "& \\bar{W} = [w_0, \\dots, w_{N - 1}] \\sim \\mathcal{Q}^{\\bar{W}}, \\theta_t \\sim \\mathcal{Q}^{\\theta_t}\\\\\n", - "& P[\\bar{X}] = [x_0, \\dots, x_{N}] \\in \\bar{X}_j \\ge p_j, \\forall j = 1, \\dots, n_{cx}\\\\\n", - "& P[\\bar{U}] = [u_0, \\dots, u_{N - 1}] \\in \\bar{U}_j \\ge p_j, \\forall j = 1, \\dots , n_{cu}\\\\\n", - "\\end{array},\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "Optimizing over a sequence of control laws $\\{\\pi_k\\}$, which can make use of all information in the\n", - "form of state measurements $x_k$ up to time step $k$. Problems of this form are in\n", - "general very hard to solve, and direct efforts typically rely on some form of discretization in space\n", - "and approximate dynamic programming or reinforcement learning.\n", - "\n", - "A notable exception, similar to what we have seen previously, is linear systems under additive noise and quadratic stage costs in the unconstrained setting, for which an exact solution, such as the standard linear quadratic regulator (LQR)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "MPC approximates the previous problem by repeatedly solving a simplified version of the\n", - "problem initialized at the currently measured state $x_k$ over a shorter horizon $N$\n", - "in a receding- horizon fashion.\n", - "\n", - "We introduce the prediction model:\n", - "\n", - "$$\n", - "x_{i+1|k} = f(x_{i|k}, u_{i|k}, i + k, w_{i|k}, \\theta),\n", - "$$\n", - "\n", - "where $f$ is the prediction dynamics. It typically aims at approximating the true dynamics but often differs, e.g.,\n", - "for computational reasons or because a succinct description of the true dynamics is unavailable.\n", - "\n", - "We use the subscript $i|k$ to emphasize predictive quantities, where, e.g., $x_{i|k}$ is the i-step-ahead prediction of the state, initialized at $x_{0|k} = x_k$. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "The most widespread MPC formulations of are nominal MPC schemes, which do not consider any uncertainties in the prediction model but instead rely exclusively on the compensation of uncertainties via feedback and by re-solving the problem at the next sampling instance.\n", - "\n", - "In nominal MPC, the optimization can be performed over control sequences $U = [u_{0|k}, \\dots , u_{N - 1|k}]$ rather than policies, resulting in the constrained optimal control problem\n", - "\n", - "$$\n", - "\\begin{array}{ll}\n", - "J^∗ &= \\displaystyle\\min_{U} g_f(x_{N|k}, u_{N|k}, k + N) + \\sum\\limits_{i=0}^{N-1} g(x_{i|k}, u_{i|k}, i + k)\\\\\n", - "\\text{subject to} & x_{i+1|k} = f(x_{i|k}, u_{i|k}, i + k)\\\\\n", - "& U = [u_{0|k}, \\dots, u_{N|k}] \\in U_j, \\forall j = 1, \\dots, n_{cu} \\\\\n", - "& X = [x_{0|k}, \\dots, x_{N|k}] \\in X_j \\forall j = 1, \\dots, n_{cx} \\\\\n", - "& x_{N|k} \\in X_f\\\\\n", - "& x_{0|k} = x_k\\\\\n", - "\\end{array}.\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "The control law is then implicitly defined through the optimization problem as:\n", - "\n", - "$$\n", - "π^{\\text{MPC}}(x_k, k) = u_{0|k}^*,\n", - "$$\n", - "\n", - "where $u_{0|k}^*$ is the first element of the computed optimal control sequence $U^∗$." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "## Control Design Challenges\n", - "\n", - "- Ensuring recursive feasibility and achieving optimality despite a short prediction horizon.\n", - "- Satisfying input and state constraints in the presence of uncertainty.\n", - "- Ensuring computational tractability by properly reformulating constraints and costs and parameterizing control policies\n", - "\n", - "There is no systematic and universal solution to the third challenge, and often the chosen approach is application dependent. Fortunately, the first and second challenges can be addressed by using data. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "## Robust MPC\n", - "\n", - "- Robust MPC guarantees constraint satisfaction for all uncertain element realizations.\n", - "- The model is split into a nominal part and additive uncertainty in a compact set. \n", - "- The controller is designed to be robust against the uncertainty.\n", - "- The MPC cost is typically optimized for the nominal system." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "## Multi-Stage MPC\n", - "\n", - "The basic idea for the multi-stage approach is to consider various scenarios, where a scenario is defined by one possible realization of all uncertain parameters at every control instant within the horizon. The family of all considered discrete scenarios can be represented as a tree structure, called the scenario tree" + "## Multi-Stage MPC\n", + "\n", + "The basic idea for the multi-stage approach is to consider various scenarios, where a scenario is defined by one possible realization of all uncertain parameters at every control instant within the horizon. The family of all considered discrete scenarios can be represented as a tree structure, called the scenario tree" ] }, { @@ -1661,10 +917,11 @@ } }, "source": [ - "## Example - Inverted Pendulum\n", + "### Inverted Pendulum\n", + "\n", + "In a real system, we cannot usually determine the model parameters exactly and this represents an important source of uncertainty.\n", "\n", - "In a real system, usually the model parameters cannot be determined exactly, what represents an important source of uncertainty. In this example, we consider that the mass of the pendulum and that of the cart are not known precisely \n", - "and vary with respect to their nominal value." + "In this example, we consider that the mass of the pole is not known precisely and is different from its nominal value but we know which possible values it may take." ] }, { @@ -1678,153 +935,17 @@ "### Model" ] }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "#### Model, States and Control inputs" - ] - }, { "cell_type": "code", "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, + "metadata": {}, "outputs": [], "source": [ - "model = do_mpc.model.Model(\"continuous\")\n", + "inverted_pendulum_env = create_inverted_pendulum_environment(max_steps=200)\n", "\n", - "pos = model.set_variable(var_type=\"_x\", var_name=\"position\")\n", - "theta = model.set_variable(var_type=\"_x\", var_name=\"theta\")\n", - "dpos = model.set_variable(var_type=\"_x\", var_name=\"velocity\")\n", - "dtheta = model.set_variable(var_type=\"_x\", var_name=\"dtheta\")\n", - "u = model.set_variable(var_type=\"_u\", var_name=\"force\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "#### Parameters" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], - "source": [ - "# Certain parameters\n", - "ip_parameters = InvertedPendulumParameters()\n", - "k = 1 / 3\n", - "l = ip_parameters.l\n", - "gamma = ip_parameters.gamma\n", - "g = ip_parameters.g\n", - "mu_p = ip_parameters.mu_p\n", - "mu_c = ip_parameters.mu_c" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], - "source": [ - "# Uncertain parameters\n", - "m = model.set_variable(\"_p\", \"m\")\n", - "M = model.set_variable(\"_p\", \"M\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "#### ODE" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], - "source": [ - "numerator = (\n", - " (M + m) * g * casadi.sin(theta)\n", - " - casadi.cos(theta)\n", - " * (gamma * u + m * l * dtheta**2 * casadi.sin(theta) - mu_c * dpos)\n", - " - (M + m) / (m * l) * mu_p * dtheta\n", - ")\n", - "denominator = (1 + k) * (M + m) * l - m * l * casadi.cos(theta) ** 2\n", - "ddtheta = numerator / denominator" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], - "source": [ - "numerator = (\n", - " m * g * casadi.cos(theta) * casadi.sin(theta)\n", - " - (1 + k) * (gamma * u + m * l * dtheta**2 * casadi.sin(theta) - mu_c * dpos)\n", - " - mu_p * dtheta * casadi.cos(theta) / l\n", - ")\n", - "denominator = m * casadi.cos(theta) ** 2 - (1 + k) * (M + m)\n", - "ddpos = numerator / denominator" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "outputs": [], - "source": [ - "model.set_rhs(\"position\", dpos)\n", - "model.set_rhs(\"theta\", dtheta)\n", - "model.set_rhs(\"velocity\", ddpos)\n", - "model.set_rhs(\"dtheta\", ddtheta)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Setup" + "inverted_pendulum_nonlin_model = build_inverted_pendulum_nonlinear_model(\n", + " inverted_pendulum_env, with_uncertainty=True\n", + ")" ] }, { @@ -1833,183 +954,55 @@ "metadata": {}, "outputs": [], "source": [ - "model.setup()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "### Controller" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], - "source": [ - "mpc = do_mpc.controller.MPC(model)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], - "source": [ - "env = create_inverted_pendulum_environment()\n", - "mpc_params = {\n", - " \"n_horizon\": 50,\n", - " \"n_robust\": 1,\n", - " \"t_step\": env.dt,\n", - " \"state_discretization\": \"collocation\",\n", - " \"collocation_type\": \"radau\",\n", - " \"collocation_deg\": 3,\n", - " \"collocation_ni\": 1,\n", - " \"store_full_solution\": True,\n", - " # Use MA27 linear solver in ipopt for faster calculations:\n", - " \"nlpsol_opts\": {\"ipopt.linear_solver\": \"mumps\"},\n", - "}\n", - "mpc.set_param(**mpc_params)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "#### Objective" + "m_p_values = inverted_pendulum_env.masspole * np.array([1.0, 1.30, 0.70])\n", + "uncertainty_values = {\"m_p\": m_p_values}" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, + "metadata": {}, "outputs": [], "source": [ - "env = create_inverted_pendulum_environment()\n", - "xss = np.array([0.5, 0, 0, 0])\n", - "distance_cost = casadi.bilin(np.diag([1, 100, 0, 0]), model.x.cat - xss)\n", + "setpoint = np.zeros((4, 1))\n", + "distance_cost = casadi.bilin(\n", + " np.diag([1, 0, 100, 0]), inverted_pendulum_nonlin_model.x.cat - setpoint\n", + ")\n", "terminal_cost = distance_cost\n", "stage_cost = distance_cost\n", + "force_penalty = 1e-2\n", + "display_array(\"Setpoint\", setpoint)\n", "print(f\"{stage_cost=}\")\n", - "print(f\"{terminal_cost=}\")\n", - "mpc.set_objective(mterm=terminal_cost, lterm=stage_cost)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], - "source": [ - "force_penalty = 0.1\n", - "mpc.set_rterm(force=force_penalty)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "#### Constraints" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], - "source": [ - "# lower and upper bounds of the position\n", - "x_max = 1\n", - "mpc.bounds[\"lower\", \"_x\", \"position\"] = -x_max\n", - "mpc.bounds[\"upper\", \"_x\", \"position\"] = x_max\n", - "# lower and upper bounds of the input\n", - "u_max = 3\n", - "mpc.bounds[\"lower\", \"_u\", \"force\"] = -u_max\n", - "mpc.bounds[\"upper\", \"_u\", \"force\"] = u_max" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "#### Parameter Uncertainty" + "print(f\"{terminal_cost=}\")" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, + "metadata": {}, "outputs": [], "source": [ - "m_values = ip_parameters.m * np.array([1.0, 1.30, 0.70])\n", - "M_values = ip_parameters.M * np.array([1.0, 1.30, 0.70])\n", - "mpc.set_uncertainty_values(m=m_values, M=M_values)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "#### Setup" + "x_limits = np.array([-np.inf, np.inf])\n", + "u_limits = np.array([-inverted_pendulum_env.force_max, inverted_pendulum_env.force_max])" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, + "metadata": {}, "outputs": [], "source": [ - "mpc.setup()" + "inverted_pendulum_mpc_controller = build_mpc_controller(\n", + " model=inverted_pendulum_nonlin_model,\n", + " t_step=inverted_pendulum_env.dt,\n", + " n_horizon=100,\n", + " stage_cost=stage_cost,\n", + " terminal_cost=terminal_cost,\n", + " force_penalty=force_penalty,\n", + " x_limits=x_limits,\n", + " u_limits=u_limits,\n", + " uncertainty_values=uncertainty_values,\n", + ")" ] }, { @@ -2041,7 +1034,7 @@ " self.mpc.set_initial_guess()\n", "\n", " def act(self, observation: NDArray) -> NDArray:\n", - " return mpc.make_step(observation.reshape(-1, 1)).ravel()" + " return self.mpc.make_step(observation.reshape(-1, 1)).ravel()" ] }, { @@ -2056,14 +1049,10 @@ "source": [ "%%capture\n", "max_steps = 100\n", - "env = create_inverted_pendulum_environment(\n", - " render_mode=render_mode,\n", - " max_steps=max_steps,\n", - " cutoff_angle=np.inf,\n", - " initial_angle=0.99 * np.pi,\n", - ")\n", - "controller = MPCController(mpc)\n", - "results = simulate_environment(env, max_steps=max_steps, controller=controller)" + "controller = MPCController(inverted_pendulum_mpc_controller)\n", + "results = simulate_environment(\n", + " inverted_pendulum_env, max_steps=max_steps, controller=controller\n", + ")" ] }, { @@ -2076,7 +1065,7 @@ }, "outputs": [], "source": [ - "show_video(results.frames, fps=1 / env.dt)" + "show_video(results.frames, fps=1 / inverted_pendulum_env.dt)" ] }, { @@ -2089,7 +1078,7 @@ }, "outputs": [], "source": [ - "animate_full_inverted_pendulum_simulation(mpc.data)" + "animate_full_inverted_pendulum_simulation(inverted_pendulum_mpc_controller.data)" ] } ], diff --git a/notebooks/nb_60_MCTS.ipynb b/notebooks/nb_60_MCTS.ipynb index c6dd250..37270a7 100644 --- a/notebooks/nb_60_MCTS.ipynb +++ b/notebooks/nb_60_MCTS.ipynb @@ -22,7 +22,11 @@ "outputs": [], "source": [ "%%capture\n", - "%load_ext training_ml_control" + "%load_ext autoreload\n", + "%autoreload 2\n", + "%matplotlib inline\n", + "%load_ext training_ml_control\n", + "%set_random_seed 12" ] }, { @@ -44,6 +48,46 @@ "%presentation_style" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scene__Initialization": true, + "tags": [ + "ActiveScene" + ] + }, + "outputs": [], + "source": [ + "import warnings\n", + "\n", + "warnings.simplefilter(\"ignore\", UserWarning)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scene__Initialization": true, + "tags": [ + "ActiveScene" + ] + }, + "outputs": [], + "source": [ + "%autoreload\n", + "import numpy as np\n", + "from training_ml_control.environments import (\n", + " create_grid_world_environment,\n", + " plot_grid_graph,\n", + " plot_grid_all_paths_graph,\n", + " simulate_environment,\n", + ")\n", + "from training_ml_control.nb_utils import (\n", + " show_video,\n", + ")" + ] + }, { "cell_type": "markdown", "metadata": { @@ -289,7 +333,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Your Solution Here" + "# Your solution here" ] }, { @@ -297,357 +341,10 @@ "metadata": {}, "source": [ ":::{solution} grid-world\n", - "\n", + ":class: dropdown\n", + "Work in Progress\n", ":::" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "class MCTS:\n", - " def __init__(self, player):\n", - " \"\"\"\n", - " Implementation of Monte Carlo Tree Search\n", - "\n", - " Creates a root of an MCTS tree to keep track of the information\n", - " obtained throughout the course of the game in the form of a tree\n", - " of MCTS nodes\n", - "\n", - " The data structure of a node consists of:\n", - " - the game state which it corresponds to\n", - " - w, the number of wins that have occurred at or below it in the tree\n", - " - n, the number of plays that have occurred at or below it in the tree\n", - " - expanded, whether all the children (legal moves) of the node have\n", - " been added to the tree\n", - "\n", - " To access the node attributes, use the following format. For example,\n", - " to access the attribute 'n' of the root node:\n", - " policy = MCTSPolicy()\n", - " current_node = policy.root\n", - " policy.tree.node[current_node]['n']\n", - " \"\"\"\n", - " self.digraph = nx.DiGraph()\n", - " self.player = player\n", - " self.num_simulations = 0\n", - " # Constant parameter to weight exploration vs. exploitation for UCT\n", - " self.uct_c = np.sqrt(2)\n", - "\n", - " self.node_counter = 0\n", - "\n", - " empty_board = GameState()\n", - " self.digraph.add_node(\n", - " self.node_counter,\n", - " attr_dict={\n", - " \"w\": 0,\n", - " \"n\": 0,\n", - " \"uct\": 0,\n", - " \"expanded\": False,\n", - " \"state\": empty_board,\n", - " },\n", - " )\n", - " empty_board_node_id = self.node_counter\n", - " self.node_counter += 1\n", - "\n", - " self.last_move = None\n", - "\n", - " if player is \"O\":\n", - " for successor in [\n", - " empty_board.transition_function(*move)\n", - " for move in empty_board.legal_moves()\n", - " ]:\n", - " self.digraph.add_node(\n", - " self.node_counter,\n", - " attr_dict={\n", - " \"w\": 0,\n", - " \"n\": 0,\n", - " \"uct\": 0,\n", - " \"expanded\": False,\n", - " \"state\": successor,\n", - " },\n", - " )\n", - " self.digraph.add_edge(empty_board_node_id, self.node_counter)\n", - " self.node_counter += 1\n", - "\n", - " def reset_game(self):\n", - " self.last_move = None\n", - "\n", - " def move(self, starting_state):\n", - " # Make a copy of the starting state so that the MCTS state can't be\n", - " # modified later from the outside\n", - " starting_state = copy.deepcopy(starting_state)\n", - " # todo: is that copy needed?\n", - "\n", - " starting_node = None\n", - "\n", - " if self.last_move is not None:\n", - " # Check if the starting state is already in the graph as a child of the last move that we made\n", - " exists = False\n", - " for child in self.digraph.successors(self.last_move):\n", - " # Check if the child has the same state attribute as the starting state\n", - " if self.digraph.node[child][\"state\"] == starting_state:\n", - " # If it does, then check if there is a link between the last move and this child state\n", - " if self.digraph.has_edge(self.last_move, child):\n", - " exists = True\n", - " starting_node = child\n", - " if not exists:\n", - " # If it wasn't found, then add the starting state and an edge to it from the last move\n", - " self.digraph.add_node(\n", - " self.node_counter,\n", - " attr_dict={\n", - " \"w\": 0,\n", - " \"n\": 0,\n", - " \"uct\": 0,\n", - " \"expanded\": False,\n", - " \"state\": starting_state,\n", - " },\n", - " )\n", - " self.digraph.add_edge(self.last_move, self.node_counter)\n", - " starting_node = self.node_counter\n", - " self.node_counter += 1\n", - " else:\n", - " for node in self.digraph.nodes():\n", - " if self.digraph.node[node][\"state\"] == starting_state:\n", - " starting_node = node\n", - "\n", - " computational_budget = 25\n", - " for i in range(computational_budget):\n", - " self.num_simulations += 1\n", - "\n", - " print(\n", - " \"Running MCTS from this starting state with node id {}:\\n{}\".format(\n", - " starting_node, starting_state\n", - " )\n", - " )\n", - "\n", - " # Until computational budget runs out, run simulated trials\n", - " # through the tree:\n", - "\n", - " # Selection: Recursively pick the best node that maximizes UCT\n", - " # until reaching an unvisited node\n", - " print(\"================ ( selection ) ================\")\n", - " selected_node = self.selection(starting_node)\n", - " print(\"selected:\\n{}\".format(self.digraph.node[selected_node][\"state\"]))\n", - "\n", - " # Check if the selected node is a terminal state, and if so, this\n", - " # iteration is finished\n", - " if self.digraph.node[selected_node][\"state\"].winner():\n", - " break\n", - "\n", - " # Expansion: Add a child node where simulation will start\n", - " print(\"================ ( expansion ) ================\")\n", - " new_child_node = self.expansion(selected_node)\n", - " print(\"Node chosen for expansion:\\n{}\".format(new_child_node))\n", - "\n", - " # Simulation: Conduct a light playout\n", - " print(\"================ ( simulation ) ================\")\n", - " reward = self.simulation(new_child_node)\n", - " print(\"Reward obtained: {}\\n\".format(reward))\n", - "\n", - " # Backpropagation: Update the nodes on the path with the simulation results\n", - " print(\"================ ( backpropagation ) ================\")\n", - " self.backpropagation(new_child_node, reward)\n", - "\n", - " move, resulting_node = self.best(starting_node)\n", - " print(\"MCTS complete. Suggesting move: {}\\n\".format(move))\n", - "\n", - " self.last_move = resulting_node\n", - "\n", - " # If we won, reset the last move to None for future games\n", - " if self.digraph.node[resulting_node][\"state\"].winner():\n", - " self.last_move = None\n", - "\n", - " return move\n", - "\n", - " def best(self, root):\n", - " \"\"\"\n", - " Returns the action that results in the child with the highest UCT value\n", - " (An alternative strategy could also be used, where the action leading to\n", - " the child with the most number of visits is chosen\n", - " \"\"\"\n", - " # Todo: explore various strategies for choosing the best action\n", - " children = self.digraph.successors(root)\n", - "\n", - " # # Option 1: Choose the child with the highest 'n' value\n", - " # num_visits = {}\n", - " # for child_node in children:\n", - " # num_visits[child_node] = self.digraph.node[child_node]['n']\n", - " # best_child = max(num_visits.items(), key=operator.itemgetter(1))[0]\n", - "\n", - " # Option 2: Choose the child with the highest UCT value\n", - " uct_values = {}\n", - " for child_node in children:\n", - " uct_values[child_node] = self.uct(child_node)\n", - "\n", - " # Choose the child node that maximizes the expected value given by UCT\n", - " # If more than one has the same UCT value then break ties randomly\n", - " best_children = [\n", - " key\n", - " for key, val in uct_values.iteritems()\n", - " if val == max(uct_values.values())\n", - " ]\n", - " idx = np.random.randint(len(best_children))\n", - " best_child = best_children[idx]\n", - "\n", - " # Determine which action leads to this child\n", - " action = self.digraph.get_edge_data(root, best_child)[\"action\"]\n", - " return action, best_child\n", - "\n", - " def selection(self, root):\n", - " \"\"\"\n", - " Starting at root, recursively select the best node that maximizes UCT\n", - " until a node is reached that has no explored children\n", - " Keeps track of the path traversed by adding each node to path as\n", - " it is visited\n", - " :return: the node to expand\n", - " \"\"\"\n", - " # In the case that the root node is not in the graph, add it\n", - " if root not in self.digraph.nodes():\n", - " self.digraph.add_node(\n", - " self.node_counter,\n", - " attr_dict={\"w\": 0, \"n\": 0, \"uct\": 0, \"expanded\": False, \"state\": root},\n", - " )\n", - " self.node_counter += 1\n", - " return root\n", - " elif not self.digraph.node[root][\"expanded\"]:\n", - " print(\"root in digraph but not expanded\")\n", - " return root # This is the node to expand\n", - " else:\n", - " print(\"root expanded, move on to a child\")\n", - " # Handle the general case\n", - " children = self.digraph.successors(root)\n", - " uct_values = {}\n", - " for child_node in children:\n", - " uct_values[child_node] = self.uct(state=child_node)\n", - "\n", - " # Choose the child node that maximizes the expected value given by UCT\n", - " best_child_node = max(uct_values.items(), key=operator.itemgetter(1))[0]\n", - "\n", - " return self.selection(best_child_node)\n", - "\n", - " def expansion(self, node):\n", - " # As long as this node has at least one unvisited child, choose a legal move\n", - " children = self.digraph.successors(node)\n", - " legal_moves = self.digraph.node[node][\"state\"].legal_moves()\n", - " print(\"Legal moves: {}\".format(legal_moves))\n", - "\n", - " # Select the next unvisited child with uniform probability\n", - " unvisited_children = []\n", - " corresponding_actions = []\n", - " print(\"legal moves: {}\".format(legal_moves))\n", - " for move in legal_moves:\n", - " print(\"adding to expansion analysis with: {}\".format(move))\n", - " child = self.digraph.node[node][\"state\"].transition_function(*move)\n", - "\n", - " in_children = False\n", - " for child_node in children:\n", - " if self.digraph.node[child_node][\"state\"] == child:\n", - " in_children = True\n", - "\n", - " if not in_children:\n", - " unvisited_children.append(child)\n", - " corresponding_actions.append(move)\n", - " # Todo: why is it possible for there to be no unvisited children?\n", - " print(\"unvisited children: {}\".format(len(unvisited_children)))\n", - " if len(unvisited_children) > 0:\n", - " idx = np.random.randint(len(unvisited_children))\n", - " child, move = unvisited_children[idx], corresponding_actions[idx]\n", - "\n", - " self.digraph.add_node(\n", - " self.node_counter,\n", - " attr_dict={\"w\": 0, \"n\": 0, \"uct\": 0, \"expanded\": False, \"state\": child},\n", - " )\n", - " self.digraph.add_edge(node, self.node_counter, attr_dict={\"action\": move})\n", - " child_node_id = self.node_counter\n", - " self.node_counter += 1\n", - " else:\n", - " # Todo:\n", - " # Is this the correct behavior? The issue is, it was getting to the expansion\n", - " # expansion method with nodes that were already expanded for an unknown reason,\n", - " # so here we return the node that was passed. Maybe there is a case where a\n", - " # node had been expanded but not yet marked as expanded until it got here.\n", - " return node\n", - "\n", - " # If all legal moves are now children, mark this node as expanded.\n", - " if len(children) + 1 == len(legal_moves):\n", - " self.digraph.node[node][\"expanded\"] = True\n", - " print(\"node is expanded\")\n", - "\n", - " return child_node_id\n", - "\n", - " def simulation(self, node):\n", - " \"\"\"\n", - " Conducts a light playout from the specified node\n", - " :return: The reward obtained once a terminal state is reached\n", - " \"\"\"\n", - " random_policy = RandomPolicy()\n", - " current_state = self.digraph.node[node][\"state\"]\n", - " while not current_state.winner():\n", - " move = random_policy.move(current_state)\n", - " current_state = current_state.transition_function(*move)\n", - "\n", - " if current_state.winner() == self.player:\n", - " return 1\n", - " else:\n", - " return 0\n", - "\n", - " def backpropagation(self, last_visited, reward):\n", - " \"\"\"\n", - " Walk the path upwards to the root, incrementing the\n", - " 'n' and 'w' attributes of the nodes along the way\n", - " \"\"\"\n", - " current = last_visited\n", - " while True:\n", - " self.digraph.node[current][\"n\"] += 1\n", - " self.digraph.node[current][\"w\"] += reward\n", - "\n", - " print(\n", - " \"Updating to n={} and w={}:\\n{}\".format(\n", - " self.digraph.node[current][\"n\"],\n", - " self.digraph.node[current][\"w\"],\n", - " self.digraph.node[current][\"state\"],\n", - " )\n", - " )\n", - "\n", - " # Terminate when we reach the empty board\n", - " if self.digraph.node[current][\"state\"] == GameState():\n", - " break\n", - " # Todo:\n", - " # Does this handle the necessary termination conditions for both 'X' and 'O'?\n", - " # As far as we can tell, it does\n", - "\n", - " # Will throw an IndexError when we arrive at a node with no predecessors\n", - " # Todo: see if this additional check is no longer necessary\n", - " try:\n", - " current = self.digraph.predecessors(current)[0]\n", - " except IndexError:\n", - " break\n", - "\n", - " def uct(self, state):\n", - " \"\"\"\n", - " Returns the expected value of a state, calculated as a weighted sum of\n", - " its exploitation value and exploration value\n", - " \"\"\"\n", - " n = self.digraph.node[state][\"n\"] # Number of plays from this node\n", - " w = self.digraph.node[state][\"w\"] # Number of wins from this node\n", - " t = self.num_simulations\n", - " c = self.uct_c\n", - " epsilon = EPSILON\n", - "\n", - " exploitation_value = w / (n + epsilon)\n", - " exploration_value = c * np.sqrt(np.log(t) / (n + epsilon))\n", - " print(\"exploration_value: {}\".format(exploration_value))\n", - "\n", - " value = exploitation_value + exploration_value\n", - "\n", - " print(\"UCT value {:.3f} for state:\\n{}\".format(value, state))\n", - "\n", - " self.digraph.node[state][\"uct\"] = value\n", - "\n", - " return value" - ] } ], "metadata": { @@ -676,7 +373,7 @@ }, "scenes_data": { "active_scene": "Initialization", - "init_scene": null, + "init_scene": "Initialization", "scenes": [ "Initialization" ] diff --git a/notebooks/nb_70_machine_learning_control.ipynb b/notebooks/nb_70_machine_learning_control.ipynb index 883e8b1..79610a9 100644 --- a/notebooks/nb_70_machine_learning_control.ipynb +++ b/notebooks/nb_70_machine_learning_control.ipynb @@ -46,17 +46,37 @@ "%presentation_style" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e892149-098a-42c7-b554-bbb83918888d", + "metadata": { + "init_cell": true, + "scene__Default Scene": true, + "tags": [ + "ActiveScene" + ] + }, + "outputs": [], + "source": [ + "import warnings\n", + "\n", + "warnings.simplefilter(\"ignore\", UserWarning)" + ] + }, { "cell_type": "code", "execution_count": null, "id": "cd78d373-34af-4b57-8d16-dbf6f662331d", "metadata": { "init_cell": true, + "scene__Default Scene": true, "slideshow": { "slide_type": "skip" }, "tags": [ - "remove-cell" + "remove-cell", + "ActiveScene" ] }, "outputs": [], @@ -70,11 +90,12 @@ "\n", "import pykoopman as pk\n", "\n", - "from training_ml_control.environment import (\n", + "from training_ml_control.environments import (\n", + " create_cart_environment,\n", " create_inverted_pendulum_environment,\n", " simulate_environment,\n", ")\n", - "from training_ml_control.nb_utils import show_video\n", + "from training_ml_control.nb_utils import show_video, display_array\n", "\n", "warnings.simplefilter(\"ignore\", UserWarning)\n", "sns.set_theme()\n", @@ -108,7 +129,9 @@ "id": "77372629-782f-44b6-ad24-baedbca2fbc2", "metadata": {}, "source": [ - "# Machine Learning & Control" + "# Machine Learning & Control\n", + "\n", + "Modern machine learning provides useful tools and perspectives for control theory. Framing control problems as data modeling tasks enables powerful function approximation, estimation, and optimization techniques from machine learning to be applied." ] }, { @@ -120,7 +143,7 @@ } }, "source": [ - "## Learning-Based MPC" + "## Learning-Based Control" ] }, { @@ -132,13 +155,10 @@ } }, "source": [ - "- Learning-based MPC addresses the automated and data-driven generation or adaptation of elements of the MPC formulation to improve control performance.\n", + "- Learning-based control addresses the automated and data-driven generation or adaptation of elements of the controller formulation to improve control performance.\n", "- The learning setup can be diverse:\n", " - Offline learning involves adapting the controller between trials or episodes while collecting data.\n", - " - Online learning adjusts the controller during closed-loop operation (e.g. repetitive tasks) or using data from one task execution.\n", - "- Much research has focused on automatically improving model quality, as this clearly affects MPC performance.\n", - "- Some efforts address the MPC problem formulation directly.\n", - "- Others use MPC concepts to satisfy constraints during learning-based control i.e. Safe Learning Control." + " - Online learning adjusts the controller during closed-loop operation (e.g. repetitive tasks) or using data from one task execution." ] }, { @@ -152,9 +172,9 @@ "source": [ "## System Identification\n", "\n", - "- MPC relies on accurate system models, so one approach is learning to adjust the model either during operation or between different operational instances.\n", + "- Control relies on accurate system models, so one approach is learning to adjust the model either during operation or between different operational instances.\n", "- Traditionally models are derived offline before control using first principles and identification.\n", - "- Learning-based MPC constructs and updates models and uncertainties from data." + "- Learning-based system identification constructs and updates models and uncertainties from data." ] }, { @@ -195,6 +215,92 @@ "To do that properly for a controlled system, we need either a human expert or a well-tuned program under supervision for data collection. However, in many safety-critical tasks such as space exploration, there is no expert collecting data. Naturally, here comes a question: Can we safely collect data without humans in the loop, and eventually achieve an aggressive control goal? For example, landing the drone faster and faster." ] }, + { + "cell_type": "markdown", + "id": "fb7dc38b-7042-4fdb-b6d9-afa4f53773a4", + "metadata": {}, + "source": [ + "### Cart" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fb805d78-a8c1-4dd6-9aa9-24acf6fa7c34", + "metadata": {}, + "outputs": [], + "source": [ + "cart_env = create_cart_environment(goal_position=9)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0f0d34fb-9237-4150-8fdd-465358e510cd", + "metadata": {}, + "outputs": [], + "source": [ + "cart_observations = []\n", + "cart_actions = []\n", + "for _ in range(20):\n", + " result = simulate_environment(cart_env)\n", + " cart_observations.append(result.observations)\n", + " cart_actions.append(result.actions)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2b4a5b6-ab5e-4d90-9c9e-fe4c6884badc", + "metadata": {}, + "outputs": [], + "source": [ + "fig, axes = plt.subplots(1, 2, sharex=True)\n", + "axes = axes.ravel()\n", + "for i, label in zip(range(2), [\"$x$\", r\"$\\dot{x}$\"]):\n", + " for j in range(len(cart_observations)):\n", + " t = np.arange(len(cart_observations[j][i]))\n", + " axes[i].plot(t, cart_observations[j][i], label=label)\n", + " axes[i].set_xlabel(\"Time\")\n", + " axes[i].set_title(label)\n", + "fig.tight_layout()\n", + "plt.show();" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "728e13ac-0b8e-4934-b1ab-bb83a442dfde", + "metadata": {}, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(1, 1, sharex=True)\n", + "\n", + "for j in range(len(cart_observations)):\n", + " ax.plot(cart_observations[j][0], cart_observations[j][1])\n", + " ax.set_xlabel(\"$x$\")\n", + " ax.set_ylabel(\"$\\dot{x}$\")\n", + "fig.tight_layout()\n", + "plt.show();" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cc1ea39d-5a37-49f2-bcca-d8d875ebcd36", + "metadata": {}, + "outputs": [], + "source": [ + "fig, ax = plt.subplots()\n", + "for j in range(len(cart_actions)):\n", + " t = np.arange(len(cart_actions[j]))\n", + " ax.plot(t, cart_actions[j])\n", + " ax.set_xlabel(\"Time\")\n", + " ax.set_title(\"$u$\")\n", + "fig.tight_layout()\n", + "plt.show();" + ] + }, { "cell_type": "markdown", "id": "9cce48e1-ffc6-4c1a-b175-2ce58e7e29e6", @@ -210,7 +316,7 @@ "metadata": {}, "outputs": [], "source": [ - "env = create_inverted_pendulum_environment(\n", + "inverted_pendulum_env = create_inverted_pendulum_environment(\n", " max_steps=100, theta_threshold=np.deg2rad(45)\n", ")" ] @@ -222,12 +328,12 @@ "metadata": {}, "outputs": [], "source": [ - "observations = []\n", - "actions = []\n", + "inverted_pendulum_observations = []\n", + "inverted_pendulum_actions = []\n", "for _ in range(20):\n", - " result = simulate_environment(env)\n", - " observations.append(result.observations)\n", - " actions.append(result.actions)" + " result = simulate_environment(inverted_pendulum_env)\n", + " inverted_pendulum_observations.append(result.observations)\n", + " inverted_pendulum_actions.append(result.actions)" ] }, { @@ -240,15 +346,31 @@ "fig, axes = plt.subplots(2, 2, sharex=True)\n", "axes = axes.ravel()\n", "for i, label in zip(range(4), [\"$x$\", r\"$\\dot{x}$\", r\"$\\theta$\", r\"$\\dot{\\theta}$\"]):\n", - " t = np.arange(len(observations[j][0]))\n", - " for j in range(len(observations)):\n", - " axes[i].plot(t, observations[j][i], label=label)\n", + " for j in range(len(inverted_pendulum_observations)):\n", + " t = np.arange(len(inverted_pendulum_observations[j][i]))\n", + " axes[i].plot(t, inverted_pendulum_observations[j][i], label=label)\n", " axes[i].set_xlabel(\"Time\")\n", " axes[i].set_title(label)\n", "fig.tight_layout()\n", "plt.show();" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "db98415f-cd1c-4e27-a201-1b51747523d0", + "metadata": {}, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(1, 1)\n", + "for j in range(len(inverted_pendulum_observations)):\n", + " ax.plot(inverted_pendulum_observations[j][0], inverted_pendulum_observations[j][2])\n", + " ax.set_xlabel(\"$x$\")\n", + " ax.set_ylabel(r\"$\\theta$\")\n", + "fig.tight_layout()\n", + "plt.show();" + ] + }, { "cell_type": "code", "execution_count": null, @@ -257,9 +379,9 @@ "outputs": [], "source": [ "fig, ax = plt.subplots()\n", - "for j in range(len(actions)):\n", - " t = np.arange(len(actions[j]))\n", - " ax.plot(t, actions[j])\n", + "for j in range(len(inverted_pendulum_actions)):\n", + " t = np.arange(len(inverted_pendulum_actions[j]))\n", + " ax.plot(t, inverted_pendulum_actions[j])\n", " ax.set_xlabel(\"Time\")\n", " ax.set_title(\"$u$\")\n", "fig.tight_layout()\n", @@ -286,14 +408,6 @@ "Where $\\theta$ are learnable parameters." ] }, - { - "cell_type": "code", - "execution_count": null, - "id": "895992af-8060-4bde-a766-269b81e84b67", - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "markdown", "id": "5e889c64-ce65-40c7-9752-4291f03128c2", @@ -379,19 +493,30 @@ "Given $\\mathcal{F}$ a space of functions $g: \\Omega \\rightarrow \\mathbb{C}$, and $\\Omega$ the state space of our dynamical system. The Koopman operator is defined on a suitable domain $\\mathcal{D}(\\mathcal{K}) \\subset \\mathcal{F}$ via the composition formula:\n", "\n", "$$\n", - "[\\mathcal{K}g](x) = [g \\circ f](x) = g(f(x)), \\quad g \\in \\mathcal{D}(\\mathcal{K})\n", + "[\\mathcal{K}g](\\mathbf{x}) = [g \\circ f](\\mathbf{x}) = g(f(\\mathbf{x})), \\quad g \\in \\mathcal{D}(\\mathcal{K})\n", "$$\n", "\n", - "Where $x_{t+1} = f(x_t)$\n", + "Where $\\mathbf{x}_{t+1} = f(\\mathbf{x}_t)$\n", "\n", "The functions $g$, referred to as *observables*, serve as tools for indirectly measuring the state of the system.\n", "Specifically, $g(x_t)$ indirectly measures the state $x_t$.\n", "\n", - "In this context, $[\\mathcal{K}g](x_t) = g(f(x_t)) = g(x_{t+1})$ represents the measurement of the state one time\n", - "step ahead of $g(x_t)$. This process effectively captures the dynamic progression of the system.\n", + "In this context, $[\\mathcal{K}g](\\mathbf{x}_t) = g(f(\\mathbf{x}_t)) = g(\\mathbf{x}_{t+1})$ represents the measurement of the state one time\n", + "step ahead of $g(\\mathbf{x}_t)$. This process effectively captures the dynamic progression of the system.\n", + "\n", + "The key property of the Koopman operator $\\mathcal{K}$ is its *linearity*. This linearity holds irrespective of whether the system’s dynamics are linear or nonlinear. Consequently, the spectral properties of $\\mathcal{K}$ become a powerful tool in analyzing the dynamical system’s behavior.\n", "\n", - "The key property of the Koopman operator $\\mathcal{K}$ is its *linearity*. This linearity holds irrespective of whether the system’s dynamics are linear or nonlinear.\n", - "Consequently, the spectral properties of K become a powerful tool in analyzing the dynamical system’s behavior." + "if $g \\in \\mathcal{F}$ is an eigenfunction of $\\mathcal{K}$ with eigenvalue $\\lambda$, then:\n", + "\n", + "$$\n", + "g(\\mathbf{x}_t) = [\\mathcal{K}^t g](\\mathbf{x}_0) = λ^t g(\\mathbf{x}_0), \\quad \\forall n \\in \\mathbb{N}.\n", + "$$\n", + "\n", + "One of the most useful features of Koopman operators is the Koopman Mode Decomposition (KMD).\n", + "The KMD expresses the state $\\mathbf{x}$ or an observable $g(\\mathbf{x})$ as a linear combination of dominant coherent structures.\n", + "It can be considered a diagonalization of the Koopman operator.\n", + "\n", + "As a result, the KMD is invaluable for tasks such as dimensionality and model reduction. It generalizes the space-time separation of variables typically achieved through the Fourier transformor singular value decomposition (SVD)." ] }, { @@ -415,9 +540,103 @@ "id": "319c7c50-bb03-46ce-a268-2281cd9c1f0b", "metadata": {}, "source": [ - "### Dynamic Mode Decomposition (DMD)\n", + "## Dynamic Mode Decomposition (DMD)\n", + "\n", + "Dynamic Mode Decomposition (DMD) is a popular data-driven analysis technique used to decompose complex, nonlinear systems into a set of modes, revealing underlying patterns and dynamics through spectral analysis.\n", + "\n", + "The simplest and historically first interpretation of DMD is as a linear regression.\n", + "\n", + "We consider a discrete-time dynamical systems represented as:\n", + "\n", + "$$\n", + "\\mathbf{x}_{t+1} = f(\\mathbf{x}_t), \\quad n = 0, 1, 2, \\dots, \n", + "$$\n", + "\n", + "Given discrete-time snapshots of the system:\n", + "\n", + "$$\n", + "\\{\\mathbf{x}^{(m)}, \\mathbf{y}^{(m)}\\}^M_{m=1}, \\quad \\text{s.t.} \\quad \\mathbf{y}^{(m)} = f(\\mathbf{x}^{(m)}), \\quad m = 1, \\dots , M.\n", + "$$\n", + "\n", + "We define the snapshot matrices $\\mathbf{X}, \\mathbf{Y} \\in \\mathbb{C}^{d\\times M}$ as:\n", + "\n", + "$$\n", + "\\mathbf{X} = \\begin{bmatrix}x^{(1)} & x^{(2)} & \\dots & x^{(M)} \\end{bmatrix}\n", + "$$\n", + "\n", + "$$\n", + "\\mathbf{Y} = \\begin{bmatrix}y^{(1)} & y^{(2)} & \\dots & y^{(M)} \\end{bmatrix}\n", + "$$\n", "\n", - "Dynamic Mode Decomposition (DMD) is a popular data-driven analysis technique used to decompose complex, nonlinear systems into a set of modes, revealing underlying patterns and dynamics through spectral analysis." + "We seek a matrix $\\mathbf{K}_{\\text{DMD}}$ such that $\\mathbf{Y} \\approx \\mathbf{K}_{\\text{DMD}} \\mathbf{X}$. We can think\n", + "of this as constructing a linear and approximate dynamical system.\n", + "\n", + "To find a suitable matrix KDMD, we consider the minimization problem:\n", + "\n", + "$$\n", + "\\underset{\\mathbf{K}_{\\text{DMD}} \\in \\mathbb{C}^{d\\times d} }{\\min} \\left\\lVert \\mathbf{Y} − \\mathbf{K}_{\\text{DMD}} \\mathbf{X} \\right\\rVert_F ,\n", + "$$ (dmd-minimization)\n", + "\n", + "where $\\left\\lVert . \\right\\rVert_F$ denotes the Frobenius norm[^*]. Similar optimization problems will be at the heart of the\n", + "various DMD-type algorithms we consider in this review. A solution to the problem in {eq}`dmd-minimization` is:\n", + "\n", + "[^*]: The Frobenius norm of a matrix $\\mathbf{X} \\in \\mathbb{C}^{m\\times n}$ is defined as $\\left\\lVert \\mathbf{X} \\right\\rVert_F = \\sqrt{\\sum_{i=0}^{m} \\sum_{j=0}^{n} x_{i,j}}$\n", + "\n", + "$$\n", + "\\mathbf{K}_{\\text{DMD}} = \\mathbf{Y} \\mathbf{X}^{+} ∈ \\mathbb{C}^{d\\times d},\n", + "$$\n", + "\n", + "where $^{+}$ denotes the Moore–Penrose pseudoinverse.\n", + "\n", + "In practice, this is computed using the Singular Value Decomposition (SVD) as follows:\n", + "\n", + "$$\n", + "\\begin{array}{ll}\n", + "\\mathbf{X} \\approx U \\Sigma V^∗ & \\text{(truncated SVD of rank r)}\\\\\n", + "\\tilde{\\mathbf{K}}_{\\text{DMD}} = U^∗\\mathbf{Y} V \\Sigma^{-1} & \\text{(Compute compression)}\\\\\n", + "\\tilde{\\mathbf{K}}_{\\text{DMD}} W = W \\Lambda & \\text{(Compute eigendecomposition)}\\\\\n", + "\\phi = YV\\Sigma^{-1}W & \\text{(Compute the modes)}\\\\\n", + "\\end{array}\n", + "$$ " + ] + }, + { + "cell_type": "markdown", + "id": "3b5d734b-8ae5-494d-922c-71d58083d7bf", + "metadata": {}, + "source": [ + "The core goal of DMD is to apply linear algebra and spectral techniques to the analysis, prediction, and control of nonlinear dynamical systems.\n", + "However, DMD often faces several challenges that have been a driving force for the many versions of the DMD algorithm that have appeared.\n", + "\n", + "Generally speaking, the error of DMD and its approximate KMD can be split into three types:\n", + "- The projection error is due to projecting/truncating the Koopman operator onto a finite-dimensional space of observables. This is linked to the issue of closure and lack of (or lack of knowledge of) non-trivial finite-dimensional Koopman invariant subspaces.\n", + "- The estimation error is due to estimating the matrices that represent the projected Koopman operator from a finite set of potentially noisy trajectory data.\n", + "- Numerical errors (e.g., roundoff, stability, further compression, etc.) incurred when processing the finite DMD matrix." + ] + }, + { + "cell_type": "markdown", + "id": "a349eefe-6ce2-4178-b76b-480e6f58936a", + "metadata": {}, + "source": [ + "### Variants\n", + "\n", + ":::{table} Summary of some DMD methods. *Adapted from {cite}`colbrook_multiverse_2023`*.\n", + ":widths: auto\n", + ":align: center\n", + "\n", + "| DMD Method | Challenges Overcome | Key Insight/Development |\n", + "|---|---|---|\n", + "| Forward-Backward DMD | Sensor noise bias. | Take geometric mean of forward and backward propagators for the data. |\n", + "| Total Least-Squares DMD | Sensor noise bias. | Replace least-squares problem with total least-squares problem. |\n", + "| Optimized DMD
Bagging Optimized DMD | Sensor noise bias.
Optimal collective processing of snapshots.| Exponential fitting problem, solve using variable projection method.
Statistical bagging sampling strategy. |\n", + "| Compressed Sensing | Computational efficiency.
Temporal or spatial undersampling. | Unitary invariance of DMD extended to settings of compressed sensing (e.g., RIP, sparsity-promoting regularizers). |\n", + "| Randomized DMD | Computational efficiency.
Memory usage. | Sketch data matrix for computations in reduced-dimensional space. |\n", + "| Multiresolution DMD | Multiscale dynamics. | Filtered decomposition across scales. |\n", + "| **DMD with Control** | Separation of unforced dynamics and actuation. | Generalized regression for globally linear control framework. |\n", + "| **Extended DMD** | Nonlinear observables. | Arbitrary (nonlinear) dictionaries, recasting of DMD as a Galerkin method. |\n", + "| Physics-Informed DMD | Preserving structure of dynamical systems.
Numerous instances given in general framework. | Restrict the least-squares optimization to lie on a matrix manifold. | \n", + ":::" ] }, { @@ -425,7 +644,47 @@ "id": "1e75e113-135f-4c73-bcb8-7083984f0e01", "metadata": {}, "source": [ - "### DMD with Control (DMDc)" + "### DMD with Control (DMDc)\n", + "\n", + "One of the most successful applications of the Koopman operator framework lies in control with demonstrated successes in various challenging appli-\n", + "cations. These include fluid dynamics, robotics, power grids, biology, and chemical processes.\n", + "\n", + "The key point is that Koopman operators represent nonlinear dynamics within a globally linear framework. This approach leads to tractable convex optimization problems and circumvents theoretical and computational limitations associated with nonlinearity. Moreover, it is amenable to data-driven, model-free approaches.\n", + "\n", + "DMDc extends DMD to disambiguate between unforced dynamics and the effect of actuation.\n", + "\n", + "The DMD regression is generalized to:\n", + "\n", + "$$\n", + "\\mathbf{x}_{t+1} = f(\\mathbf{x}_t, \\mathbf{u}_t) \\approx A \\mathbf{x}_t + B \\mathbf{u}_t\n", + "$$\n", + "\n", + "where $A \\in \\mathbb{C}^{d\\times d}$ and $B \\in \\mathbb{C}^{d\\times q} are unknown matrices.\n", + "\n", + "Snapshot triplets of the form $\\{\\mathbf{x}^{(m)}, \\mathbf{y}^{(m)}, \\mathbf{u}^{(m)}\\}^M_{m=1}$ are collected, where we assume that:\n", + "\n", + "$$\n", + "\\mathbf{y}^{(m)} \\approx f(\\mathbf{x}^{(m)}, \\mathbf{u}^{(m)}), \\quad m = 1, \\dots , M.\n", + "$$\n", + "\n", + "The control portion of the snapshots is arranged into the matrix $\\Upsilon = \\begin{pmatrix}u^{(1)} & u^{(2)} & \\dots & u^{(M)}\\end{pmatrix}$.\n", + "\n", + "The optimization problem in {eq}`dmd-minimization` is replaced by\n", + "\n", + "$$\n", + "\\underset{A, B}{\\min} \\left\\lVert \\mathbf{Y} − \\begin{pmatrix}A & B \\end{pmatrix}\\Omega \\right\\rVert_F^2,\\\\\n", + "\\text{where} \\quad \\Omega = \\begin{pmatrix}\\mathbf{X} \\\\ \\Upsilon\\end{pmatrix} \n", + "$$ (dmdc-minimization)\n", + "\n", + "A solution is given as $\\begin{pmatrix}A & B \\end{pmatrix} = \\mathbf{Y} \\Omega^{+}$." + ] + }, + { + "cell_type": "markdown", + "id": "e2ec06ac-e9e8-4918-9ea8-8107a16f7104", + "metadata": {}, + "source": [ + "#### Cart " ] }, { @@ -435,54 +694,48 @@ "metadata": {}, "outputs": [], "source": [ - "X = np.concatenate([obs[:-1] for obs in observations])\n", - "U = np.concatenate(actions)" + "X = np.concatenate([obs[:-1] for obs in cart_observations])\n", + "U = np.concatenate(cart_actions)" ] }, { "cell_type": "code", "execution_count": null, - "id": "c6774097-e963-4402-aa06-4e2b3594c179", + "id": "4f0884d8-de8d-4129-a91a-ddc0342d6a52", "metadata": {}, "outputs": [], "source": [ - "EDMDc = pk.regression.EDMDc()\n", - "centers = np.random.uniform(-1.5, 1.5, (4, 4))\n", - "RBF = pk.observables.RadialBasisFunction(\n", - " rbf_type=\"thinplate\",\n", - " n_centers=centers.shape[1],\n", - " centers=centers,\n", - " kernel_width=1,\n", - " polyharmonic_coeff=1.0,\n", - " include_state=True,\n", - ")\n", - "\n", - "model = pk.Koopman(observables=RBF, regressor=EDMDc)\n", - "model.fit(X, u=U, dt=env.dt)" + "DMDc = pk.regression.DMDc()\n", + "model = pk.Koopman(regressor=DMDc)\n", + "model.fit(X, u=U, dt=cart_env.dt)" + ] + }, + { + "cell_type": "markdown", + "id": "ef41acbf-7e9e-45b5-b366-0e041c51219d", + "metadata": {}, + "source": [ + "Once we fit the model we can access the linear state-space models matrices:" ] }, { "cell_type": "code", "execution_count": null, - "id": "1eebdf2c-17a5-4dab-85bf-0a1ad83ddefd", + "id": "c77d8a23-4f5a-4154-bbf9-446c274d893f", "metadata": {}, "outputs": [], "source": [ - "fig, axes = plt.subplots(2, 2, figsize=(10, 10))\n", - "axes = axes.ravel()\n", - "\n", - "axes[0].imshow(model.A, aspect=\"auto\", cmap=plt.get_cmap(\"magma\"))\n", - "axes[0].set(title=\"A\")\n", - "\n", - "axes[1].imshow(model.B, aspect=\"1\", cmap=plt.get_cmap(\"magma\"))\n", - "axes[1].set(title=\"B\")\n", - "\n", - "axes[2].imshow(model.C, aspect=\"auto\", cmap=plt.get_cmap(\"magma\"))\n", - "axes[2].set(title=\"C\")\n", - "\n", - "axes[3].imshow(np.real(model.W), aspect=\"auto\", cmap=plt.get_cmap(\"magma\"))\n", - "axes[3].set(title=r\"$\\mathcal{Re}(W)$\")\n", - "fig.tight_layout()" + "display_array(\"A\", model.A)\n", + "display_array(\"B\", model.B)\n", + "display_array(\"C\", model.C)" + ] + }, + { + "cell_type": "markdown", + "id": "b488a1c9-cc85-44cc-a3fc-8d61aaef119a", + "metadata": {}, + "source": [ + "After that we can use the model to simulate the system using the remaining data" ] }, { @@ -504,18 +757,19 @@ "outputs": [], "source": [ "t = np.arange(0, len(U))\n", - "fig, axs = plt.subplots(3, 1, sharex=True, tight_layout=True, figsize=(9, 6))\n", + "fig, axs = plt.subplots(3, 1, sharex=True, figsize=(16, 8))\n", "axs[0].plot(t, U, \"-k\")\n", "axs[0].set(ylabel=r\"$u$\")\n", "axs[1].plot(t, X[:, 0], \"-\", color=\"b\", label=\"True\")\n", - "axs[1].plot(t, Xkoop[:, 0], \"--r\", label=\"EDMDc\")\n", + "axs[1].plot(t, Xkoop[:, 0], \"--r\", label=\"DMDc\")\n", "axs[1].set(ylabel=r\"$x_1$\")\n", "axs[2].plot(t, X[:, 1], \"-\", color=\"b\", label=\"True\")\n", - "axs[2].plot(t, Xkoop[:, 1], \"--r\", label=\"EDMDc\")\n", + "axs[2].plot(t, Xkoop[:, 1], \"--r\", label=\"DMDc\")\n", "axs[2].set(ylabel=r\"$x_2$\", xlabel=r\"$t$\")\n", "axs[1].legend(loc=\"best\")\n", - "axs[1].set_ylim([-2.5, 2.5])\n", - "axs[2].set_ylim([-2.5, 2.5]);" + "axs[1].set_ylim([-8, 8])\n", + "axs[2].set_ylim([-8, 8])\n", + "fig.tight_layout();" ] }, { @@ -523,7 +777,7 @@ "id": "b827634c-9cf7-47e9-bc54-393190770c2d", "metadata": {}, "source": [ - "## Eigenvalues" + "#### Eigenvalues" ] }, { @@ -534,19 +788,70 @@ "outputs": [], "source": [ "eigval, eigvec = np.linalg.eig(model.A)\n", + "display_array(\"Eigenvalues\", eigval)\n", "fig, ax = plt.subplots()\n", "ax.plot(np.real(eigval), np.imag(eigval), \"o\", color=\"lightgrey\", label=\"DMDc\")\n", "ax.set(title=\"Eigenvalues\")\n", - "ax.legend()" + "ax.legend();" + ] + }, + { + "cell_type": "markdown", + "id": "5d0f5a15-4544-4001-8a59-d280bc21f929", + "metadata": {}, + "source": [ + ":::{exercise} Inverted Pendulum DMDc\n", + ":label: inverted-pendulum-dmdc\n", + "\n", + "Use the DMDc method to fit a model on the data collected from the inverted pendulum environment.\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "id": "391a6e1f-86c5-4511-b026-011f2385279e", + "metadata": {}, + "source": [ + ":::{solution} inverted-pendulum-dmdc\n", + ":::" ] }, { "cell_type": "code", "execution_count": null, - "id": "91412b8d-4973-4584-bf58-ea43bbd98efe", + "id": "06b83c62-8f91-4006-bfc7-9ec691e4e995", "metadata": {}, "outputs": [], - "source": [] + "source": [ + "# Your solution here" + ] + }, + { + "cell_type": "markdown", + "id": "3830d079-29d8-45ea-ad1b-c40d288ec7a7", + "metadata": {}, + "source": [ + ":::{solution} inverted-pendulum-dmdc\n", + ":class: dropdown\n", + "**Work in Progress**\n", + "\n", + "```{code-cell}\n", + "EDMDc = pk.regression.EDMDc()\n", + "centers = np.random.uniform(-1.5, 1.5, (4, 4))\n", + "RBF = pk.observables.RadialBasisFunction(\n", + " rbf_type=\"thinplate\",\n", + " n_centers=centers.shape[1],\n", + " centers=centers,\n", + " kernel_width=1,\n", + " polyharmonic_coeff=1.0,\n", + " include_state=True,\n", + ")\n", + "\n", + "model = pk.Koopman(observables=RBF, regressor=EDMDc)\n", + "model.fit(X, u=U, dt=env.dt)\n", + "```\n", + ":::" + ] } ], "metadata": { @@ -569,7 +874,7 @@ }, "scenes_data": { "active_scene": "Default Scene", - "init_scene": "", + "init_scene": "Default Scene", "scenes": [ "Default Scene" ] diff --git a/pyproject.toml b/pyproject.toml index f5eabf9..97d48d5 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -15,7 +15,7 @@ license = "CC-BY-SA 4.0" readme = "README.md" homepage = "https://github.com/aai-institute/tfl-training-ml-control" classifiers = [ - "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.10", ] exclude = ["test/*"] packages = [ @@ -27,7 +27,7 @@ include=["src/training_ml_control/assets"] python = "~3.10" ipywidgets = "^8.1.1" notebook = "^7.1.1" -jupyterlab = "^4.1.3" +jupyterlab = "^4.2.0" jupyterlab-myst = "^2.4.2" jupyterlab-lsp = "^5.1.0" jupyterlab-scenes = "^2.0.0" @@ -60,7 +60,7 @@ pytest-cov = "*" # Source https://github.com/psf/black#configuration-format include = "\\.pyi?$" line-length = 88 -target-version = ["py311"] +target-version = ["py310"] # Black-compatible settings for isort # See https://black.readthedocs.io/en/stable/compatible_configs.html diff --git a/src/training_ml_control/control.py b/src/training_ml_control/control.py index f4703ca..34f6e87 100644 --- a/src/training_ml_control/control.py +++ b/src/training_ml_control/control.py @@ -68,6 +68,8 @@ def build_mpc_controller( x_limits: NDArray, u_limits: NDArray, force_penalty: float, + *, + uncertainty_values: dict[str, NDArray] | None = None, ) -> MPC: mpc = MPC(model) mpc_params = { @@ -90,5 +92,8 @@ def build_mpc_controller( # lower and upper bounds of the input mpc.bounds["lower", "_u", "force"] = u_limits[0] mpc.bounds["upper", "_u", "force"] = u_limits[1] + # Parameter uncertainty + if uncertainty_values is not None: + mpc.set_uncertainty_values(**uncertainty_values) mpc.setup() return mpc diff --git a/src/training_ml_control/environments/grid_world.py b/src/training_ml_control/environments/grid_world.py index 02cf922..bea5751 100644 --- a/src/training_ml_control/environments/grid_world.py +++ b/src/training_ml_control/environments/grid_world.py @@ -303,13 +303,21 @@ def plot_grid_graph(G: nx.Graph, *, show_start_to_target_paths: bool = False) -> F.add_nodes_from((n, deepcopy(d)) for n, d in G.nodes.items()) start_node = G.start_node target_node = G.target_node - while True: - for path in nx.all_simple_paths(G, source=start_node, target=target_node): - n1, n2 = path[0], path[1] - for n1, n2 in itertools.pairwise(path): - if (n1, n2) in F.edges or (n2, n1) in F.edges: - continue - F.add_edge(n1, n2, weight=1) + for path in nx.all_simple_paths( + G.to_undirected(), source=start_node, target=target_node, cutoff=len(G) + ): + for n1, n2 in itertools.pairwise(path): + """ + if (n1, n2) in F.edges or (n2, n1) in F.edges: + continue + """ + F.add_edge(n1, n2, weight=1) + """ + F = nx.DiGraph() + for node, next_nodes in nx.bfs_successors(G.to_undirected(), G.start_node): + for next_node in next_nodes: + F.add_edge(node, next_node, weight=1) + """ else: F = G.copy().to_undirected() plt.figure(figsize=(12, 12)) @@ -383,7 +391,7 @@ def plot_grid_all_paths_graph(G: nx.DiGraph, *, show_solution: bool = False) -> for node in F.nodes: node_labels[node] = node[-1] - fig, ax = plt.subplots(figsize=(10, 10)) + plt.subplots(figsize=(10, 10)) nx.draw_networkx_nodes(F, pos, **node_options) nx.draw_networkx_labels(F, pos, font_size=8, labels=node_labels) diff --git a/src/training_ml_control/model.py b/src/training_ml_control/model.py index 5575cb8..45f6c1d 100644 --- a/src/training_ml_control/model.py +++ b/src/training_ml_control/model.py @@ -29,7 +29,7 @@ def build_cart_model(env: CartEnv) -> LinearModel: model = LinearModel("continuous") pos = model.set_variable(var_type="_x", var_name="position") dpos = model.set_variable(var_type="_x", var_name="velocity") - model.set_variable(var_type="_u", var_name="force") # Inertia + model.set_variable(var_type="_u", var_name="force") # Energy E_kin = 0.5 * dpos**2 model.set_expression("E_kinetic", E_kin) @@ -77,7 +77,9 @@ def build_inverted_pendulum_linear_model(env: InvertedPendulumEnv) -> LinearMode return model -def build_inverted_pendulum_nonlinear_model(env: InvertedPendulumEnv) -> Model: +def build_inverted_pendulum_nonlinear_model( + env: InvertedPendulumEnv, *, with_uncertainty: bool = False +) -> Model: g, l, m_p, m_c = env.gravity, env.length, env.masspole, env.masscart model = Model("continuous") @@ -86,6 +88,9 @@ def build_inverted_pendulum_nonlinear_model(env: InvertedPendulumEnv) -> Model: theta = model.set_variable(var_type="_x", var_name="theta") dtheta = model.set_variable(var_type="_x", var_name="dtheta") force = model.set_variable(var_type="_u", var_name="force") + if with_uncertainty: + # Uncertain parameters + m_p = model.set_variable("_p", "m_p") total_mass = m_c + m_p half_length = l / 2 diff --git a/src/training_ml_control/shortest_path_problem.py b/src/training_ml_control/shortest_path_problem.py index 02d3db2..50e569b 100644 --- a/src/training_ml_control/shortest_path_problem.py +++ b/src/training_ml_control/shortest_path_problem.py @@ -1,15 +1,82 @@ import itertools +from copy import deepcopy import matplotlib.pyplot as plt import networkx as nx __all__ = [ + "plot_optimality_principle_graph", "create_shortest_path_graph", "plot_shortest_path_graph", "plot_all_paths_graph", ] +def plot_optimality_principle_graph( + n_stages: int = 5, n_nodes_per_stage: int = 3 +) -> nx.DiGraph: + """Plots optimality principle graph.""" + G = nx.DiGraph() + G.add_node("initial_state") + G.add_node("final_state") + + previous_stage_nodes = ["initial_state"] + next_stage_nodes = [f"stage_0_node_{i}" for i in range(n_nodes_per_stage)] + + for stage in range(1, n_stages): + for previous_node in previous_stage_nodes: + for next_node in next_stage_nodes: + G.add_edge(previous_node, next_node) + previous_stage_nodes = next_stage_nodes + next_stage_nodes = [f"stage_{stage}_node_{i}" for i in range(n_nodes_per_stage)] + for previous_node in previous_stage_nodes: + G.add_edge(previous_node, "final_state") + + for layer, nodes in enumerate(nx.topological_generations(G)): + # `multipartite_layout` expects the layer as a node attribute, so add the + # numeric layer value as a node attribute + for node in nodes: + G.nodes[node]["layer"] = layer + + shortest_path = nx.shortest_path(G, source="initial_state", target="final_state") + shortest_path_edges = list(itertools.pairwise(shortest_path)) + + options = { + "node_size": 1000, + "edgecolors": "black", + "linewidths": 3, + } + + node_color = [] + for node in G.nodes: + if node == "initial_state": + node_color.append("lightgreen") + elif node == "final_state": + node_color.append("xkcd:light red") + elif node in shortest_path: + node_color.append("lightblue") + else: + node_color.append("white") + options["node_color"] = node_color + # Compute the multipartite_layout using the "layer" node attribute + pos = nx.multipartite_layout(G, subset_key="layer", scale=2, align="vertical") + plt.figure(figsize=(14, 8)) + nx.draw_networkx_nodes(G, pos, **options) + nx.draw_networkx_edges( + G, + pos, + edgelist=shortest_path_edges, + edge_color="red", + width=5, + ) + other_edges = [edge for edge in G.edges if edge not in shortest_path_edges] + nx.draw_networkx_edges(G, pos, edgelist=other_edges, edge_color="gray", width=1) + ax = plt.gca() + ax.margins(0.20) + plt.axis("off") + plt.show() + + def create_shortest_path_graph() -> nx.DiGraph: """Create shortest-path problem graph.""" G = nx.DiGraph()