kb autocommit

Jemoka · Nov 16, 2023 · aa6fa71 · aa6fa71
1 parent 116008f
commit aa6fa71
Show file tree

Hide file tree

Showing 7 changed files with 118 additions and 11 deletions.
diff --git a/content/posts/KBhalpha_vector.md b/content/posts/KBhalpha_vector.md
@@ -4,12 +4,101 @@ author = ["Houjun Liu"]
 draft = false
 +++
 
+Recall, from [conditional plan evaluation]({{< relref "KBhconditional_plan.md#conditional-plan--kbhconditional-plan-dot-md--evaluation" >}}), we had that:
+
+\begin{equation}
+U^{\pi}(b) = \sum\_{s}^{} b(s) U^{\pi}(s)
+\end{equation}
+
+let's write it as:
+
 \begin{equation}
 U^{\pi}(b) = \sum\_{s}^{} b(s) U^{\pi}(s) = {\alpha\_{\pi}}^{\top} b
 \end{equation}
 
-where \\(\alpha\_{\pi}\\) is the [utility of the policy]({{< relref "KBhpolicy_evaluation.md" >}}) at each actual state.
+where \\(\U\_{\pi}(s)\\) is the [conditional plan evaluation]({{< relref "KBhconditional_plan.md#conditional-plan--kbhconditional-plan-dot-md--evaluation" >}}) starting at each of the initial states.
 
 \begin{equation}
 \alpha\_{\pi} = \qty[ U^{\pi}(s\_1), U^{\pi}(s\_2) ]
 \end{equation}
+
+You will notice, then the [utility]({{< relref "KBhutility_theory.md" >}}) of \\(b\\) is linear on \\(b\\) for different policies \\(\alpha\_{\pi}\\):
+
+{{< figure src="/ox-hugo/2023-11-16_09-23-10_screenshot.png" >}}
+
+At every belief \\(b\\), there is a policy which has the highest \\(U(b)\\) at that \\(b\\) given be the [alpha vector]({{< relref "KBhalpha_vector.md" >}}) formulation.
+
+
+## Additional Information {#additional-information}
+
+
+### top action {#top-action}
+
+you can just represent a [policy]({{< relref "KBhpolicy.md" >}}) out of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s by taking the top (root) action of the [conditional plan]({{< relref "KBhconditional_plan.md" >}}) with the [alpha vector]({{< relref "KBhalpha_vector.md" >}}) on top.
+
+
+### [optimal value function for POMDP]({{< relref "KBhconditional_plan.md#for" >}}) with [alpha vector]({{< relref "KBhalpha_vector.md" >}}) {#optimal-value-function-for-pomdp--kbhconditional-plan-dot-md--with-alpha-vector--kbhalpha-vector-dot-md}
+
+Recall:
+
+\begin{equation}
+U^{\*}(b) = \max\_{\pi} U^{\pi}(b) = \max\_{\pi} \alpha\_{\pi}^{\top}b
+\end{equation}
+
+NOTE! This function (look at the chart above from \\(b\\) to \\(u\\)) is:
+
+1.  piecewise linear
+2.  convex (because the "best" (highest) line) is always curving up
+
+
+### one-step lookahead in POMDP {#one-step-lookahead-in-pomdp}
+
+Say you want to extract a [policy]({{< relref "KBhpolicy.md" >}}) out of a bunch of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s.
+
+Let \\(\alpha \in \Gamma\\), a set of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s.
+
+\begin{equation}
+\pi^{\Gamma}(b) = \arg\max\_{a}\qty[R(b,a)+\gamma \qty(\sum\_{o}^{}P(o|b,a) U^{\Gamma}(update(b,a,o)))]
+\end{equation}
+
+where:
+
+\begin{equation}
+R(b,a) = \sum\_{s}^{} R(s,a)b(s)
+\end{equation}
+
+\begin{align}
+P(o|b,a) &= \sum\_{s}^{} p(o|s,a) b(s)  \\\\
+&= \sum\_{s}^{} \sum\_{s'}^{} T(s'|s,a) O(o|s',a) b(s)
+\end{align}
+
+
+### [alpha vector]({{< relref "KBhalpha_vector.md" >}}) pruning {#alpha-vector--kbhalpha-vector-dot-md--pruning}
+
+Say we had as set of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s \\(\Gamma\\):
+
+{{< figure src="/ox-hugo/2023-11-16_09-40-27_screenshot.png" >}}
+
+\\(\alpha\_{3}\\) isn't all that useful here. So we ask:
+
+"Is alpha dominated by some \\(\alpha\_{i}\\) everywhere?"
+
+We formulate this question in terms of a linear program:
+
+\begin{equation}
+\max\_{\delta, b} \delta
+\end{equation}
+
+where \\(\delta\\) is the gap between \\(\alpha\\) and the [utility]({{< relref "KBhutility_theory.md" >}}) o
+
+subject to:
+
+\begin{align}
+&1^{\top} b = 1\ \text{(b adds up to 1)} \\\\
+& b\geq 0 \\\\
+& \alpha^{\top} b \geq \alpha'^{\top} b + \delta, \forall \alpha' \in \Gamma
+\end{align}
+
+if \\(\delta < 0\\), then we can prune \\(\alpha\\) because it had been dominated.
+
+if each value on the top of the set
diff --git a/content/posts/KBhbelief.md b/content/posts/KBhbelief.md
@@ -42,6 +42,6 @@ there is some model which is a probability distribution over the state given obs
 let orange \\(d\\) be state, the green would be the [error model](#error-model)
 
 
-### [filters]({{< relref "KBhfilters.md#filter" >}}) {#filters--kbhfilters-dot-md}
+### [filters]({{< relref "KBhfilters.md" >}}) {#filters--kbhfilters-dot-md}
 
-[filters]({{< relref "KBhfilters.md#filter" >}}) are how [belief]({{< relref "KBhbelief.md" >}})s are updated from observation. "we want to perform localization"
+[filters]({{< relref "KBhfilters.md" >}}) are how [belief]({{< relref "KBhbelief.md" >}})s are updated from observation. "we want to perform localization"
diff --git a/content/posts/KBhconditional_plan.md b/content/posts/KBhconditional_plan.md
@@ -60,4 +60,4 @@ Of course, trying to actually do this is impossible because you have to iterate
 
 This is practically untenable, because the space of \\(\pi\\) is wayyy too big. Hence, we turn to [alpha vector]({{< relref "KBhalpha_vector.md" >}})s.
 
-See also [optimal value function for POMDP with alpha vector]({{< relref "KBhalpha_vector.md#optimal-value-function-for-pomdp--org8a609fc--with-alpha-vector--kbhalpha-vector-dot-md" >}})
+See also [optimal value function for POMDP with alpha vector]({{< relref "KBhalpha_vector.md#id-a2dee193-65b1-47ed-8dbc-aa362b28b451-optimal-value-function-for-pomdp-with-id-a11af4cf-7e36-4b3f-876f-e6a26cf6817e-alpha-vector" >}})
diff --git a/content/posts/KBhfilters.md b/content/posts/KBhfilters.md
@@ -119,12 +119,12 @@ K \leftarrow \Sigma\_{p} O\_{s}^{T} (O\_{s}\Sigma\_{p}O\_{s}^{T}+\Sigma\_{O})^{-
 ### Additional Information {#additional-information}
 
 
-#### Extended [Kalman Filter](#kalman-filter) {#extended-kalman-filter--org669cd69}
+#### Extended [Kalman Filter](#kalman-filter) {#extended-kalman-filter--orgc52f45b}
 
 [Kalman Filter](#kalman-filter), but no linearity through the jacobian
 
 
-#### Unscented [Kalman Filter](#kalman-filter) {#unscented-kalman-filter--org669cd69}
+#### Unscented [Kalman Filter](#kalman-filter) {#unscented-kalman-filter--orgc52f45b}
 
 ?
 

diff --git a/content/posts/KBhpartially_observable_markov_decision_process.md b/content/posts/KBhpartially_observable_markov_decision_process.md
@@ -16,13 +16,31 @@ draft = false
 
 ## policy representations {#policy-representations}
 
-"what do we do knowing where we are (ish, as its a distribution?)?"
+"how do we use a policy"
 
--   [belief-state MDP]({{< relref "KBhbelief_state_mdp.md" >}})
 -   [conditional plan]({{< relref "KBhconditional_plan.md" >}})
--   [alpha vector]({{< relref "KBhalpha_vector.md" >}})
+-   [alpha vector]({{< relref "KBhalpha_vector.md" >}}) + [one-step lookahead in POMDP]({{< relref "KBhalpha_vector.md#one-step-lookahead-in-pomdp" >}})
+-   [alpha vector]({{< relref "KBhalpha_vector.md" >}}) + just take the top action of the conditional plan the alpha-vector was computed from
+
+
+## policy evaluations {#policy-evaluations}
+
+[conditional plan evaluation]({{< relref "KBhconditional_plan.md#conditional-plan--kbhconditional-plan-dot-md--evaluation" >}})
 
 
 ## policy solutions {#policy-solutions}
 
-"how do we make that better?"
+"how do we make that policy better?"
+
+
+### exact solutions {#exact-solutions}
+
+-   [optimal value function for POMDP]({{< relref "KBhconditional_plan.md#for-pomdp--kbhpartially-observable-markov-decision-process-dot-md" >}})
+-   [POMDP value-iteration]({{< relref "KBhvalue_iteration.md#pomdp--kbhpartially-observable-markov-decision-process-dot-md--value-iteration" >}})
+
+
+### approximate solutions {#approximate-solutions}
+
+-   estimate an [alpha vector]({{< relref "KBhalpha_vector.md" >}}), and then use a policy representation:
+    -   [QMDP]({{< relref "KBhqmdp.md" >}})
+    -   [FIB]({{< relref "KBhfast_informed_bound.md" >}})
diff --git a/content/posts/KBhvalue_iteration.md b/content/posts/KBhvalue_iteration.md
@@ -69,6 +69,6 @@ where \\(S\\) is the number of states and \\(A\\) the number of actions.
 ## [POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}}) value-iteration {#pomdp--kbhpartially-observable-markov-decision-process-dot-md--value-iteration}
 
 1.  compute [alpha vector]({{< relref "KBhalpha_vector.md" >}})s for all one-step plans (i.e. [conditional plan]({{< relref "KBhconditional_plan.md" >}})s that does just one action and gives up)
-2.  [alpha vector pruning]({{< relref "KBhalpha_vector.md#alpha-vector--kbhalpha-vector-dot-md--pruning" >}}) on any plans that are dominated
+2.  [alpha vector pruning]({{< relref "KBhalpha_vector.md#id-a11af4cf-7e36-4b3f-876f-e6a26cf6817e-alpha-vector-pruning" >}}) on any plans that are dominated
 3.  generate all possible two-step [conditional plan]({{< relref "KBhconditional_plan.md" >}})s over all actions using combinations of non-pruned one-step plans above as ****SUBPLANS**** (yes, you can use a one-step plan twice)
 4.  repeat steps 2-3
diff --git a/static/ox-hugo/2023-11-16_09-40-27_screenshot.png b/static/ox-hugo/2023-11-16_09-40-27_screenshot.png