Skip to content

Commit

Permalink
kb autocommit
Browse files Browse the repository at this point in the history
  • Loading branch information
Jemoka committed Nov 16, 2023
1 parent 116008f commit aa6fa71
Show file tree
Hide file tree
Showing 7 changed files with 118 additions and 11 deletions.
91 changes: 90 additions & 1 deletion content/posts/KBhalpha_vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,101 @@ author = ["Houjun Liu"]
draft = false
+++

Recall, from [conditional plan evaluation]({{< relref "KBhconditional_plan.md#conditional-plan--kbhconditional-plan-dot-md--evaluation" >}}), we had that:

\begin{equation}
U^{\pi}(b) = \sum\_{s}^{} b(s) U^{\pi}(s)
\end{equation}

let's write it as:

\begin{equation}
U^{\pi}(b) = \sum\_{s}^{} b(s) U^{\pi}(s) = {\alpha\_{\pi}}^{\top} b
\end{equation}

where \\(\alpha\_{\pi}\\) is the [utility of the policy]({{< relref "KBhpolicy_evaluation.md" >}}) at each actual state.
where \\(\U\_{\pi}(s)\\) is the [conditional plan evaluation]({{< relref "KBhconditional_plan.md#conditional-plan--kbhconditional-plan-dot-md--evaluation" >}}) starting at each of the initial states.

\begin{equation}
\alpha\_{\pi} = \qty[ U^{\pi}(s\_1), U^{\pi}(s\_2) ]
\end{equation}

You will notice, then the [utility]({{< relref "KBhutility_theory.md" >}}) of \\(b\\) is linear on \\(b\\) for different policies \\(\alpha\_{\pi}\\):

{{< figure src="/ox-hugo/2023-11-16_09-23-10_screenshot.png" >}}

At every belief \\(b\\), there is a policy which has the highest \\(U(b)\\) at that \\(b\\) given be the [alpha vector]({{< relref "KBhalpha_vector.md" >}}) formulation.


## Additional Information {#additional-information}


### top action {#top-action}

you can just represent a [policy]({{< relref "KBhpolicy.md" >}}) out of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s by taking the top (root) action of the [conditional plan]({{< relref "KBhconditional_plan.md" >}}) with the [alpha vector]({{< relref "KBhalpha_vector.md" >}}) on top.


### [optimal value function for POMDP]({{< relref "KBhconditional_plan.md#for" >}}) with [alpha vector]({{< relref "KBhalpha_vector.md" >}}) {#optimal-value-function-for-pomdp--kbhconditional-plan-dot-md--with-alpha-vector--kbhalpha-vector-dot-md}

Recall:

\begin{equation}
U^{\*}(b) = \max\_{\pi} U^{\pi}(b) = \max\_{\pi} \alpha\_{\pi}^{\top}b
\end{equation}

NOTE! This function (look at the chart above from \\(b\\) to \\(u\\)) is:

1. piecewise linear
2. convex (because the "best" (highest) line) is always curving up


### one-step lookahead in POMDP {#one-step-lookahead-in-pomdp}

Say you want to extract a [policy]({{< relref "KBhpolicy.md" >}}) out of a bunch of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s.

Let \\(\alpha \in \Gamma\\), a set of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s.

\begin{equation}
\pi^{\Gamma}(b) = \arg\max\_{a}\qty[R(b,a)+\gamma \qty(\sum\_{o}^{}P(o|b,a) U^{\Gamma}(update(b,a,o)))]
\end{equation}

where:

\begin{equation}
R(b,a) = \sum\_{s}^{} R(s,a)b(s)
\end{equation}

\begin{align}
P(o|b,a) &= \sum\_{s}^{} p(o|s,a) b(s) \\\\
&= \sum\_{s}^{} \sum\_{s'}^{} T(s'|s,a) O(o|s',a) b(s)
\end{align}


### [alpha vector]({{< relref "KBhalpha_vector.md" >}}) pruning {#alpha-vector--kbhalpha-vector-dot-md--pruning}

Say we had as set of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s \\(\Gamma\\):

{{< figure src="/ox-hugo/2023-11-16_09-40-27_screenshot.png" >}}

\\(\alpha\_{3}\\) isn't all that useful here. So we ask:

"Is alpha dominated by some \\(\alpha\_{i}\\) everywhere?"

We formulate this question in terms of a linear program:

\begin{equation}
\max\_{\delta, b} \delta
\end{equation}

where \\(\delta\\) is the gap between \\(\alpha\\) and the [utility]({{< relref "KBhutility_theory.md" >}}) o

subject to:

\begin{align}
&1^{\top} b = 1\ \text{(b adds up to 1)} \\\\
& b\geq 0 \\\\
& \alpha^{\top} b \geq \alpha'^{\top} b + \delta, \forall \alpha' \in \Gamma
\end{align}

if \\(\delta < 0\\), then we can prune \\(\alpha\\) because it had been dominated.

if each value on the top of the set
4 changes: 2 additions & 2 deletions content/posts/KBhbelief.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,6 @@ there is some model which is a probability distribution over the state given obs
let orange \\(d\\) be state, the green would be the [error model](#error-model)


### [filters]({{< relref "KBhfilters.md#filter" >}}) {#filters--kbhfilters-dot-md}
### [filters]({{< relref "KBhfilters.md" >}}) {#filters--kbhfilters-dot-md}

[filters]({{< relref "KBhfilters.md#filter" >}}) are how [belief]({{< relref "KBhbelief.md" >}})s are updated from observation. "we want to perform localization"
[filters]({{< relref "KBhfilters.md" >}}) are how [belief]({{< relref "KBhbelief.md" >}})s are updated from observation. "we want to perform localization"
2 changes: 1 addition & 1 deletion content/posts/KBhconditional_plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,4 @@ Of course, trying to actually do this is impossible because you have to iterate

This is practically untenable, because the space of \\(\pi\\) is wayyy too big. Hence, we turn to [alpha vector]({{< relref "KBhalpha_vector.md" >}})s.

See also [optimal value function for POMDP with alpha vector]({{< relref "KBhalpha_vector.md#optimal-value-function-for-pomdp--org8a609fc--with-alpha-vector--kbhalpha-vector-dot-md" >}})
See also [optimal value function for POMDP with alpha vector]({{< relref "KBhalpha_vector.md#id-a2dee193-65b1-47ed-8dbc-aa362b28b451-optimal-value-function-for-pomdp-with-id-a11af4cf-7e36-4b3f-876f-e6a26cf6817e-alpha-vector" >}})
4 changes: 2 additions & 2 deletions content/posts/KBhfilters.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,12 +119,12 @@ K \leftarrow \Sigma\_{p} O\_{s}^{T} (O\_{s}\Sigma\_{p}O\_{s}^{T}+\Sigma\_{O})^{-
### Additional Information {#additional-information}


#### Extended [Kalman Filter](#kalman-filter) {#extended-kalman-filter--org669cd69}
#### Extended [Kalman Filter](#kalman-filter) {#extended-kalman-filter--orgc52f45b}

[Kalman Filter](#kalman-filter), but no linearity through the jacobian


#### Unscented [Kalman Filter](#kalman-filter) {#unscented-kalman-filter--org669cd69}
#### Unscented [Kalman Filter](#kalman-filter) {#unscented-kalman-filter--orgc52f45b}

?

Expand Down
26 changes: 22 additions & 4 deletions content/posts/KBhpartially_observable_markov_decision_process.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,31 @@ draft = false

## policy representations {#policy-representations}

"what do we do knowing where we are (ish, as its a distribution?)?"
"how do we use a policy"

- [belief-state MDP]({{< relref "KBhbelief_state_mdp.md" >}})
- [conditional plan]({{< relref "KBhconditional_plan.md" >}})
- [alpha vector]({{< relref "KBhalpha_vector.md" >}})
- [alpha vector]({{< relref "KBhalpha_vector.md" >}}) + [one-step lookahead in POMDP]({{< relref "KBhalpha_vector.md#one-step-lookahead-in-pomdp" >}})
- [alpha vector]({{< relref "KBhalpha_vector.md" >}}) + just take the top action of the conditional plan the alpha-vector was computed from


## policy evaluations {#policy-evaluations}

[conditional plan evaluation]({{< relref "KBhconditional_plan.md#conditional-plan--kbhconditional-plan-dot-md--evaluation" >}})


## policy solutions {#policy-solutions}

"how do we make that better?"
"how do we make that policy better?"


### exact solutions {#exact-solutions}

- [optimal value function for POMDP]({{< relref "KBhconditional_plan.md#for-pomdp--kbhpartially-observable-markov-decision-process-dot-md" >}})
- [POMDP value-iteration]({{< relref "KBhvalue_iteration.md#pomdp--kbhpartially-observable-markov-decision-process-dot-md--value-iteration" >}})


### approximate solutions {#approximate-solutions}

- estimate an [alpha vector]({{< relref "KBhalpha_vector.md" >}}), and then use a policy representation:
- [QMDP]({{< relref "KBhqmdp.md" >}})
- [FIB]({{< relref "KBhfast_informed_bound.md" >}})
2 changes: 1 addition & 1 deletion content/posts/KBhvalue_iteration.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,6 @@ where \\(S\\) is the number of states and \\(A\\) the number of actions.
## [POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}}) value-iteration {#pomdp--kbhpartially-observable-markov-decision-process-dot-md--value-iteration}

1. compute [alpha vector]({{< relref "KBhalpha_vector.md" >}})s for all one-step plans (i.e. [conditional plan]({{< relref "KBhconditional_plan.md" >}})s that does just one action and gives up)
2. [alpha vector pruning]({{< relref "KBhalpha_vector.md#alpha-vector--kbhalpha-vector-dot-md--pruning" >}}) on any plans that are dominated
2. [alpha vector pruning]({{< relref "KBhalpha_vector.md#id-a11af4cf-7e36-4b3f-876f-e6a26cf6817e-alpha-vector-pruning" >}}) on any plans that are dominated
3. generate all possible two-step [conditional plan]({{< relref "KBhconditional_plan.md" >}})s over all actions using combinations of non-pruned one-step plans above as ****SUBPLANS**** (yes, you can use a one-step plan twice)
4. repeat steps 2-3
Binary file added static/ox-hugo/2023-11-16_09-40-27_screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit aa6fa71

Please sign in to comment.