-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
118 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
+++ | ||
title = "MOMDP" | ||
author = ["Houjun Liu"] | ||
draft = false | ||
+++ | ||
|
||
[MOMDP]({{< relref "KBhmomdp.md" >}}) are [POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}})s where some parts of the state are fully observable. | ||
|
||
--- | ||
|
||
|
||
## Motivation {#motivation} | ||
|
||
scaling up [POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}})s is ****really hard****: exponential [curse of dimensionality]({{< relref "KBhcurse_of_dimensionality.md" >}}). Even discretization will cause the number of beliefs to really blow up. | ||
|
||
**Some of the state isn't uncertain, some others are bounded uncertainty: this REDUCES scale a lot.** | ||
|
||
|
||
## Solving {#solving} | ||
|
||
Solving the algorithm uses [SARSOP]({{< relref "KBhsarsop.md" >}}), or any point-based system. Instead of sampling the full belief state, however, we sample from a tuple \\((x, b\_{y})\\), whereby \\(x\\) is the observable part and \\(b\_{y}\\) is the unobservable part. | ||
|
||
|
||
## How Exactly Tuple? {#how-exactly-tuple} | ||
|
||
|
||
### True Mixed Observability {#true-mixed-observability} | ||
|
||
Go about splitting about your space based on the true observability part. Say there are \\(10\\) states which are observable, you literally just initialize 10 sets of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s to create: | ||
|
||
\begin{equation} | ||
V(x, b\_{y}) = \dots | ||
\end{equation} | ||
|
||
whereby all of your objectives and backup, etc., takes \\(x\\) your observable state as input. Then, during inference/backup looking at where you are. | ||
|
||
|
||
### Pseudo-Full Observability {#pseudo-full-observability} | ||
|
||
Train a fully observable model, and then use [belief]({{< relref "KBhbelief.md" >}})-weighted average during inference. This is where [QMDP]({{< relref "KBhqmdp.md" >}}) came from. | ||
|
||
|
||
### Bounded Uncertainty {#bounded-uncertainty} | ||
|
||
Throw away extra uncertainty, and hence leave only the region around your observed location. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
+++ | ||
title = "POMDP-lite" | ||
author = ["Houjun Liu"] | ||
draft = false | ||
+++ | ||
|
||
What if our initial unobservable state never change or deterministically changing? For instance, say, for localization. | ||
|
||
(note, like a [MOMDP]({{< relref "KBhmomdp.md" >}}), you can tack on top any amount) | ||
|
||
|
||
## POMDP-lite {#pomdp-lite} | ||
|
||
- \\(X\\) fully observable states | ||
- \\(\theta\\) hidden parameter: finite amount of values \\(\theta\_{1 \dots N}\\) | ||
- where \\(S = X \times \theta\\) | ||
|
||
we then assume conditional independence between \\(x\\) and \\(\theta\\). So: \\(T = P(x'|\theta, x, a)\\), where \\(P(\theta'|\theta,x,a) = 1\\) ("our hidden parameter is known or deterministically changing") | ||
|
||
|
||
## Solving {#solving} | ||
|
||
****Main Idea****: if that's the case, then we can split our models into a set of [MDP]({{< relref "KBhmarkov_decision_process.md" >}})s. Because \\(\theta\_{j}\\) change deterministically, we can have a MDP solved **ONLINE** over \\(X\\) and \\(T\\) for each possible initial \\(\theta\\). Then, you just take the believe over \\(\theta\\) and sample over the MDPs based on that belief. | ||
|
||
- information gain | ||
- exploration reward bonus, which encourages exploration (this helps coordinate) | ||
- maintain a value \\(\xi(b,x,a)\\) which is the number of times b,x,a is visited---if it exceeds a number of times, clip reward bonus | ||
|
||
|
||
### Algorithm {#algorithm} | ||
|
||
{{< figure src="/ox-hugo/2024-02-06_09-54-45_screenshot.png" >}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
+++ | ||
title = "rho-POMDPs" | ||
author = ["Houjun Liu"] | ||
draft = false | ||
+++ | ||
|
||
[POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}})s to solve [Active Sensing Problem]({{< relref "KBhrho_pomdps.md" >}}): where **gathering information** is the explicit goal and not a means to do something. | ||
|
||
Directly reward the **reduction of uncertainty**: [belief]({{< relref "KBhbelief.md" >}})-based reward framework which you can just tack onto the existing solvers. To do this, we want to define some reward directly over the belief space which assigns rewards based on uncertainty reduction: | ||
|
||
\begin{equation} | ||
r(b,a) = \rho(b,a) | ||
\end{equation} | ||
|
||
and \\(\rho\\) likely includes some entropy/information. | ||
|
||
|
||
## $ρ$-POMDPs adaption for Piece-Wise Linear Convex ([PWLC]({{< relref "KBhrho_pomdps.md" >}})) Objectives {#and-rho-pomdps-adaption-for-piece-wise-linear-convex--pwlc-kbhrho-pomdps-dot-md---objectives} | ||
|
||
\begin{equation} | ||
\rho(b,a) = \max\_{\alpha \in \Gamma} \mqty[\sum\_{s}^{} ??] | ||
\end{equation} | ||
|
||
We want to use \\(R\\) extra alpha-vectors to compute the value at a state. | ||
|
||
|
||
## non-[PWLC]({{< relref "KBhrho_pomdps.md" >}}) objectives {#non-pwlc--kbhrho-pomdps-dot-md--objectives} | ||
|
||
Certain stronger-than [Lipschitz Condition]({{< relref "KBhuniqueness_and_existance.md#lipschitz-condition" >}}) [continuity]({{< relref "KBhuniqueness_and_existance.md#continuity" >}}) on \\(\rho\\), we can use a modified version of the bellman updates. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.