Skip to content

Commit

Permalink
kb autocommit
Browse files Browse the repository at this point in the history
  • Loading branch information
Jemoka committed Feb 6, 2024
1 parent 92d8a42 commit 86c26c3
Show file tree
Hide file tree
Showing 8 changed files with 118 additions and 1 deletion.
45 changes: 45 additions & 0 deletions content/posts/KBhmomdp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
+++
title = "MOMDP"
author = ["Houjun Liu"]
draft = false
+++

[MOMDP]({{< relref "KBhmomdp.md" >}}) are [POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}})s where some parts of the state are fully observable.

---


## Motivation {#motivation}

scaling up [POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}})s is ****really hard****: exponential [curse of dimensionality]({{< relref "KBhcurse_of_dimensionality.md" >}}). Even discretization will cause the number of beliefs to really blow up.

**Some of the state isn't uncertain, some others are bounded uncertainty: this REDUCES scale a lot.**


## Solving {#solving}

Solving the algorithm uses [SARSOP]({{< relref "KBhsarsop.md" >}}), or any point-based system. Instead of sampling the full belief state, however, we sample from a tuple \\((x, b\_{y})\\), whereby \\(x\\) is the observable part and \\(b\_{y}\\) is the unobservable part.


## How Exactly Tuple? {#how-exactly-tuple}


### True Mixed Observability {#true-mixed-observability}

Go about splitting about your space based on the true observability part. Say there are \\(10\\) states which are observable, you literally just initialize 10 sets of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s to create:

\begin{equation}
V(x, b\_{y}) = \dots
\end{equation}

whereby all of your objectives and backup, etc., takes \\(x\\) your observable state as input. Then, during inference/backup looking at where you are.


### Pseudo-Full Observability {#pseudo-full-observability}

Train a fully observable model, and then use [belief]({{< relref "KBhbelief.md" >}})-weighted average during inference. This is where [QMDP]({{< relref "KBhqmdp.md" >}}) came from.


### Bounded Uncertainty {#bounded-uncertainty}

Throw away extra uncertainty, and hence leave only the region around your observed location.
4 changes: 3 additions & 1 deletion content/posts/KBhmultithreading.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ thread(myfunc, ref(myint));
Remember: ref will ****SHARE MEMORY****, and you have no control over when the thread runs. So once a pointer is passed all bets are off in terms of what values things take on.
## [process]({{< relref "KBhmultiprocessing.md#process" >}})es vs [thread](#thread)s {#process--kbhmultiprocessing-dot-md--es-vs-thread--orgdebbf3a--s}
## [process]({{< relref "KBhmultiprocessing.md#process" >}})es vs [thread](#thread)s {#process--kbhmultiprocessing-dot-md--es-vs-thread--org78221a2--s}
| Processes | Threads |
|----------------------------------------------------|---------------------------------------------|
Expand All @@ -74,3 +74,5 @@ undesirable behavior caused by arbitrary execution order.
we want [atomicity]({{< relref "KBhdistributed_algorithum.md#atomicity" >}}) in the code: we want entire data viewing + modification operations to not be interrupted---otherwise, you will generate race conditions.
Recall: ****C++ statements themselves are not INHERENTLY autonomic****.
we want to outline a "critical section" and ensure it doesn't get ran more than once.
32 changes: 32 additions & 0 deletions content/posts/KBhpomdp_lite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
+++
title = "POMDP-lite"
author = ["Houjun Liu"]
draft = false
+++

What if our initial unobservable state never change or deterministically changing? For instance, say, for localization.

(note, like a [MOMDP]({{< relref "KBhmomdp.md" >}}), you can tack on top any amount)


## POMDP-lite {#pomdp-lite}

- \\(X\\) fully observable states
- \\(\theta\\) hidden parameter: finite amount of values \\(\theta\_{1 \dots N}\\)
- where \\(S = X \times \theta\\)

we then assume conditional independence between \\(x\\) and \\(\theta\\). So: \\(T = P(x'|\theta, x, a)\\), where \\(P(\theta'|\theta,x,a) = 1\\) ("our hidden parameter is known or deterministically changing")


## Solving {#solving}

****Main Idea****: if that's the case, then we can split our models into a set of [MDP]({{< relref "KBhmarkov_decision_process.md" >}})s. Because \\(\theta\_{j}\\) change deterministically, we can have a MDP solved **ONLINE** over \\(X\\) and \\(T\\) for each possible initial \\(\theta\\). Then, you just take the believe over \\(\theta\\) and sample over the MDPs based on that belief.

- information gain
- exploration reward bonus, which encourages exploration (this helps coordinate)
- maintain a value \\(\xi(b,x,a)\\) which is the number of times b,x,a is visited---if it exceeds a number of times, clip reward bonus


### Algorithm {#algorithm}

{{< figure src="/ox-hugo/2024-02-06_09-54-45_screenshot.png" >}}
1 change: 1 addition & 0 deletions content/posts/KBhpomdps_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ a class about [POMDP]({{< relref "KBhpartially_observable_markov_decision_proces
| Policy Graphs | [Hansen]({{< relref "KBhhansen.md" >}}), [MCVI]({{< relref "KBhmcvi.md" >}}), [PGA]({{< relref "KBhpga.md" >}}) |
| Online Solvers | [AEMS]({{< relref "KBhaems.md" >}}), [POMCP]({{< relref "KBhpomcp.md" >}}), [DESPOT]({{< relref "KBhdespot.md" >}}) |
| Moar Online Methods | [IS-DESPOT]({{< relref "KBhis_despot.md" >}}), [POMCPOW]({{< relref "KBhpomcpow.md" >}}), [AdaOPS]({{< relref "KBhadaops.md" >}}) |
| POMDPish | [MOMDP]({{< relref "KBhmomdp.md" >}}), [POMDP-lite]({{< relref "KBhpomdp_lite.md" >}}), [rho-POMDPs]({{< relref "KBhrho_pomdps.md" >}}) |


## Other Content {#other-content}
Expand Down
1 change: 1 addition & 0 deletions content/posts/KBhqmdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ This is going to give you a set of [alpha vector]({{< relref "KBhalpha_vector.md
time complexity: \\(O(|S|^{2}|A|^{2})\\)

you will note we don't ever actually use anything partially-observable in this. Once we get the [alpha vector]({{< relref "KBhalpha_vector.md" >}}), we need to use [one-step lookahead in POMDP]({{< relref "KBhalpha_vector.md#one-step-lookahead-in-pomdp" >}}) (which does use transitions) to actually turn this [alpha vector]({{< relref "KBhalpha_vector.md" >}}) into a policy, which then does create
you
7 changes: 7 additions & 0 deletions content/posts/KBhresearch_tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,3 +107,10 @@ minted, or---code executation---pythontex
Transitions are hard: don't tap on a slide and go "woah"; pre-cache first sentence of each slide.

Overview **AFTER** the motivation.


## Reference Handling {#reference-handling}

- biblatex: bibtex with postprocessing the .tex
- sislstrings.bib: mykel's conference list for .bib
- JabRef
29 changes: 29 additions & 0 deletions content/posts/KBhrho_pomdps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
+++
title = "rho-POMDPs"
author = ["Houjun Liu"]
draft = false
+++

[POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}})s to solve [Active Sensing Problem]({{< relref "KBhrho_pomdps.md" >}}): where **gathering information** is the explicit goal and not a means to do something.

Directly reward the **reduction of uncertainty**: [belief]({{< relref "KBhbelief.md" >}})-based reward framework which you can just tack onto the existing solvers. To do this, we want to define some reward directly over the belief space which assigns rewards based on uncertainty reduction:

\begin{equation}
r(b,a) = \rho(b,a)
\end{equation}

and \\(\rho\\) likely includes some entropy/information.


## $&rho;$-POMDPs adaption for Piece-Wise Linear Convex ([PWLC]({{< relref "KBhrho_pomdps.md" >}})) Objectives {#and-rho-pomdps-adaption-for-piece-wise-linear-convex--pwlc-kbhrho-pomdps-dot-md---objectives}

\begin{equation}
\rho(b,a) = \max\_{\alpha \in \Gamma} \mqty[\sum\_{s}^{} ??]
\end{equation}

We want to use \\(R\\) extra alpha-vectors to compute the value at a state.


## non-[PWLC]({{< relref "KBhrho_pomdps.md" >}}) objectives {#non-pwlc--kbhrho-pomdps-dot-md--objectives}

Certain stronger-than [Lipschitz Condition]({{< relref "KBhuniqueness_and_existance.md#lipschitz-condition" >}}) [continuity]({{< relref "KBhuniqueness_and_existance.md#continuity" >}}) on \\(\rho\\), we can use a modified version of the bellman updates.
Binary file added static/ox-hugo/2024-02-06_09-54-45_screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 86c26c3

Please sign in to comment.