kb autocommit

Jemoka · Feb 6, 2024 · 86c26c3 · 86c26c3
1 parent 92d8a42
commit 86c26c3
Show file tree

Hide file tree

Showing 8 changed files with 118 additions and 1 deletion.
diff --git a/content/posts/KBhmomdp.md b/content/posts/KBhmomdp.md
@@ -0,0 +1,45 @@
++++
+title = "MOMDP"
+author = ["Houjun Liu"]
+draft = false
++++
+
+[MOMDP]({{< relref "KBhmomdp.md" >}}) are [POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}})s where some parts of the state are fully observable.
+
+---
+
+
+## Motivation {#motivation}
+
+scaling up [POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}})s is ****really hard****: exponential [curse of dimensionality]({{< relref "KBhcurse_of_dimensionality.md" >}}). Even discretization will cause the number of beliefs to really blow up.
+
+**Some of the state isn't uncertain, some others are bounded uncertainty: this REDUCES scale a lot.**
+
+
+## Solving {#solving}
+
+Solving the algorithm uses [SARSOP]({{< relref "KBhsarsop.md" >}}), or any point-based system. Instead of sampling the full belief state, however, we sample from a tuple \\((x, b\_{y})\\), whereby \\(x\\) is the observable part and \\(b\_{y}\\) is the unobservable part.
+
+
+## How Exactly Tuple? {#how-exactly-tuple}
+
+
+### True Mixed Observability {#true-mixed-observability}
+
+Go about splitting about your space based on the true observability part. Say there are \\(10\\) states which are observable, you literally just initialize 10 sets of [alpha vector]({{< relref "KBhalpha_vector.md" >}})s to create:
+
+\begin{equation}
+V(x, b\_{y}) = \dots
+\end{equation}
+
+whereby all of your objectives and backup, etc., takes \\(x\\) your observable state as input. Then, during inference/backup looking at where you are.
+
+
+### Pseudo-Full Observability {#pseudo-full-observability}
+
+Train a fully observable model, and then use [belief]({{< relref "KBhbelief.md" >}})-weighted average during inference. This is where [QMDP]({{< relref "KBhqmdp.md" >}}) came from.
+
+
+### Bounded Uncertainty {#bounded-uncertainty}
+
+Throw away extra uncertainty, and hence leave only the region around your observed location.
diff --git a/content/posts/KBhmultithreading.md b/content/posts/KBhmultithreading.md
@@ -51,7 +51,7 @@ thread(myfunc, ref(myint));
 Remember: ref will ****SHARE MEMORY****, and you have no control over when the thread runs. So once a pointer is passed all bets are off in terms of what values things take on.
 
 
-## [process]({{< relref "KBhmultiprocessing.md#process" >}})es vs [thread](#thread)s {#process--kbhmultiprocessing-dot-md--es-vs-thread--orgdebbf3a--s}
+## [process]({{< relref "KBhmultiprocessing.md#process" >}})es vs [thread](#thread)s {#process--kbhmultiprocessing-dot-md--es-vs-thread--org78221a2--s}
 
 | Processes                                          | Threads                                     |
 |----------------------------------------------------|---------------------------------------------|
@@ -74,3 +74,5 @@ undesirable behavior caused by arbitrary execution order.
 we want [atomicity]({{< relref "KBhdistributed_algorithum.md#atomicity" >}}) in the code: we want entire data viewing + modification operations to not be interrupted---otherwise, you will generate race conditions.
 
 Recall: ****C++ statements themselves are not INHERENTLY autonomic****.
+
+we want to outline a "critical section" and ensure it doesn't get ran more than once.
diff --git a/content/posts/KBhpomdp_lite.md b/content/posts/KBhpomdp_lite.md
@@ -0,0 +1,32 @@
++++
+title = "POMDP-lite"
+author = ["Houjun Liu"]
+draft = false
++++
+
+What if our initial unobservable state never change or deterministically changing? For instance, say, for localization.
+
+(note, like a [MOMDP]({{< relref "KBhmomdp.md" >}}), you can tack on top any amount)
+
+
+## POMDP-lite {#pomdp-lite}
+
+-   \\(X\\) fully observable states
+-   \\(\theta\\) hidden parameter: finite amount of values \\(\theta\_{1 \dots N}\\)
+-   where \\(S = X \times \theta\\)
+
+we then assume conditional independence between \\(x\\) and \\(\theta\\). So: \\(T = P(x'|\theta, x, a)\\), where \\(P(\theta'|\theta,x,a) = 1\\) ("our hidden parameter is known or deterministically changing")
+
+
+## Solving {#solving}
+
+****Main Idea****: if that's the case, then we can split our models into a set of [MDP]({{< relref "KBhmarkov_decision_process.md" >}})s. Because \\(\theta\_{j}\\) change deterministically, we can have a MDP solved **ONLINE** over \\(X\\) and \\(T\\) for each possible initial \\(\theta\\). Then, you just take the believe over \\(\theta\\) and sample over the MDPs based on that belief.
+
+-   information gain
+-   exploration reward bonus, which encourages exploration (this helps coordinate)
+-   maintain a value \\(\xi(b,x,a)\\) which is the number of times b,x,a is visited---if it exceeds a number of times, clip reward bonus
+
+
+### Algorithm {#algorithm}
+
+{{< figure src="/ox-hugo/2024-02-06_09-54-45_screenshot.png" >}}
diff --git a/content/posts/KBhpomdps_index.md b/content/posts/KBhpomdps_index.md
@@ -15,6 +15,7 @@ a class about [POMDP]({{< relref "KBhpartially_observable_markov_decision_proces
 | Policy Graphs       | [Hansen]({{< relref "KBhhansen.md" >}}), [MCVI]({{< relref "KBhmcvi.md" >}}), [PGA]({{< relref "KBhpga.md" >}})                            |
 | Online Solvers      | [AEMS]({{< relref "KBhaems.md" >}}), [POMCP]({{< relref "KBhpomcp.md" >}}), [DESPOT]({{< relref "KBhdespot.md" >}})                        |
 | Moar Online Methods | [IS-DESPOT]({{< relref "KBhis_despot.md" >}}), [POMCPOW]({{< relref "KBhpomcpow.md" >}}), [AdaOPS]({{< relref "KBhadaops.md" >}})          |
+| POMDPish            | [MOMDP]({{< relref "KBhmomdp.md" >}}), [POMDP-lite]({{< relref "KBhpomdp_lite.md" >}}), [rho-POMDPs]({{< relref "KBhrho_pomdps.md" >}})    |
 
 
 ## Other Content {#other-content}

diff --git a/content/posts/KBhqmdp.md b/content/posts/KBhqmdp.md
@@ -15,3 +15,4 @@ This is going to give you a set of [alpha vector]({{< relref "KBhalpha_vector.md
 time complexity: \\(O(|S|^{2}|A|^{2})\\)
 
 you will note we don't ever actually use anything partially-observable in this. Once we get the [alpha vector]({{< relref "KBhalpha_vector.md" >}}), we need to use [one-step lookahead in POMDP]({{< relref "KBhalpha_vector.md#one-step-lookahead-in-pomdp" >}}) (which does use transitions) to actually turn this [alpha vector]({{< relref "KBhalpha_vector.md" >}}) into a policy, which then does create
+you
diff --git a/content/posts/KBhresearch_tips.md b/content/posts/KBhresearch_tips.md
@@ -107,3 +107,10 @@ minted, or---code executation---pythontex
 Transitions are hard: don't tap on a slide and go "woah"; pre-cache first sentence of each slide.
 
 Overview **AFTER** the motivation.
+
+
+## Reference Handling {#reference-handling}
+
+-   biblatex: bibtex with postprocessing the .tex
+-   sislstrings.bib: mykel's conference list for .bib
+-   JabRef
diff --git a/content/posts/KBhrho_pomdps.md b/content/posts/KBhrho_pomdps.md
@@ -0,0 +1,29 @@
++++
+title = "rho-POMDPs"
+author = ["Houjun Liu"]
+draft = false
++++
+
+[POMDP]({{< relref "KBhpartially_observable_markov_decision_process.md" >}})s to solve [Active Sensing Problem]({{< relref "KBhrho_pomdps.md" >}}): where **gathering information** is the explicit goal and not a means to do something.
+
+Directly reward the **reduction of uncertainty**: [belief]({{< relref "KBhbelief.md" >}})-based reward framework which you can just tack onto the existing solvers. To do this, we want to define some reward directly over the belief space which assigns rewards based on uncertainty reduction:
+
+\begin{equation}
+r(b,a) = \rho(b,a)
+\end{equation}
+
+and \\(\rho\\) likely includes some entropy/information.
+
+
+## $&rho;$-POMDPs adaption for Piece-Wise Linear Convex ([PWLC]({{< relref "KBhrho_pomdps.md" >}})) Objectives {#and-rho-pomdps-adaption-for-piece-wise-linear-convex--pwlc-kbhrho-pomdps-dot-md---objectives}
+
+\begin{equation}
+\rho(b,a) = \max\_{\alpha \in \Gamma} \mqty[\sum\_{s}^{} ??]
+\end{equation}
+
+We want to use \\(R\\) extra alpha-vectors to compute the value at a state.
+
+
+## non-[PWLC]({{< relref "KBhrho_pomdps.md" >}}) objectives {#non-pwlc--kbhrho-pomdps-dot-md--objectives}
+
+Certain stronger-than [Lipschitz Condition]({{< relref "KBhuniqueness_and_existance.md#lipschitz-condition" >}}) [continuity]({{< relref "KBhuniqueness_and_existance.md#continuity" >}}) on \\(\rho\\), we can use a modified version of the bellman updates.
diff --git a/static/ox-hugo/2024-02-06_09-54-45_screenshot.png b/static/ox-hugo/2024-02-06_09-54-45_screenshot.png