kb autocommit

Jemoka · Feb 13, 2024 · b182702 · b182702
1 parent 1958cce
commit b182702
Show file tree

Hide file tree

Showing 11 changed files with 211 additions and 0 deletions.
diff --git a/content/posts/KBhltrdp.md b/content/posts/KBhltrdp.md
@@ -0,0 +1,32 @@
++++
+title = "LRTDP"
+author = ["Houjun Liu"]
+draft = false
++++
+
+## Real-Time Dynamic Programming {#real-time-dynamic-programming}
+
+[RTDP](#real-time-dynamic-programming) is a asynchronous value iteration scheme. Each [RTDP](#real-time-dynamic-programming) trial is a result of:
+
+\begin{equation}
+V(s) = \min\_{ia \in A(s)} c(a,s) + \sum\_{s' \in S}^{} P\_{a}(s'|s)V(s)
+\end{equation}
+
+the algorithm halts when the residuals are sufficiently small.
+
+
+## Labeled [RTDP](#real-time-dynamic-programming) {#labeled-rtdp--org9a279ff}
+
+We want to label converged states so we don't need to keep investigate it:
+
+a state is **solved** if:
+
+-   state has less then \\(\epsilon\\)
+-   all reachable states given \\(s'\\) from this state has residual lower than \\(\epsilon\\)
+
+
+### Labelled RTDP {#labelled-rtdp}
+
+{{< figure src="/ox-hugo/2024-02-13_10-11-32_screenshot.png" >}}
+
+We stochastically simulate one step forward, and until a state we haven't marked as "solved" is met, then we simulate forward and value iterate
diff --git a/content/posts/KBhmaxq.md b/content/posts/KBhmaxq.md
@@ -0,0 +1,55 @@
++++
+title = "MaxQ"
+author = ["Houjun Liu"]
+draft = false
++++
+
+## Two Abstractions {#two-abstractions}
+
+-   "temporal abstractions": making decisions without consideration / abstracting away time ([MDP]({{< relref "KBhmarkov_decision_process.md" >}}))
+-   "state abstractions": making decisions about groups of states at once
+
+
+## Graph {#graph}
+
+[MaxQ]({{< relref "KBhmaxq.md" >}}) formulates a policy as a graph, which formulates a set of \\(n\\) policies
+
+{{< figure src="/ox-hugo/2024-02-13_09-50-20_screenshot.png" >}}
+
+
+### Max Node {#max-node}
+
+This is a "policy node", connected to a series of \\(Q\\) nodes from which it takes the max and propegate down. If we are at a leaf max-node, the actual action is taken and control is passed back t to the top of the graph
+
+
+### Q Node {#q-node}
+
+each node computes \\(Q(S,A)\\) for a value at that action
+
+
+## Hierachical Value Function {#hierachical-value-function}
+
+{{< figure src="/ox-hugo/2024-02-13_09-51-27_screenshot.png" >}}
+
+\begin{equation}
+Q(s,a) = V\_{a}(s) + C\_{i}(s,a)
+\end{equation}
+
+the value function of the root node is the value obtained over all nodes in the graph
+
+where:
+
+\begin{equation}
+C\_{i}(s,a) = \sum\_{s'}^{} P(s'|s,a) V(s')
+\end{equation}
+
+
+## Learning MaxQ {#learning-maxq}
+
+1.  maintain two tables \\(C\_{i}\\) and \\(\tilde{C}\_{i}(s,a)\\) (which is a special completion function which corresponds to a special reward \\(\tilde{R}\\) which prevents the model from doing egregious ending actions)
+2.  choose \\(a\\) according to exploration strategy
+3.  execute \\(a\\), observe \\(s'\\), and compute \\(R(s'|s,a)\\)
+
+Then, update:
+
+{{< figure src="/ox-hugo/2024-02-13_09-54-38_screenshot.png" >}}
diff --git a/content/posts/KBhoption.md b/content/posts/KBhoption.md
@@ -0,0 +1,60 @@
++++
+title = "Option (MDP)"
+author = ["Houjun Liu"]
+draft = false
++++
+
+an [Option (MDP)]({{< relref "KBhoption.md" >}}) represents a high level collection of actions. Big Picture: abstract away your big policy into \\(n\\) small policies, and value-iterate over expected values of the big policies.
+
+
+## Markov Option {#markov-option}
+
+A [Markov Option](#markov-option) is given by a triple \\((I, \pi, \beta)\\)
+
+-   \\(I \subset S\\), the states from which the option maybe started
+-   \\(S \times A\\), the MDP during that option
+-   \\(\beta(s)\\), the probability of the option terminating at state \\(s\\)
+
+
+### one-step options {#one-step-options}
+
+You can develop one-shot options, which terminates immediate after one action with underlying probability
+
+-   \\(I = \\{s:a \in A\_{s}\\}\\)
+-   \\(\pi(s,a) = 1\\)
+-   \\(\beta(s) = 1\\)
+
+
+### option value fuction {#option-value-fuction}
+
+\begin{equation}
+Q^{\mu}(s,o) = \mathbb{E}\qty[r\_{t} + \gamma r\_{t+1} + \dots]
+\end{equation}
+
+where \\(\mu\\) is some option selection process
+
+
+### semi-markov decision process {#semi-markov-decision-process}
+
+a [semi-markov decision process](#semi-markov-decision-process) is a system over a bunch of [option]({{< relref "KBhoptions.md" >}})s, with time being a factor in option transitions, but the underlying policies still being [MDP]({{< relref "KBhmarkov_decision_process.md" >}})s.
+
+\begin{equation}
+T(s', \tau | s,o)
+\end{equation}
+
+where \\(\tau\\) is time elapsed.
+
+because option-level termination induces jumps between large scale states, one backup can propagate to a lot of states.
+
+
+### intra option q-learning {#intra-option-q-learning}
+
+\begin{equation}
+Q\_{k+1} (s\_{i},o) = (1-\alpha\_{k})Q\_{k}(S\_{t}, o) + \alpha\_{k} \qty(r\_{t+1} + \gamma U\_{k}(s\_{t+1}, o))
+\end{equation}
+
+where:
+
+\begin{equation}
+U\_{k}(s,o) = (1-\beta(s))Q\_{k}(s,o) + \beta(s) \max\_{o \in O} Q\_{k}(s,o')
+\end{equation}
diff --git a/content/posts/KBhpomdps_index.md b/content/posts/KBhpomdps_index.md
@@ -17,8 +17,12 @@ a class about [POMDP]({{< relref "KBhpartially_observable_markov_decision_proces
 | Moar Online Methods        | [IS-DESPOT]({{< relref "KBhis_despot.md" >}}), [POMCPOW]({{< relref "KBhpomcpow.md" >}}), [AdaOPS]({{< relref "KBhadaops.md" >}})          |
 | POMDPish                   | [MOMDP]({{< relref "KBhmomdp.md" >}}), [POMDP-lite]({{< relref "KBhpomdp_lite.md" >}}), [rho-POMDPs]({{< relref "KBhrho_pomdps.md" >}})    |
 | Memoryless + Policy Search | [Sarsa (Lambda)]({{< relref "KBhsarsa_lambda.md" >}}), [JSJ]({{< relref "KBhjsj.md" >}}), [Pegasus]({{< relref "KBhpegasus.md" >}})        |
+| Hierarchical Decomposition | [Option]({{< relref "KBhoption.md" >}}), [MaxQ]({{< relref "KBhmaxq.md" >}}), [LTRDP]({{< relref "KBhltrdp.md" >}})                        |
 
 
 ## Other Content {#other-content}
 
 [Research Tips]({{< relref "KBhresearch_tips.md" >}})
+
+-   [STRIPS-style planning]({{< relref "KBhstrips_style_planning.md" >}})
+-   [Temperal Abstraction]({{< relref "KBhtemperal_abstraction.md" >}})
diff --git a/content/posts/KBhresearch_tips.md b/content/posts/KBhresearch_tips.md
@@ -114,3 +114,35 @@ Overview **AFTER** the motivation.
 -   biblatex: bibtex with postprocessing the .tex
 -   sislstrings.bib: mykel's conference list for .bib
 -   JabRef
+
+
+## PhD Thesis {#phd-thesis}
+
+<http://www.feron.org/Eric/PhD_characterization_2.htm>
+
+-   "Cool Theorems and New Methods"
+-   "Cool Methods and Predictions"
+-   "Beautiful Demonstrations"
+-   "Cool engineering ideas"
+
+
+## "How to Write a Paper" {#how-to-write-a-paper}
+
+<https://cs.stanford.edu/people/widom/paper-writing.html>
+
+1.  what's the problem
+2.  why is it interesting and important?
+3.  why is it hard?
+4.  why hasn't been solved before/what's wrong with previous solutions?
+5.  what are the key components of my approach and results?
+
+You want the intro to end near the end of the first page or near the end of the second page. **Always lead with the problem.**
+
+
+## Mathematical Writing {#mathematical-writing}
+
+"CS209 mathematical writing"
+
+Don't start a sentence with a symbol.
+
+Don't use "utilize".
diff --git a/content/posts/KBhstrips_style_planning.md b/content/posts/KBhstrips_style_planning.md
@@ -0,0 +1,23 @@
++++
+title = "STRIPS-style planning"
+author = ["Houjun Liu"]
+draft = false
++++
+
+This is a precursor to [MDP]({{< relref "KBhmarkov_decision_process.md" >}}) planning:
+
+-   states: conjunction of "fluents" (which are state)
+-   actions: transition between fulents
+-   transitions: deleting of older, changed parts of fluents, adding new parts
+
+
+## Planning Domain Definition Language {#planning-domain-definition-language}
+
+A LISP used to specify a [STRIPS-style planning]({{< relref "KBhstrips_style_planning.md" >}}) problem.
+
+
+## Hierarchical Task Network {#hierarchical-task-network}
+
+1.  Decompose classical planning into a hierarchy of actions
+2.  Leverage High level actions to generate a coarse plan
+3.  Refine to smaller problems
diff --git a/content/posts/KBhtemperal_abstraction.md b/content/posts/KBhtemperal_abstraction.md
@@ -0,0 +1,5 @@
++++
+title = "Temperal Abstraction"
+author = ["Houjun Liu"]
+draft = false
++++
diff --git a/static/ox-hugo/2024-02-13_09-50-20_screenshot.png b/static/ox-hugo/2024-02-13_09-50-20_screenshot.png
diff --git a/static/ox-hugo/2024-02-13_09-51-27_screenshot.png b/static/ox-hugo/2024-02-13_09-51-27_screenshot.png
diff --git a/static/ox-hugo/2024-02-13_09-54-38_screenshot.png b/static/ox-hugo/2024-02-13_09-54-38_screenshot.png
diff --git a/static/ox-hugo/2024-02-13_10-11-32_screenshot.png b/static/ox-hugo/2024-02-13_10-11-32_screenshot.png