Skip to content

Commit

Permalink
kb autocommit
Browse files Browse the repository at this point in the history
  • Loading branch information
Jemoka committed Feb 13, 2024
1 parent 1958cce commit b182702
Show file tree
Hide file tree
Showing 11 changed files with 211 additions and 0 deletions.
32 changes: 32 additions & 0 deletions content/posts/KBhltrdp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
+++
title = "LRTDP"
author = ["Houjun Liu"]
draft = false
+++

## Real-Time Dynamic Programming {#real-time-dynamic-programming}

[RTDP](#real-time-dynamic-programming) is a asynchronous value iteration scheme. Each [RTDP](#real-time-dynamic-programming) trial is a result of:

\begin{equation}
V(s) = \min\_{ia \in A(s)} c(a,s) + \sum\_{s' \in S}^{} P\_{a}(s'|s)V(s)
\end{equation}

the algorithm halts when the residuals are sufficiently small.


## Labeled [RTDP](#real-time-dynamic-programming) {#labeled-rtdp--org9a279ff}

We want to label converged states so we don't need to keep investigate it:

a state is **solved** if:

- state has less then \\(\epsilon\\)
- all reachable states given \\(s'\\) from this state has residual lower than \\(\epsilon\\)


### Labelled RTDP {#labelled-rtdp}

{{< figure src="/ox-hugo/2024-02-13_10-11-32_screenshot.png" >}}

We stochastically simulate one step forward, and until a state we haven't marked as "solved" is met, then we simulate forward and value iterate
55 changes: 55 additions & 0 deletions content/posts/KBhmaxq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
+++
title = "MaxQ"
author = ["Houjun Liu"]
draft = false
+++

## Two Abstractions {#two-abstractions}

- "temporal abstractions": making decisions without consideration / abstracting away time ([MDP]({{< relref "KBhmarkov_decision_process.md" >}}))
- "state abstractions": making decisions about groups of states at once


## Graph {#graph}

[MaxQ]({{< relref "KBhmaxq.md" >}}) formulates a policy as a graph, which formulates a set of \\(n\\) policies

{{< figure src="/ox-hugo/2024-02-13_09-50-20_screenshot.png" >}}


### Max Node {#max-node}

This is a "policy node", connected to a series of \\(Q\\) nodes from which it takes the max and propegate down. If we are at a leaf max-node, the actual action is taken and control is passed back t to the top of the graph


### Q Node {#q-node}

each node computes \\(Q(S,A)\\) for a value at that action


## Hierachical Value Function {#hierachical-value-function}

{{< figure src="/ox-hugo/2024-02-13_09-51-27_screenshot.png" >}}

\begin{equation}
Q(s,a) = V\_{a}(s) + C\_{i}(s,a)
\end{equation}

the value function of the root node is the value obtained over all nodes in the graph

where:

\begin{equation}
C\_{i}(s,a) = \sum\_{s'}^{} P(s'|s,a) V(s')
\end{equation}


## Learning MaxQ {#learning-maxq}

1. maintain two tables \\(C\_{i}\\) and \\(\tilde{C}\_{i}(s,a)\\) (which is a special completion function which corresponds to a special reward \\(\tilde{R}\\) which prevents the model from doing egregious ending actions)
2. choose \\(a\\) according to exploration strategy
3. execute \\(a\\), observe \\(s'\\), and compute \\(R(s'|s,a)\\)

Then, update:

{{< figure src="/ox-hugo/2024-02-13_09-54-38_screenshot.png" >}}
60 changes: 60 additions & 0 deletions content/posts/KBhoption.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
+++
title = "Option (MDP)"
author = ["Houjun Liu"]
draft = false
+++

an [Option (MDP)]({{< relref "KBhoption.md" >}}) represents a high level collection of actions. Big Picture: abstract away your big policy into \\(n\\) small policies, and value-iterate over expected values of the big policies.


## Markov Option {#markov-option}

A [Markov Option](#markov-option) is given by a triple \\((I, \pi, \beta)\\)

- \\(I \subset S\\), the states from which the option maybe started
- \\(S \times A\\), the MDP during that option
- \\(\beta(s)\\), the probability of the option terminating at state \\(s\\)


### one-step options {#one-step-options}

You can develop one-shot options, which terminates immediate after one action with underlying probability

- \\(I = \\{s:a \in A\_{s}\\}\\)
- \\(\pi(s,a) = 1\\)
- \\(\beta(s) = 1\\)


### option value fuction {#option-value-fuction}

\begin{equation}
Q^{\mu}(s,o) = \mathbb{E}\qty[r\_{t} + \gamma r\_{t+1} + \dots]
\end{equation}

where \\(\mu\\) is some option selection process


### semi-markov decision process {#semi-markov-decision-process}

a [semi-markov decision process](#semi-markov-decision-process) is a system over a bunch of [option]({{< relref "KBhoptions.md" >}})s, with time being a factor in option transitions, but the underlying policies still being [MDP]({{< relref "KBhmarkov_decision_process.md" >}})s.

\begin{equation}
T(s', \tau | s,o)
\end{equation}

where \\(\tau\\) is time elapsed.

because option-level termination induces jumps between large scale states, one backup can propagate to a lot of states.


### intra option q-learning {#intra-option-q-learning}

\begin{equation}
Q\_{k+1} (s\_{i},o) = (1-\alpha\_{k})Q\_{k}(S\_{t}, o) + \alpha\_{k} \qty(r\_{t+1} + \gamma U\_{k}(s\_{t+1}, o))
\end{equation}

where:

\begin{equation}
U\_{k}(s,o) = (1-\beta(s))Q\_{k}(s,o) + \beta(s) \max\_{o \in O} Q\_{k}(s,o')
\end{equation}
4 changes: 4 additions & 0 deletions content/posts/KBhpomdps_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,12 @@ a class about [POMDP]({{< relref "KBhpartially_observable_markov_decision_proces
| Moar Online Methods | [IS-DESPOT]({{< relref "KBhis_despot.md" >}}), [POMCPOW]({{< relref "KBhpomcpow.md" >}}), [AdaOPS]({{< relref "KBhadaops.md" >}}) |
| POMDPish | [MOMDP]({{< relref "KBhmomdp.md" >}}), [POMDP-lite]({{< relref "KBhpomdp_lite.md" >}}), [rho-POMDPs]({{< relref "KBhrho_pomdps.md" >}}) |
| Memoryless + Policy Search | [Sarsa (Lambda)]({{< relref "KBhsarsa_lambda.md" >}}), [JSJ]({{< relref "KBhjsj.md" >}}), [Pegasus]({{< relref "KBhpegasus.md" >}}) |
| Hierarchical Decomposition | [Option]({{< relref "KBhoption.md" >}}), [MaxQ]({{< relref "KBhmaxq.md" >}}), [LTRDP]({{< relref "KBhltrdp.md" >}}) |


## Other Content {#other-content}

[Research Tips]({{< relref "KBhresearch_tips.md" >}})

- [STRIPS-style planning]({{< relref "KBhstrips_style_planning.md" >}})
- [Temperal Abstraction]({{< relref "KBhtemperal_abstraction.md" >}})
32 changes: 32 additions & 0 deletions content/posts/KBhresearch_tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,35 @@ Overview **AFTER** the motivation.
- biblatex: bibtex with postprocessing the .tex
- sislstrings.bib: mykel's conference list for .bib
- JabRef


## PhD Thesis {#phd-thesis}

<http://www.feron.org/Eric/PhD_characterization_2.htm>

- "Cool Theorems and New Methods"
- "Cool Methods and Predictions"
- "Beautiful Demonstrations"
- "Cool engineering ideas"


## "How to Write a Paper" {#how-to-write-a-paper}

<https://cs.stanford.edu/people/widom/paper-writing.html>

1. what's the problem
2. why is it interesting and important?
3. why is it hard?
4. why hasn't been solved before/what's wrong with previous solutions?
5. what are the key components of my approach and results?

You want the intro to end near the end of the first page or near the end of the second page. **Always lead with the problem.**


## Mathematical Writing {#mathematical-writing}

"CS209 mathematical writing"

Don't start a sentence with a symbol.

Don't use "utilize".
23 changes: 23 additions & 0 deletions content/posts/KBhstrips_style_planning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
+++
title = "STRIPS-style planning"
author = ["Houjun Liu"]
draft = false
+++

This is a precursor to [MDP]({{< relref "KBhmarkov_decision_process.md" >}}) planning:

- states: conjunction of "fluents" (which are state)
- actions: transition between fulents
- transitions: deleting of older, changed parts of fluents, adding new parts


## Planning Domain Definition Language {#planning-domain-definition-language}

A LISP used to specify a [STRIPS-style planning]({{< relref "KBhstrips_style_planning.md" >}}) problem.


## Hierarchical Task Network {#hierarchical-task-network}

1. Decompose classical planning into a hierarchy of actions
2. Leverage High level actions to generate a coarse plan
3. Refine to smaller problems
5 changes: 5 additions & 0 deletions content/posts/KBhtemperal_abstraction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
+++
title = "Temperal Abstraction"
author = ["Houjun Liu"]
draft = false
+++
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/ox-hugo/2024-02-13_09-51-27_screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/ox-hugo/2024-02-13_09-54-38_screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit b182702

Please sign in to comment.