kb autocommit

Jemoka · Feb 8, 2024 · 3dc4973 · 3dc4973
1 parent 8595bd2
commit 3dc4973
Show file tree

Hide file tree

Showing 3 changed files with 114 additions and 0 deletions.
diff --git a/content/posts/KBhlm_alignment.md b/content/posts/KBhlm_alignment.md
@@ -0,0 +1,47 @@
++++
+title = "LM Alignment"
+author = ["Houjun Liu"]
+draft = false
++++
+
+Been Kim
+
+alignment problem involves "aligning" the representation spaces between machines of the world and that of the human. alternative perspective: **teach <span class="underline"><span class="underline">humans</span></span> new concepts to understand/communicate better**
+
+
+## feature attribution doesn't work {#feature-attribution-doesn-t-work}
+
+We take that perspective because many of the intersectional intepretability doesn't work well (feature permutation, etc.)---feature attribution type analyses ("Impossibility Theorems Been Kim") actually has no correlation with predictive results.
+
+
+## feature information store in models is unrelated to model edit success {#feature-information-store-in-models-is-unrelated-to-model-edit-success}
+
+i.e.: knowledge storing location located using ROME technique, though it gives you a sense of the location to store information, doens't correlate to success of model editing.
+
+
+## can we use ML to teach people? {#can-we-use-ml-to-teach-people}
+
+for instance, we can teach grandmasters to play chess using AlphaGo, and see if we can make a quantitative impact.
+
+
+### concept {#concept}
+
+A [concept](#concept) is a unit of knowledge that's **useful for a task**. Two properties:
+
+1.  **minimality**: irrelavent information has been removed
+2.  **transferable**: it can be taught atomically
+
+
+#### filtering for good [concept](#concept)s {#filtering-for-good-concept--orgcd99e8f--s}
+
+Representing a concept as a sparse vector as the latent space. We can check if a concept is transferable by teaching a student agent by doing KL divergence.
+
+
+#### demonstration learning {#demonstration-learning}
+
+instead of doing demonstration learning on machines, do it on ****HUMANS****. Filter for the [concept](#concept)s that are well operationalized.
+
+
+## alpha-zero {#alpha-zero}
+
+recap: using a dense network to embed the network, and then [MCTS]({{< relref "KBhmonte_carlo_tree_search.md" >}}).
diff --git a/content/posts/KBhpset_5.md b/content/posts/KBhpset_5.md
@@ -533,3 +533,69 @@ a(1-y)x - a(1-y) + a((x-1)y-(x-1)) = 0
 \end{equation}
 
 as desired.
+
+
+### Problem 14.6 {#problem-14-dot-6}
+
+
+#### Part a {#part-a}
+
+We have:
+
+\begin{equation}
+s = 1-i
+\end{equation}
+
+and that:
+
+\begin{equation}
+i'(t) = \tau s(t) i(t) - \gamma i(t)
+\end{equation}
+
+Substituting the former into the latter:
+
+\begin{equation}
+i'(t) = \tau (1-i(t))i(t) - \gamma i(t)
+\end{equation}
+
+which means, combining the two:
+
+\begin{equation}
+i'(t) = \tau i(t) -\tau  i^{2}(t) - \gamma i(t)
+\end{equation}
+
+which finally means th
+
+at:
+
+\begin{equation}
+i'(t) = (\tau -\gamma )i(t) - \tau i^{2}(t)
+\end{equation}
+
+Now, factoring \\(i(t)\\) out, we obtain:
+
+\begin{equation}
+i'(t) = i(t) ((\tau-\gamma) - \tau i(t))
+\end{equation}
+
+which yields two solutions, either \\(i(t) = 0\\), or \\((\tau - \gamma ) - \tau  (i(t)) = 0\\) meaning, \\(i(t) = 1- \frac{\gamma}{\tau}\\)
+
+
+#### Part b {#part-b}
+
+If \\(R\_0 = \frac{\tau}{\gamma} < 1\\), then we have that \\(\tau < \gamma\\).
+
+Meaning, \\(\frac{\gamma}{\tau} > 1\\), which gives that \\(1 - \frac{\gamma}{\tau} < 0\\). Meaning there is no positive stationary value.
+
+As for \\(i(t) = 0\\), taking one derivative of the expression above yields that \\(i''(t) = (\tau -\gamma) - 2\tau i(t)\\). The latter is \\(0\\) when \\(i(t) = 0\\), and the former would be negative. Meaning, the stationary point \\(i(t) = 0\\) is stable.
+
+
+#### Part c {#part-c}
+
+A similar argument goes as part \\(b\\), but in opposite direction.
+
+If \\(R\_0 = \frac{\tau}{\gamma} > 1\\), then we have that \\(\tau > \gamma\\).
+
+For \\(i(t) = 0\\), taking one derivative of the expression above yields that \\(i''(t) = (\tau -\gamma) - 2\tau i(t)\\). At \\(i(t) = 1- \frac{\gamma}{\tau}\\), this simplifies to $(&tau; -&gamma; ) - 2 &tau; + 2&gamma; $, which gives \\(\gamma - \tau < 0\\), which is stable
+
+As for \\(i(t) = 0\\), the latter term in the derivative is still \\(0\\) when \\(i(t) = 0\\), and the former would be positive. Meaning, the stationary point \\(i(t) = 0\\) is unstable.
diff --git a/content/posts/KBhstanford_courses_index.md b/content/posts/KBhstanford_courses_index.md
@@ -44,6 +44,7 @@ Here are a list of random indicies which may end up being helpful!
 | <span class="timestamp-wrapper"><span class="timestamp">&lt;2023-11-16 Thu&gt;</span></span> | Dissociating Language and Thought | Anna Ivanova       | [Dissociating Language and Thought]({{< relref "KBhdissociating_language_and_thought.md" >}}) |
 | <span class="timestamp-wrapper"><span class="timestamp">&lt;2024-01-11 Thu&gt;</span></span> | Language Agents                   | Karthik Narasimhan | [Language Agents with Karthik]({{< relref "KBhlanguage_agents.md" >}})                        |
 | <span class="timestamp-wrapper"><span class="timestamp">&lt;2024-02-01 Thu&gt;</span></span> |                                   |                    | [Pretraining Data]({{< relref "KBhpretraining_data.md" >}})                                   |
+| <span class="timestamp-wrapper"><span class="timestamp">&lt;2024-02-08 Thu&gt;</span></span> | value alignment                   | Been Kim           | [LM Alignment]({{< relref "KBhlm_alignment.md" >}})                                           |
 
 
 ## Contacts {#contacts}