diff --git a/Aalto2023.Rmd b/Aalto2023.Rmd
index f804f137..92dc3301 100644
--- a/Aalto2023.Rmd
+++ b/Aalto2023.Rmd
@@ -199,7 +199,7 @@ the book chapters relaed to the next lecture and assignment.
| 7\. Hierarchical models and exchangeability | [BDA3 Chapter 5](BDA3_notes.html#ch5) | [2023 Lecture 7.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=c1014690-1133-4232-ad0f-b0a400ba228d),
[2023 Lecture 7.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=196c3a91-3ba2-4469-ab15-b0a400ca6074),
[2022 Project info](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=8f0158f9-6abf-4ada-bdb7-af3800d139de),
[Slides 7](slides/BDA_lecture_7.pdf) | [Assignment 7](assignments/assignment7.html) | `r sdate("Lecture date", "Week8")` | `r sdate("Assignment closes (23:59)", "Week8")` |
| 8\. Model checking & cross-validation | [BDA3 Chapter 6](BDA3_notes.html#ch6), [BDA3 Chapter 7](BDA3_notes.html#ch7), [Visualization in Bayesian workflow](https://doi.org/10.1111/rssa.12378), [Practical Bayesian cross-validation](https://arxiv.org/abs/1507.04544) | [2023 Lecture 8.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=785ece8a-16ef-4f64-8134-b0ab00cbd1e8),
[2023 Lecture 8.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=456afda7-0e6d-4903-b0df-b0ab00da8f1e),
[Slides 8a](slides/BDA_lecture_8a.pdf),[Slides 8b](slides/BDA_lecture_8b.pdf) | Start project work | `r sdate("Lecture date", "Week9")` | `r sdate("Assignment closes (23:59)", "Week9")` |
| 9\. Model comparison, selection, and hypothesis testing | [BDA3 Chapter 7 (not 7.2 and 7.3)](BDA3_notes.html#ch7),
[Practical Bayesian cross-validation](https://arxiv.org/abs/1507.04544) | [2023 Lecture 9.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a4961b5a-7e42-4603-8aaf-b0b200ca6295),
[2023 Lecture 9.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a4796c79-eab2-436e-b55f-b0b200dac7ce),
[Slides 9](slides/BDA_lecture_9.pdf) | [Assignment 8](assignments/assignment8.html) | `r sdate("Lecture date", "Week10")` | `r sdate("Assignment closes (23:59)", "Week10")` |
-| 10\. Decision analysis | [BDA3 Chapter 9](BDA3_notes.html#ch9) | [2022 Lecture 10.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a22aab9c-953c-4ea8-b6ec-af4d00c9fe58),
[Slides 10](slides/BDA_lecture_10.pdf) | [Assignment 9](assignments/assignment9.html) | `r sdate("Lecture date", "Week11")` | `r sdate("Assignment closes (23:59)", "Week11")` |
+| 10\. Decision analysis | [BDA3 Chapter 9](BDA3_notes.html#ch9) | [2022 Lecture 10.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a22aab9c-953c-4ea8-b6ec-af4d00c9fe58),
[Slides 10a](slides/BDA_lecture_10a.pdf), [Slides 10b](slides/BDA_lecture_10b.pdf) | [Assignment 9](assignments/assignment9.html) | `r sdate("Lecture date", "Week11")` | `r sdate("Assignment closes (23:59)", "Week11")` |
| 11\. Normal approximation, frequency properties | [BDA3 Chapter 4](BDA3_notes.html#ch4) | [2022 Lecture 11.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=8cde4d40-1b77-4110-af98-af5400ca38b5),
[2022 Lecture 11.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=d83f6553-1516-475f-8898-af5400dd7b50),
[Slides 11](slides/BDA_lecture_11.pdf) | Project work | `r sdate("Lecture date", "Week12")` | `r sdate("Assignment closes (23:59)", "Week12")` |
| 12\. Extended topics | Optional: BDA3 Chapters [8](BDA_notes.hml#ch8), [14-18](BDA_notes.hml#ch14), and 21 | Optional:
[Old Lecture 12.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=e998b5dd-bf8e-42da-9f7c-ab1700ca2702),
[Old Lecture 12.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=c43c862a-a5a4-45da-9b27-ab1700e12012),
[Slides 12](slides/BDA_lecture_12.pdf) | Project work | `r sdate("Lecture date", "Week13")` | `r sdate("Assignment closes (23:59)", "Week13")` |
| 13\. Project evaluation | | | | Project presentations: `r params$project_presentations` | Evaluation week |
@@ -526,16 +526,17 @@ variable selection with projection predictive variable selection.
`r sdate("Lecture date", "Week10", offset = 4)` 10-12.
- Start reading Chapter 9, see instructions below.
-### 10) BDA3 Ch 9, decision analysis
+### 10) BDA3 Ch 9, decision analysis + BDA3 Ch 4 Laplace approximation and asymptotics
-Decision analysis. BDA3 Ch 9.
+Decision analysis. BDA3 Ch 9. + Laplace approximation and asymptotics. BDA Ch 4.
-- Read Chapter 9
+- Read Chapter 9 and 4
- see [reading instructions for Chapter 9](BDA3_notes.html#ch9)
+ - see [reading instructions for Chapter 4](BDA3_notes.html#ch4)
- **Lecture `r paste(sday("Lecture date", "Week11"), sdate("Lecture date", "Week11"))` 14:15-16, hall T2, CS building**
- - [Slides 10](slides/BDA_lecture_10.pdf)
+ - [Slides 10a](slides/BDA_lecture_10a.pdf), [Slides 10b](slides/BDA_lecture_10b.pdf)
- Videos: [2022 Lecture 10.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a22aab9c-953c-4ea8-b6ec-af4d00c9fe58)
- on decision analysis. BDA3 Ch 9.
+ on decision analysis. BDA3 Ch 9, and [2022 Lecture 11.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=8cde4d40-1b77-4110-af98-af5400ca38b5) on normal approximation (Laplace approximation), large sample theory, and counter examples, BDA3 Ch 4.
- Make and submit [Assignment 9](assignments/assignment9.html).
**`r sday("Assignment closes (23:59)", "Week11")` `r sdate("Assignment closes (23:59)", "Week11")` 23:59**
- Review Assignment 8 done by your peers before 23:59
diff --git a/Aalto2023.html b/Aalto2023.html
index 857cc11e..b62a3954 100644
--- a/Aalto2023.html
+++ b/Aalto2023.html
@@ -1612,7 +1612,7 @@
Bayesian Data Analysis course - Aalto
-Page updated: 2023-11-07
+Page updated: 2023-11-13
@@ -1914,8 +1914,8 @@ Schedule overview
10. Decision analysis |
BDA3 Chapter 9 |
-Lecture 10.1, Slides
-10 |
+Lecture 10.1,
+10a, Slides 10b
Assignment 9 |
13.11. |
19.11. |
@@ -2330,21 +2330,27 @@ 9) BDA3 Ch 7, extra material, model comparison and selection
Start reading Chapter 9, see instructions below.
10) BDA3 Ch 9, decision analysis
Decision analysis. BDA3 Ch 9.
10) BDA3 Ch 9, decision analysis + BDA3 Ch 4 Laplace approximation
+and asymptotics
Decision analysis. BDA3 Ch 9. + Laplace approximation and
+asymptotics. BDA Ch 4.
-- Read Chapter 9
- Read Chapter 9 and 4
- Lecture Monday 13.11. 14:15-16, hall T2, CS
Make and submit Assignment
9. Sunday 19.11. 23:59
diff --git a/slides/BDA_lecture_10a.pdf b/slides/BDA_lecture_10a.pdf
new file mode 100644
index 00000000..fa39664b
Binary files /dev/null and b/slides/BDA_lecture_10a.pdf differ
diff --git a/slides/BDA_lecture_10a.tex b/slides/BDA_lecture_10a.tex
new file mode 100644
index 00000000..962cd2f4
--- /dev/null
+++ b/slides/BDA_lecture_10a.tex
@@ -0,0 +1,549 @@
+\usepackage{newtxtext} % times
+%\usepackage[scaled=.95]{cabin} % sans serif
+\usepackage[varqu,varl]{inconsolata} % typewriter
+\usefonttheme[onlymath]{serif} % beamer font theme
+% \usepackage{amsbsy}
+% \usepackage{eucal}
+ \setbeamercovered{invisible}
+ \setbeamertemplate{itemize items}[circle]
+ \setbeamercolor{frametitle}{bg=white,fg=navyblue}
+ \setbeamertemplate{navigation symbols}{}
+ \setbeamertemplate{headline}[default]{}
+ \setbeamertemplate{footline}[split]
+ % \setbeamertemplate{headline}[text line]{\insertsection}
+ % \setbeamertemplate{footline}[frame number]
+ /Title (BDA, Lecture 10)
+ /Author (Aki Vehtari) %
+ /Keywords (Bayesian data analysis)
+% Lists
+ \begin{list}{$\color{list1}\bullet$}{\itemsep=6pt}}{
+ \end{list}}
+ \begin{list}{$\includegraphics[width=5pt]{logo.eps}$}{\itemsep=6pt}}{
+ \end{list}}
+ \begin{list}{-}{\baselineskip=12pt\itemsep=2pt}}{
+ \end{list}}
+ \begin{list}{$\cdot$}{\baselineskip=15pt}}{
+ \end{list}}
+\def\o{{\mathbf o}}
+\def\t{{\mathbf \theta}}
+\def\w{{\mathbf w}}
+\def\x{{\mathbf x}}
+\def\y{{\mathbf y}}
+\def\z{{\mathbf z}}
+% \DeclareMathOperator{\Pr}{Pr}
+\def\euro{{\footnotesize \EUR\, }}
+% \def\dashxy(#1){%
+% /xydash{[#1] 0 setdash}def}
+% \def\grayxy(#1){%
+% /xycolor{#1 setgray}def}
+% \newgraphescape{D}[1]{!{\ar @*{[!\dashxy(2 2)]} "#1"}}
+% \newgraphescape{P}[1]{!{\ar "#1"}}
+% \newgraphescape{F}[1]{!{*+=<2em>[F=]{#1}="#1"}}
+% \newgraphescape{O}[1]{!{*+=<2em>[F]{#1}="#1"}}
+% \newgraphescape{V}[1]{!{*+=<2em>[o][F]{#1}="#1"}}
+% \newgraphescape{B}[3]{!{{ "#1"*+#3\frm{} }.{ "#2"*+#3\frm{} } *+[F:!\grayxy(0.75)]\frm{}}}
+\title[]{Bayesian data analysis}
+\author{Aki Vehtari}
+\begin{frame}{Chapter 9 Decision Analysis}
+ \begin{list1}
+\item 9.1 Context and basic steps (most important part)
+\item 9.2 Example
+\item 9.3 Multistage decision analysis (example)
+\item 9.4 Hierarchical decision analysis (example)
+\item 9.5 Personal vs. institutional decision analysis
+\begin{frame}{Bayesian decision theory}
+ \begin{list1}
+ \item<+-> Potential decisions ${\color{set12}d}$
+ \begin{list2}
+ \item or actions ${\color{set12}a}$
+ \end{list2}
+ \item<+-> Potential consequences ${\color{set11}x}$
+ \begin{list2}
+ \item ${\color{set11}x}$ may be categorical, ordinal, real, scalar, vector, etc.
+ \end{list2}
+ \item<+-> Probability distributions of consequences given decisions $p({\color{set11}x}\mid{\color{set12}d})$
+ \begin{list2}
+ \item in decision making the decisions are controlled and thus $p({\color{set12}d})$ does not exist
+ \end{list2}
+ \item<+-> Utility function ${\color{set13}U}({\color{set11}x})$ maps consequences to real value
+ \begin{list2}
+ \item e.g. euro or expected lifetime
+ \item instead of utility sometimes cost or loss is defined
+ \end{list2}
+ \vspace{-1mm}
+ \item<+-> Expected utility $\E[{\color{set13}U}({\color{set11}x})\mid{\color{set12}d}]=\int {\color{set13}U}({\color{set11}x}) p({\color{set11}x}\mid{\color{set12}d}) d{\color{set11}x}$
+ \item<+-> Choose decision ${\color{set12}d^*}$, which maximizes the expected utility
+ \begin{equation*}
+ {\color{set12}d^*}=\arg\max_{\color{set12}d} \E[{\color{set13}U}({\color{set11}x})\mid{\color{set12}d}]
+ \end{equation*}
+ \end{list1}
+{\Large\color{navyblue} Example of decision making: 2 choices}
+\item<+-> Helen is going to pick mushrooms in a forest, while she notices a
+ paw print which could made by a dog or a wolf
+\item<+-> Helen measures that the length of the paw print is 14 cm and
+ goes home to Google how big paws dogs and wolves have, and tries
+ then to infer which animal has made the paw print
+ \includegraphics[width=11cm]{hatutus_likelihoods}
+ observed length has been marked with a horizontal line
+\item<+-> Likelihood of wolf is 0.92 (alternative being dog)
+{\Large\color{navyblue} Example of decision making}
+ \begin{list1}
+ \item<+-> Helen assumes also that in her living area there are about one
+ hundred times more free running dogs than wolves, that is {\em a
+ priori} probability for wolf, before observation is 1\%.
+ \item<+-> Likelihood and posterior
+ \begin{center}\leavevmode
+ \begin{tabular}{| l | c c |}
+ \hline
+ Animal & Likelihood & Posterior probability \\
+ \hline
+ Wolf & 0.92 & 0.10 \\
+ Dog & 0.08 & 0.90 \\
+ \hline
+ \end{tabular}
+ \end{center}
+ \item<+-> Posterior probability of wolf is 10\%
+ \end{list1}
+{\Large\color{navyblue} Example of decision making}
+ \begin{list1}
+ \item<+-> Helen has to make decision whether to go pick mushrooms
+ \item<+-> If she doesn't go to pick mushrooms utility is zero
+ \item<+-> Helen assigns positive utility 1 for getting fresh mushrooms
+ \item<+-> Helen assigns negative utility -1000 for a event that she goes to the forest and wolf attacks (for some reason Helen assumes that wolf will always attack)\\
+ \vspace{0.5\baselineskip}
+ \uncover<+->{
+ \begin{minipage}[t]{58mm}
+ \small
+ \begin{tabular}{| l | c c |}
+ \hline
+ & \multicolumn{2}{ c |}{Animal} \\
+ Decision ${\color{set12}d}$ & Wolf & Dog \\
+ \hline
+ Stay home & 0 & 0 \\
+ Go to the forest & -1000 & 1 \\
+ \hline
+ \end{tabular}\\
+ {Utility matrix ${\color{set13}U}({\color{set11}x})$}
+ \end{minipage}
+ }
+ ~\\
+ \vspace{0.5\baselineskip}
+ \uncover<+->{
+ \begin{minipage}[t]{58mm}
+ \small
+ \begin{tabular}{| l | c | }
+ \hline
+ & Expected utility \\
+ Action $d$ & $\E[{\color{set13}U}({\color{set11}x})\mid{\color{set12}d}]$ \\
+ \hline
+ Stay home & 0 \\
+ Go to the forest & -100+0.9 \\
+ \hline
+ \end{tabular}\\
+ {Utilities for different actions}
+ \end{minipage}
+ \end{list1}
+\begin{frame}{Example of decision making}
+ \begin{list1}
+ \item<+-> Maximum likelihood decision would be to assume that there is a wolf
+ \item<+-> Maximum posterior decision would be to assume that there is a dog
+ \item<+-> Maximum utility decision is to stay home, even if it is more likely that the animal is dog
+ \item<+-> Example illustrates that the uncertainties (probabilities)
+ related to all consequences need to be carried on until final
+ decision making
+ \end{list1}
+% \begin{frame}
+% {\Large\color{navyblue} Example of decision making: several choices}
+% \begin{list1}
+% \item Prof. Gelman has a jar of quarters
+% \begin{list2}
+% \item he first drew a line on the side of the jar and then
+% filled the jar up to the line, and so the number coins was not
+% chosen beforehand
+% \item Prof. Gelman does not know the number of coins in the jar
+% \item<2-> Prof. Gelman gives the class a chance to win the coins if
+% they guess the number of coins correctly (someone else has
+% counted the coins without telling Gelman)
+% \item<2-> How should the students make the decision?
+% \end{list2}
+% \end{list1}
+% \end{frame}
+\begin{frame}{Example of decision making: several choices}
+\item You decide to earn money by selling a seasonal product
+ \begin{list2}
+ \item You pay 7€ per each, and sell them 10€ each
+ \item You need to decide how many ($N$) items to buy
+ \item<2-> You ask your friends how many they used to sell and estimate a
+ distribution for how many you might sell
+ \end{list2}
+\begin{frame}{Example of decision making: several choices}
+ {\includegraphics[width=9cm]{sales_dist2.pdf}}\\
+ \only<2>{\vspace{-0.5\baselineskip}\includegraphics[width=9cm]{sales_utility_20_30.pdf}}
+ \only<3>{\vspace{-0.5\baselineskip}\includegraphics[width=9cm]{sales_utilprob_20_30.pdf}}
+ \only<4>{\vspace{-0.5\baselineskip}\includegraphics[width=9cm]{sales_utilprob_exputil_20_30.pdf}}
+ \only<5>{\vspace{-0.5\baselineskip}\includegraphics[width=9cm]{sales_exputil.pdf}}
+\begin{frame}{Decision making in sales}
+ \begin{list1}
+ \item Common task in commerce and restaurants
+ \end{list1}
+\begin{frame}{Challenges in decision making}
+ \begin{list1}
+ \item Actual utility functions are rarely linear
+ \begin{list2}
+ \item<2-> the expected utility is 5€ for\\
+ a) 100\% of receiving 5€\\
+ b) 50\% of losing 1M€ and 50\% of winning 1.00001M€
+ \item<3-> most gambling has negative expected utility\\
+ (but the excitement of the game may have positive utility)
+ \end{list2}
+ \item<4-> What is the cost of human life?
+ \item<5-> Multiple parties having different utilities
+ \end{list1}
+\begin{frame}{Model selection as decision problem}
+ \begin{list1}
+ \item Choose the model that maximizes the expected utility of using
+ the model to make predictions / decisions in the future
+ \end{list1}
+\begin{frame}{Multi-stage decision making (Section 9.3)}
+ \vspace{-0.3\baselineskip}
+ \begin{list1}
+ \item<+-> 95 year old has a tumor that is malignant with 90\% prob
+ \item<+-> Based on statistics
+ \begin{list2}
+ \item<.-> expected lifetime is 34.8 months if no cancer
+ \item<+-> expected lifetime is 16.7 months if cancer and radiation therapy is used
+ \item<+-> expected lifetime is 20.3 months if cancer and surgery, but the probability of dying in surgery is 35\% (for 95 year old)
+ \item<+-> expected lifetime is 5.6 months if cancer and no treatment
+ \end{list2}
+ \item<+-> Which treatment to choose?
+ \begin{list2}
+ \item<.-> quality adjusted life time
+ \item<.-> 1 month is subtracted for the time spent in treatments
+ \end{list2}
+ \item<+-> Quality adjusted life time
+ \begin{list2}
+ \item<.-> See the book for the multi-stage decision making
+ % \item<.-> Radiotherapy: 0.9*16.7 + 0.1*34.8 - 1 = 17.5mo
+ % \item<+-> Surgery: 0.35*0 + 0.65*(0.9*20.3 + 0.1*34.8 - 1) = 13.5mo
+ % \item<+-> No treatment: 0.9*5.6 + 0.1*34.8 = 8.5mo
+ \end{list2}
+ % \item<+-> See the book for continuation of the example with
+ % additional test for cancer
+\begin{frame}{Design of experiment}
+ \begin{list1}
+ \item Which experiment would give most additional information
+ \begin{list2}
+ \item decide values $x_{n+1}$ for the next experiment
+ \item which values of $x_{n+1}$ would reduce the posterior
+ uncertainty or increase the expected utility most
+ \end{list2}
+ \item<2-> Example 1
+ \begin{list2}
+ \item biopsy in the cancer example
+ \end{list2}
+ \item<3-> Example 2
+ \begin{list2}
+ \item Imagine that in bioassay the posterior uncertainty of LD50 is too large
+ \item which dose should be used in the next experiment to reduce
+ the variance of LD50 as much as possible ?
+ \begin{list3}
+ \item this way less experiments need to be made (and less animals need to be killed)
+ \end{list3}
+ \end{list2}
+ \item<4-> Example 3
+ \begin{list2}
+ \item optimal paper helicopter wing length
+ \end{list2}
+ \end{list1}
+\begin{frame}{Bayesian optimization}
+ \begin{list1}
+ \item Design of experiment
+ \item Used to optimize, for example,
+ \begin{list2}
+ \item machine learning / deep learning model structures,
+ regularization, and learning algorithm parameters
+ \item material science
+ \item engines
+ \item drug testing
+ \item part of Bayesian inference for stochastic simulators
+ \end{list2}
+ \end{list1}
+\begin{frame}{Bayesian optimization of wing length}
+\only<1>{Start with a small number of experiments\\
+\only<2>{Gaussian process model\\
+\only<3-4>{Gaussian process model -- posterior draws\\
+ \hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_initial_fit_draws.pdf}}
+\only<5>{Gaussian process model -- Thompson sampling\\
+ \hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_1.pdf}}
+\only<6>{Gaussian process model -- Thompson sampling\\
+ \hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_1.pdf}}
+ \item<4-> Thompson sampling:
+ \begin{list2}
+ \item pick one posterior draw (function)
+ \item find the wing length corresponding to the max. of that draw
+ \item make the next observation with that wing length
+ \end{list2}
+ \end{list1}
+\begin{frame}{Bayesian optimization of wing length}
+ Gaussian process model -- Thompson sampling\\
+\begin{frame}{Bayesian optimization of wing length}
+ {\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_maximizing_density_2.pdf}\\
+ \uncover<2->{33 BO obs. post. Wasserstein-1 distance $\approx$ 0.77\\
+ 33 first obs. post. Wasserstein-1 distance $\approx$ 1.36}\\}
+ \uncover<3->{~\\We obtain about 50\% increase in efficiency}
+% \only<3>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_maximizing_density.pdf}
+% 33 BO obs. post. Wasserstein-1 distance $\approx$ 0.77\\
+% 5 first + 28 random obs. post. Wasserstein-1 distance $\approx$ 1.27}
+\begin{frame}{Examples of big Bayesian decision making success stories}
+ \begin{list1}
+ \item Bayesian optimization of ML algorithms
+ \item Bayesian optimization of new medical molecules
+ \item Bayesian optimization of new materials
+ \item A/B testing
+ \item Customer retention / satisfaction
+ \item Marketing
+ \end{list1}
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% End:
diff --git a/slides/BDA_lecture_10b.pdf b/slides/BDA_lecture_10b.pdf
new file mode 100644
index 00000000..d90b769a
Binary files /dev/null and b/slides/BDA_lecture_10b.pdf differ
diff --git a/slides/BDA_lecture_10b.tex b/slides/BDA_lecture_10b.tex
new file mode 100644
index 00000000..4b653170
--- /dev/null
+++ b/slides/BDA_lecture_10b.tex
@@ -0,0 +1,1177 @@
+% Uncomment if want to show notes
+% \setbeameroption{show notes}
+ % \usetheme{Copenhagen}
+ % oder ...
+ %\setbeamercovered{transparent}
+ % oder auch nicht
+\setbeamertemplate{itemize items}[circle]
+ bookmarksopen=true,
+ bookmarksnumbered=true,
+ pdftitle={Stan},
+ pdfsubject={Bayesian data analysis},
+ pdfauthor={Aki Vehtari},
+ pdfkeywords={},
+ pdfstartview={FitH -32768},
+ colorlinks=true,
+ linkcolor=navyblue,
+ citecolor=navyblue,
+ filecolor=navyblue,
+ urlcolor=navyblue
+% \definecolor{hutblue}{rgb}{0,0.2549,0.6784}
+% \definecolor{midnightblue}{rgb}{0.0977,0.0977,0.4375}
+% \definecolor{hutsilver}{rgb}{0.4863,0.4784,0.4784}
+% \definecolor{lightgray}{rgb}{0.95,0.95,0.95}
+% \definecolor{section}{rgb}{0,0.2549,0.6784}
+% \definecolor{list1}{rgb}{0,0.2549,0.6784}
+ /Title (Bayesian data analysis 4)
+ /Author (Aki Vehtari) %
+ /Keywords (Bayesian probability theory, Bayesian inference, Bayesian data analysis)
+\setbeamertemplate{navigation symbols}{}
+\setbeamertemplate{headline}[text line]{\insertsection}
+\setbeamertemplate{footline}[frame number]
+\def\o{{\mathbf o}}
+\def\t{{\mathbf \theta}}
+\def\w{{\mathbf w}}
+\def\x{{\mathbf x}}
+\def\y{{\mathbf y}}
+\def\z{{\mathbf z}}
+% \DeclareMathOperator{\Pr}{Pr}
+\def\euro{{\footnotesize \EUR\, }}
+\title[]{Bayesian data analysis}
+\author{Aki Vehtari}
+\begin{frame}{Chapter 4}
+ \begin{itemize}
+ \item 4.1 Normal approximation (Laplace's method)
+ \item 4.2 Large-sample theory
+ \item 4.3 Counter examples
+ \begin{itemize}
+ \item includes examples of difficult posteriors for MCMC, too
+ \end{itemize}
+ \item {\color{gray} 4.4 Frequency evaluation*}
+ \item {\color{gray} 4.5 Other statistical methods*}
+ \end{itemize}
+\begin{frame}{Normal approximation (Laplace approximation)}
+ \begin{itemize}
+ \item Often posterior converges to normal distribution when
+ $n\rightarrow \infty$
+ \begin{itemize}
+ \item bounded, non-singular, the number of parameters don't grow with $n$
+ % \end{itemize}
+ % \item If posterior is unimodal and close to symmetric
+ % \begin{itemize}
+ \item we can then approximate $p(\theta|y)$ with normal distribution
+ % \only<1-3>{\begin{align*}
+ % p(\theta|y)&\approx \frac{1}{\sqrt{2\pi}\sigma_\theta}\exp\left(-\frac{1}{2\sigma_\theta^2}(\theta-\hat{\theta})^2\right)
+ % \end{align*}}
+ \item<2> Laplace used this (before Gauss) to approximate the
+ posterior of binomial model to infer ratio of girls and boys born
+ % \item<3> A strict proof by LeCam in 1950's
+ \end{itemize}
+\begin{frame}{Taylor series}
+ \begin{itemize}
+ \item We can approximate $p(\theta|y)$ with normal distribution
+ \begin{align*}
+ p(\theta|y)&\approx \frac{1}{\sqrt{2\pi}\sigma_\theta}\exp\left(-\frac{1}{2\sigma_\theta^2}(\theta-\hat{\theta})^2\right)
+ \end{align*}
+ \begin{itemize}
+ \item i.e. log posterior $\log p(\theta|y)$ can be
+ approximated with a quadratic function
+ \end{itemize}
+ \begin{align*}
+ \log p(\theta|y)& \approx \alpha(\theta-\hat{\theta})^2 + C
+ \end{align*}
+ \item<2-> Corresponds to Taylor series expansion around $\theta=\hat{\theta}$
+ \begin{equation*}
+ f(\theta)=f(\hat{\theta}) {\only<3->{\color{gray}}+ f'(\hat{\theta})(\theta-\hat{\theta})} + \frac{f''(\hat{\theta})}{2!}(\theta-\hat{\theta})^2 {\only<4->{\color{gray}} +\frac{f^{(3)}(\hat{\theta})}{3!}(\theta-\hat{\theta})^3+\ldots}
+ \end{equation*}
+ \begin{itemize}
+ \item<3-> if $\hat{\theta}$ is at mode, then $f'(\hat{\theta})=0$
+ \item<4-> often when $n \rightarrow \infty$, $\frac{f^{(3)}(\hat{\theta})}{3!}(\theta-\hat{\theta})^3+\ldots$ is small
+ \end{itemize}
+ \end{itemize}
+\begin{frame}{Multivariate Taylor series}
+ \begin{itemize}
+ \item Multivariate series expansion
+ \begin{equation*}
+ f(\theta)= f(\hat{\theta}) {\color{gray} + \frac{d f(\theta')}{d \theta'}_{\theta'=\hat{\theta}}(\theta-\hat{\theta})}
+ + \frac{1}{2!}(\theta-\hat{\theta})^T \frac{d^2 f(\theta')}{d \theta'^2}_{\theta'=\hat{\theta}} (\theta-\hat{\theta}) {\color{gray} + \ldots}
+ % \sum_{j=0}^\infty\left\{\frac{1}{j!}\left[\sum_{k=1}^n(x_k-a_k)\frac{\partial}{\partial x_k'}\right]^j
+ % f(x_1',\ldots,x_n')\right\}_{x_1'=a_1,\ldots,x_n'=a_n}
+ \end{equation*}~
+ \end{itemize}
+% \note{Onko joku joka ei muista Taylorin sarjakehitelmää?
+% Nimetty 1700-luvulla elännen Taylorin mukaan
+\begin{frame}{Normal approximation}
+ \begin{itemize}
+ \only<1-2>{
+ \item Taylor series expansion of the log posterior around the posterior mode
+ $\hat{\theta}$
+ \begin{equation*}
+ \log p(\theta|y) = \log p(\hat{\theta}|y) +
+ \frac{1}{2}(\theta-\hat{\theta})^T\left[\frac{d^2}{d\theta^2}\log p(\theta'|y) \right]_{\theta'=\hat{\theta}} (\theta-\hat{\theta})+\ldots
+ \end{equation*}
+ }
+ \item<2-> Multivariate normal $\propto\left|\Sigma\right|^{-1/2} \exp\left(-\frac{1}{2}(\theta-\hat{\theta}^T)\Sigma^{-1}(\theta-\hat{\theta})\right)$
+ \end{itemize}
+ \vspace{-0.5\baselineskip}
+ \only<3>{
+ \includegraphics[width=11cm]{cond_excat_norm.pdf}}
+ \only<4>{
+ \includegraphics[width=11cm]{cond_excat_norm_log.pdf}}
+\begin{frame}{Normal approximation}
+ \begin{itemize}
+ % \item Taylor series expansion of the log posterior around the posterior mode
+ % $\hat{\theta}$
+ % \begin{equation*}
+ % \log p(\theta|y) = \log p(\hat{\theta}|y) +
+ % \frac{1}{2}(\theta-\hat{\theta})^T\left[\frac{d^2}{d\theta^2}\log p(\theta'|y) \right]_{\theta'=\hat{\theta}} (\theta-\hat{\theta})+\ldots
+ % \end{equation*}
+ % \item Multivariate normal $\propto\left|\Sigma\right|^{-1/2} \exp\left(-\frac{1}{2}(\theta-\hat{\theta}^T)\Sigma^{-1}(\theta-\hat{\theta})\right)$
+ \item Normal approximation
+ \begin{equation*}
+ p(\theta|y) \approx \N(\hat{\theta},[I(\hat{\theta})]^{-1})
+ \end{equation*}
+ where $I(\theta)$ is called {\em observed information}
+ \begin{equation*}
+ I(\theta) = - \frac{d^2}{d\theta^2}\log p(\theta|y)
+ \end{equation*}
+ \uncover<2->{Hessian $H(\theta)=-I(\theta)$}
+ \end{itemize}
+\begin{frame}{Normal approximation}
+ \begin{itemize}
+ \item $I(\theta)$ is called {\em observed information}
+ \begin{equation*}
+ I(\theta) = - \frac{d^2}{d\theta^2}\log p(\theta|y)
+ \end{equation*}
+ \begin{itemize}
+ \item $I(\hat{\theta})$ is the second derivatives at the mode and
+ thus describes the curvature at the mode
+ \item if the mode is inside the parameter space, $I(\hat{\theta})$
+ is positive
+ \item if $\theta$ is a vector, then $I(\theta)$ is a matrix
+% \item Käytetään myös nimitystä {\em Hessian} $H(\theta)$
+ \end{itemize}
+\begin{frame}{Normal approximation}
+ \begin{itemize}
+ \item BDA3 Ch 4 has an example where it is easy to compute first
+ and second derivatives and there is easy analytic solution to
+ find where the first derivatives are zero
+ \end{itemize}
+% \begin{frame}{Normal approximation -- example}
+% \begin{itemize}
+% \item Normal distribution, unknown mean and variance
+% \begin{itemize}
+% \item uniform prior $(\mu,\log\sigma)$
+% \item normal approximation for the posterior of $(\mu,\log\sigma)$
+% \end{itemize}
+% \begin{eqnarray*}
+% \log
+% p(\mu,\log\sigma|y)=& \mathrm{constant}-n\log\sigma- \\
+% & \frac{1}{2\sigma^2}[(n-1)s^2 + n(\bar{y}-\mu)^2]
+% \end{eqnarray*}
+% \pause
+% first derivatives
+% \begin{eqnarray*}
+% \frac{d}{d\mu}\log p(\mu,\log\sigma|y) & = & \frac{n(\bar{y}-\mu)}{\sigma^2},\\
+% \pause
+% \frac{d}{d(\log\sigma)}\log p(\mu,\log\sigma|y) &
+% = & -n + \frac{(n-1)s^2+n(\bar{y}-\mu)^2}{\sigma^2},
+% \end{eqnarray*}
+% \pause
+% from which it is easy to compute the mode
+% \begin{eqnarray*}
+% (\hat{\mu},\log\hat{\sigma})=\left(\bar{y},\frac{1}{2}\log\left(\frac{n-1}{n}s^2\right)\right)
+% \end{eqnarray*}
+% \end{itemize}
+% \end{frame}
+% % \note{tässä parametrisoidaan malli $\log\sigma$:n avulla\\
+% % muistanette varmaankin, että priorina uniformi priori $\log\sigma$:lle
+% % vastaa $1/\sigma$ prioria sigmalle, joka voidaan
+% % muuttujanvaihdoksella helposti todeta
+% % \begin{align*}
+% % p(\log\sigma)&=|J|p(\sigma)=\frac{d \sigma}{d\log\sigma} \frac{1}{\sigma}=\sigma \frac{1}{\sigma}=1
+% % \end{align*}
+% % \begin{align*}
+% % \frac{d}{d\mu}(\bar{y}-\mu)^2=-2(\bar{y}-\mu)
+% % \end{align*}
+% % \begin{align*}
+% % \frac{d}{d\log\sigma}\sigma^{-2}=\frac{d}{d\log\sigma}\exp(\log\sigma)^{-2}=-2\exp(\log\sigma)^{-3}(\log\sigma)=-2\exp(\log\sigma)^{-2}=-2 \sigma^{-2}
+% % \end{align*}
+% % % LASKE muuttujanvaihdos valmiiksi!
+% % }
+% \begin{frame}{Normal approximation -- example}
+% \begin{itemize}
+% \item Normal distribution, unknown mean and variance\\
+% first derivatives
+% \begin{eqnarray*}
+% \frac{d}{d\mu}\log p(\mu,\log\sigma|y) & = & \frac{n(\bar{y}-\mu)}{\sigma^2},\\
+% \frac{d}{d(\log\sigma)}\log p(\mu,\log\sigma|y) & = & -n + \frac{(n-1)s^2+n(\bar{y}-\mu)^2}{\sigma^2}
+% \end{eqnarray*}
+% \pause
+% second derivatives
+% \begin{eqnarray*}
+% \frac{d^2}{d\mu^2}\log p(\mu,\log\sigma|y) & = & -\frac{n}{\sigma^2},\\
+% \pause \frac{d^2}{d\mu d(\log\sigma)}\log p(\mu,\log\sigma|y) &
+% = & -2n\frac{\bar{y}-\mu}{\sigma^2},\\
+% \pause \frac{d^2}{d(\log\sigma)^2}\log p(\mu,\log\sigma|y) &
+% = & -\frac{2}{\sigma^2}((n-1)s^2+n(\bar{y}-\mu)^2)
+% \end{eqnarray*}
+% \end{itemize}
+% \end{frame}
+% \begin{frame}{Normal approximation -- example}
+% \begin{itemize}
+% \item Normal distribution, unknown mean and variance\\
+% second derivatives
+% \begin{eqnarray*}
+% \frac{d^2}{d\mu^2}\log p(\mu,\log\sigma|y) & = & -\frac{n}{\sigma^2},\\
+% \frac{d^2}{d\mu(\log\sigma)}\log p(\mu,\log\sigma|y) & = & -2n\frac{\bar{y}-\mu}{\sigma^2},\\
+% \frac{d^2}{d(\log\sigma)^2}\log p(\mu,\log\sigma|y) & = & -\frac{2}{\sigma^2}((n-1)s^2+n(\bar{y}-\mu)^2)
+% \end{eqnarray*}
+% matrix of the second derivatives at $(\hat{\mu},\log\hat{\sigma})$
+% \begin{eqnarray*}
+% \begin{pmatrix}
+% -n/\hat{\sigma}^2 & 0 \\
+% 0 & -2n
+% \end{pmatrix}
+% \end{eqnarray*}
+% \end{itemize}
+% \end{frame}
+% \begin{frame}{Normal approximation -- example}
+% \begin{itemize}
+% \item Normal distribution, unknown mean and variance\\
+% posterior mode
+% \begin{eqnarray*}
+% (\hat{\mu},\log\hat{\sigma})=\left(\bar{y},\frac{1}{2}\log\left(\frac{n-1}{n}s^2\right)\right)
+% \end{eqnarray*}
+% matrix of the second derivatives at $(\hat{\mu},\log\hat{\sigma})$
+% \begin{eqnarray*}
+% \begin{pmatrix}
+% -n/\hat{\sigma}^2 & 0 \\
+% 0 & -2n
+% \end{pmatrix}
+% \end{eqnarray*}
+% normal approximation
+% \begin{equation*}
+% p(\mu,\log\sigma|y) \approx \N\left(
+% \begin{pmatrix}
+% \mu \\ \log\sigma
+% \end{pmatrix}
+% \Bigg|
+% \begin{pmatrix}
+% \bar{y} \\ \log\hat{\sigma}
+% \end{pmatrix},
+% \begin{pmatrix}
+% \hat{\sigma}^2/n & 0 \\
+% 0 & 1/(2n)
+% \end{pmatrix}
+% \right)
+% \end{equation*}
+% \end{itemize}
+% \end{frame}
+ \frametitle{Normal approximation -- numerically}
+ \begin{itemize}
+ \item Normal approximation can be computed numerically
+ \begin{itemize}
+ \item iterative optimization to find a mode (may use gradients)
+ \item autodiff or finite-difference for gradients and Hessian
+ \item<2> e.g. in R, demo4\_1.R:
+ {\scriptsize
+bioassayfun <- function(w, df) {
+ z <- w[1] + w[2]*df$x
+ -sum(df$y*(z) - df$n*log1p(exp(z)))
+theta0 <- c(0,0)
+optimres <- optim(w0, bioassayfun, gr=NULL, df1, hessian=T)
+thetahat <- optimres$par
+Sigma <- solve(optimres$hessian)
+ }
+ \end{itemize}
+ \end{itemize}
+\begin{frame}{Normal approximation -- numerically}
+ \begin{itemize}
+ \item Normal approximation can be computed numerically
+ \begin{itemize}
+ \item iterative optimization to find a mode (may use gradients)
+ \item autodiff or finite-difference for gradients and Hessian
+ \end{itemize}
+ \item CmdStan(R) has Laplace algorithm
+ \uncover<2->{
+ \begin{itemize}
+ \item uses L-BFGS quasi-Newton optimization algorithm for finding the mode
+ \item uses autodiff for gradients
+ \item uses finite differences of gradients to compute Hessian
+ \begin{itemize}
+ \item<3-> second order autodiff in progress
+ \end{itemize}
+ \end{itemize}
+ }
+ \end{itemize}
+\begin{frame}{Normal approximation}
+ \begin{itemize}
+ \item Optimization and computation of Hessian requires usually much
+ less density evaluations than MCMC
+ \item<2-> In some cases accuracy is sufficient
+ \item<3-> In some cases accuracy for a conditional distribution is
+ sufficient (Ch 13)
+ \begin{itemize}
+ \item e.g. Gaussian latent variable models, such as Gaussian
+ processes (Ch 21) and Gaussian Markov random fields
+ \item Rasmussen \& Williams: Gaussian Processes for Machine Learning
+ \item CS-E4895 - Gaussian Processes (in spring)
+ % \begin{itemize}
+ % \item CS-E4070 - Special Course in Machine Learning and Data
+ % Science: Gaussian processes - theory and applications
+ % \end{itemize}
+ \end{itemize}
+ \item<4-> Accuracy can be improved by importance sampling (Ch 10)
+ \end{itemize}
+\begin{frame}{Example: Importance sampling in Bioassay}
+ \vspace{-.5\baselineskip}
+ \makebox[12cm][t]{
+ \hspace{-0.9cm}
+ \begin{minipage}[t][12cm][t]{12cm}
+ \begin{center}
+ \makebox[0cm][t]{\hspace{-0.5cm}\rotatebox{90}{\hspace{1cm}Grid}}
+ \includegraphics[width=3.4cm]{bioassayis1d.pdf}
+ \includegraphics[width=3.4cm]{bioassayis1s.pdf}
+ \includegraphics[width=3.4cm]{bioassayis1h.pdf}\\
+ \only<2->{
+ \makebox[0cm][t]{\hspace{-0.5cm}\rotatebox{90}{\hspace{1cm}Normal}}
+ \includegraphics[width=3.4cm]{bioassayis2d.pdf}
+ \includegraphics[width=3.4cm]{bioassayis2s.pdf}
+ \includegraphics[width=3.4cm]{bioassayis2h.pdf}\\}
+ \only<3>{But the normal approximation is not that good here:\\ Grid sd(LD50) $\approx$ 0.1, Normal sd(LD50) $\approx$ .75!}
+ \only<4->{
+ \makebox[0cm][t]{\hspace{-0.5cm}\rotatebox{90}{\hspace{1cm}IS}}
+ \includegraphics[width=3.4cm]{bioassayis3d.pdf}
+ \includegraphics[width=3.4cm]{bioassayis3s.pdf}
+ \includegraphics[width=3.4cm]{bioassayis3h.pdf}\\}
+ \only<5->{Grid sd(LD50) $\approx$ 0.1, IS sd(LD50) $\approx$ 0.1}
+ \end{center}
+ \end{minipage}
+ }
+\begin{frame}{Normal approximation}
+ \begin{itemize}
+ \item<1-> Accuracy can be improved by importance sampling
+ \item<1-> Pareto-$k$ diagnostic of importance sampling weights can be
+ used for diagnostic
+ \begin{itemize}
+ \item in Bioassay example $k=0.57$, which is ok
+ \end{itemize}
+ \item<2-> CmdStan(R) has Laplace algorithm
+ \begin{itemize}
+ \item since version 2.33 (2023)
+ \begin{itemize}
+ \item[+] Pareto-$k$ diagnostic via posterior package
+ \item[+] importance resampling (IR) via posterior package
+ \end{itemize}
+ \end{itemize}
+ \end{itemize}
+\begin{frame}{Normal approximation and parameter transformations}
+ \begin{itemize}
+ \item<+-> Normal approximation is not good for parameters with
+ bounded or half-bounded support
+ \begin{itemize}
+ \item e.g. $\theta \in [0,1]$ presentin probability
+ \item<+-> Stan code can include constraints\\
+ \texttt{real theta;}
+ \item<+-> for this, Stan does the inference in unconstrained space
+ using logit transformation
+ \item<+-> density of the transformed parameter needs to include
+ Jacobian of the transformation (BDA3 p. 21)
+ \end{itemize}
+ \end{itemize}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ Binomial model $y \sim \Bin(\theta, N)$, with data $y=9, N=10$
+ With $\Beta(1,1)$ prior, the posterior is $\Beta(9+1,1+1)$
+ \includegraphics[width=10cm]{jacobian-6.png}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ With $\Beta(1,1)$ prior, the posterior is $\Beta(9+1,1+1)$
+ Stan computes only the unnormalized posterior $q(\theta|y)$
+ \includegraphics[width=10cm]{jacobian-7.png}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ With $\Beta(1,1)$ prior, the posterior is $\Beta(9+1,1+1)$
+ For illustration purposes we normalize Stan result $q(\theta|y)$
+ \includegraphics[width=10cm]{jacobian-8.png}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ With $\Beta(1,1)$ prior, the posterior is $\Beta(9+1,1+1)$
+ $\Beta(9+1,1+1)$, but x-axis shows the unconstrained $\logit(\theta)$
+ \includegraphics[width=10cm]{jacobian-10.png}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ ...but we need to take into account the absolute value of the determinant of the Jacobian of the transformation $\theta(1-\theta)$
+ \vspace{.58\baselineskip}
+ \includegraphics[width=10cm]{jacobian-12.png}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ ...but we need to take into account Jacobian $\theta(1-\theta)$
+ Let's compare a wrong normal approximation...
+ \includegraphics[width=10cm]{jacobian-14.png}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ ...but we need to take into account Jacobian $\theta(1-\theta)$
+ Let's compare a wrong normal approximation and correct one
+ \includegraphics[width=10cm]{jacobian-16.png}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ Let's compare a wrong normal approximation and correct one
+ Sample from both approximations and show KDEs for draws
+ \includegraphics[width=10cm]{jacobian-17.png}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ Let's compare a wrong normal approximation and correct one
+ Inverse transform draws and show KDEs
+ \includegraphics[width=10cm]{jacobian-18.png}
+\begin{frame}{Normal approximation and parameter transformations}
+ \vspace{-.5\baselineskip}
+ Laplace approximation can be further improved with importance resampling
+ \vspace{.42\baselineskip}
+ \includegraphics[width=10cm]{jacobian-20.png}
+\begin{frame}{Other distributional approximations}
+ \begin{itemize}
+ \item<+-> Higher order derivatives at the mode can be used
+ \item<+-> Split-normal and split-$t$ by Geweke (1989) use additional
+ scaling along different principal axes
+ \item<+-> Other distributions can be used (e.g. $t$-distribution)
+ \item<+-> Instead of mode and Hessian at mode, e.g.
+ \begin{itemize}
+ \item variational inference (Ch 13)
+ \begin{itemize}
+ \item CS-E4820 - Machine Learning: Advanced Probabilistic Methods
+ \item CS-E4895 - Gaussian Processes
+ \item Stan has the ADVI algorithm (not very good implementaion)
+ \item Stan has Pathfinder algorithm (CmdStanR github version)
+ \item instead of normal, methods with flexible flow transformations
+ \end{itemize}
+ \item expectation propagation (Ch 13)
+ \item speed of these is usually between optimization and MCMC
+ \begin{itemize}
+ \item stochastic variational inference can be eeven slower than MCMC
+ \end{itemize}
+ \end{itemize}
+ \end{itemize}
+\begin{frame}{Pathfinder: Parallel quasi-Newton variational inference.}
+ \vspace{-.5\baselineskip}}
+ \vspace{-.5\baselineskip}}
+\footnotesize{Zhang, Carpenter, Gelman, and Vehtari
+ (2022). Pathfinder: Parallel quasi-Newton variational
+ inference. \textit{Journal of Machine Learning Research},
+ 23(306):1--49.}
+\begin{frame}{Distributional approximations}
+ \vspace{-\baselineskip}
+{\small {\color{blue} Exact}, {\color{red} Normal at mode}, {\color{forestgreen} Normal with variational inference}}
+ \makebox[12cm][t]{
+ \hspace{-.7cm}
+ \includegraphics[width=12cm]{cond_excat_normfr2.pdf}\\
+ \includegraphics[width=12cm]{cond_excat_normfr3.pdf}
+ \end{minipage}
+ \vspace{-.5\baselineskip}
+ Grid sd(LD50) $\approx$ 0.090,\\
+ Normal sd(LD50) $\approx$ .75,
+ Normal + IR sd(LD50) $\approx$ 0.096 (Pareto-$k$ = 0.57)\\}
+ VI sd(LD50) $\approx$ 0.13,
+ VI + IR sd(LD50) $\approx$ 0.095 (Pareto-$k$ = 0.17)
+ }
+\begin{frame}{Variational inference}
+ \begin{itemize}
+ \item<+-> Variational inference includes a large number of methods
+ \item<+-> For a restricted set of models, possible to derive
+ deterministic algorithms
+ \begin{itemize}
+ \item can be fast and can be relatively accurate
+ \end{itemize}
+ \item<+-> Using stochastic (Monte Carlo) estimation of the
+ divergence, possible to derive generic black box algorithms
+ \begin{itemize}
+ \item<+-> possible to use use also mini-batching
+ \item<+-> can be fast and provide better predictive distribution than
+ Laplace approximation if the posterior is far from normal
+ \item<+-> in general, unlikely to achieve accuracy of HMC with the
+ same computation cost
+ \item<+-> with increasing number of posterior dimensions, the
+ obtained approximation gets worse {\small
+ (\href{https://papers.nips.cc/paper/2020/hash/7cac11e2f46ed46c339ec3d569853759-Abstract.html}{Dhaka,
+ Catalina, Andersen, Magnusson, Huggins, and Vehtari, 2020})}
+ \item<+-> with increasing number of posterior dimensions, the
+ stochastic divergence estimate gets worse and flows have problems,
+ too {\small
+ (\href{https://proceedings.neurips.cc/paper/2021/hash/404dcc91b2aeaa7caa47487d1483e48a-Abstract.html}{Dhaka,
+ Catalina, Andersen, Welandawe, Huggins, and Vehtari, 2021})}
+ \end{itemize}
+ \end{itemize}
+\begin{frame}{Large sample theory}
+ \begin{itemize}
+ \item Asymptotic normality
+ \begin{itemize}
+ \item<+-> as $n$ the number of observations $y_i$ increases the
+ posterior converges to normal distribution
+ \item<+-> can be shown by showing that
+ \begin{itemize}
+ \item eventually likelihood dominates the prior
+ \item the higher order terms in Taylor series increase slower
+ than the second order term
+ \end{itemize}
+ \item<+-> see counter examples
+ \end{itemize}
+\begin{frame}{Large sample theory}
+ \begin{itemize}
+ \item Assume "true" underlying data distribution $f(y)$
+ \begin{itemize}
+ \item observations $y_1,\ldots,y_n$ are independent samples from
+ the joint distribution $f(y)$
+ \item "true" data distribution $f(y)$ is not always well defined
+ \item in the following we proceed as if there were true underlying data distribution
+ \item for the theory the exact form of $f(y)$ is not important as
+ long at it has certain regularity conditions
+ \end{itemize}
+ \end{itemize}
+\begin{frame}{Large sample theory}
+ \begin{itemize}
+ % \item Asymptoottinen normaalius
+ % \begin{itemize}
+ % \item jakaumasta $f(y)$ saatujen havaintojen $y_i$ määrän $n$ kasvaessa
+ % parametrivektorin posteriorijakauma lähestyy normaalijakaumaa
+ % \end{itemize}
+ % \pause
+ \item Consistency
+ \begin{itemize}
+ \item if true distribution is included in the parametric family,
+ so that $f(y)=p(y|\theta_0)$ for some $\theta_0$, then posterior
+ converges to a point $\theta_0$, when $n\rightarrow\infty$
+ % \item<2-> a point doesn't have uncertainty
+ \item<2-> the same result as for maximum likelihood estimate
+ \end{itemize}
+ \item<3-> If true distribution is not included in the parametric family,
+ then there is no true $\theta_0$
+ \begin{itemize}
+ \item true $\theta_0$ is replaced with $\theta_0$ which minimizes
+ the Kullback-Leibler divergence from $f(y)$ to $p(y_i|\theta_0)$
+ % \begin{align*}
+ % H(\theta_0)=\int f(y_i) \log\left(\frac{f(y_i)}{p(y_i|\theta_0)}\right)dy_i
+ % \end{align*}
+% \item<5-> this point doesn't have uncertainty, but it's a wrong point!
+ \item<4-> the same result as for maximum likelihood estimate
+ \end{itemize}
+\begin{frame}{Large sample theory -- counter examples}
+ \begin{itemize}
+ \item Under- and non-identifiability
+ \begin{itemize}
+ \item a model is under-identifiable, if the model has parameters or parameter combinations for which there is no information in the data
+ \item then there is no single point $\theta_0$ where
+ posterior would converge
+ \item<2-> e.g. if the model is
+ \begin{align*}
+ y \sim \N(a+b+cx, \sigma)
+ \end{align*}
+ \begin{itemize}
+ \vspace{-\baselineskip}
+ \item<3-> posterior would converge to a line with prior
+ determining the density along the line
+ \end{itemize}
+ \item<4-> e.g. if we never observe $u$ and $v$ at the same time and the model is
+ \begin{equation*}
+ \begin{pmatrix}
+ u \\ v
+ \end{pmatrix}
+ \sim
+ \N\left(
+ \begin{pmatrix}
+ 0\\0
+ \end{pmatrix},
+ \begin{pmatrix}
+ 1 & \rho \\ \rho & 1
+ \end{pmatrix}
+ \right)
+ \end{equation*}
+ then correlation $\rho$ is non-identifiable
+ \begin{itemize}
+ \item<5-> e.g. $u$ and $v$ could be length and weight of
+ a student; if only one of them is measured for each student,
+ then $\rho$ is non-identifiable
+ \end{itemize}
+ \end{itemize}
+ \item<6-> Problem also for other inference methods like MCMC
+ \end{itemize}
+% \note{ongelma voidaan poistaa havaitsemalla ongelma, jos näitä
+% parametreja oikeasti tarvitsee estimoida hankitaan tarpeelliset havainnot}
+\begin{frame}{Asymptotic identifiability vs finite data case}
+ \begin{itemize}
+ \item If we randomly would measure both height and weight,
+ asymptotically the correlation $\rho$ would be identifiable
+ \item<2-> But a finite data from this data generating process may
+ lack the joint height and weight observations, and thus the the
+ finite data likelihood doesn't have information about $\rho$
+ \item<3-> If the likelihood is weakly informative for some
+ parameters, priors and integration are more important
+ \end{itemize}
+\begin{frame}{Large sample theory -- counter examples}
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item If the number of parameter increases as the number of
+ observation increases
+ \begin{itemize}
+ \item in some models number of parameters depends on the number
+ of observations
+ \item e.g. time series models $y_t \sim
+ \N(\theta_t,\sigma^2)$ and $\theta_t$ has prior in time
+ \item posterior of $\theta_t$ does not converge to a point, if
+ additional observations do not bring enough information
+ \end{itemize}
+ \end{itemize}
+\begin{frame}{Large sample theory -- counter examples}
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Aliasing (\emph{valetoisto} in Finnish)
+ \begin{itemize}
+ \item special case of under-identifiability where likelihood
+ repeats in separate points
+ \item e.g. mixture of normals
+ \begin{equation*}
+ p(y_i|\mu_1,\mu_2,\sigma_1^2,\sigma_2^2,\lambda)=\lambda\N(\mu_1,\sigma_1^2)+(1-\lambda)\N(\mu_2,\sigma_2^2)
+ \end{equation*}
+ \uncover<2->{
+ if $(\mu_1,\mu_2)$ are switched, $(\sigma_1^2,\sigma_2^2)$ are
+ switched and replace $\lambda$ with $(1-\lambda)$, model is
+ equivalent; posterior would usually have two modes which are
+ mirror images of each other and the posterior does not
+ converge to a single point}
+ \end{itemize}
+ \item<3-> For MCMC makes the convergence diagnostics more difficult,
+ as it is difficult to identify aliasing from other multimodality
+\begin{frame}{Large sample theory -- counter examples}
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Unbounded (\emph{rajoittamaton} in Finnish) likelihood
+ \begin{itemize}
+ \item if likelihood is unbounded it is possible that there is no
+ mode in the posterior
+ \item<2-> e.g. previous normal mixture model; assume $\lambda$ to be
+ known (and not $0$ or $1$); if we set $\mu_1=y_i$ for any $i$
+ and $\sigma_1^2\rightarrow 0$, then likelihood
+ $\rightarrow\infty$
+ \item<3-> if prior for $\sigma_1^2$ does not go to zero when
+ $\sigma_1^2\rightarrow 0$, then the posterior is unbounded
+ \item<4-> when $n\rightarrow\infty$ the number of likelihood
+ modes increases
+ \end{itemize}
+ \item<5-> Problem for any inference method including MCMC
+ \begin{itemize}
+ \item can be avoided with good priors
+ \item<6-> a prior close to a prior allowing unbounded
+ posterior may produce almost unbounded posterior
+ \end{itemize}
+ \end{itemize}
+% \note{esim. uniformi priori ei hyvä
+% esim. $1/\sigma^2$ priori ei hyvä
+% esim. $\Invchi2$-jakauma sopivilla parametreilla mahdollinen}
+\begin{frame}{Large sample theory -- counter examples}
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Improper posterior
+ \begin{itemize}
+ \item asymptotic results assume that probability sums to 1
+ \item e.g. Binomial model, with $\Beta(0,0)$ prior and observation $y=n$
+ \begin{itemize}
+ \item posterior $p(\theta|n,0)=\theta^{n-1}(1-\theta)^{-1}$
+ \item when $\theta\rightarrow 1$, then
+ $p(\theta|n,0)\rightarrow \infty$
+ \end{itemize}
+ \end{itemize}
+ \item<2-> Problem for any inference method including MCMC
+ \begin{itemize}
+ \item can be avoided with proper priors
+ \item<3-> a prior close to a improper prior may produce
+ almost improper posterior
+ \end{itemize}
+ \end{itemize}
+% \note{myös sellainen ei-aito priori käy, joka tuottaa aidon posteriorin
+% Muistakaa, että $\Beta(1,1)$ vastaa uniformiprioria ja
+% $\Beta(\frac{1}{2},\frac{1}{2})$ Jeffereysin prioria, joten
+% $\Beta(0,0)$ on vielä väljempi priori ja onkin ei-aito
+% posteriori $\Beta(\theta|n,0)=\theta^{n-1}(1-\theta)-1$\\
+% $\Beta(1|n,0)=1^{n-1}0^{-1} = 1/0 = \infty$
+% }
+\begin{frame}{Large sample theory -- counter examples}
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Prior distribution does not include the convergence point
+ \begin{itemize}
+ \item if in discrete case $p(\theta_0)=0$ or in continuous case
+ $p(\theta)=0$ in the neighborhood of $\theta_0$, then the
+ convergence results based on the dominance of the likelihood
+ do not hold
+ \end{itemize}
+ \item<2-> Should have a positive prior probability/density where needed
+\begin{frame}{Large sample theory -- counter examples}
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Convergence point at the edge of the parameter space
+ \begin{itemize}
+ \item if $\theta_0$ is on the edge of the parameter space,
+ Taylor series expansion has to be truncated, and normal
+ approximation does not necessarily hold
+ \item<2-> e.g. $y_i\sim\N(\theta,1)$ with a restriction $\theta\geq
+ 0$ and assume that $\theta_0=0$
+ \begin{itemize}
+ \item posterior of $\theta$ is left truncated normal
+ distribution with $\mu=\bar{y}$
+ \item in the limit $n\rightarrow\infty$ posterior is half
+ normal distribution \pause
+ \end{itemize}
+ \end{itemize}
+ \item Can be easy or difficult for MCMC
+ \end{itemize}
+% \begin{frame}{Large sample theory -- counter examples}
+% \begin{itemize}
+% \item Tails of the distribution
+% \begin{itemize}
+% \item normal approximation may be accurate for the most of the
+% posterior mass, but still be inaccurate for the tails
+% \item e.g. parameter which is constrained to be positive; given a
+% finite $n$, normal approximation assumes non-zero probability
+% for negative values
+% \end{itemize}
+% % \item Monte Carlo has different kind of problems with the tails
+% \end{itemize}
+% \end{frame}
+\begin{frame}{Frequency evaluations}
+ \begin{itemize}
+ \item Bayesian theory has epistemic and aleatory probabilities
+ \item Frequency evaluations focus on frequency properties given
+ aleatoric repetition of an observation and modeling
+ \begin{itemize}
+ \item It is useful to examine these for Bayesian inference, too
+ % \item<2-> Consistency
+ \item<2-> Asymptotic unbiasedness
+ \begin{itemize}
+ \item not that important in Bayesian inference, small and
+ decreasing error more important
+ \end{itemize}
+ \item<3-> Asymptotic efficiency
+ \begin{itemize}
+ \item no other point estimate with smaller squared error
+ \item useful also in Bayesian inference, but should consider
+ which utility/loss is important
+ \end{itemize}
+ \item<4-> Calibration
+ \begin{itemize}
+ \item $\alpha\%$-posterior interval has the true value in
+ $\alpha\%$ cases
+ \item $\alpha\%$-predictive interval has the true future values
+ in $\alpha\%$ cases
+ \item approximate calibration with shorter intervals for
+ likely true values more important than exact calibration
+ with very bad intervals for all possible values.
+ \end{itemize}
+ \end{itemize}
+ \end{itemize}
+ {\Large\color{navyblue} Frequentist statistics}
+ \begin{itemize}
+ \item Frequentist statistics accepts only aleatory probabilities
+ \begin{itemize}
+ \item Estimates are based on data
+ \item Uncertainty of estimates are based on all possible data
+ sets which could have been generated by the data generating
+ mechanism
+ \begin{itemize}
+ \item<2-> inference is based also on data we did not observe
+ \end{itemize}
+ \end{itemize}
+ \item<3-> Estimates are derived to fulfill frequency properties
+ \begin{itemize}
+ \item Maximum likelihood (often) fulfills asymptotic frequency
+ properties
+ \item Common finite data desiderata are 1) unbiasedness, 2)
+ minimum variance, 3) calibration of confidence interval
+ \end{itemize}
+ \end{itemize}
+ {\Large\color{navyblue} Frequentist statistics}
+ \begin{itemize}
+ \item Estimates are derived to fulfill frequency properties
+ \begin{itemize}
+ \item Maximum likelihood fulfills just asymptotic frequency
+ properties
+ \item Common desiderata are 1) unbiasedness, 2) minimum
+ variance, 3) calibration of confidence interval
+ \end{itemize}
+ \item Requirement of unbiasedness may lead to higher variance or
+ silly estimates
+ \begin{itemize}
+ \item unbiased estimate for strictly positive parameter can be
+ negative
+ \end{itemize}
+ \item<2-> Confidence interval is defined to have true value inside the
+ interval in $\alpha\%$ cases of repeated data generation from the
+ data generating mechanism
+ \begin{itemize}
+ % \item doesn't say how likely the true value is inside the interval
+ % given the observed data
+ \item doesn't need be useful to have perfect calibration
+ \end{itemize}
+ \end{itemize}
+ {\Large\color{navyblue} Frequentist vs Bayes vs others}
+ \begin{itemize}
+ \item There is a great amount of very useful frequentist statistics
+ \begin{itemize}
+ \item also for simple models and lot's of data there is not much
+ difference
+ \end{itemize}
+ \item<2-> Bayesian inference
+ \begin{itemize}
+ \item easier for complex, e.g. hierarchical, models
+ \item easier when model changes
+ \item a consistent way to add prior information
+ \end{itemize}
+ \item<3-> A lot of machine learning is not pure frequentist or
+ Bayesian
+ \end{itemize}
+%%% Local Variables:
+%%% TeX-PDF-mode: t
+%%% TeX-master: t
+%%% End:
diff --git a/slides/figs/helicopter_bo_a_1.pdf b/slides/figs/helicopter_bo_a_1.pdf
new file mode 100644
index 00000000..94afe9f6
Binary files /dev/null and b/slides/figs/helicopter_bo_a_1.pdf differ
diff --git a/slides/figs/helicopter_bo_a_10.pdf b/slides/figs/helicopter_bo_a_10.pdf
new file mode 100644
index 00000000..d59321e3
Binary files /dev/null and b/slides/figs/helicopter_bo_a_10.pdf differ
diff --git a/slides/figs/helicopter_bo_a_11.pdf b/slides/figs/helicopter_bo_a_11.pdf
new file mode 100644
index 00000000..8fd3cb8b
Binary files /dev/null and b/slides/figs/helicopter_bo_a_11.pdf differ
diff --git a/slides/figs/helicopter_bo_a_12.pdf b/slides/figs/helicopter_bo_a_12.pdf
new file mode 100644
index 00000000..fcdcf31c
Binary files /dev/null and b/slides/figs/helicopter_bo_a_12.pdf differ
diff --git a/slides/figs/helicopter_bo_a_13.pdf b/slides/figs/helicopter_bo_a_13.pdf
new file mode 100644
index 00000000..2c39934b
Binary files /dev/null and b/slides/figs/helicopter_bo_a_13.pdf differ
diff --git a/slides/figs/helicopter_bo_a_14.pdf b/slides/figs/helicopter_bo_a_14.pdf
new file mode 100644
index 00000000..e921b30b
Binary files /dev/null and b/slides/figs/helicopter_bo_a_14.pdf differ
diff --git a/slides/figs/helicopter_bo_a_15.pdf b/slides/figs/helicopter_bo_a_15.pdf
new file mode 100644
index 00000000..9cde35ea
Binary files /dev/null and b/slides/figs/helicopter_bo_a_15.pdf differ
diff --git a/slides/figs/helicopter_bo_a_16.pdf b/slides/figs/helicopter_bo_a_16.pdf
new file mode 100644
index 00000000..b3e1448d
Binary files /dev/null and b/slides/figs/helicopter_bo_a_16.pdf differ
diff --git a/slides/figs/helicopter_bo_a_17.pdf b/slides/figs/helicopter_bo_a_17.pdf
new file mode 100644
index 00000000..ef0d83ec
Binary files /dev/null and b/slides/figs/helicopter_bo_a_17.pdf differ
diff --git a/slides/figs/helicopter_bo_a_18.pdf b/slides/figs/helicopter_bo_a_18.pdf
new file mode 100644
index 00000000..0b03cb15
Binary files /dev/null and b/slides/figs/helicopter_bo_a_18.pdf differ
diff --git a/slides/figs/helicopter_bo_a_19.pdf b/slides/figs/helicopter_bo_a_19.pdf
new file mode 100644
index 00000000..c74982de
Binary files /dev/null and b/slides/figs/helicopter_bo_a_19.pdf differ
diff --git a/slides/figs/helicopter_bo_a_2.pdf b/slides/figs/helicopter_bo_a_2.pdf
new file mode 100644
index 00000000..9f510c33
Binary files /dev/null and b/slides/figs/helicopter_bo_a_2.pdf differ
diff --git a/slides/figs/helicopter_bo_a_20.pdf b/slides/figs/helicopter_bo_a_20.pdf
new file mode 100644
index 00000000..3a196489
Binary files /dev/null and b/slides/figs/helicopter_bo_a_20.pdf differ
diff --git a/slides/figs/helicopter_bo_a_21.pdf b/slides/figs/helicopter_bo_a_21.pdf
new file mode 100644
index 00000000..45995cd8
Binary files /dev/null and b/slides/figs/helicopter_bo_a_21.pdf differ
diff --git a/slides/figs/helicopter_bo_a_22.pdf b/slides/figs/helicopter_bo_a_22.pdf
new file mode 100644
index 00000000..435432d0
Binary files /dev/null and b/slides/figs/helicopter_bo_a_22.pdf differ
diff --git a/slides/figs/helicopter_bo_a_23.pdf b/slides/figs/helicopter_bo_a_23.pdf
new file mode 100644
index 00000000..73b37689
Binary files /dev/null and b/slides/figs/helicopter_bo_a_23.pdf differ
diff --git a/slides/figs/helicopter_bo_a_24.pdf b/slides/figs/helicopter_bo_a_24.pdf
new file mode 100644
index 00000000..bbd081bb
Binary files /dev/null and b/slides/figs/helicopter_bo_a_24.pdf differ
diff --git a/slides/figs/helicopter_bo_a_25.pdf b/slides/figs/helicopter_bo_a_25.pdf
new file mode 100644
index 00000000..2e0aac3f
Binary files /dev/null and b/slides/figs/helicopter_bo_a_25.pdf differ
diff --git a/slides/figs/helicopter_bo_a_26.pdf b/slides/figs/helicopter_bo_a_26.pdf
new file mode 100644
index 00000000..0b51bc1a
Binary files /dev/null and b/slides/figs/helicopter_bo_a_26.pdf differ
diff --git a/slides/figs/helicopter_bo_a_27.pdf b/slides/figs/helicopter_bo_a_27.pdf
new file mode 100644
index 00000000..b24dd186
Binary files /dev/null and b/slides/figs/helicopter_bo_a_27.pdf differ
diff --git a/slides/figs/helicopter_bo_a_28.pdf b/slides/figs/helicopter_bo_a_28.pdf
new file mode 100644
index 00000000..56b4b12e
Binary files /dev/null and b/slides/figs/helicopter_bo_a_28.pdf differ
diff --git a/slides/figs/helicopter_bo_a_29.pdf b/slides/figs/helicopter_bo_a_29.pdf
new file mode 100644
index 00000000..e6bbab2f
Binary files /dev/null and b/slides/figs/helicopter_bo_a_29.pdf differ
diff --git a/slides/figs/helicopter_bo_a_3.pdf b/slides/figs/helicopter_bo_a_3.pdf
new file mode 100644
index 00000000..d223cfcd
Binary files /dev/null and b/slides/figs/helicopter_bo_a_3.pdf differ
diff --git a/slides/figs/helicopter_bo_a_4.pdf b/slides/figs/helicopter_bo_a_4.pdf
new file mode 100644
index 00000000..75776460
Binary files /dev/null and b/slides/figs/helicopter_bo_a_4.pdf differ
diff --git a/slides/figs/helicopter_bo_a_5.pdf b/slides/figs/helicopter_bo_a_5.pdf
new file mode 100644
index 00000000..a531f0c5
Binary files /dev/null and b/slides/figs/helicopter_bo_a_5.pdf differ
diff --git a/slides/figs/helicopter_bo_a_6.pdf b/slides/figs/helicopter_bo_a_6.pdf
new file mode 100644
index 00000000..99e631bf
Binary files /dev/null and b/slides/figs/helicopter_bo_a_6.pdf differ
diff --git a/slides/figs/helicopter_bo_a_7.pdf b/slides/figs/helicopter_bo_a_7.pdf
new file mode 100644
index 00000000..5bc9db56
Binary files /dev/null and b/slides/figs/helicopter_bo_a_7.pdf differ
diff --git a/slides/figs/helicopter_bo_a_8.pdf b/slides/figs/helicopter_bo_a_8.pdf
new file mode 100644
index 00000000..6d877e90
Binary files /dev/null and b/slides/figs/helicopter_bo_a_8.pdf differ
diff --git a/slides/figs/helicopter_bo_a_9.pdf b/slides/figs/helicopter_bo_a_9.pdf
new file mode 100644
index 00000000..efef7ed2
Binary files /dev/null and b/slides/figs/helicopter_bo_a_9.pdf differ
diff --git a/slides/figs/helicopter_bo_b_1.pdf b/slides/figs/helicopter_bo_b_1.pdf
new file mode 100644
index 00000000..6c27f855
Binary files /dev/null and b/slides/figs/helicopter_bo_b_1.pdf differ
diff --git a/slides/figs/helicopter_bo_b_10.pdf b/slides/figs/helicopter_bo_b_10.pdf
new file mode 100644
index 00000000..81cc418b
Binary files /dev/null and b/slides/figs/helicopter_bo_b_10.pdf differ
diff --git a/slides/figs/helicopter_bo_b_11.pdf b/slides/figs/helicopter_bo_b_11.pdf
new file mode 100644
index 00000000..23f8a803
Binary files /dev/null and b/slides/figs/helicopter_bo_b_11.pdf differ
diff --git a/slides/figs/helicopter_bo_b_12.pdf b/slides/figs/helicopter_bo_b_12.pdf
new file mode 100644
index 00000000..c67a1f8a
Binary files /dev/null and b/slides/figs/helicopter_bo_b_12.pdf differ
diff --git a/slides/figs/helicopter_bo_b_13.pdf b/slides/figs/helicopter_bo_b_13.pdf
new file mode 100644
index 00000000..78ee6758
Binary files /dev/null and b/slides/figs/helicopter_bo_b_13.pdf differ
diff --git a/slides/figs/helicopter_bo_b_14.pdf b/slides/figs/helicopter_bo_b_14.pdf
new file mode 100644
index 00000000..78a526ce
Binary files /dev/null and b/slides/figs/helicopter_bo_b_14.pdf differ
diff --git a/slides/figs/helicopter_bo_b_15.pdf b/slides/figs/helicopter_bo_b_15.pdf
new file mode 100644
index 00000000..79ca5d11
Binary files /dev/null and b/slides/figs/helicopter_bo_b_15.pdf differ
diff --git a/slides/figs/helicopter_bo_b_16.pdf b/slides/figs/helicopter_bo_b_16.pdf
new file mode 100644
index 00000000..9b6e1b34
Binary files /dev/null and b/slides/figs/helicopter_bo_b_16.pdf differ
diff --git a/slides/figs/helicopter_bo_b_17.pdf b/slides/figs/helicopter_bo_b_17.pdf
new file mode 100644
index 00000000..41174028
Binary files /dev/null and b/slides/figs/helicopter_bo_b_17.pdf differ
diff --git a/slides/figs/helicopter_bo_b_18.pdf b/slides/figs/helicopter_bo_b_18.pdf
new file mode 100644
index 00000000..8187836b
Binary files /dev/null and b/slides/figs/helicopter_bo_b_18.pdf differ
diff --git a/slides/figs/helicopter_bo_b_19.pdf b/slides/figs/helicopter_bo_b_19.pdf
new file mode 100644
index 00000000..e928323a
Binary files /dev/null and b/slides/figs/helicopter_bo_b_19.pdf differ
diff --git a/slides/figs/helicopter_bo_b_2.pdf b/slides/figs/helicopter_bo_b_2.pdf
new file mode 100644
index 00000000..75566cd7
Binary files /dev/null and b/slides/figs/helicopter_bo_b_2.pdf differ
diff --git a/slides/figs/helicopter_bo_b_20.pdf b/slides/figs/helicopter_bo_b_20.pdf
new file mode 100644
index 00000000..72b565db
Binary files /dev/null and b/slides/figs/helicopter_bo_b_20.pdf differ
diff --git a/slides/figs/helicopter_bo_b_21.pdf b/slides/figs/helicopter_bo_b_21.pdf
new file mode 100644
index 00000000..dafa8a2d
Binary files /dev/null and b/slides/figs/helicopter_bo_b_21.pdf differ
diff --git a/slides/figs/helicopter_bo_b_22.pdf b/slides/figs/helicopter_bo_b_22.pdf
new file mode 100644
index 00000000..7b2fd634
Binary files /dev/null and b/slides/figs/helicopter_bo_b_22.pdf differ
diff --git a/slides/figs/helicopter_bo_b_23.pdf b/slides/figs/helicopter_bo_b_23.pdf
new file mode 100644
index 00000000..507cae44
Binary files /dev/null and b/slides/figs/helicopter_bo_b_23.pdf differ
diff --git a/slides/figs/helicopter_bo_b_24.pdf b/slides/figs/helicopter_bo_b_24.pdf
new file mode 100644
index 00000000..13a88558
Binary files /dev/null and b/slides/figs/helicopter_bo_b_24.pdf differ
diff --git a/slides/figs/helicopter_bo_b_25.pdf b/slides/figs/helicopter_bo_b_25.pdf
new file mode 100644
index 00000000..2d10ae91
Binary files /dev/null and b/slides/figs/helicopter_bo_b_25.pdf differ
diff --git a/slides/figs/helicopter_bo_b_26.pdf b/slides/figs/helicopter_bo_b_26.pdf
new file mode 100644
index 00000000..2180d5d4
Binary files /dev/null and b/slides/figs/helicopter_bo_b_26.pdf differ
diff --git a/slides/figs/helicopter_bo_b_27.pdf b/slides/figs/helicopter_bo_b_27.pdf
new file mode 100644
index 00000000..a2328295
Binary files /dev/null and b/slides/figs/helicopter_bo_b_27.pdf differ
diff --git a/slides/figs/helicopter_bo_b_28.pdf b/slides/figs/helicopter_bo_b_28.pdf
new file mode 100644
index 00000000..08047465
Binary files /dev/null and b/slides/figs/helicopter_bo_b_28.pdf differ
diff --git a/slides/figs/helicopter_bo_b_29.pdf b/slides/figs/helicopter_bo_b_29.pdf
new file mode 100644
index 00000000..c43d4da8
Binary files /dev/null and b/slides/figs/helicopter_bo_b_29.pdf differ
diff --git a/slides/figs/helicopter_bo_b_3.pdf b/slides/figs/helicopter_bo_b_3.pdf
new file mode 100644
index 00000000..e4a38684
Binary files /dev/null and b/slides/figs/helicopter_bo_b_3.pdf differ
diff --git a/slides/figs/helicopter_bo_b_4.pdf b/slides/figs/helicopter_bo_b_4.pdf
new file mode 100644
index 00000000..34772749
Binary files /dev/null and b/slides/figs/helicopter_bo_b_4.pdf differ
diff --git a/slides/figs/helicopter_bo_b_5.pdf b/slides/figs/helicopter_bo_b_5.pdf
new file mode 100644
index 00000000..975e829c
Binary files /dev/null and b/slides/figs/helicopter_bo_b_5.pdf differ
diff --git a/slides/figs/helicopter_bo_b_6.pdf b/slides/figs/helicopter_bo_b_6.pdf
new file mode 100644
index 00000000..7b02831d
Binary files /dev/null and b/slides/figs/helicopter_bo_b_6.pdf differ
diff --git a/slides/figs/helicopter_bo_b_7.pdf b/slides/figs/helicopter_bo_b_7.pdf
new file mode 100644
index 00000000..3be28d5f
Binary files /dev/null and b/slides/figs/helicopter_bo_b_7.pdf differ
diff --git a/slides/figs/helicopter_bo_b_8.pdf b/slides/figs/helicopter_bo_b_8.pdf
new file mode 100644
index 00000000..ec524695
Binary files /dev/null and b/slides/figs/helicopter_bo_b_8.pdf differ
diff --git a/slides/figs/helicopter_bo_b_9.pdf b/slides/figs/helicopter_bo_b_9.pdf
new file mode 100644
index 00000000..3bac8a95
Binary files /dev/null and b/slides/figs/helicopter_bo_b_9.pdf differ
diff --git a/slides/figs/helicopter_bo_initial_data.pdf b/slides/figs/helicopter_bo_initial_data.pdf
new file mode 100644
index 00000000..5b9b2a4b
Binary files /dev/null and b/slides/figs/helicopter_bo_initial_data.pdf differ
diff --git a/slides/figs/helicopter_bo_initial_fit.pdf b/slides/figs/helicopter_bo_initial_fit.pdf
new file mode 100644
index 00000000..95df83b4
Binary files /dev/null and b/slides/figs/helicopter_bo_initial_fit.pdf differ
diff --git a/slides/figs/helicopter_bo_initial_fit_draws.pdf b/slides/figs/helicopter_bo_initial_fit_draws.pdf
new file mode 100644
index 00000000..b1e4a502
Binary files /dev/null and b/slides/figs/helicopter_bo_initial_fit_draws.pdf differ
diff --git a/slides/figs/helicopter_bo_maximizing_density.pdf b/slides/figs/helicopter_bo_maximizing_density.pdf
new file mode 100644
index 00000000..282130d1
Binary files /dev/null and b/slides/figs/helicopter_bo_maximizing_density.pdf differ
diff --git a/slides/figs/helicopter_bo_maximizing_density_2.pdf b/slides/figs/helicopter_bo_maximizing_density_2.pdf
new file mode 100644
index 00000000..f0f8b9b0
Binary files /dev/null and b/slides/figs/helicopter_bo_maximizing_density_2.pdf differ
diff --git a/slides/figs/helicopter_hier_stable.pdf b/slides/figs/helicopter_hier_stable.pdf
new file mode 100644
index 00000000..da37e936
Binary files /dev/null and b/slides/figs/helicopter_hier_stable.pdf differ
diff --git a/slides/figs/helicopter_hier_time.pdf b/slides/figs/helicopter_hier_time.pdf
new file mode 100644
index 00000000..37f62d80
Binary files /dev/null and b/slides/figs/helicopter_hier_time.pdf differ
diff --git a/slides/figs/jacobian-10.png b/slides/figs/jacobian-10.png
new file mode 100644
index 00000000..e93201b7
Binary files /dev/null and b/slides/figs/jacobian-10.png differ
diff --git a/slides/figs/jacobian-12.png b/slides/figs/jacobian-12.png
new file mode 100644
index 00000000..0fb1f8f5
Binary files /dev/null and b/slides/figs/jacobian-12.png differ
diff --git a/slides/figs/jacobian-14.png b/slides/figs/jacobian-14.png
new file mode 100644
index 00000000..b3f82e33
Binary files /dev/null and b/slides/figs/jacobian-14.png differ
diff --git a/slides/figs/jacobian-16.png b/slides/figs/jacobian-16.png
new file mode 100644
index 00000000..347465fd
Binary files /dev/null and b/slides/figs/jacobian-16.png differ
diff --git a/slides/figs/jacobian-17.png b/slides/figs/jacobian-17.png
new file mode 100644
index 00000000..dab2c334
Binary files /dev/null and b/slides/figs/jacobian-17.png differ
diff --git a/slides/figs/jacobian-18.png b/slides/figs/jacobian-18.png
new file mode 100644
index 00000000..a6c04130
Binary files /dev/null and b/slides/figs/jacobian-18.png differ
diff --git a/slides/figs/jacobian-20.png b/slides/figs/jacobian-20.png
new file mode 100644
index 00000000..0f214e84
Binary files /dev/null and b/slides/figs/jacobian-20.png differ
diff --git a/slides/figs/jacobian-6.png b/slides/figs/jacobian-6.png
new file mode 100644
index 00000000..2b39b5db
Binary files /dev/null and b/slides/figs/jacobian-6.png differ
diff --git a/slides/figs/jacobian-7.png b/slides/figs/jacobian-7.png
new file mode 100644
index 00000000..47a3e40e
Binary files /dev/null and b/slides/figs/jacobian-7.png differ
diff --git a/slides/figs/jacobian-8.png b/slides/figs/jacobian-8.png
new file mode 100644
index 00000000..4e719e81
Binary files /dev/null and b/slides/figs/jacobian-8.png differ
diff --git a/slides/figs/pathfinder_funnel_example.pdf b/slides/figs/pathfinder_funnel_example.pdf
new file mode 100644
index 00000000..8d0534b7
Binary files /dev/null and b/slides/figs/pathfinder_funnel_example.pdf differ
diff --git a/slides/figs/pathfinder_logit_example.pdf b/slides/figs/pathfinder_logit_example.pdf
new file mode 100644
index 00000000..37313fd6
Binary files /dev/null and b/slides/figs/pathfinder_logit_example.pdf differ
diff --git a/slides/figs/sales_dist1.pdf b/slides/figs/sales_dist1.pdf
index ed1cec56..da1a12fa 100644
Binary files a/slides/figs/sales_dist1.pdf and b/slides/figs/sales_dist1.pdf differ
diff --git a/slides/figs/sales_dist2.pdf b/slides/figs/sales_dist2.pdf
index 42536727..f71f12b2 100644
Binary files a/slides/figs/sales_dist2.pdf and b/slides/figs/sales_dist2.pdf differ
diff --git a/slides/figs/sales_exputil.pdf b/slides/figs/sales_exputil.pdf
index 34d03719..925b909e 100644
Binary files a/slides/figs/sales_exputil.pdf and b/slides/figs/sales_exputil.pdf differ
diff --git a/slides/figs/sales_utility_20_30.pdf b/slides/figs/sales_utility_20_30.pdf
index b1cba3c3..160cfd14 100644
Binary files a/slides/figs/sales_utility_20_30.pdf and b/slides/figs/sales_utility_20_30.pdf differ
diff --git a/slides/figs/sales_utilprob_20_30.pdf b/slides/figs/sales_utilprob_20_30.pdf
index 8b2f8e58..a600126e 100644
Binary files a/slides/figs/sales_utilprob_20_30.pdf and b/slides/figs/sales_utilprob_20_30.pdf differ
diff --git a/slides/figs/sales_utilprob_exputil_20_30.pdf b/slides/figs/sales_utilprob_exputil_20_30.pdf
index d7ce4624..5890d200 100644
Binary files a/slides/figs/sales_utilprob_exputil_20_30.pdf and b/slides/figs/sales_utilprob_exputil_20_30.pdf differ
diff --git a/slides/figs/student_retention_sbinom_linpreds10.pdf b/slides/figs/student_retention_sbinom_linpreds10.pdf
new file mode 100644
index 00000000..af0163bc
Binary files /dev/null and b/slides/figs/student_retention_sbinom_linpreds10.pdf differ