diff --git a/Aalto2023.Rmd b/Aalto2023.Rmd
index f804f137..92dc3301 100644
--- a/Aalto2023.Rmd
+++ b/Aalto2023.Rmd
@@ -199,7 +199,7 @@ the book chapters relaed to the next lecture and assignment.
| 7\. Hierarchical models and exchangeability | [BDA3 Chapter 5](BDA3_notes.html#ch5) | [2023 Lecture 7.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=c1014690-1133-4232-ad0f-b0a400ba228d),
[2023 Lecture 7.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=196c3a91-3ba2-4469-ab15-b0a400ca6074),
[2022 Project info](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=8f0158f9-6abf-4ada-bdb7-af3800d139de),
[Slides 7](slides/BDA_lecture_7.pdf) | [Assignment 7](assignments/assignment7.html) | `r sdate("Lecture date", "Week8")` | `r sdate("Assignment closes (23:59)", "Week8")` |
| 8\. Model checking & cross-validation | [BDA3 Chapter 6](BDA3_notes.html#ch6), [BDA3 Chapter 7](BDA3_notes.html#ch7), [Visualization in Bayesian workflow](https://doi.org/10.1111/rssa.12378), [Practical Bayesian cross-validation](https://arxiv.org/abs/1507.04544) | [2023 Lecture 8.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=785ece8a-16ef-4f64-8134-b0ab00cbd1e8),
[2023 Lecture 8.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=456afda7-0e6d-4903-b0df-b0ab00da8f1e),
[Slides 8a](slides/BDA_lecture_8a.pdf),[Slides 8b](slides/BDA_lecture_8b.pdf) | Start project work | `r sdate("Lecture date", "Week9")` | `r sdate("Assignment closes (23:59)", "Week9")` |
| 9\. Model comparison, selection, and hypothesis testing | [BDA3 Chapter 7 (not 7.2 and 7.3)](BDA3_notes.html#ch7),
[Practical Bayesian cross-validation](https://arxiv.org/abs/1507.04544) | [2023 Lecture 9.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a4961b5a-7e42-4603-8aaf-b0b200ca6295),
[2023 Lecture 9.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a4796c79-eab2-436e-b55f-b0b200dac7ce),
[Slides 9](slides/BDA_lecture_9.pdf) | [Assignment 8](assignments/assignment8.html) | `r sdate("Lecture date", "Week10")` | `r sdate("Assignment closes (23:59)", "Week10")` |
-| 10\. Decision analysis | [BDA3 Chapter 9](BDA3_notes.html#ch9) | [2022 Lecture 10.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a22aab9c-953c-4ea8-b6ec-af4d00c9fe58),
[Slides 10](slides/BDA_lecture_10.pdf) | [Assignment 9](assignments/assignment9.html) | `r sdate("Lecture date", "Week11")` | `r sdate("Assignment closes (23:59)", "Week11")` |
+| 10\. Decision analysis | [BDA3 Chapter 9](BDA3_notes.html#ch9) | [2022 Lecture 10.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a22aab9c-953c-4ea8-b6ec-af4d00c9fe58),
[Slides 10a](slides/BDA_lecture_10a.pdf), [Slides 10b](slides/BDA_lecture_10b.pdf) | [Assignment 9](assignments/assignment9.html) | `r sdate("Lecture date", "Week11")` | `r sdate("Assignment closes (23:59)", "Week11")` |
| 11\. Normal approximation, frequency properties | [BDA3 Chapter 4](BDA3_notes.html#ch4) | [2022 Lecture 11.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=8cde4d40-1b77-4110-af98-af5400ca38b5),
[2022 Lecture 11.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=d83f6553-1516-475f-8898-af5400dd7b50),
[Slides 11](slides/BDA_lecture_11.pdf) | Project work | `r sdate("Lecture date", "Week12")` | `r sdate("Assignment closes (23:59)", "Week12")` |
| 12\. Extended topics | Optional: BDA3 Chapters [8](BDA_notes.hml#ch8), [14-18](BDA_notes.hml#ch14), and 21 | Optional:
[Old Lecture 12.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=e998b5dd-bf8e-42da-9f7c-ab1700ca2702),
[Old Lecture 12.2](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=c43c862a-a5a4-45da-9b27-ab1700e12012),
[Slides 12](slides/BDA_lecture_12.pdf) | Project work | `r sdate("Lecture date", "Week13")` | `r sdate("Assignment closes (23:59)", "Week13")` |
| 13\. Project evaluation | | | | Project presentations: `r params$project_presentations` | Evaluation week |
@@ -526,16 +526,17 @@ variable selection with projection predictive variable selection.
`r sdate("Lecture date", "Week10", offset = 4)` 10-12.
- Start reading Chapter 9, see instructions below.
-### 10) BDA3 Ch 9, decision analysis
+### 10) BDA3 Ch 9, decision analysis + BDA3 Ch 4 Laplace approximation and asymptotics
-Decision analysis. BDA3 Ch 9.
+Decision analysis. BDA3 Ch 9. + Laplace approximation and asymptotics. BDA Ch 4.
-- Read Chapter 9
+- Read Chapter 9 and 4
- see [reading instructions for Chapter 9](BDA3_notes.html#ch9)
+ - see [reading instructions for Chapter 4](BDA3_notes.html#ch4)
- **Lecture `r paste(sday("Lecture date", "Week11"), sdate("Lecture date", "Week11"))` 14:15-16, hall T2, CS building**
- - [Slides 10](slides/BDA_lecture_10.pdf)
+ - [Slides 10a](slides/BDA_lecture_10a.pdf), [Slides 10b](slides/BDA_lecture_10b.pdf)
- Videos: [2022 Lecture 10.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a22aab9c-953c-4ea8-b6ec-af4d00c9fe58)
- on decision analysis. BDA3 Ch 9.
+ on decision analysis. BDA3 Ch 9, and [2022 Lecture 11.1](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=8cde4d40-1b77-4110-af98-af5400ca38b5) on normal approximation (Laplace approximation), large sample theory, and counter examples, BDA3 Ch 4.
- Make and submit [Assignment 9](assignments/assignment9.html).
**`r sday("Assignment closes (23:59)", "Week11")` `r sdate("Assignment closes (23:59)", "Week11")` 23:59**
- Review Assignment 8 done by your peers before 23:59
diff --git a/Aalto2023.html b/Aalto2023.html
index 857cc11e..b62a3954 100644
--- a/Aalto2023.html
+++ b/Aalto2023.html
@@ -1612,7 +1612,7 @@
Bayesian Data Analysis course - Aalto
2023
-Page updated: 2023-11-07
+Page updated: 2023-11-13
@@ -1914,8 +1914,8 @@ Schedule overview
10. Decision analysis |
BDA3 Chapter 9 |
2022
-Lecture 10.1, Slides
-10 |
+Lecture 10.1,
Slides
+10a, Slides 10b
Assignment 9 |
13.11. |
19.11. |
@@ -2330,21 +2330,27 @@ 9) BDA3 Ch 7, extra material, model comparison and selection
Start reading Chapter 9, see instructions below.
-
-
10) BDA3 Ch 9, decision analysis
-
Decision analysis. BDA3 Ch 9.
+
+
10) BDA3 Ch 9, decision analysis + BDA3 Ch 4 Laplace approximation
+and asymptotics
+
Decision analysis. BDA3 Ch 9. + Laplace approximation and
+asymptotics. BDA Ch 4.
-- Read Chapter 9
+
- Read Chapter 9 and 4
- Lecture Monday 13.11. 14:15-16, hall T2, CS
building
Make and submit Assignment
9. Sunday 19.11. 23:59
diff --git a/slides/BDA_lecture_10a.pdf b/slides/BDA_lecture_10a.pdf
new file mode 100644
index 00000000..fa39664b
Binary files /dev/null and b/slides/BDA_lecture_10a.pdf differ
diff --git a/slides/BDA_lecture_10a.tex b/slides/BDA_lecture_10a.tex
new file mode 100644
index 00000000..962cd2f4
--- /dev/null
+++ b/slides/BDA_lecture_10a.tex
@@ -0,0 +1,549 @@
+\documentclass[t]{beamer}
+%\documentclass[finnish,english,handout]{beamer}
+
+\usepackage[T1]{fontenc}
+\usepackage[utf8]{inputenc}
+\usepackage{newtxtext} % times
+%\usepackage[scaled=.95]{cabin} % sans serif
+\usepackage{amsmath}
+\usepackage[varqu,varl]{inconsolata} % typewriter
+\usepackage[varg]{newtxmath}
+\usefonttheme[onlymath]{serif} % beamer font theme
+\usepackage{microtype}
+\usepackage{afterpage}
+\usepackage{url}
+\urlstyle{same}
+% \usepackage{amsbsy}
+% \usepackage{eucal}
+\usepackage{rotating}
+\usepackage{listings}
+\usepackage{lstbayes}
+\usepackage[all,poly,ps,color]{xy}
+\usepackage{eurosym}
+
+\usepackage{natbib}
+\bibliographystyle{apalike}
+
+\mode
+{
+ \setbeamercovered{invisible}
+ \setbeamertemplate{itemize items}[circle]
+ \setbeamercolor{frametitle}{bg=white,fg=navyblue}
+ \setbeamertemplate{navigation symbols}{}
+ \setbeamertemplate{headline}[default]{}
+ \setbeamertemplate{footline}[split]
+ % \setbeamertemplate{headline}[text line]{\insertsection}
+ % \setbeamertemplate{footline}[frame number]
+}
+
+\pdfinfo{
+ /Title (BDA, Lecture 10)
+ /Author (Aki Vehtari) %
+ /Keywords (Bayesian data analysis)
+}
+
+\definecolor{forestgreen}{rgb}{0.1333,0.5451,0.1333}
+\definecolor{navyblue}{rgb}{0,0,0.5}
+\definecolor{list1}{rgb}{0,0.2549,0.6784}
+\renewcommand{\emph}[1]{\textcolor{navyblue}{#1}}
+\definecolor{set11}{HTML}{E41A1C}
+\definecolor{set12}{HTML}{377EB8}
+\definecolor{set13}{HTML}{4DAF4A}
+
+\graphicspath{{./figs/}}
+
+
+\parindent=0pt
+\parskip=8pt
+\tolerance=9000
+\abovedisplayshortskip=0pt
+
+%\renewcommand{\itemsep}{0pt}
+% Lists
+\newenvironment{list1}{
+ \begin{list}{$\color{list1}\bullet$}{\itemsep=6pt}}{
+ \end{list}}
+\newenvironment{list1s}{
+ \begin{list}{$\includegraphics[width=5pt]{logo.eps}$}{\itemsep=6pt}}{
+ \end{list}}
+\newenvironment{list2}{
+ \begin{list}{-}{\baselineskip=12pt\itemsep=2pt}}{
+ \end{list}}
+\newenvironment{list3}{
+ \begin{list}{$\cdot$}{\baselineskip=15pt}}{
+ \end{list}}
+
+\def\o{{\mathbf o}}
+\def\t{{\mathbf \theta}}
+\def\w{{\mathbf w}}
+\def\x{{\mathbf x}}
+\def\y{{\mathbf y}}
+\def\z{{\mathbf z}}
+
+\def\peff{p_{\mathrm{eff}}}
+\def\eff{\mathrm{eff}}
+
+\DeclareMathOperator{\E}{E}
+\DeclareMathOperator{\Var}{Var}
+\DeclareMathOperator{\var}{var}
+\DeclareMathOperator{\Sd}{Sd}
+\DeclareMathOperator{\sd}{sd}
+\DeclareMathOperator{\Gammad}{Gamma}
+\DeclareMathOperator{\Invgamma}{Inv-gamma}
+\DeclareMathOperator{\Bin}{Bin}
+\DeclareMathOperator{\Negbin}{Neg-bin}
+\DeclareMathOperator{\Poisson}{Poisson}
+\DeclareMathOperator{\Beta}{Beta}
+\DeclareMathOperator{\logit}{logit}
+\DeclareMathOperator{\N}{N}
+\DeclareMathOperator{\U}{U}
+\DeclareMathOperator{\BF}{BF}
+\DeclareMathOperator{\Invchi2}{Inv-\chi^2}
+\DeclareMathOperator{\NInvchi2}{N-Inv-\chi^2}
+\DeclareMathOperator{\InvWishart}{Inv-Wishart}
+\DeclareMathOperator{\tr}{tr}
+% \DeclareMathOperator{\Pr}{Pr}
+\def\euro{{\footnotesize \EUR\, }}
+\DeclareMathOperator{\rep}{\mathrm{rep}}
+
+% \def\dashxy(#1){%
+% /xydash{[#1] 0 setdash}def}
+% \def\grayxy(#1){%
+% /xycolor{#1 setgray}def}
+% \newgraphescape{D}[1]{!{\ar @*{[!\dashxy(2 2)]} "#1"}}
+% \newgraphescape{P}[1]{!{\ar "#1"}}
+% \newgraphescape{F}[1]{!{*+=<2em>[F=]{#1}="#1"}}
+% \newgraphescape{O}[1]{!{*+=<2em>[F]{#1}="#1"}}
+% \newgraphescape{V}[1]{!{*+=<2em>[o][F]{#1}="#1"}}
+% \newgraphescape{B}[3]{!{{ "#1"*+#3\frm{} }.{ "#2"*+#3\frm{} } *+[F:!\grayxy(0.75)]\frm{}}}
+
+
+\title[]{Bayesian data analysis}
+\subtitle{}
+
+\author{Aki Vehtari}
+
+\institute[Aalto]{}
+
+\date[]{}
+
+%\beamerdefaultoverlayspecification{<+->}
+
+\begin{document}
+
+\begin{frame}{Chapter 9 Decision Analysis}
+
+ \begin{list1}
+\item 9.1 Context and basic steps (most important part)
+\item 9.2 Example
+\item 9.3 Multistage decision analysis (example)
+\item 9.4 Hierarchical decision analysis (example)
+\item 9.5 Personal vs. institutional decision analysis
+\end{list1}
+\end{frame}
+
+\begin{frame}{Bayesian decision theory}
+
+ \begin{list1}
+ \item<+-> Potential decisions ${\color{set12}d}$
+ \begin{list2}
+ \item or actions ${\color{set12}a}$
+ \end{list2}
+ \item<+-> Potential consequences ${\color{set11}x}$
+ \begin{list2}
+ \item ${\color{set11}x}$ may be categorical, ordinal, real, scalar, vector, etc.
+ \end{list2}
+ \item<+-> Probability distributions of consequences given decisions $p({\color{set11}x}\mid{\color{set12}d})$
+ \begin{list2}
+ \item in decision making the decisions are controlled and thus $p({\color{set12}d})$ does not exist
+ \end{list2}
+ \item<+-> Utility function ${\color{set13}U}({\color{set11}x})$ maps consequences to real value
+ \begin{list2}
+ \item e.g. euro or expected lifetime
+ \item instead of utility sometimes cost or loss is defined
+ \end{list2}
+ \vspace{-1mm}
+ \item<+-> Expected utility $\E[{\color{set13}U}({\color{set11}x})\mid{\color{set12}d}]=\int {\color{set13}U}({\color{set11}x}) p({\color{set11}x}\mid{\color{set12}d}) d{\color{set11}x}$
+ \item<+-> Choose decision ${\color{set12}d^*}$, which maximizes the expected utility
+ \begin{equation*}
+ {\color{set12}d^*}=\arg\max_{\color{set12}d} \E[{\color{set13}U}({\color{set11}x})\mid{\color{set12}d}]
+ \end{equation*}
+ \end{list1}
+
+\end{frame}
+
+\begin{frame}
+
+{\Large\color{navyblue} Example of decision making: 2 choices}
+\vspace{-0.5\baselineskip}
+\begin{list1}
+\item<+-> Helen is going to pick mushrooms in a forest, while she notices a
+ paw print which could made by a dog or a wolf
+\item<+-> Helen measures that the length of the paw print is 14 cm and
+ goes home to Google how big paws dogs and wolves have, and tries
+ then to infer which animal has made the paw print
+ \includegraphics[width=11cm]{hatutus_likelihoods}
+ observed length has been marked with a horizontal line
+\item<+-> Likelihood of wolf is 0.92 (alternative being dog)
+\end{list1}
+
+\end{frame}
+
+\begin{frame}
+
+{\Large\color{navyblue} Example of decision making}
+
+ \begin{list1}
+ \item<+-> Helen assumes also that in her living area there are about one
+ hundred times more free running dogs than wolves, that is {\em a
+ priori} probability for wolf, before observation is 1\%.
+ \item<+-> Likelihood and posterior
+ \begin{center}\leavevmode
+ \begin{tabular}{| l | c c |}
+ \hline
+ Animal & Likelihood & Posterior probability \\
+ \hline
+ Wolf & 0.92 & 0.10 \\
+ Dog & 0.08 & 0.90 \\
+ \hline
+ \end{tabular}
+ \end{center}
+ \item<+-> Posterior probability of wolf is 10\%
+ \end{list1}
+
+\end{frame}
+
+\begin{frame}
+
+{\Large\color{navyblue} Example of decision making}
+
+ \begin{list1}
+ \item<+-> Helen has to make decision whether to go pick mushrooms
+ \item<+-> If she doesn't go to pick mushrooms utility is zero
+ \item<+-> Helen assigns positive utility 1 for getting fresh mushrooms
+ \item<+-> Helen assigns negative utility -1000 for a event that she goes to the forest and wolf attacks (for some reason Helen assumes that wolf will always attack)\\
+ \vspace{0.5\baselineskip}
+ \uncover<+->{
+ \begin{minipage}[t]{58mm}
+ \small
+ \begin{tabular}{| l | c c |}
+ \hline
+ & \multicolumn{2}{ c |}{Animal} \\
+ Decision ${\color{set12}d}$ & Wolf & Dog \\
+ \hline
+ Stay home & 0 & 0 \\
+ Go to the forest & -1000 & 1 \\
+ \hline
+ \end{tabular}\\
+ {Utility matrix ${\color{set13}U}({\color{set11}x})$}
+ \end{minipage}
+ }
+ ~\\
+ \vspace{0.5\baselineskip}
+ \uncover<+->{
+ \begin{minipage}[t]{58mm}
+ \small
+ \begin{tabular}{| l | c | }
+ \hline
+ & Expected utility \\
+ Action $d$ & $\E[{\color{set13}U}({\color{set11}x})\mid{\color{set12}d}]$ \\
+ \hline
+ Stay home & 0 \\
+ Go to the forest & -100+0.9 \\
+ \hline
+ \end{tabular}\\
+ {Utilities for different actions}
+ \end{minipage}
+}
+ \end{list1}
+
+\end{frame}
+
+\begin{frame}{Example of decision making}
+
+ \begin{list1}
+ \item<+-> Maximum likelihood decision would be to assume that there is a wolf
+ \item<+-> Maximum posterior decision would be to assume that there is a dog
+ \item<+-> Maximum utility decision is to stay home, even if it is more likely that the animal is dog
+ \item<+-> Example illustrates that the uncertainties (probabilities)
+ related to all consequences need to be carried on until final
+ decision making
+ \end{list1}
+
+\end{frame}
+
+
+% \begin{frame}
+
+% {\Large\color{navyblue} Example of decision making: several choices}
+
+% \begin{list1}
+% \item Prof. Gelman has a jar of quarters
+% \begin{list2}
+% \item he first drew a line on the side of the jar and then
+% filled the jar up to the line, and so the number coins was not
+% chosen beforehand
+% \item Prof. Gelman does not know the number of coins in the jar
+% \item<2-> Prof. Gelman gives the class a chance to win the coins if
+% they guess the number of coins correctly (someone else has
+% counted the coins without telling Gelman)
+% \item<2-> How should the students make the decision?
+% \end{list2}
+% \end{list1}
+
+% \end{frame}
+
+\begin{frame}{Example of decision making: several choices}
+
+\begin{list1}
+\item You decide to earn money by selling a seasonal product
+ \begin{list2}
+ \item You pay 7€ per each, and sell them 10€ each
+ \item You need to decide how many ($N$) items to buy
+ \item<2-> You ask your friends how many they used to sell and estimate a
+ distribution for how many you might sell
+ \end{list2}
+\end{list1}
+\uncover<2>{\includegraphics[width=9.5cm]{sales_dist1.pdf}}
+
+\end{frame}
+
+\begin{frame}{Example of decision making: several choices}
+
+ {\includegraphics[width=9cm]{sales_dist2.pdf}}\\
+ \only<2>{\vspace{-0.5\baselineskip}\includegraphics[width=9cm]{sales_utility_20_30.pdf}}
+ \only<3>{\vspace{-0.5\baselineskip}\includegraphics[width=9cm]{sales_utilprob_20_30.pdf}}
+ \only<4>{\vspace{-0.5\baselineskip}\includegraphics[width=9cm]{sales_utilprob_exputil_20_30.pdf}}
+ \only<5>{\vspace{-0.5\baselineskip}\includegraphics[width=9cm]{sales_exputil.pdf}}
+
+\end{frame}
+
+\begin{frame}{Decision making in sales}
+
+ \begin{list1}
+ \item Common task in commerce and restaurants
+ \end{list1}
+
+\end{frame}
+
+
+\begin{frame}{Challenges in decision making}
+
+ \begin{list1}
+ \item Actual utility functions are rarely linear
+ \begin{list2}
+ \item<2-> the expected utility is 5€ for\\
+ a) 100\% of receiving 5€\\
+ b) 50\% of losing 1M€ and 50\% of winning 1.00001M€
+ \item<3-> most gambling has negative expected utility\\
+ (but the excitement of the game may have positive utility)
+ \end{list2}
+ \item<4-> What is the cost of human life?
+ \item<5-> Multiple parties having different utilities
+ \end{list1}
+
+\end{frame}
+
+\begin{frame}{Model selection as decision problem}
+
+ \begin{list1}
+ \item Choose the model that maximizes the expected utility of using
+ the model to make predictions / decisions in the future
+ \end{list1}
+
+\end{frame}
+
+\begin{frame}{Multi-stage decision making (Section 9.3)}
+
+ \vspace{-0.3\baselineskip}
+ \begin{list1}
+ \item<+-> 95 year old has a tumor that is malignant with 90\% prob
+ \item<+-> Based on statistics
+ \begin{list2}
+ \item<.-> expected lifetime is 34.8 months if no cancer
+ \item<+-> expected lifetime is 16.7 months if cancer and radiation therapy is used
+ \item<+-> expected lifetime is 20.3 months if cancer and surgery, but the probability of dying in surgery is 35\% (for 95 year old)
+ \item<+-> expected lifetime is 5.6 months if cancer and no treatment
+ \end{list2}
+ \item<+-> Which treatment to choose?
+ \begin{list2}
+ \item<.-> quality adjusted life time
+ \item<.-> 1 month is subtracted for the time spent in treatments
+ \end{list2}
+ \item<+-> Quality adjusted life time
+ \begin{list2}
+ \item<.-> See the book for the multi-stage decision making
+ % \item<.-> Radiotherapy: 0.9*16.7 + 0.1*34.8 - 1 = 17.5mo
+ % \item<+-> Surgery: 0.35*0 + 0.65*(0.9*20.3 + 0.1*34.8 - 1) = 13.5mo
+ % \item<+-> No treatment: 0.9*5.6 + 0.1*34.8 = 8.5mo
+ \end{list2}
+ % \item<+-> See the book for continuation of the example with
+ % additional test for cancer
+\end{list1}
+
+\end{frame}
+
+\begin{frame}{Design of experiment}
+
+ \begin{list1}
+ \item Which experiment would give most additional information
+ \begin{list2}
+ \item decide values $x_{n+1}$ for the next experiment
+ \item which values of $x_{n+1}$ would reduce the posterior
+ uncertainty or increase the expected utility most
+ \end{list2}
+ \item<2-> Example 1
+ \begin{list2}
+ \item biopsy in the cancer example
+ \end{list2}
+ \item<3-> Example 2
+ \begin{list2}
+ \item Imagine that in bioassay the posterior uncertainty of LD50 is too large
+ \item which dose should be used in the next experiment to reduce
+ the variance of LD50 as much as possible ?
+ \begin{list3}
+ \item this way less experiments need to be made (and less animals need to be killed)
+ \end{list3}
+ \end{list2}
+ \item<4-> Example 3
+ \begin{list2}
+ \item optimal paper helicopter wing length
+ \end{list2}
+ \end{list1}
+\end{frame}
+
+\begin{frame}{Bayesian optimization}
+
+ \begin{list1}
+ \item Design of experiment
+ \item Used to optimize, for example,
+ \begin{list2}
+ \item machine learning / deep learning model structures,
+ regularization, and learning algorithm parameters
+ \item material science
+ \item engines
+ \item drug testing
+ \item part of Bayesian inference for stochastic simulators
+ \end{list2}
+ \end{list1}
+
+\end{frame}
+
+\begin{frame}{Bayesian optimization of wing length}
+
+\only<1>{Start with a small number of experiments\\
+\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_initial_data.pdf}}
+\only<2>{Gaussian process model\\
+\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_initial_fit.pdf}}
+\only<3-4>{Gaussian process model -- posterior draws\\
+ \hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_initial_fit_draws.pdf}}
+\only<5>{Gaussian process model -- Thompson sampling\\
+ \hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_1.pdf}}
+\only<6>{Gaussian process model -- Thompson sampling\\
+ \hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_1.pdf}}
+
+{\vspace{-2\baselineskip}}
+\begin{list1}
+ \item<4-> Thompson sampling:
+ \begin{list2}
+ \item pick one posterior draw (function)
+ \item find the wing length corresponding to the max. of that draw
+ \item make the next observation with that wing length
+ \end{list2}
+ \end{list1}
+
+\end{frame}
+
+\begin{frame}{Bayesian optimization of wing length}
+
+ Gaussian process model -- Thompson sampling\\
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_2.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_2.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_3.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_3.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_4.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_4.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_5.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_5.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_6.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_6.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_7.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_7.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_8.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_8.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_9.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_9.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_10.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_10.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_11.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_11.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_12.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_12.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_13.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_13.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_14.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_14.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_15.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_15.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_16.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_16.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_17.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_17.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_18.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_18.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_19.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_19.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_20.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_20.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_21.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_21.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_22.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_22.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_23.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_23.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_24.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_24.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_25.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_25.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_26.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_26.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_27.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_27.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_28.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_b_28.pdf}}
+\only<+>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_a_29.pdf}}
+
+
+\end{frame}
+
+\begin{frame}{Bayesian optimization of wing length}
+
+ {\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_maximizing_density_2.pdf}\\
+ \uncover<2->{33 BO obs. post. Wasserstein-1 distance $\approx$ 0.77\\
+ 33 first obs. post. Wasserstein-1 distance $\approx$ 1.36}\\}
+ \uncover<3->{~\\We obtain about 50\% increase in efficiency}
+% \only<3>{\hspace{-8mm}\includegraphics[width=11.5cm]{helicopter_bo_maximizing_density.pdf}
+% 33 BO obs. post. Wasserstein-1 distance $\approx$ 0.77\\
+% 5 first + 28 random obs. post. Wasserstein-1 distance $\approx$ 1.27}
+
+\end{frame}
+
+\begin{frame}{Examples of big Bayesian decision making success stories}
+
+ \begin{list1}
+ \item Bayesian optimization of ML algorithms
+ \item Bayesian optimization of new medical molecules
+ \item Bayesian optimization of new materials
+ \item A/B testing
+ \item Customer retention / satisfaction
+ \item Marketing
+ \end{list1}
+
+\end{frame}
+
+\end{document}
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% End:
diff --git a/slides/BDA_lecture_10b.pdf b/slides/BDA_lecture_10b.pdf
new file mode 100644
index 00000000..d90b769a
Binary files /dev/null and b/slides/BDA_lecture_10b.pdf differ
diff --git a/slides/BDA_lecture_10b.tex b/slides/BDA_lecture_10b.tex
new file mode 100644
index 00000000..4b653170
--- /dev/null
+++ b/slides/BDA_lecture_10b.tex
@@ -0,0 +1,1177 @@
+\documentclass[t]{beamer}
+%\documentclass[finnish,english,handout]{beamer}
+
+% Uncomment if want to show notes
+% \setbeameroption{show notes}
+
+\mode
+{
+ % \usetheme{Copenhagen}
+ % oder ...
+
+ %\setbeamercovered{transparent}
+ % oder auch nicht
+}
+\setbeamertemplate{itemize items}[circle]
+\setbeamercolor{frametitle}{bg=white,fg=navyblue}
+
+
+\usepackage[T1]{fontenc}
+\usepackage[latin1]{inputenc}
+\usepackage{times}
+\usepackage{epic,epsfig}
+\usepackage{subfigure,float}
+\usepackage{amsmath,amsfonts,amssymb}
+\usepackage{inputenc}
+\usepackage{afterpage}
+\usepackage{url}
+\urlstyle{same}
+\usepackage{amsbsy}
+\usepackage{eucal}
+\usepackage{rotating}
+\usepackage{listings}
+\usepackage{lstbayes}
+\usepackage[all,poly,ps,color]{xy}
+\usepackage{eurosym}
+\usepackage{microtype}
+
+\usepackage{natbib}
+\bibliographystyle{apalike}
+
+\hypersetup{%
+ bookmarksopen=true,
+ bookmarksnumbered=true,
+ pdftitle={Stan},
+ pdfsubject={Bayesian data analysis},
+ pdfauthor={Aki Vehtari},
+ pdfkeywords={},
+ pdfstartview={FitH -32768},
+ colorlinks=true,
+ linkcolor=navyblue,
+ citecolor=navyblue,
+ filecolor=navyblue,
+ urlcolor=navyblue
+}
+
+% \definecolor{hutblue}{rgb}{0,0.2549,0.6784}
+% \definecolor{midnightblue}{rgb}{0.0977,0.0977,0.4375}
+% \definecolor{hutsilver}{rgb}{0.4863,0.4784,0.4784}
+% \definecolor{lightgray}{rgb}{0.95,0.95,0.95}
+% \definecolor{section}{rgb}{0,0.2549,0.6784}
+% \definecolor{list1}{rgb}{0,0.2549,0.6784}
+\definecolor{forestgreen}{rgb}{0.1333,0.5451,0.1333}
+\definecolor{navyblue}{rgb}{0,0,0.5}
+\renewcommand{\emph}[1]{\textcolor{navyblue}{#1}}
+
+\graphicspath{{./figs/}}
+
+\pdfinfo{
+ /Title (Bayesian data analysis 4)
+ /Author (Aki Vehtari) %
+ /Keywords (Bayesian probability theory, Bayesian inference, Bayesian data analysis)
+}
+
+
+\parindent=0pt
+\parskip=8pt
+\tolerance=9000
+\abovedisplayshortskip=0pt
+
+\setbeamertemplate{navigation symbols}{}
+\setbeamertemplate{headline}[default]{}
+\setbeamertemplate{headline}[text line]{\insertsection}
+\setbeamertemplate{footline}[frame number]
+
+
+\def\o{{\mathbf o}}
+\def\t{{\mathbf \theta}}
+\def\w{{\mathbf w}}
+\def\x{{\mathbf x}}
+\def\y{{\mathbf y}}
+\def\z{{\mathbf z}}
+
+\DeclareMathOperator{\E}{E}
+\DeclareMathOperator{\Var}{Var}
+\DeclareMathOperator{\var}{var}
+\DeclareMathOperator{\Sd}{Sd}
+\DeclareMathOperator{\sd}{sd}
+\DeclareMathOperator{\Gammad}{Gamma}
+\DeclareMathOperator{\Invgamma}{Inv-gamma}
+\DeclareMathOperator{\Bin}{Bin}
+\DeclareMathOperator{\Negbin}{Neg-bin}
+\DeclareMathOperator{\Poisson}{Poisson}
+\DeclareMathOperator{\Beta}{Beta}
+\DeclareMathOperator{\logit}{logit}
+\DeclareMathOperator{\N}{N}
+\DeclareMathOperator{\U}{U}
+\DeclareMathOperator{\BF}{BF}
+\DeclareMathOperator{\Invchi2}{Inv-\chi^2}
+\DeclareMathOperator{\NInvchi2}{N-Inv-\chi^2}
+\DeclareMathOperator{\InvWishart}{Inv-Wishart}
+\DeclareMathOperator{\tr}{tr}
+% \DeclareMathOperator{\Pr}{Pr}
+\def\euro{{\footnotesize \EUR\, }}
+\DeclareMathOperator{\rep}{\mathrm{rep}}
+
+
+
+\title[]{Bayesian data analysis}
+\subtitle{}
+
+\author{Aki Vehtari}
+
+\institute[Aalto]{}
+
+
+\begin{document}
+
+\begin{frame}{Chapter 4}
+
+ \begin{itemize}
+ \item 4.1 Normal approximation (Laplace's method)
+ \item 4.2 Large-sample theory
+ \item 4.3 Counter examples
+ \begin{itemize}
+ \item includes examples of difficult posteriors for MCMC, too
+ \end{itemize}
+ \item {\color{gray} 4.4 Frequency evaluation*}
+ \item {\color{gray} 4.5 Other statistical methods*}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}{Normal approximation (Laplace approximation)}
+
+ \begin{itemize}
+ \item Often posterior converges to normal distribution when
+ $n\rightarrow \infty$
+ \begin{itemize}
+ \item bounded, non-singular, the number of parameters don't grow with $n$
+ % \end{itemize}
+ % \item If posterior is unimodal and close to symmetric
+ % \begin{itemize}
+ \item we can then approximate $p(\theta|y)$ with normal distribution
+ % \only<1-3>{\begin{align*}
+ % p(\theta|y)&\approx \frac{1}{\sqrt{2\pi}\sigma_\theta}\exp\left(-\frac{1}{2\sigma_\theta^2}(\theta-\hat{\theta})^2\right)
+ % \end{align*}}
+ \item<2> Laplace used this (before Gauss) to approximate the
+ posterior of binomial model to infer ratio of girls and boys born
+ % \item<3> A strict proof by LeCam in 1950's
+ \end{itemize}
+\end{itemize}
+\vspace{-3\baselineskip}
+\uncover<3->{\hspace{-5mm}\includegraphics[width=11cm]{bioassay_norm.pdf}}
+
+\end{frame}
+
+\begin{frame}{Taylor series}
+
+ \begin{itemize}
+ \item We can approximate $p(\theta|y)$ with normal distribution
+ \begin{align*}
+ p(\theta|y)&\approx \frac{1}{\sqrt{2\pi}\sigma_\theta}\exp\left(-\frac{1}{2\sigma_\theta^2}(\theta-\hat{\theta})^2\right)
+ \end{align*}
+ \begin{itemize}
+ \item i.e. log posterior $\log p(\theta|y)$ can be
+ approximated with a quadratic function
+ \end{itemize}
+ \begin{align*}
+ \log p(\theta|y)& \approx \alpha(\theta-\hat{\theta})^2 + C
+ \end{align*}
+ \item<2-> Corresponds to Taylor series expansion around $\theta=\hat{\theta}$
+ \begin{equation*}
+ f(\theta)=f(\hat{\theta}) {\only<3->{\color{gray}}+ f'(\hat{\theta})(\theta-\hat{\theta})} + \frac{f''(\hat{\theta})}{2!}(\theta-\hat{\theta})^2 {\only<4->{\color{gray}} +\frac{f^{(3)}(\hat{\theta})}{3!}(\theta-\hat{\theta})^3+\ldots}
+ \end{equation*}
+ \begin{itemize}
+ \item<3-> if $\hat{\theta}$ is at mode, then $f'(\hat{\theta})=0$
+ \item<4-> often when $n \rightarrow \infty$, $\frac{f^{(3)}(\hat{\theta})}{3!}(\theta-\hat{\theta})^3+\ldots$ is small
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}{Multivariate Taylor series}
+
+ \begin{itemize}
+ \item Multivariate series expansion
+ \begin{equation*}
+ f(\theta)= f(\hat{\theta}) {\color{gray} + \frac{d f(\theta')}{d \theta'}_{\theta'=\hat{\theta}}(\theta-\hat{\theta})}
+ + \frac{1}{2!}(\theta-\hat{\theta})^T \frac{d^2 f(\theta')}{d \theta'^2}_{\theta'=\hat{\theta}} (\theta-\hat{\theta}) {\color{gray} + \ldots}
+ % \sum_{j=0}^\infty\left\{\frac{1}{j!}\left[\sum_{k=1}^n(x_k-a_k)\frac{\partial}{\partial x_k'}\right]^j
+ % f(x_1',\ldots,x_n')\right\}_{x_1'=a_1,\ldots,x_n'=a_n}
+ \end{equation*}~
+ \end{itemize}
+
+\end{frame}
+
+% \note{Onko joku joka ei muista Taylorin sarjakehitelmää?
+% Nimetty 1700-luvulla elännen Taylorin mukaan
+
+%PIIRTELE TAULULLE}
+
+\begin{frame}{Normal approximation}
+
+ \begin{itemize}
+ \only<1-2>{
+ \item Taylor series expansion of the log posterior around the posterior mode
+ $\hat{\theta}$
+ \begin{equation*}
+ \log p(\theta|y) = \log p(\hat{\theta}|y) +
+ \frac{1}{2}(\theta-\hat{\theta})^T\left[\frac{d^2}{d\theta^2}\log p(\theta'|y) \right]_{\theta'=\hat{\theta}} (\theta-\hat{\theta})+\ldots
+ \end{equation*}
+ }
+ \item<2-> Multivariate normal $\propto\left|\Sigma\right|^{-1/2} \exp\left(-\frac{1}{2}(\theta-\hat{\theta}^T)\Sigma^{-1}(\theta-\hat{\theta})\right)$
+ \end{itemize}
+ \vspace{-0.5\baselineskip}
+ \only<3>{
+ \includegraphics[width=11cm]{cond_excat_norm.pdf}}
+ \only<4>{
+ \includegraphics[width=11cm]{cond_excat_norm_log.pdf}}
+
+\end{frame}
+
+\begin{frame}{Normal approximation}
+
+ \begin{itemize}
+ % \item Taylor series expansion of the log posterior around the posterior mode
+ % $\hat{\theta}$
+ % \begin{equation*}
+ % \log p(\theta|y) = \log p(\hat{\theta}|y) +
+ % \frac{1}{2}(\theta-\hat{\theta})^T\left[\frac{d^2}{d\theta^2}\log p(\theta'|y) \right]_{\theta'=\hat{\theta}} (\theta-\hat{\theta})+\ldots
+ % \end{equation*}
+ % \item Multivariate normal $\propto\left|\Sigma\right|^{-1/2} \exp\left(-\frac{1}{2}(\theta-\hat{\theta}^T)\Sigma^{-1}(\theta-\hat{\theta})\right)$
+
+ \item Normal approximation
+ \begin{equation*}
+ p(\theta|y) \approx \N(\hat{\theta},[I(\hat{\theta})]^{-1})
+ \end{equation*}
+ where $I(\theta)$ is called {\em observed information}
+ \begin{equation*}
+ I(\theta) = - \frac{d^2}{d\theta^2}\log p(\theta|y)
+ \end{equation*}
+ \uncover<2->{Hessian $H(\theta)=-I(\theta)$}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}{Normal approximation}
+
+ \begin{itemize}
+ \item $I(\theta)$ is called {\em observed information}
+ \begin{equation*}
+ I(\theta) = - \frac{d^2}{d\theta^2}\log p(\theta|y)
+ \end{equation*}
+ \begin{itemize}
+ \item $I(\hat{\theta})$ is the second derivatives at the mode and
+ thus describes the curvature at the mode
+ \item if the mode is inside the parameter space, $I(\hat{\theta})$
+ is positive
+ \item if $\theta$ is a vector, then $I(\theta)$ is a matrix
+% \item Käytetään myös nimitystä {\em Hessian} $H(\theta)$
+ \end{itemize}
+\end{itemize}
+
+\end{frame}
+
+\begin{frame}{Normal approximation}
+
+ \begin{itemize}
+ \item BDA3 Ch 4 has an example where it is easy to compute first
+ and second derivatives and there is easy analytic solution to
+ find where the first derivatives are zero
+ \end{itemize}
+
+\end{frame}
+
+% \begin{frame}{Normal approximation -- example}
+
+% \begin{itemize}
+% \item Normal distribution, unknown mean and variance
+% \begin{itemize}
+% \item uniform prior $(\mu,\log\sigma)$
+% \item normal approximation for the posterior of $(\mu,\log\sigma)$
+% \end{itemize}
+% \begin{eqnarray*}
+% \log
+% p(\mu,\log\sigma|y)=& \mathrm{constant}-n\log\sigma- \\
+% & \frac{1}{2\sigma^2}[(n-1)s^2 + n(\bar{y}-\mu)^2]
+% \end{eqnarray*}
+% \pause
+% first derivatives
+% \begin{eqnarray*}
+% \frac{d}{d\mu}\log p(\mu,\log\sigma|y) & = & \frac{n(\bar{y}-\mu)}{\sigma^2},\\
+% \pause
+% \frac{d}{d(\log\sigma)}\log p(\mu,\log\sigma|y) &
+% = & -n + \frac{(n-1)s^2+n(\bar{y}-\mu)^2}{\sigma^2},
+% \end{eqnarray*}
+% \pause
+% from which it is easy to compute the mode
+% \begin{eqnarray*}
+% (\hat{\mu},\log\hat{\sigma})=\left(\bar{y},\frac{1}{2}\log\left(\frac{n-1}{n}s^2\right)\right)
+% \end{eqnarray*}
+% \end{itemize}
+
+% \end{frame}
+
+% % \note{tässä parametrisoidaan malli $\log\sigma$:n avulla\\
+% % muistanette varmaankin, että priorina uniformi priori $\log\sigma$:lle
+% % vastaa $1/\sigma$ prioria sigmalle, joka voidaan
+% % muuttujanvaihdoksella helposti todeta
+
+% % \begin{align*}
+% % p(\log\sigma)&=|J|p(\sigma)=\frac{d \sigma}{d\log\sigma} \frac{1}{\sigma}=\sigma \frac{1}{\sigma}=1
+% % \end{align*}
+
+% % \begin{align*}
+% % \frac{d}{d\mu}(\bar{y}-\mu)^2=-2(\bar{y}-\mu)
+% % \end{align*}
+
+% % \begin{align*}
+% % \frac{d}{d\log\sigma}\sigma^{-2}=\frac{d}{d\log\sigma}\exp(\log\sigma)^{-2}=-2\exp(\log\sigma)^{-3}(\log\sigma)=-2\exp(\log\sigma)^{-2}=-2 \sigma^{-2}
+% % \end{align*}
+% % % LASKE muuttujanvaihdos valmiiksi!
+% % }
+
+% \begin{frame}{Normal approximation -- example}
+
+% \begin{itemize}
+% \item Normal distribution, unknown mean and variance\\
+% first derivatives
+% \begin{eqnarray*}
+% \frac{d}{d\mu}\log p(\mu,\log\sigma|y) & = & \frac{n(\bar{y}-\mu)}{\sigma^2},\\
+% \frac{d}{d(\log\sigma)}\log p(\mu,\log\sigma|y) & = & -n + \frac{(n-1)s^2+n(\bar{y}-\mu)^2}{\sigma^2}
+% \end{eqnarray*}
+% \pause
+% second derivatives
+% \begin{eqnarray*}
+% \frac{d^2}{d\mu^2}\log p(\mu,\log\sigma|y) & = & -\frac{n}{\sigma^2},\\
+% \pause \frac{d^2}{d\mu d(\log\sigma)}\log p(\mu,\log\sigma|y) &
+% = & -2n\frac{\bar{y}-\mu}{\sigma^2},\\
+% \pause \frac{d^2}{d(\log\sigma)^2}\log p(\mu,\log\sigma|y) &
+% = & -\frac{2}{\sigma^2}((n-1)s^2+n(\bar{y}-\mu)^2)
+% \end{eqnarray*}
+% \end{itemize}
+
+% \end{frame}
+
+% \begin{frame}{Normal approximation -- example}
+
+% \begin{itemize}
+% \item Normal distribution, unknown mean and variance\\
+% second derivatives
+% \begin{eqnarray*}
+% \frac{d^2}{d\mu^2}\log p(\mu,\log\sigma|y) & = & -\frac{n}{\sigma^2},\\
+% \frac{d^2}{d\mu(\log\sigma)}\log p(\mu,\log\sigma|y) & = & -2n\frac{\bar{y}-\mu}{\sigma^2},\\
+% \frac{d^2}{d(\log\sigma)^2}\log p(\mu,\log\sigma|y) & = & -\frac{2}{\sigma^2}((n-1)s^2+n(\bar{y}-\mu)^2)
+% \end{eqnarray*}
+% matrix of the second derivatives at $(\hat{\mu},\log\hat{\sigma})$
+% \begin{eqnarray*}
+% \begin{pmatrix}
+% -n/\hat{\sigma}^2 & 0 \\
+% 0 & -2n
+% \end{pmatrix}
+% \end{eqnarray*}
+% \end{itemize}
+
+% \end{frame}
+
+% \begin{frame}{Normal approximation -- example}
+
+% \begin{itemize}
+% \item Normal distribution, unknown mean and variance\\
+% posterior mode
+% \begin{eqnarray*}
+% (\hat{\mu},\log\hat{\sigma})=\left(\bar{y},\frac{1}{2}\log\left(\frac{n-1}{n}s^2\right)\right)
+% \end{eqnarray*}
+% matrix of the second derivatives at $(\hat{\mu},\log\hat{\sigma})$
+% \begin{eqnarray*}
+% \begin{pmatrix}
+% -n/\hat{\sigma}^2 & 0 \\
+% 0 & -2n
+% \end{pmatrix}
+% \end{eqnarray*}
+% normal approximation
+% \begin{equation*}
+% p(\mu,\log\sigma|y) \approx \N\left(
+% \begin{pmatrix}
+% \mu \\ \log\sigma
+% \end{pmatrix}
+% \Bigg|
+% \begin{pmatrix}
+% \bar{y} \\ \log\hat{\sigma}
+% \end{pmatrix},
+% \begin{pmatrix}
+% \hat{\sigma}^2/n & 0 \\
+% 0 & 1/(2n)
+% \end{pmatrix}
+% \right)
+% \end{equation*}
+% \end{itemize}
+
+% \end{frame}
+
+\begin{frame}[fragile]
+ \frametitle{Normal approximation -- numerically}
+
+ \begin{itemize}
+ \item Normal approximation can be computed numerically
+ \begin{itemize}
+ \item iterative optimization to find a mode (may use gradients)
+ \item autodiff or finite-difference for gradients and Hessian
+ \item<2> e.g. in R, demo4\_1.R:
+ {\scriptsize
+\begin{lstlisting}[]
+bioassayfun <- function(w, df) {
+ z <- w[1] + w[2]*df$x
+ -sum(df$y*(z) - df$n*log1p(exp(z)))
+}
+
+theta0 <- c(0,0)
+optimres <- optim(w0, bioassayfun, gr=NULL, df1, hessian=T)
+thetahat <- optimres$par
+Sigma <- solve(optimres$hessian)
+\end{lstlisting}
+ }
+ \end{itemize}
+ \end{itemize}
+\end{frame}
+
+\begin{frame}{Normal approximation -- numerically}
+
+ \begin{itemize}
+ \item Normal approximation can be computed numerically
+ \begin{itemize}
+ \item iterative optimization to find a mode (may use gradients)
+ \item autodiff or finite-difference for gradients and Hessian
+ \end{itemize}
+ \item CmdStan(R) has Laplace algorithm
+ \uncover<2->{
+ \begin{itemize}
+ \item uses L-BFGS quasi-Newton optimization algorithm for finding the mode
+ \item uses autodiff for gradients
+ \item uses finite differences of gradients to compute Hessian
+ \begin{itemize}
+ \item<3-> second order autodiff in progress
+ \end{itemize}
+ \end{itemize}
+ }
+ \end{itemize}
+\end{frame}
+
+\begin{frame}{Normal approximation}
+
+ \begin{itemize}
+ \item Optimization and computation of Hessian requires usually much
+ less density evaluations than MCMC
+ \item<2-> In some cases accuracy is sufficient
+ \item<3-> In some cases accuracy for a conditional distribution is
+ sufficient (Ch 13)
+ \begin{itemize}
+ \item e.g. Gaussian latent variable models, such as Gaussian
+ processes (Ch 21) and Gaussian Markov random fields
+ \item Rasmussen \& Williams: Gaussian Processes for Machine Learning
+ \item CS-E4895 - Gaussian Processes (in spring)
+ % \begin{itemize}
+ % \item CS-E4070 - Special Course in Machine Learning and Data
+ % Science: Gaussian processes - theory and applications
+ % \end{itemize}
+ \end{itemize}
+ \item<4-> Accuracy can be improved by importance sampling (Ch 10)
+ \end{itemize}
+\end{frame}
+
+\begin{frame}{Example: Importance sampling in Bioassay}
+
+ \vspace{-.5\baselineskip}
+ \makebox[12cm][t]{
+ \hspace{-0.9cm}
+ \begin{minipage}[t][12cm][t]{12cm}
+ \begin{center}
+ \makebox[0cm][t]{\hspace{-0.5cm}\rotatebox{90}{\hspace{1cm}Grid}}
+ \includegraphics[width=3.4cm]{bioassayis1d.pdf}
+ \includegraphics[width=3.4cm]{bioassayis1s.pdf}
+ \includegraphics[width=3.4cm]{bioassayis1h.pdf}\\
+ \only<2->{
+ \makebox[0cm][t]{\hspace{-0.5cm}\rotatebox{90}{\hspace{1cm}Normal}}
+ \includegraphics[width=3.4cm]{bioassayis2d.pdf}
+ \includegraphics[width=3.4cm]{bioassayis2s.pdf}
+ \includegraphics[width=3.4cm]{bioassayis2h.pdf}\\}
+ \only<3>{But the normal approximation is not that good here:\\ Grid sd(LD50) $\approx$ 0.1, Normal sd(LD50) $\approx$ .75!}
+ \only<4->{
+ \makebox[0cm][t]{\hspace{-0.5cm}\rotatebox{90}{\hspace{1cm}IS}}
+ \includegraphics[width=3.4cm]{bioassayis3d.pdf}
+ \includegraphics[width=3.4cm]{bioassayis3s.pdf}
+ \includegraphics[width=3.4cm]{bioassayis3h.pdf}\\}
+ \only<5->{Grid sd(LD50) $\approx$ 0.1, IS sd(LD50) $\approx$ 0.1}
+ \end{center}
+ \end{minipage}
+ }
+
+\end{frame}
+
+\begin{frame}{Normal approximation}
+
+ \begin{itemize}
+ \item<1-> Accuracy can be improved by importance sampling
+ \item<1-> Pareto-$k$ diagnostic of importance sampling weights can be
+ used for diagnostic
+ \begin{itemize}
+ \item in Bioassay example $k=0.57$, which is ok
+ \end{itemize}
+ \item<2-> CmdStan(R) has Laplace algorithm
+ \begin{itemize}
+ \item since version 2.33 (2023)
+ \begin{itemize}
+ \item[+] Pareto-$k$ diagnostic via posterior package
+ \item[+] importance resampling (IR) via posterior package
+ \end{itemize}
+ \end{itemize}
+ \end{itemize}
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \begin{itemize}
+ \item<+-> Normal approximation is not good for parameters with
+ bounded or half-bounded support
+ \begin{itemize}
+ \item e.g. $\theta \in [0,1]$ presentin probability
+ \item<+-> Stan code can include constraints\\
+ \texttt{real theta;}
+ \item<+-> for this, Stan does the inference in unconstrained space
+ using logit transformation
+ \item<+-> density of the transformed parameter needs to include
+ Jacobian of the transformation (BDA3 p. 21)
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+ Binomial model $y \sim \Bin(\theta, N)$, with data $y=9, N=10$
+
+ With $\Beta(1,1)$ prior, the posterior is $\Beta(9+1,1+1)$
+
+ \includegraphics[width=10cm]{jacobian-6.png}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+ With $\Beta(1,1)$ prior, the posterior is $\Beta(9+1,1+1)$
+
+ Stan computes only the unnormalized posterior $q(\theta|y)$
+
+ \includegraphics[width=10cm]{jacobian-7.png}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+ With $\Beta(1,1)$ prior, the posterior is $\Beta(9+1,1+1)$
+
+ For illustration purposes we normalize Stan result $q(\theta|y)$
+
+ \includegraphics[width=10cm]{jacobian-8.png}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+ With $\Beta(1,1)$ prior, the posterior is $\Beta(9+1,1+1)$
+
+ $\Beta(9+1,1+1)$, but x-axis shows the unconstrained $\logit(\theta)$
+
+ \includegraphics[width=10cm]{jacobian-10.png}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+
+ ...but we need to take into account the absolute value of the determinant of the Jacobian of the transformation $\theta(1-\theta)$
+
+ \vspace{.58\baselineskip}
+
+ \includegraphics[width=10cm]{jacobian-12.png}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+
+ ...but we need to take into account Jacobian $\theta(1-\theta)$
+
+ Let's compare a wrong normal approximation...
+
+ \includegraphics[width=10cm]{jacobian-14.png}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+
+ ...but we need to take into account Jacobian $\theta(1-\theta)$
+
+ Let's compare a wrong normal approximation and correct one
+
+ \includegraphics[width=10cm]{jacobian-16.png}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+
+ Let's compare a wrong normal approximation and correct one
+
+ Sample from both approximations and show KDEs for draws
+
+ \includegraphics[width=10cm]{jacobian-17.png}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+
+ Let's compare a wrong normal approximation and correct one
+
+ Inverse transform draws and show KDEs
+
+ \includegraphics[width=10cm]{jacobian-18.png}
+
+\end{frame}
+
+\begin{frame}{Normal approximation and parameter transformations}
+
+ \vspace{-.5\baselineskip}
+
+ Laplace approximation can be further improved with importance resampling
+
+ \vspace{.42\baselineskip}
+
+ \includegraphics[width=10cm]{jacobian-20.png}
+
+\end{frame}
+
+\begin{frame}{Other distributional approximations}
+
+ \begin{itemize}
+ \item<+-> Higher order derivatives at the mode can be used
+ \item<+-> Split-normal and split-$t$ by Geweke (1989) use additional
+ scaling along different principal axes
+ \item<+-> Other distributions can be used (e.g. $t$-distribution)
+ \item<+-> Instead of mode and Hessian at mode, e.g.
+ \begin{itemize}
+ \item variational inference (Ch 13)
+ \begin{itemize}
+ \item CS-E4820 - Machine Learning: Advanced Probabilistic Methods
+ \item CS-E4895 - Gaussian Processes
+ \item Stan has the ADVI algorithm (not very good implementaion)
+ \item Stan has Pathfinder algorithm (CmdStanR github version)
+ \item instead of normal, methods with flexible flow transformations
+ \end{itemize}
+ \item expectation propagation (Ch 13)
+ \item speed of these is usually between optimization and MCMC
+ \begin{itemize}
+ \item stochastic variational inference can be eeven slower than MCMC
+ \end{itemize}
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}{Pathfinder: Parallel quasi-Newton variational inference.}
+
+\vspace{-.5\baselineskip}
+
+\only<1>{\includegraphics[width=1.05\textwidth]{pathfinder_logit_example.pdf}\\
+ \vspace{-.5\baselineskip}}
+\only<2>{\includegraphics[width=1.05\textwidth]{pathfinder_funnel_example.pdf}\\
+ \vspace{-.5\baselineskip}}
+\footnotesize{Zhang, Carpenter, Gelman, and Vehtari
+ (2022). Pathfinder: Parallel quasi-Newton variational
+ inference. \textit{Journal of Machine Learning Research},
+ 23(306):1--49.}
+
+\end{frame}
+
+
+\begin{frame}{Distributional approximations}
+
+ \vspace{-\baselineskip}
+{\small {\color{blue} Exact}, {\color{red} Normal at mode}, {\color{forestgreen} Normal with variational inference}}
+
+ \makebox[12cm][t]{
+ \hspace{-.7cm}
+\begin{minipage}[t]{12cm}
+ \includegraphics[width=12cm]{cond_excat_normfr2.pdf}\\
+ \includegraphics[width=12cm]{cond_excat_normfr3.pdf}
+ \end{minipage}
+}
+
+ \vspace{-.5\baselineskip}
+\uncover<2->{\footnotesize
+ Grid sd(LD50) $\approx$ 0.090,\\
+ Normal sd(LD50) $\approx$ .75,
+ Normal + IR sd(LD50) $\approx$ 0.096 (Pareto-$k$ = 0.57)\\}
+\uncover<3->{\footnotesize
+ VI sd(LD50) $\approx$ 0.13,
+ VI + IR sd(LD50) $\approx$ 0.095 (Pareto-$k$ = 0.17)
+ }
+
+\end{frame}
+
+\begin{frame}{Variational inference}
+
+ \begin{itemize}
+ \item<+-> Variational inference includes a large number of methods
+ \item<+-> For a restricted set of models, possible to derive
+ deterministic algorithms
+ \begin{itemize}
+ \item can be fast and can be relatively accurate
+ \end{itemize}
+ \item<+-> Using stochastic (Monte Carlo) estimation of the
+ divergence, possible to derive generic black box algorithms
+ \begin{itemize}
+ \item<+-> possible to use use also mini-batching
+ \item<+-> can be fast and provide better predictive distribution than
+ Laplace approximation if the posterior is far from normal
+ \item<+-> in general, unlikely to achieve accuracy of HMC with the
+ same computation cost
+ \item<+-> with increasing number of posterior dimensions, the
+ obtained approximation gets worse {\small
+ (\href{https://papers.nips.cc/paper/2020/hash/7cac11e2f46ed46c339ec3d569853759-Abstract.html}{Dhaka,
+ Catalina, Andersen, Magnusson, Huggins, and Vehtari, 2020})}
+ \item<+-> with increasing number of posterior dimensions, the
+ stochastic divergence estimate gets worse and flows have problems,
+ too {\small
+ (\href{https://proceedings.neurips.cc/paper/2021/hash/404dcc91b2aeaa7caa47487d1483e48a-Abstract.html}{Dhaka,
+ Catalina, Andersen, Welandawe, Huggins, and Vehtari, 2021})}
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}{Large sample theory}
+
+ \begin{itemize}
+ \item Asymptotic normality
+ \begin{itemize}
+ \item<+-> as $n$ the number of observations $y_i$ increases the
+ posterior converges to normal distribution
+ \item<+-> can be shown by showing that
+ \begin{itemize}
+ \item eventually likelihood dominates the prior
+ \item the higher order terms in Taylor series increase slower
+ than the second order term
+ \end{itemize}
+ \item<+-> see counter examples
+ \end{itemize}
+\end{itemize}
+\end{frame}
+
+\begin{frame}{Large sample theory}
+
+ \begin{itemize}
+ \item Assume "true" underlying data distribution $f(y)$
+ \begin{itemize}
+ \item observations $y_1,\ldots,y_n$ are independent samples from
+ the joint distribution $f(y)$
+ \item "true" data distribution $f(y)$ is not always well defined
+ \item in the following we proceed as if there were true underlying data distribution
+ \item for the theory the exact form of $f(y)$ is not important as
+ long at it has certain regularity conditions
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}{Large sample theory}
+
+ \begin{itemize}
+ % \item Asymptoottinen normaalius
+ % \begin{itemize}
+ % \item jakaumasta $f(y)$ saatujen havaintojen $y_i$ määrän $n$ kasvaessa
+ % parametrivektorin posteriorijakauma lähestyy normaalijakaumaa
+ % \end{itemize}
+ % \pause
+ \item Consistency
+ \begin{itemize}
+ \item if true distribution is included in the parametric family,
+ so that $f(y)=p(y|\theta_0)$ for some $\theta_0$, then posterior
+ converges to a point $\theta_0$, when $n\rightarrow\infty$
+ % \item<2-> a point doesn't have uncertainty
+ \item<2-> the same result as for maximum likelihood estimate
+ \end{itemize}
+ \item<3-> If true distribution is not included in the parametric family,
+ then there is no true $\theta_0$
+ \begin{itemize}
+ \item true $\theta_0$ is replaced with $\theta_0$ which minimizes
+ the Kullback-Leibler divergence from $f(y)$ to $p(y_i|\theta_0)$
+ % \begin{align*}
+ % H(\theta_0)=\int f(y_i) \log\left(\frac{f(y_i)}{p(y_i|\theta_0)}\right)dy_i
+ % \end{align*}
+% \item<5-> this point doesn't have uncertainty, but it's a wrong point!
+ \item<4-> the same result as for maximum likelihood estimate
+ \end{itemize}
+\end{itemize}
+
+\end{frame}
+
+\begin{frame}{Large sample theory -- counter examples}
+
+ \begin{itemize}
+ \item Under- and non-identifiability
+ \begin{itemize}
+ \item a model is under-identifiable, if the model has parameters or parameter combinations for which there is no information in the data
+ \item then there is no single point $\theta_0$ where
+ posterior would converge
+ \item<2-> e.g. if the model is
+ \begin{align*}
+ y \sim \N(a+b+cx, \sigma)
+ \end{align*}
+ \begin{itemize}
+ \vspace{-\baselineskip}
+ \item<3-> posterior would converge to a line with prior
+ determining the density along the line
+ \end{itemize}
+ \item<4-> e.g. if we never observe $u$ and $v$ at the same time and the model is
+ \begin{equation*}
+ \begin{pmatrix}
+ u \\ v
+ \end{pmatrix}
+ \sim
+ \N\left(
+ \begin{pmatrix}
+ 0\\0
+ \end{pmatrix},
+ \begin{pmatrix}
+ 1 & \rho \\ \rho & 1
+ \end{pmatrix}
+ \right)
+ \end{equation*}
+ then correlation $\rho$ is non-identifiable
+ \begin{itemize}
+ \item<5-> e.g. $u$ and $v$ could be length and weight of
+ a student; if only one of them is measured for each student,
+ then $\rho$ is non-identifiable
+ \end{itemize}
+ \end{itemize}
+ \item<6-> Problem also for other inference methods like MCMC
+ \end{itemize}
+
+\end{frame}
+
+% \note{ongelma voidaan poistaa havaitsemalla ongelma, jos näitä
+% parametreja oikeasti tarvitsee estimoida hankitaan tarpeelliset havainnot}
+
+\begin{frame}{Asymptotic identifiability vs finite data case}
+
+ \begin{itemize}
+ \item If we randomly would measure both height and weight,
+ asymptotically the correlation $\rho$ would be identifiable
+ \item<2-> But a finite data from this data generating process may
+ lack the joint height and weight observations, and thus the the
+ finite data likelihood doesn't have information about $\rho$
+ \item<3-> If the likelihood is weakly informative for some
+ parameters, priors and integration are more important
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}{Large sample theory -- counter examples}
+
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item If the number of parameter increases as the number of
+ observation increases
+ \begin{itemize}
+ \item in some models number of parameters depends on the number
+ of observations
+ \item e.g. time series models $y_t \sim
+ \N(\theta_t,\sigma^2)$ and $\theta_t$ has prior in time
+ \item posterior of $\theta_t$ does not converge to a point, if
+ additional observations do not bring enough information
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}{Large sample theory -- counter examples}
+
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Aliasing (\emph{valetoisto} in Finnish)
+ \begin{itemize}
+ \item special case of under-identifiability where likelihood
+ repeats in separate points
+ \item e.g. mixture of normals
+ \begin{equation*}
+ p(y_i|\mu_1,\mu_2,\sigma_1^2,\sigma_2^2,\lambda)=\lambda\N(\mu_1,\sigma_1^2)+(1-\lambda)\N(\mu_2,\sigma_2^2)
+ \end{equation*}
+ \uncover<2->{
+ if $(\mu_1,\mu_2)$ are switched, $(\sigma_1^2,\sigma_2^2)$ are
+ switched and replace $\lambda$ with $(1-\lambda)$, model is
+ equivalent; posterior would usually have two modes which are
+ mirror images of each other and the posterior does not
+ converge to a single point}
+ \end{itemize}
+ \item<3-> For MCMC makes the convergence diagnostics more difficult,
+ as it is difficult to identify aliasing from other multimodality
+\end{itemize}
+
+\end{frame}
+
+\begin{frame}{Large sample theory -- counter examples}
+
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Unbounded (\emph{rajoittamaton} in Finnish) likelihood
+ \begin{itemize}
+ \item if likelihood is unbounded it is possible that there is no
+ mode in the posterior
+ \item<2-> e.g. previous normal mixture model; assume $\lambda$ to be
+ known (and not $0$ or $1$); if we set $\mu_1=y_i$ for any $i$
+ and $\sigma_1^2\rightarrow 0$, then likelihood
+ $\rightarrow\infty$
+ \item<3-> if prior for $\sigma_1^2$ does not go to zero when
+ $\sigma_1^2\rightarrow 0$, then the posterior is unbounded
+ \item<4-> when $n\rightarrow\infty$ the number of likelihood
+ modes increases
+ \end{itemize}
+ \item<5-> Problem for any inference method including MCMC
+ \begin{itemize}
+ \item can be avoided with good priors
+ \item<6-> a prior close to a prior allowing unbounded
+ posterior may produce almost unbounded posterior
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+% \note{esim. uniformi priori ei hyvä
+
+% esim. $1/\sigma^2$ priori ei hyvä
+
+% esim. $\Invchi2$-jakauma sopivilla parametreilla mahdollinen}
+
+\begin{frame}{Large sample theory -- counter examples}
+
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Improper posterior
+ \begin{itemize}
+ \item asymptotic results assume that probability sums to 1
+ \item e.g. Binomial model, with $\Beta(0,0)$ prior and observation $y=n$
+ \begin{itemize}
+ \item posterior $p(\theta|n,0)=\theta^{n-1}(1-\theta)^{-1}$
+ \item when $\theta\rightarrow 1$, then
+ $p(\theta|n,0)\rightarrow \infty$
+ \end{itemize}
+ \end{itemize}
+ \item<2-> Problem for any inference method including MCMC
+ \begin{itemize}
+ \item can be avoided with proper priors
+ \item<3-> a prior close to a improper prior may produce
+ almost improper posterior
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+% \note{myös sellainen ei-aito priori käy, joka tuottaa aidon posteriorin
+
+% Muistakaa, että $\Beta(1,1)$ vastaa uniformiprioria ja
+% $\Beta(\frac{1}{2},\frac{1}{2})$ Jeffereysin prioria, joten
+% $\Beta(0,0)$ on vielä väljempi priori ja onkin ei-aito
+
+% posteriori $\Beta(\theta|n,0)=\theta^{n-1}(1-\theta)-1$\\
+% $\Beta(1|n,0)=1^{n-1}0^{-1} = 1/0 = \infty$
+% }
+
+\begin{frame}{Large sample theory -- counter examples}
+
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Prior distribution does not include the convergence point
+ \begin{itemize}
+ \item if in discrete case $p(\theta_0)=0$ or in continuous case
+ $p(\theta)=0$ in the neighborhood of $\theta_0$, then the
+ convergence results based on the dominance of the likelihood
+ do not hold
+ \end{itemize}
+ \item<2-> Should have a positive prior probability/density where needed
+\end{itemize}
+
+\end{frame}
+
+\begin{frame}{Large sample theory -- counter examples}
+
+ \begin{itemize}
+ % \item Does not always hold when $n\rightarrow\infty$
+ \item Convergence point at the edge of the parameter space
+ \begin{itemize}
+ \item if $\theta_0$ is on the edge of the parameter space,
+ Taylor series expansion has to be truncated, and normal
+ approximation does not necessarily hold
+ \item<2-> e.g. $y_i\sim\N(\theta,1)$ with a restriction $\theta\geq
+ 0$ and assume that $\theta_0=0$
+ \begin{itemize}
+ \item posterior of $\theta$ is left truncated normal
+ distribution with $\mu=\bar{y}$
+ \item in the limit $n\rightarrow\infty$ posterior is half
+ normal distribution \pause
+ \end{itemize}
+ \end{itemize}
+ \item Can be easy or difficult for MCMC
+ \end{itemize}
+
+\end{frame}
+
+% \begin{frame}{Large sample theory -- counter examples}
+
+% \begin{itemize}
+% \item Tails of the distribution
+% \begin{itemize}
+% \item normal approximation may be accurate for the most of the
+% posterior mass, but still be inaccurate for the tails
+% \item e.g. parameter which is constrained to be positive; given a
+% finite $n$, normal approximation assumes non-zero probability
+% for negative values
+% \end{itemize}
+% % \item Monte Carlo has different kind of problems with the tails
+% \end{itemize}
+
+% \end{frame}
+
+\begin{frame}{Frequency evaluations}
+
+ \begin{itemize}
+ \item Bayesian theory has epistemic and aleatory probabilities
+ \item Frequency evaluations focus on frequency properties given
+ aleatoric repetition of an observation and modeling
+ \begin{itemize}
+ \item It is useful to examine these for Bayesian inference, too
+ % \item<2-> Consistency
+ \item<2-> Asymptotic unbiasedness
+ \begin{itemize}
+ \item not that important in Bayesian inference, small and
+ decreasing error more important
+ \end{itemize}
+ \item<3-> Asymptotic efficiency
+ \begin{itemize}
+ \item no other point estimate with smaller squared error
+ \item useful also in Bayesian inference, but should consider
+ which utility/loss is important
+ \end{itemize}
+ \item<4-> Calibration
+ \begin{itemize}
+ \item $\alpha\%$-posterior interval has the true value in
+ $\alpha\%$ cases
+ \item $\alpha\%$-predictive interval has the true future values
+ in $\alpha\%$ cases
+ \item approximate calibration with shorter intervals for
+ likely true values more important than exact calibration
+ with very bad intervals for all possible values.
+ \end{itemize}
+ \end{itemize}
+ \end{itemize}
+
+
+\end{frame}
+
+\begin{frame}
+
+ {\Large\color{navyblue} Frequentist statistics}
+
+ \begin{itemize}
+ \item Frequentist statistics accepts only aleatory probabilities
+ \begin{itemize}
+ \item Estimates are based on data
+ \item Uncertainty of estimates are based on all possible data
+ sets which could have been generated by the data generating
+ mechanism
+ \begin{itemize}
+ \item<2-> inference is based also on data we did not observe
+ \end{itemize}
+ \end{itemize}
+ \item<3-> Estimates are derived to fulfill frequency properties
+ \begin{itemize}
+ \item Maximum likelihood (often) fulfills asymptotic frequency
+ properties
+ \item Common finite data desiderata are 1) unbiasedness, 2)
+ minimum variance, 3) calibration of confidence interval
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}
+
+ {\Large\color{navyblue} Frequentist statistics}
+
+ \begin{itemize}
+ \item Estimates are derived to fulfill frequency properties
+ \begin{itemize}
+ \item Maximum likelihood fulfills just asymptotic frequency
+ properties
+ \item Common desiderata are 1) unbiasedness, 2) minimum
+ variance, 3) calibration of confidence interval
+ \end{itemize}
+ \item Requirement of unbiasedness may lead to higher variance or
+ silly estimates
+ \begin{itemize}
+ \item unbiased estimate for strictly positive parameter can be
+ negative
+ \end{itemize}
+ \item<2-> Confidence interval is defined to have true value inside the
+ interval in $\alpha\%$ cases of repeated data generation from the
+ data generating mechanism
+ \begin{itemize}
+ % \item doesn't say how likely the true value is inside the interval
+ % given the observed data
+ \item doesn't need be useful to have perfect calibration
+ \end{itemize}
+ \end{itemize}
+
+\end{frame}
+
+\begin{frame}
+
+ {\Large\color{navyblue} Frequentist vs Bayes vs others}
+
+ \begin{itemize}
+ \item There is a great amount of very useful frequentist statistics
+ \begin{itemize}
+ \item also for simple models and lot's of data there is not much
+ difference
+ \end{itemize}
+ \item<2-> Bayesian inference
+ \begin{itemize}
+ \item easier for complex, e.g. hierarchical, models
+ \item easier when model changes
+ \item a consistent way to add prior information
+ \end{itemize}
+ \item<3-> A lot of machine learning is not pure frequentist or
+ Bayesian
+ \end{itemize}
+
+\end{frame}
+
+
+\end{document}
+
+%%% Local Variables:
+%%% TeX-PDF-mode: t
+%%% TeX-master: t
+%%% End:
diff --git a/slides/figs/helicopter_bo_a_1.pdf b/slides/figs/helicopter_bo_a_1.pdf
new file mode 100644
index 00000000..94afe9f6
Binary files /dev/null and b/slides/figs/helicopter_bo_a_1.pdf differ
diff --git a/slides/figs/helicopter_bo_a_10.pdf b/slides/figs/helicopter_bo_a_10.pdf
new file mode 100644
index 00000000..d59321e3
Binary files /dev/null and b/slides/figs/helicopter_bo_a_10.pdf differ
diff --git a/slides/figs/helicopter_bo_a_11.pdf b/slides/figs/helicopter_bo_a_11.pdf
new file mode 100644
index 00000000..8fd3cb8b
Binary files /dev/null and b/slides/figs/helicopter_bo_a_11.pdf differ
diff --git a/slides/figs/helicopter_bo_a_12.pdf b/slides/figs/helicopter_bo_a_12.pdf
new file mode 100644
index 00000000..fcdcf31c
Binary files /dev/null and b/slides/figs/helicopter_bo_a_12.pdf differ
diff --git a/slides/figs/helicopter_bo_a_13.pdf b/slides/figs/helicopter_bo_a_13.pdf
new file mode 100644
index 00000000..2c39934b
Binary files /dev/null and b/slides/figs/helicopter_bo_a_13.pdf differ
diff --git a/slides/figs/helicopter_bo_a_14.pdf b/slides/figs/helicopter_bo_a_14.pdf
new file mode 100644
index 00000000..e921b30b
Binary files /dev/null and b/slides/figs/helicopter_bo_a_14.pdf differ
diff --git a/slides/figs/helicopter_bo_a_15.pdf b/slides/figs/helicopter_bo_a_15.pdf
new file mode 100644
index 00000000..9cde35ea
Binary files /dev/null and b/slides/figs/helicopter_bo_a_15.pdf differ
diff --git a/slides/figs/helicopter_bo_a_16.pdf b/slides/figs/helicopter_bo_a_16.pdf
new file mode 100644
index 00000000..b3e1448d
Binary files /dev/null and b/slides/figs/helicopter_bo_a_16.pdf differ
diff --git a/slides/figs/helicopter_bo_a_17.pdf b/slides/figs/helicopter_bo_a_17.pdf
new file mode 100644
index 00000000..ef0d83ec
Binary files /dev/null and b/slides/figs/helicopter_bo_a_17.pdf differ
diff --git a/slides/figs/helicopter_bo_a_18.pdf b/slides/figs/helicopter_bo_a_18.pdf
new file mode 100644
index 00000000..0b03cb15
Binary files /dev/null and b/slides/figs/helicopter_bo_a_18.pdf differ
diff --git a/slides/figs/helicopter_bo_a_19.pdf b/slides/figs/helicopter_bo_a_19.pdf
new file mode 100644
index 00000000..c74982de
Binary files /dev/null and b/slides/figs/helicopter_bo_a_19.pdf differ
diff --git a/slides/figs/helicopter_bo_a_2.pdf b/slides/figs/helicopter_bo_a_2.pdf
new file mode 100644
index 00000000..9f510c33
Binary files /dev/null and b/slides/figs/helicopter_bo_a_2.pdf differ
diff --git a/slides/figs/helicopter_bo_a_20.pdf b/slides/figs/helicopter_bo_a_20.pdf
new file mode 100644
index 00000000..3a196489
Binary files /dev/null and b/slides/figs/helicopter_bo_a_20.pdf differ
diff --git a/slides/figs/helicopter_bo_a_21.pdf b/slides/figs/helicopter_bo_a_21.pdf
new file mode 100644
index 00000000..45995cd8
Binary files /dev/null and b/slides/figs/helicopter_bo_a_21.pdf differ
diff --git a/slides/figs/helicopter_bo_a_22.pdf b/slides/figs/helicopter_bo_a_22.pdf
new file mode 100644
index 00000000..435432d0
Binary files /dev/null and b/slides/figs/helicopter_bo_a_22.pdf differ
diff --git a/slides/figs/helicopter_bo_a_23.pdf b/slides/figs/helicopter_bo_a_23.pdf
new file mode 100644
index 00000000..73b37689
Binary files /dev/null and b/slides/figs/helicopter_bo_a_23.pdf differ
diff --git a/slides/figs/helicopter_bo_a_24.pdf b/slides/figs/helicopter_bo_a_24.pdf
new file mode 100644
index 00000000..bbd081bb
Binary files /dev/null and b/slides/figs/helicopter_bo_a_24.pdf differ
diff --git a/slides/figs/helicopter_bo_a_25.pdf b/slides/figs/helicopter_bo_a_25.pdf
new file mode 100644
index 00000000..2e0aac3f
Binary files /dev/null and b/slides/figs/helicopter_bo_a_25.pdf differ
diff --git a/slides/figs/helicopter_bo_a_26.pdf b/slides/figs/helicopter_bo_a_26.pdf
new file mode 100644
index 00000000..0b51bc1a
Binary files /dev/null and b/slides/figs/helicopter_bo_a_26.pdf differ
diff --git a/slides/figs/helicopter_bo_a_27.pdf b/slides/figs/helicopter_bo_a_27.pdf
new file mode 100644
index 00000000..b24dd186
Binary files /dev/null and b/slides/figs/helicopter_bo_a_27.pdf differ
diff --git a/slides/figs/helicopter_bo_a_28.pdf b/slides/figs/helicopter_bo_a_28.pdf
new file mode 100644
index 00000000..56b4b12e
Binary files /dev/null and b/slides/figs/helicopter_bo_a_28.pdf differ
diff --git a/slides/figs/helicopter_bo_a_29.pdf b/slides/figs/helicopter_bo_a_29.pdf
new file mode 100644
index 00000000..e6bbab2f
Binary files /dev/null and b/slides/figs/helicopter_bo_a_29.pdf differ
diff --git a/slides/figs/helicopter_bo_a_3.pdf b/slides/figs/helicopter_bo_a_3.pdf
new file mode 100644
index 00000000..d223cfcd
Binary files /dev/null and b/slides/figs/helicopter_bo_a_3.pdf differ
diff --git a/slides/figs/helicopter_bo_a_4.pdf b/slides/figs/helicopter_bo_a_4.pdf
new file mode 100644
index 00000000..75776460
Binary files /dev/null and b/slides/figs/helicopter_bo_a_4.pdf differ
diff --git a/slides/figs/helicopter_bo_a_5.pdf b/slides/figs/helicopter_bo_a_5.pdf
new file mode 100644
index 00000000..a531f0c5
Binary files /dev/null and b/slides/figs/helicopter_bo_a_5.pdf differ
diff --git a/slides/figs/helicopter_bo_a_6.pdf b/slides/figs/helicopter_bo_a_6.pdf
new file mode 100644
index 00000000..99e631bf
Binary files /dev/null and b/slides/figs/helicopter_bo_a_6.pdf differ
diff --git a/slides/figs/helicopter_bo_a_7.pdf b/slides/figs/helicopter_bo_a_7.pdf
new file mode 100644
index 00000000..5bc9db56
Binary files /dev/null and b/slides/figs/helicopter_bo_a_7.pdf differ
diff --git a/slides/figs/helicopter_bo_a_8.pdf b/slides/figs/helicopter_bo_a_8.pdf
new file mode 100644
index 00000000..6d877e90
Binary files /dev/null and b/slides/figs/helicopter_bo_a_8.pdf differ
diff --git a/slides/figs/helicopter_bo_a_9.pdf b/slides/figs/helicopter_bo_a_9.pdf
new file mode 100644
index 00000000..efef7ed2
Binary files /dev/null and b/slides/figs/helicopter_bo_a_9.pdf differ
diff --git a/slides/figs/helicopter_bo_b_1.pdf b/slides/figs/helicopter_bo_b_1.pdf
new file mode 100644
index 00000000..6c27f855
Binary files /dev/null and b/slides/figs/helicopter_bo_b_1.pdf differ
diff --git a/slides/figs/helicopter_bo_b_10.pdf b/slides/figs/helicopter_bo_b_10.pdf
new file mode 100644
index 00000000..81cc418b
Binary files /dev/null and b/slides/figs/helicopter_bo_b_10.pdf differ
diff --git a/slides/figs/helicopter_bo_b_11.pdf b/slides/figs/helicopter_bo_b_11.pdf
new file mode 100644
index 00000000..23f8a803
Binary files /dev/null and b/slides/figs/helicopter_bo_b_11.pdf differ
diff --git a/slides/figs/helicopter_bo_b_12.pdf b/slides/figs/helicopter_bo_b_12.pdf
new file mode 100644
index 00000000..c67a1f8a
Binary files /dev/null and b/slides/figs/helicopter_bo_b_12.pdf differ
diff --git a/slides/figs/helicopter_bo_b_13.pdf b/slides/figs/helicopter_bo_b_13.pdf
new file mode 100644
index 00000000..78ee6758
Binary files /dev/null and b/slides/figs/helicopter_bo_b_13.pdf differ
diff --git a/slides/figs/helicopter_bo_b_14.pdf b/slides/figs/helicopter_bo_b_14.pdf
new file mode 100644
index 00000000..78a526ce
Binary files /dev/null and b/slides/figs/helicopter_bo_b_14.pdf differ
diff --git a/slides/figs/helicopter_bo_b_15.pdf b/slides/figs/helicopter_bo_b_15.pdf
new file mode 100644
index 00000000..79ca5d11
Binary files /dev/null and b/slides/figs/helicopter_bo_b_15.pdf differ
diff --git a/slides/figs/helicopter_bo_b_16.pdf b/slides/figs/helicopter_bo_b_16.pdf
new file mode 100644
index 00000000..9b6e1b34
Binary files /dev/null and b/slides/figs/helicopter_bo_b_16.pdf differ
diff --git a/slides/figs/helicopter_bo_b_17.pdf b/slides/figs/helicopter_bo_b_17.pdf
new file mode 100644
index 00000000..41174028
Binary files /dev/null and b/slides/figs/helicopter_bo_b_17.pdf differ
diff --git a/slides/figs/helicopter_bo_b_18.pdf b/slides/figs/helicopter_bo_b_18.pdf
new file mode 100644
index 00000000..8187836b
Binary files /dev/null and b/slides/figs/helicopter_bo_b_18.pdf differ
diff --git a/slides/figs/helicopter_bo_b_19.pdf b/slides/figs/helicopter_bo_b_19.pdf
new file mode 100644
index 00000000..e928323a
Binary files /dev/null and b/slides/figs/helicopter_bo_b_19.pdf differ
diff --git a/slides/figs/helicopter_bo_b_2.pdf b/slides/figs/helicopter_bo_b_2.pdf
new file mode 100644
index 00000000..75566cd7
Binary files /dev/null and b/slides/figs/helicopter_bo_b_2.pdf differ
diff --git a/slides/figs/helicopter_bo_b_20.pdf b/slides/figs/helicopter_bo_b_20.pdf
new file mode 100644
index 00000000..72b565db
Binary files /dev/null and b/slides/figs/helicopter_bo_b_20.pdf differ
diff --git a/slides/figs/helicopter_bo_b_21.pdf b/slides/figs/helicopter_bo_b_21.pdf
new file mode 100644
index 00000000..dafa8a2d
Binary files /dev/null and b/slides/figs/helicopter_bo_b_21.pdf differ
diff --git a/slides/figs/helicopter_bo_b_22.pdf b/slides/figs/helicopter_bo_b_22.pdf
new file mode 100644
index 00000000..7b2fd634
Binary files /dev/null and b/slides/figs/helicopter_bo_b_22.pdf differ
diff --git a/slides/figs/helicopter_bo_b_23.pdf b/slides/figs/helicopter_bo_b_23.pdf
new file mode 100644
index 00000000..507cae44
Binary files /dev/null and b/slides/figs/helicopter_bo_b_23.pdf differ
diff --git a/slides/figs/helicopter_bo_b_24.pdf b/slides/figs/helicopter_bo_b_24.pdf
new file mode 100644
index 00000000..13a88558
Binary files /dev/null and b/slides/figs/helicopter_bo_b_24.pdf differ
diff --git a/slides/figs/helicopter_bo_b_25.pdf b/slides/figs/helicopter_bo_b_25.pdf
new file mode 100644
index 00000000..2d10ae91
Binary files /dev/null and b/slides/figs/helicopter_bo_b_25.pdf differ
diff --git a/slides/figs/helicopter_bo_b_26.pdf b/slides/figs/helicopter_bo_b_26.pdf
new file mode 100644
index 00000000..2180d5d4
Binary files /dev/null and b/slides/figs/helicopter_bo_b_26.pdf differ
diff --git a/slides/figs/helicopter_bo_b_27.pdf b/slides/figs/helicopter_bo_b_27.pdf
new file mode 100644
index 00000000..a2328295
Binary files /dev/null and b/slides/figs/helicopter_bo_b_27.pdf differ
diff --git a/slides/figs/helicopter_bo_b_28.pdf b/slides/figs/helicopter_bo_b_28.pdf
new file mode 100644
index 00000000..08047465
Binary files /dev/null and b/slides/figs/helicopter_bo_b_28.pdf differ
diff --git a/slides/figs/helicopter_bo_b_29.pdf b/slides/figs/helicopter_bo_b_29.pdf
new file mode 100644
index 00000000..c43d4da8
Binary files /dev/null and b/slides/figs/helicopter_bo_b_29.pdf differ
diff --git a/slides/figs/helicopter_bo_b_3.pdf b/slides/figs/helicopter_bo_b_3.pdf
new file mode 100644
index 00000000..e4a38684
Binary files /dev/null and b/slides/figs/helicopter_bo_b_3.pdf differ
diff --git a/slides/figs/helicopter_bo_b_4.pdf b/slides/figs/helicopter_bo_b_4.pdf
new file mode 100644
index 00000000..34772749
Binary files /dev/null and b/slides/figs/helicopter_bo_b_4.pdf differ
diff --git a/slides/figs/helicopter_bo_b_5.pdf b/slides/figs/helicopter_bo_b_5.pdf
new file mode 100644
index 00000000..975e829c
Binary files /dev/null and b/slides/figs/helicopter_bo_b_5.pdf differ
diff --git a/slides/figs/helicopter_bo_b_6.pdf b/slides/figs/helicopter_bo_b_6.pdf
new file mode 100644
index 00000000..7b02831d
Binary files /dev/null and b/slides/figs/helicopter_bo_b_6.pdf differ
diff --git a/slides/figs/helicopter_bo_b_7.pdf b/slides/figs/helicopter_bo_b_7.pdf
new file mode 100644
index 00000000..3be28d5f
Binary files /dev/null and b/slides/figs/helicopter_bo_b_7.pdf differ
diff --git a/slides/figs/helicopter_bo_b_8.pdf b/slides/figs/helicopter_bo_b_8.pdf
new file mode 100644
index 00000000..ec524695
Binary files /dev/null and b/slides/figs/helicopter_bo_b_8.pdf differ
diff --git a/slides/figs/helicopter_bo_b_9.pdf b/slides/figs/helicopter_bo_b_9.pdf
new file mode 100644
index 00000000..3bac8a95
Binary files /dev/null and b/slides/figs/helicopter_bo_b_9.pdf differ
diff --git a/slides/figs/helicopter_bo_initial_data.pdf b/slides/figs/helicopter_bo_initial_data.pdf
new file mode 100644
index 00000000..5b9b2a4b
Binary files /dev/null and b/slides/figs/helicopter_bo_initial_data.pdf differ
diff --git a/slides/figs/helicopter_bo_initial_fit.pdf b/slides/figs/helicopter_bo_initial_fit.pdf
new file mode 100644
index 00000000..95df83b4
Binary files /dev/null and b/slides/figs/helicopter_bo_initial_fit.pdf differ
diff --git a/slides/figs/helicopter_bo_initial_fit_draws.pdf b/slides/figs/helicopter_bo_initial_fit_draws.pdf
new file mode 100644
index 00000000..b1e4a502
Binary files /dev/null and b/slides/figs/helicopter_bo_initial_fit_draws.pdf differ
diff --git a/slides/figs/helicopter_bo_maximizing_density.pdf b/slides/figs/helicopter_bo_maximizing_density.pdf
new file mode 100644
index 00000000..282130d1
Binary files /dev/null and b/slides/figs/helicopter_bo_maximizing_density.pdf differ
diff --git a/slides/figs/helicopter_bo_maximizing_density_2.pdf b/slides/figs/helicopter_bo_maximizing_density_2.pdf
new file mode 100644
index 00000000..f0f8b9b0
Binary files /dev/null and b/slides/figs/helicopter_bo_maximizing_density_2.pdf differ
diff --git a/slides/figs/helicopter_hier_stable.pdf b/slides/figs/helicopter_hier_stable.pdf
new file mode 100644
index 00000000..da37e936
Binary files /dev/null and b/slides/figs/helicopter_hier_stable.pdf differ
diff --git a/slides/figs/helicopter_hier_time.pdf b/slides/figs/helicopter_hier_time.pdf
new file mode 100644
index 00000000..37f62d80
Binary files /dev/null and b/slides/figs/helicopter_hier_time.pdf differ
diff --git a/slides/figs/jacobian-10.png b/slides/figs/jacobian-10.png
new file mode 100644
index 00000000..e93201b7
Binary files /dev/null and b/slides/figs/jacobian-10.png differ
diff --git a/slides/figs/jacobian-12.png b/slides/figs/jacobian-12.png
new file mode 100644
index 00000000..0fb1f8f5
Binary files /dev/null and b/slides/figs/jacobian-12.png differ
diff --git a/slides/figs/jacobian-14.png b/slides/figs/jacobian-14.png
new file mode 100644
index 00000000..b3f82e33
Binary files /dev/null and b/slides/figs/jacobian-14.png differ
diff --git a/slides/figs/jacobian-16.png b/slides/figs/jacobian-16.png
new file mode 100644
index 00000000..347465fd
Binary files /dev/null and b/slides/figs/jacobian-16.png differ
diff --git a/slides/figs/jacobian-17.png b/slides/figs/jacobian-17.png
new file mode 100644
index 00000000..dab2c334
Binary files /dev/null and b/slides/figs/jacobian-17.png differ
diff --git a/slides/figs/jacobian-18.png b/slides/figs/jacobian-18.png
new file mode 100644
index 00000000..a6c04130
Binary files /dev/null and b/slides/figs/jacobian-18.png differ
diff --git a/slides/figs/jacobian-20.png b/slides/figs/jacobian-20.png
new file mode 100644
index 00000000..0f214e84
Binary files /dev/null and b/slides/figs/jacobian-20.png differ
diff --git a/slides/figs/jacobian-6.png b/slides/figs/jacobian-6.png
new file mode 100644
index 00000000..2b39b5db
Binary files /dev/null and b/slides/figs/jacobian-6.png differ
diff --git a/slides/figs/jacobian-7.png b/slides/figs/jacobian-7.png
new file mode 100644
index 00000000..47a3e40e
Binary files /dev/null and b/slides/figs/jacobian-7.png differ
diff --git a/slides/figs/jacobian-8.png b/slides/figs/jacobian-8.png
new file mode 100644
index 00000000..4e719e81
Binary files /dev/null and b/slides/figs/jacobian-8.png differ
diff --git a/slides/figs/pathfinder_funnel_example.pdf b/slides/figs/pathfinder_funnel_example.pdf
new file mode 100644
index 00000000..8d0534b7
Binary files /dev/null and b/slides/figs/pathfinder_funnel_example.pdf differ
diff --git a/slides/figs/pathfinder_logit_example.pdf b/slides/figs/pathfinder_logit_example.pdf
new file mode 100644
index 00000000..37313fd6
Binary files /dev/null and b/slides/figs/pathfinder_logit_example.pdf differ
diff --git a/slides/figs/sales_dist1.pdf b/slides/figs/sales_dist1.pdf
index ed1cec56..da1a12fa 100644
Binary files a/slides/figs/sales_dist1.pdf and b/slides/figs/sales_dist1.pdf differ
diff --git a/slides/figs/sales_dist2.pdf b/slides/figs/sales_dist2.pdf
index 42536727..f71f12b2 100644
Binary files a/slides/figs/sales_dist2.pdf and b/slides/figs/sales_dist2.pdf differ
diff --git a/slides/figs/sales_exputil.pdf b/slides/figs/sales_exputil.pdf
index 34d03719..925b909e 100644
Binary files a/slides/figs/sales_exputil.pdf and b/slides/figs/sales_exputil.pdf differ
diff --git a/slides/figs/sales_utility_20_30.pdf b/slides/figs/sales_utility_20_30.pdf
index b1cba3c3..160cfd14 100644
Binary files a/slides/figs/sales_utility_20_30.pdf and b/slides/figs/sales_utility_20_30.pdf differ
diff --git a/slides/figs/sales_utilprob_20_30.pdf b/slides/figs/sales_utilprob_20_30.pdf
index 8b2f8e58..a600126e 100644
Binary files a/slides/figs/sales_utilprob_20_30.pdf and b/slides/figs/sales_utilprob_20_30.pdf differ
diff --git a/slides/figs/sales_utilprob_exputil_20_30.pdf b/slides/figs/sales_utilprob_exputil_20_30.pdf
index d7ce4624..5890d200 100644
Binary files a/slides/figs/sales_utilprob_exputil_20_30.pdf and b/slides/figs/sales_utilprob_exputil_20_30.pdf differ
diff --git a/slides/figs/student_retention_sbinom_linpreds10.pdf b/slides/figs/student_retention_sbinom_linpreds10.pdf
new file mode 100644
index 00000000..af0163bc
Binary files /dev/null and b/slides/figs/student_retention_sbinom_linpreds10.pdf differ