- Tim Hopper
- 2015-10-13
- Qadium
Much of the literature on Dirichlet Processes makes assertions similar to the following:
- "DP is the Dirichlet process, a distribution over distributions." (Neal, 2000)
- "[The Dirichlet process] is a distribution over distributions, i.e. each draw from a Dirichlet process is itself a distribution." (Teh, 2010)
- "The Dirichlet process (DP) is a distribution over distributions." (Gershman and Blei, 2012)
- The "Dirichlet process defines a distribution on random probability measures..." (Sudderth, 2006)
- "Dirichlet processes define a distribution over distributions..." (Ghahramani, 2005)
Michael Jordan makes an equivalent statement.
Each of these sources makes the claim that a Dirichlet Process is a distribution over probability distributions. That is, given a base distribution
Confusingly, while many sources refer to the DP as a distribution over distributions, when using the phrase "sample from a Dirichlet process", they mean a sample from
After being confused by this point for some time, I prepared these notes arguging that the Dirichlet process is a distribution over distributions. I argued that the term sample from a Dirichlet process should refer to a distribution sampled from the DP, not to a point sampled from the support of
In response to my notes, Dan Roy briefly argued that "The Dirichlet process is a distribution on the space of probability measures" is a misstatement. In fact, Roy argues that
Thomas Ferguson first defined the Dirichlet Process in his 1973 paper. Charles Antoniak (a student of Ferguson) repeats the definition in his his 1974 paper. Antoniak's definition is as follows:
Let
$\Theta$ be a set, and$\mathcal{A}$ a$\sigma$ -field of subsets of$\Theta$ . Let$\beta$ be a finite, nonnull, nonnegative, finitely additive measure on$(\Theta, \mathcal{A})$ . We say a random probability measure$P$ on$(\Theta, \mathcal{A})$ is a Dirichlet process on$(\Theta, \mathcal{A})$ with parameter$\beta$ , denoted$P\in \mathcal{D}(\beta)$ , if for every$k=1, 2, \ldots$ and measurable partition$B_1,\ldots,B_k$ of$\Theta$ , the joint distribution of the random probabilities$(P(B_1),\ldots,P(B_k))$ is Dirichlet with parameters$(\beta(B_1),\ldots,\beta(B_k))$ , denoted$(P(B_1),\ldots,P(B_k))\in \mathcal{D}(\beta(B_1),\ldots,\beta(B_k))$ .
Let's unpack this dense, measure theoretic definition.
Let
$\Theta$ be a set, and$\mathcal{A}$ a$\sigma$ -field of subsets of$\Theta$ . Let$\beta$ be a finite, nonnull, nonnegative, finitely additive measure on$(\Theta, \mathcal{A})$ .
First, note that
We say a random probability measure
$P$ on$(\Theta, \mathcal{A})$ is a Dirichlet process on$(\Theta, \mathcal{A})$ with parameter$\beta$ , denoted$P\in \mathcal{D}(\beta)$
TODO
...if for every
$k=1, 2, \ldots$ and measurable partition$B_1,\ldots,B_k$ of$\Theta$ , the joint distribution of the random probabilities$(P(B_1),\ldots,P(B_k))$ is Dirichlet with parameters$(\beta(B_1),\ldots,\beta(B_k))$ , denoted$(P(B_1),\ldots,P(B_k))\in \mathcal{D}(\beta(B_1),\ldots,\beta(B_k))$ .