diff --git a/stan-semantics/stan-semantics.Rmd b/stan-semantics/stan-semantics.Rmd index 4f8a6b2..0881309 100644 --- a/stan-semantics/stan-semantics.Rmd +++ b/stan-semantics/stan-semantics.Rmd @@ -1,5 +1,5 @@ --- -title: "Semantics for the Stan probabilistic programming language" +title: "Operational semantics for the Stan probabilistic programming language" author: "Bob Carpenter" date: "March 2020" output: @@ -34,7 +34,7 @@ programming language semantics # Introduction -This paper introduces a denotational and operational semantics for Stan, +This paper introduces a big-step operational semantics for Stan, a differentiable probabilistic programming language. ## About Stan @@ -133,9 +133,9 @@ generated code rather than in the language. # Overview -# MiniStan Sequent Calculus +# Step 1. Mini Stan -## Objects +## Abstract Syntax \[ L \in \textsf{NumericLiteral} @@ -149,12 +149,16 @@ $$ T \in \textsf{Type} ::= \texttt{real} $$ +$$ +F \in \textsf{Function} +$$ + $$ D \in \textsf{Declaration} \ \begin{array}[t]{rl} ::= & \epsilon \\ -\mid & V : T +\mid & V {:} T \\ \mid & D_1, D_2 \end{array} @@ -176,7 +180,7 @@ S \in \textsf{Statement} \ \begin{array}[t]{rl} ::= & \epsilon \\ -\mid & \texttt{target +=} E \\ +\mid & \texttt{target +=} \ E \\ \mid & S_1, S_2 \end{array} $$ @@ -188,63 +192,151 @@ P \in \textsf{Program} \ \, \texttt{model} \ \, S $$ +## Type inference + +*Type environments* + $$ \Gamma \in \textsf{TypeEnvironment} \ \begin{array}[t]{rl} ::= & \epsilon \\ -\mid & V : T \\ +\mid & V {:} T \\ \mid & \Gamma_1, \Gamma_2 \end{array} $$ -## Type inference +*Type calculus* $$ \frac {} - {V:T \vdash V:T} - (1) + {V{:} T \vdash V{:} T} $$ $$ \frac - {\Gamma_1 \vdash V:T} - {\Gamma_1, \, \Gamma_2 \vdash V:T} - (\textrm{2a}) + {\Gamma_1 \vdash V{:} T} + {\Gamma_1, \, \Gamma_2 \vdash V{:} T} \qquad \frac - {\Gamma_2 \vdash V:T} - {\Gamma_1, \, \Gamma_2 \vdash V:T} - (\textrm{2b}) + {\Gamma_2 \vdash V{:} T} + {\Gamma_1, \, \Gamma_2 \vdash V{:} T} $$ $$ \frac{\Gamma \vdash F:(T_1 \times \cdots \times T_N) \rightarrow T_0 \qquad - \Gamma \vdash E_1 : T_1 + \Gamma \vdash E_1 {:} T_1 \quad \cdots \quad - \Gamma \vdash E_N : T_N} - {\Gamma \vdash F(E_1, \ldots, E_N) : T_0} - (3) + \Gamma \vdash E_N {:} T_N} + {\Gamma \vdash F(E_1, \ldots, E_N) {:} T_0} $$ - -## Evaluation -### Expression evaluation +$$ +\frac + {} + {\Gamma \vdash \epsilon : ()} +$$ + +$$ +\frac + {\Gamma_1 \vdash + D_1 : T_1, \ \Gamma_2 + \qquad + \Gamma_2 \vdash + D_2 : T_2, \ \Gamma_3} + {\Gamma_1 \vdash D_1, D_2 : T_1 \times T_2, \ \Gamma_3} +$$ + + +$$ +\frac + {\Gamma_1 + \vdash D^{\textrm{dat}} : T^{\textrm{dat}}, \Gamma_2 + \qquad + \Gamma_2 + \vdash D^{\textrm{par}} : T^{\textrm{par}}, \Gamma_3 + \qquad + \Gamma_3 \vdash S : \texttt{void}} + {\Gamma_1 \vdash + \texttt{data} \ D^{\textrm{dat}}, \, + \ \texttt{parameters} \ D^{\textrm{par}}, \, + \ \texttt{model} \ S + : T^{\textrm{dat}} \rightarrow T^{\textrm{par}} \rightarrow \texttt{real}} +$$ + +## Operational semantics + +*Value domain* + +$$ +y \in \textsf{Value}_{\texttt{real}} + = \mathbb{R} +$$ + +$$ +f \in \textsf{Value}_{T_1 \rightarrow T_2} + = \textsf{Value}_{T_1} \rightarrow \textsf{Value}_{T_2} +$$ + +$$ +x \in T_1 \times T_2 + = \textsf{Value}_{T_1} \times \textsf{Value}_{T_2} +$$ + + +*Program state* + +$$ +\sigma \in \textsf{State} = \textsf{Variable} \rightharpoonup \textsf{Value} +$$ + +$$ +\sigma[V_1 = y](V_2) += \begin{cases} +y & \textrm{if} \ V_1 = V_2 +\\ +\sigma(V_2) & \textrm{otherwise} +\end{cases} +$$ + + +*Declaration calculus* $$ \frac {} - {N, \sigma \Downarrow n} - (1) + {\epsilon \Downarrow ()} +$$ + +$$ +\frac + {} + {V{:}T \Downarrow V{:}T} +$$ + +$$ +\frac + {D_1 \Downarrow \Gamma_1 + \qquad + D_2 \Downarrow \Gamma_2} + {D_1, D_2 \Downarrow \Gamma_1, \Gamma_2} +$$ + + +*Expression calculus* + +$$ +\frac + {\textrm{val}(L) = x} + {L, \sigma \Downarrow x} $$ $$ \frac {\sigma(V) = v} {V, \sigma \Downarrow v} - (2) $$ $$ @@ -254,24 +346,22 @@ $$ \quad \cdots \quad E_N, \sigma \Downarrow v_N} {F(E_1, \ldots, E_N), \sigma \Downarrow f(v_1, \ldots v_N)} - (3) $$ -### Statement evaluation + +*Statement calculus* $$ \frac {E, \sigma \Downarrow e} - {\sigma, \texttt{target}\,\texttt{+=} \ E + {\texttt{target}\,\texttt{+=} \ E, \ \sigma \Downarrow \sigma[{\small \textrm{target} = \textrm{target} + e}]} - (1) $$ $$ \frac {} {\epsilon, \sigma \Downarrow \sigma} - (2) $$ $$ @@ -279,398 +369,409 @@ $$ \qquad S_2, \sigma_2 \Downarrow \sigma_3} {(S_1, S_2), \sigma_1 \Downarrow \sigma_3} - (3) $$ -### Declaration evaluation + +*Program calculus* $$ \frac - {} - {\Gamma, \epsilon \vdash \Gamma} - (1) + {\vdash \texttt{data} \ D^{\textrm{dat}}; + \ \texttt{parameters} \ D^{\textrm{par}}; \ + \ \texttt{model} \ S : \texttt{void} + \qquad + S, \left( \sigma^{\textrm{dat}} \cup \sigma^{\textrm{par}} + \right)\![ \textrm{target} = 0 ] + \Downarrow \tau} + {\texttt{data} \ D^{\textrm{dat}}; + \ \texttt{parameters} \ D^{\textrm{par}}; + \ \texttt{model} \ S, + \ \left( \sigma^{\textrm{dat}} \cup \sigma^{\textrm{par}} + \right) \Downarrow \tau} $$ + +# Mini-Stan 2: vectors, row vectors, matrices and arrays + +## Abstract syntax + $$ -\frac - {\Gamma_1, D_1 \Downarrow \Gamma_2 - \qquad - \Gamma_2, D_2 \Downarrow \Gamma_3} - {\Gamma_1, (D_1, D_2) \Downarrow \Gamma_3} - (2) +T \in \textsf{Type} +\ \begin{array}[t]{rl} +::= & \texttt{int} +\\ +\mid & \texttt{real} +\\ +\mid & \texttt{vector} +\\ +\mid & \texttt{rowvector} +\\ +\mid & \texttt{matrix} +\\ +\mid & T\texttt{[}\,\texttt{]} +\end{array} $$ + $$ -\frac - {} - {V{:}T, \Gamma_1 \Downarrow V{:}T, \Gamma_1} - (3) +E \in \textsf{Expression} +\ \begin{array}[t]{rl} +::= & L +\\ +\mid & V +\\ +\mid & F\texttt{(}E_1, \ldots, E_n\texttt{)} \qquad [n \geq 0] +\\ +\mid & E_1[E_2] +\\ +\mid & \texttt{\{} E_1, \ldots, E_N \texttt{\}} +\\ +\mid & \texttt{[} E_1, \ldots, E_N \texttt{]} +\end{array} $$ -### Program evaluation +## Type Inference +*Type ordering by assignability* +$$ +\frac + {} + {\vdash T \sqsubseteq T} +$$ -# MiniStan +$$ +\frac + {\vdash T_1 \sqsubset T_2} + {\vdash T_1 \sqsubseteq T_2} +$$ -The design of Stan was strongly influenced by the design of BUGS. -In this section, a simple BUGS-like version of Stan is defined -that is simpler still in that it only involves univariate variables -and does not introduce local variables. This section lays out -MiniStan in the same form as the more general Stan is layed out in the -rest of the document. +$$ +\frac + {} + {\vdash \texttt{int} \sqsubset \texttt{real}} +$$ -All expressions in MiniStan denote single real quantities, so the set of -types is the singleton $$ -\textsf{Type} = \{ \texttt{real} \}. +\frac + {\vdash T_1 \sqsubset T_2} + {\vdash T_1[] \sqsubset T_2[]} $$ -Each type has a range of values expressions of that type may have. In -MiniStan, there is only one type, expressions of which take values in -the set $\textsf{Value}_{\textt{real}}$. The Stan language is -polymorphic by design and can work with value types including -floating point and automatic differentiation variables. -MiniStan includes a countably infinite set of variables. For -concreteness, and to match Stan's notion of identifier, let +*Type calculus* + $$ -\textsf{Variable} = \texttt{a-ZA-Z(a-zA-Z0-9_)*}. +\frac + {\Gamma \vdash E : T_1 + \qquad + \vdash T_1 \sqsubseteq T_2} + {\Gamma \vdash E : T_2} $$ -There is only one type, $\texttt{real}$, which is used for every -variable. The variable typing function $\textrm{type} : -\textsf{Variable} \rightarrow \textsf{Type}$ is thus constant, so that $$ -\textrm{type}(v) = \texttt{real} +\frac + {\Gamma \vdash E_1 : T + \quad \cdots \quad + \Gamma \vdash E_N : T} + {\Gamma \vdash \{ E_1, \ldots, E_N \} : T[]} $$ -for every $v \in \textsf{Variable}$. - -Expressions are mathematical formulas made up of constants, variables, -and functions that denote values. In MiniStan, the set -\textsf{Expression} is defined to be the smallest such that -* $y \in \textsf{Expression}$ if $y \in \mathsf{Value}_{\texttt{real}}$, -* $v \in \textsf{Expression}$ if $v \in \textsf{Variable}$, and -* $f(e_1, \ldots, e_N) \in \textsf{Expression}$ if - $f : \textsf{real}( - \underbrace{\textsf{real}, \ldots, \textsf{real}} - _{N \ \textrm{times}})$, - and $e_1, \ldots, e_N \in \textsf{Expression}.$ -In practice, values cannot be used directly and must be expressed -using a literal syntax, such as standard computational scientific -notation for real numbers (e.g., `-1.3e-12` for $-1.3 \times -10^{-12}$), which is then translated into the value type. +$$ +\frac + {\Gamma \vdash E_1 : \texttt{real} + \quad \cdots \quad + \Gamma \vdash E_N : \texttt{real}} + {\Gamma \vdash [ E_1, \ldots, E_N ] : \texttt{rowvector}} +$$ -An environment is a mapping from variables to values, which for -MiniStan reduces to a simple function -$\Gamma : \textsf{Variable}_{\texttt{real}} - \rightarrow \textsf{Value}_{\texttt{real}}$. +$$ +\frac + {\Gamma \vdash E_1 : \texttt{rowvector} + \quad \cdots \quad + \Gamma \vdash E_N : \texttt{rowvector}} + {\Gamma \vdash [ E_1, \ldots, E_N ] : \texttt{matrix}} +$$ -Expressions have denotations in their value types relative to an -environment $\Gamma$, defined by +$$ +\frac + {\Gamma \vdash E_1:T[] \qquad \Gamma \vdash E_2:\texttt{int}} + {\Gamma \vdash E_1[E_2] : T} +$$ -* ${}_{\Gamma}[\![ y ]\!] = y$ - if $y \in \textsf{Value}_{\texttt{real}}$, -* ${}_{\Gamma}[\![ v ]\!] = \Gamma(v)$ - if $v \in \textsf{Variable}_{\texttt{real}}$, and -* ${}_{\Gamma}[\![ f(e_1, \ldots, e_N)]\!] - = f'({}_{\Gamma}[\![ e_1 ]\!], \ldots - {}_{\Gamma}[\![ e_N ]\!])$ where - $f' : \textsf{Value}_{\texttt{real}}^N \rightarrow - \textsf{Value}_{\texttt{real}}$ is the denotation of $f$ +$$ +\frac + {\Gamma \vdash E_1:\texttt{vector} \qquad \Gamma \vdash E_2:\texttt{int}} + {\Gamma \vdash E_1[E_2] : \texttt{real}} +$$ -MiniStan is restricted to sampling statements. The set -$\textsf{Statement}$ of statements is the smallest such that +$$ +\frac + {\Gamma \vdash E_1:\texttt{rowvector} + \qquad + \Gamma \vdash E_2:\texttt{int}} + {\Gamma \vdash E_1[E_2] : \texttt{real}} +$$ -* $e_0 \sim f(e_1, \ldots, e_N)\texttt{;} \in \textsf{Statement}$ - if $e_0, e_1, \ldots, e_N \in \textsf{Expression}_{\texttt{real}}$ - and $f' : (\textsf{Value}_{\texttt{real}})^{N + 1} \rightarrow - \textsf{Value}_{\texttt{real}}.$ +$$ +\frac + {\Gamma \vdash E_1:\texttt{matrix} + \qquad + \Gamma \vdash E_2:\texttt{int}} + {\Gamma \vdash E_1[E_2] : \texttt{rowvector}} +$$ -The other content of a MiniStan program is a variable -declaration. The set $\textsf{Declaration}$ of variable declarations -is the least such that +## Operational Semantics -* $\texttt{real} \ v\texttt{;} \in \textsf{Declaration}$ if $v \in - \textsf{Variable}$. +*Value domain* -A MiniStan program consists of three blocks, +$$ +i \in \textsf{Value}_{\texttt{int}} = \mathbb{Z} +$$ -* a data block declaring variables that are known and read in -externally, -* a parameters block declaring parameters that are unknown and for -which inference is desired, and -* a model block defining a log density. +$$ +a \in \textsf{Value}_{T[]} = T^* = \cup_{n = 0}^{\infty} T^n +$$ -The set $\textsf{Program}$ is the smallest such that +$$ +v \in \textsf{Value}_{\texttt{vector}} = \mathbb{R}^* +$$ -* $\texttt{data \{} d_1, \ldots, d_M \texttt{\}}$ - $\texttt{parameters \{} u_1, \ldots, u_N \texttt{\}}$ - $\ \texttt{model \{} s_1, \ldots, s_J \texttt{\}} - \in \textsf{Program}$ if - $d_1, \ldots d_M \in \textsf{Declaration}$, - $u_1, \ldots u_N \in \textsf{Declaration}$, - $s_1, \ldots s_J \in \textsf{Statement}$, - no variable is declared more than once, and every variable - used in the model block is declared. +$$ +u \in \textsf{Value}_{\texttt{rowvector}} = \mathbb{R}^* +$$ -The denotation of a MiniStan program $p \in \textsf{Program}$ is a -function $$ -[\![p]\!]: \mathsf{Value}_{\texttt{real}}^M - \rightarrow - (\mathsf{Value}_{\texttt{real}}^N \rightarrow -\mathbb{R}) +m \in \textsf{Value}_{\texttt{matrix}} += \cup_{m = 0}^{\infty} \cup_{n = 0}^{\infty} + \mathbb{R}^m \times \mathbb{R}^n $$ -that maps a data vector $y$ to a log density function $[\![ p -]\!](y)$, which in turn maps a parameter vector $\theta$ to a log -density $[\![ p ]\!](y)(\theta)$. Letting $\pi$ be the posterior -density for parameters $\theta$ given data $y$, the denotation of a -Stan program is + +*Expression calculus* + $$ -[\![ p ]\!](y)(\theta) = \log \pi(\theta \mid y) + \textrm{const.}, +\frac{E_1, \sigma \Downarrow x + \qquad + E_2 \sigma \Downarrow n} + {E_1[E_2], \sigma \Downarrow x_n} $$ -where $\textrm{const}$ does not depend on $\theta$. -An environment is a mapping from variables to values. Given $y \in -\mathbb{R}^M$ and $\theta \in \mathbb{R}^N$, define an environment -$\Gamma$ such that -$$ -\Gamma_{y,\theta}(v) -= -\begin{cases} -y_m & \textrm{if} \ d_m = \texttt{real} \ v; -\\[4pt] -\theta_n & \textrm{if} \ u_n = \texttt{real} \ v; -\\[4pt] -\textrm{undefined} & \textrm{otherwise}. -\end{cases} $$ +\frac{E_1, \sigma \Downarrow x_1 + \quad \cdots \quad + E_N, \sigma \Downarrow x_N} + {\{ E_1, \ldots, E_N \}, \sigma \Downarrow (x_1, \ldots, x_N)} +$$ -A MiniStan program $p$ has a denotation $[\![ p ]\!]$ defined by $$ -[\![ p ]\!](y, \theta) = \sum_{j = 1}^J {}_{\Gamma_{y,\theta}}[\![ s_j ]\!], +\frac{E_1, \sigma \Downarrow x_1 + \quad \cdots \quad + E_N, \sigma \Downarrow x_N} + {[ E_1, \ldots, E_N ], \sigma + \Downarrow + \begin{bmatrix} + x_1 \\ + \vdots \\ + x_N + \end{bmatrix}} $$ -where the $j$-th statement in the program block is + +# Mini-Stan 3: local variables and assignment + +## Abstract syntax + $$ -s_j = e_{j, 0} \sim f(e_{j, 1}, \ldots, e_{j, K_j}) +S \in \textsf{Statement} +\ +\begin{array}[t]{rl} +::= & \epsilon \\ +\mid & \texttt{target +=} \ E \\ +\mid & S_1; S_2 \\ +\mid & \{ D; S \} \\ +\mid & V[E_1] \ldots [E_N] \ \texttt{=} \ E_0 +\end{array} $$ -and its contribution to the log density is given by + +## Operational semantics + +*Type calculus* + $$ -\Gamma_{y,\theta}[\![ s_j ]\!] -= -\Gamma_{y,\theta}[\![ \, f_j(e_{j, 0}, e_{j, 1}, \ldots, e_{j, K_j}) ]\!]. +\frac + {V{:}T, \Gamma \vdash S : \texttt{null}} + {\Gamma \vdash \{ V{:}T; S \} : \texttt{null}} $$ -In Bayesian applications, given a fixed data set $y$, the posterior -log density function $[\![ p ]\!](y)$ can be used with a sampler to -draw a posterior sample $$ -\theta^{(1)}, \ldots, \theta^{(S)} \sim \pi(\theta \mid y), +\frac + {\Gamma \vdash V[E_1]\cdots[E_N] : T + \quad + \Gamma \vdash E_0 : T} + {\Gamma \vdash V[E_1]\cdots[E_N] = E_0 : \texttt{null}} +$$ + + +*Statement Calculus* + $$ -or with an optimizer to find a posterior mode, +\frac + {E_1, \sigma \Downarrow i_1 + \quad \cdots \quad + E_N, \sigma \Downarrow i_N + \quad + E_0, \sigma \Downarrow y} + {V[E_1]\cdots[E_N] = E_0, \sigma + \Downarrow + \sigma[V = \sigma(V)[(i_1, \ldots, i_N) = y]]} +$$ + $$ -\theta^* = \textrm{arg max}_{\theta} \ \pi(\theta \mid y). +\frac + {S, \sigma[V = T()] \Downarrow \tau} + {\{ V{:}T; \, S \}, \sigma + \Downarrow \tau[V{\uparrow}]} $$ -With automatic differentiation variables for $\theta$, gradient-based -algorithms like Hamiltonian Monte Carlo sampling and limited-memory -quasi-Newton optimization can be used, because it is relatively -inexpensive to accurately compute $\nabla_{\theta} \log p(\theta \mid -y)$. +where $T() \in \textsf{Val}_T$ is a special value used for +initialization, +$$ +x[(i_1, \ldots, i_N) = y](j_1, \ldots, j_N) += +\begin{cases} +y & \textrm{if} \ (i_1, \ldots, i_N) = (j_1, \ldots, j_N) +\\ +x[j_1, \ldots, j_N] & \textrm{otherwise}, +\end{cases} +$$ +and +$$ +f[x{\uparrow}](y) += +\begin{cases} +\textrm{undefined} & \textrm{if} \ x = y +\\ +f(y) & \textrm{otherwise}. +\end{cases} +$$ -# Stan +# Mini-Stan 4: conditionals and loops -## Types +## Abstract syntax -Every variable in Stan is declared with a type that is known -statically. From the types of variables, the type of an expression -may be inferred. -In addition to function types, there are three classes of types in -Stan: +$$ +T \in \textsf{Type} +\ \begin{array}[t]{rl} +::= & \texttt{bool} +\\ +\mid & \texttt{int} +\\ +\mid & \texttt{real} +\\ +\mid & \texttt{vector} +\\ +\mid & \texttt{rowvector} +\\ +\mid & \texttt{matrix} +\\ +\mid & T\texttt{[}\,\texttt{]} +\end{array} +$$ -* basic types, used for function argument declarations, -* sized types, used for local variable declarations, and -* constrained types, used for block variable declarations. +$$ +\textrm{true}, \textrm{false} \in \textsf{NumericLiteral}_{\texttt{bool}} +$$ -The set $\textsf{BasicType}$ is the least such that +$$ +S \in \textsf{Statement} +\ +\begin{array}[t]{rl} +::= & \epsilon \\ +\mid & \texttt{target +=} \ E \\ +\mid & S_1; S_2 \\ +\mid & \{ D; S \} \\ +\mid & V[E_1] \ldots [E_N] \ \texttt{=} \ E_0 \\ +\mid & \texttt{if} \ (E) \ S \\ +\mid & \texttt{while} \ (E) \ S \\ +\end{array} +$$ -* $\texttt{real}, \texttt{int} \in \textsf{BasicType}$, -* $\texttt{vector}, \texttt{row_vector}, \texttt{matrix} \in - \textsf{BasicType}$, and -* $T[] \in \textsf{BasicType}$ if $T \in \textsf{BasicType}$. +## Type inference -The number of dimensions of a type is given by the function -$\textrm{dim}:\textsf{BasicType} \rightarrow \mathbb{N}$ defined -inductively by +*Type ordering by assignability* -* $\textrm{dim}(\texttt{real}) - = \textrm{dim}(\texttt{int}) - = 0$ -* $\textrm{dim}(\texttt{vector}) - = \textrm{dim}(\texttt{row_vector}) - = 1$, -* $\textrm{dim}(\texttt{matrix}) = 2$, and -* $\textrm{dim}(T[]) = 1 + \textrm{dim}(T)$. - -The set $\textsf{SizedType}$ is the least such that +$$ +\frac{}{\texttt{bool} \sqsubset \texttt{int}} +$$ -* $\texttt{real}, \texttt{int} \in \textsf{SizedType}$, -* $\texttt{vector}[n] \in \textsf{SizedType}$ if $n \in \mathbb{N}$, -* $\texttt{row_vector}[m] \in \textsf{SizedType}$ if $m \in \mathbb{N}$, -* $\texttt{matrix}[m, n] \in \textsf{SizedType}$ if $m, n \in - \mathbb{N}$, and -* $T[n] \in \textsf{SizedType}$ if $T \in \textsf{SizedType}$ and $n - \in \mathbb{N}$. -Each sized type corresponds to the basic type derived by removing the sizes. +## Operational semantics -The set $\textsf{ConstrainedType}$ is the least such that - -* $\texttt{real}, \texttt{int} \in \textsf{ConstrainedType}$, -* $\texttt{vector}[n], \texttt{unit_vector}[n]$, $\texttt{simplex}[n], - \texttt{ordered}[n]$, $\texttt{pos_ordered}[n] \in \textsf{ConstrainedType}$ - if $n \in \mathbb{N}$, -* $\texttt{row_vector}[m] \in \textsf{ConstrainedType}$ if $m \in \mathbb{N}$, -* $\texttt{matrix}[m, n] \in \textsf{ConstrainedType}$ if $m, n \in - \mathbb{N}$, and -* $\texttt{cov_matrix}[m], \texttt{corr_matrix}[n]$, - $\texttt{cholesky_factor}[n]$, $\texttt{cholesky_factor_corr}[n] \in - \textsf{ConstrainedType}$ if $m, n \in \mathbb{N}$, and -* $T[] \in \textsf{ConstrainedType}$ if $T \in \textsf{ConstrainedType}$. - -Each constrained type corresponds to a sized type that is derived by -dropping the constraint. Specifically, - -* $\texttt{vector}[n]$ is the sized type for the constrained types - $\texttt{unit_vector}[n]$, $\texttt{simplex}[n]$, - $\texttt{ordered}[n]$, and $\texttt{pos_ordered}[n]$; and -* $\texttt{matrix}[n, n]$ is the sized type for the constrained types - $\texttt{cov_matrix}[m]$, $\texttt{corr_matrix}[n]$, - $\texttt{cholesky_factor}[n]$, and $\texttt{cholesky_factor_corr}[n]$. +*Type calculus* -Stan uses simply typed functions. The set $\textsf{FunctionType}$ is -the least such that +$$ +\frac + {\Gamma \vdash E : \texttt{bool} \qquad \Gamma \vdash S : \texttt{null}} + {\Gamma \vdash \texttt{if} \ (E) \ S : \texttt{null}} +$$ -* $T_0(T_1, \ldots, T_N) \in \textsf{FunctionType}$ if - $T_0, T_1, \ldots, T_N \in \textsf{FunctionType} \cup \textsf{BasicType}$. +$$ +\frac + {\Gamma \vdash E : \texttt{bool} \qquad \Gamma \vdash S : \texttt{null}} + {\Gamma \vdash \texttt{while} \ (E) \ S : \texttt{null}} +$$ -## Variables, expressions, and type inference +*Statement calculus* -Expressions are syntactic constructs that denote values. Stan assumes -there is some computational approximation of a real-valued field (in -the algebraic sense) over which real-valued expressions take their -value. Integers are assumed to be a subset of the real values. In -practice, real values are representated using fixed-size floating -point numbers and fixed size integers for constants and data and -automatic differentiation variables for parameters. - -The set $\textsf{Variable}$ consists of a countably infinite set of -unique identifiers. - -The set $\textsf{Expression}$ is the least such that - -* $y \in \textsf{Expression}$ if $y \in \mathbb{R} \cup \{ -\infty, - +\infty, \textrm{NaN} \}$, -* $n \in \textsf{Expression}$ if $n \in \mathbb{Z}$, -* $v \in \mathsf{Expression}$ if $v \in \textsf{Variable}$, and -* $f(e_1, \ldots, e_n) \in \textsf{Expression}$ if - $f, e_1, \ldots, e_n \in \textsf{Expression}$. +$$ +\frac + {E, \sigma \Downarrow \textrm{false}} + {\texttt{if} \ (E) \ S, \sigma \Downarrow \sigma} +$$ -The *typing relation* $:$ is defined relative to a variable typing -function $\textrm{type} : \textsf{Variable} \rightarrow \textsf{Type}$ as the -smallest relation such that - -* $y : \texttt{real}$ if $y \in \mathbb{R} \cup \{ -\infty, - +\infty, \textrm{NaN} \}$, -* $n : \texttt{int}$ if $n \in \mathbb{Z}$, -* $v : T$ if $v \in \textsf{Variable}$ and $\textrm{type}(v) = T$ -* $f(e_1, \ldots, e_n) : T_0$ if - $f : T_0(T_1, \ldots, T_N)$ and $e_1:T_1, \ldots, e_N:T_n$. - -## Values of types - -Expressions take on values in a range that is determined by their -type. The sets of possible values $\textsf{Value}_T$ for each sized -type $T \in \textsf{SizedType}$ is defined jointly by - -* $\textsf{Value}_{\texttt{int}} = \mathbb{Z}$, -* $\textsf{Value}_{\texttt{real}} = \mathbb{R}$, -* $\textsf{Value}_{\texttt{vector}[n]} = \mathbb{R}^n$, -* $\textsf{Value}_{\texttt{row_vector}[m]} = \mathbb{R}^m$, -* $\textsf{Value}_{\texttt{matrix}[m, n]} = \mathbb{R}^{m \times n}$, -* $\textsf{Value}_{T[k]} = (\textsf{Value}_T)^k$, and -* $\textsf{Value}_{T_0(T_1, \ldots, T_N)} - = (\textsf{Value}_{T_1} \times \cdots \times \textsf{Value}_{T_N}) - \rightarrow \textsf{Value}_{T_N})$. - -Possible values for unsized types are the union of their sized counterparts, - -* $\textsf{Value}_{\texttt{vector}[]} - = \bigcup_{n \in \mathbb{N}} \textsf{Value}_{\texttt{vector}[n]}$ -* $\textsf{Value}_{\texttt{matrix}} - = \bigcup_{m, n \in \mathbb{N}} - \textsf{Value}_{\texttt{matrix}[m, n]}$ -* $\textsf{Value}_{T[]} - = \bigcup_{k \in \mathbb{N}} \textsf{Value}_{T[k]}$ - - -## Environments and expression denotations - -Given an expression $e$ of type $T$, its denotation will be an element -of $\textsf{Value}_T$. An *environment* is a function -$\Gamma: \textrm{Variable} \rightarrow \textrm{Value}$. Relative to -an environment $\Gamma$, an expression $e$ of type $T$ has a denotation -${}_\Gamma[\![ e ]\!] \in \textrm{Value}_T$ defined by - -* ${}_\Gamma[\![ y ]\!] - = y$ if $y \in \mathbb{R} \cup \{-\infty, \infty, \textrm{NaN} \}$, -* ${}_\Gamma[\![ n ]\!] - = n$ if $n \in \mathbb{Z}$, -* ${}_\Gamma[\![ v ]\!] - = \Gamma(v)$ if $v \in \textsf{Variable}$, and -* ${}_\Gamma[\![ f(e_1, \ldots, e_N) ]\!] - = {}_\Gamma[\![f]\!]({}_\Gamma[\![e_1]\!], \ldots, {}_\Gamma[\![e_N]\!])$ - if $f : T_0(T_1, \ldots, T_N)$ and $e_1 : T_1, \ldots, e_N: T_N$. - -## Lvalues - -Lvalues are the set of expressions that may appear on the left hand -side of an assignment statement. The set $\textsf{Lvalue}$ is defined -to be the least such that +$$ +\frac + {E, \sigma \Downarrow \textrm{true} + \qquad + S, \sigma \Downarrow \tau} + {\texttt{if} \ (E) \ S, \sigma \Downarrow \tau} +$$ -* $v \in \textsf{Lvalue}$ if $v \in \textsf{Variable}$ and -* $v[i_1, \ldots, i_N] \in \textsf{Lvalue}$ if $v \in - \textsf{Variable}$, $\textrm{dim}(\textrm{type}(v)) \geq N$, and - $i_1, \ldots, i_N \in \mathbb{N}$. -## Variable declarations +$$ +\frac + {E, \sigma \Downarrow \textrm{false}} + {\texttt{while} \ (E) \ S, \sigma \Downarrow \sigma} +$$ -The set \textsf{Declaration} of variable declarations is the smallest -such that +$$ +\frac + {E, \sigma \Downarrow \textrm{true} + \qquad + S, \sigma \Downarrow \rho + \qquad + \texttt{while} \ (E) \ S, \rho \Downarrow \tau} + {\texttt{while} \ (E) \ S, \sigma \Downarrow \tau} +$$ -* $T v \in \textsf{Declaration}$ - if $T \in \textsf{Type}$ and $v \in \textsf{Variable}$ -## Statements +# References {-} -The set $\textsf{Statement}$ is the smallest such that +There are several nice introductions to probabilistic programming +drawn from courses in the UK and Ireland. This paper leans heavily on -* $l = e \in \textsf{Statement}$
- if $l \in \textsf{Lvalue}_T$ and $e \in \textsf{Expression}_T$, -* $y \sim \ p(\theta_1, \ldots, \theta_N) \in \textsf{Statement}$
- if $p : \texttt{real}(T_0, T_1, \ldots, T_N), - y : T_0, \theta_1 : T_1, \ldots, \theta_N : T_N$, -* $\{ d_1; \cdots d_M; s_1; \cdots s_N; \} \in \textsf{Statement}$
- if $d_1, \ldots, d_M \in \textsf{Declaration}$ - and $s_1, \ldots, s_N \in \textsf{Statement}$, +* Matthew Hennessy. 2014. [Semantics of programming + languages](https://www.scss.tcd.ie/Matthew.Hennessy/splexternal2015/LectureNotes/Notes14%20copy.pdf). + CS3017 course notes. Trinity College Dublin. +* Andrew M. Pitts. 2015. [Lecture notes on semantics of programming + languages](https://www.scss.tcd.ie/Matthew.Hennessy/splexternal2015/resources/Pitts.pdf). + Computer science tripos, part IB. University of Cambridge. +* Peter Sewell. 2015. [Semantics of programming + languages](https://www.cl.cam.ac.uk/teaching/1516/Semantics/notes.pdf). + Computer science tripos, part IB. University of Cambridge. -
-
\ No newline at end of file