Chapter 7 Appendix
7.1 Main Stan Distributions Cheatsheet
Statistical modeling in Stan is powered by a flexible and expressive probabilistic language grounded in log-density functions. While the modeling blocks (model
, data
, parameters
, etc.) help structure a model, the core statistical logic is defined through distributions. This cheatsheet offers a practical summary of the most important distributions used in Stan, their syntax, required parameters, typical use cases, and examples of where they show up in statistical modeling.
Distribution | Function | Parameters | Use Case | Model Type(s) | |
---|---|---|---|---|---|
Bernoulli | `bernoulli_lpmf(y | θ)` | θ ∈ (0, 1) |
Binary outcome (0/1) | Logistic regression, classification |
Binomial | `binomial_lpmf(y | n, θ)` | n ∈ ℕ⁺ , θ ∈ (0, 1) |
# of successes in n trials |
Logistic GLMs, grouped binomial models |
Categorical | `categorical_lpmf(y | θ)` | θ : simplex vector (length K) |
Single draw from K categories | Multinomial regression |
Multinomial | `multinomial_lpmf(y | θ)` | y : int vector of counts, θ : simplex |
Category count data | Count models with category splits |
Normal | `normal_lpdf(y | μ, σ)` | μ ∈ ℝ , σ > 0 |
Gaussian noise, residuals | Linear regression, priors for real parameters |
Student’s t | `student_t_lpdf(y | ν, μ, σ)` | ν > 0 , μ ∈ ℝ , σ > 0 |
Heavy-tailed data, robust models | Robust regression, hierarchical priors |
Cauchy | `cauchy_lpdf(y | μ, σ)` | μ ∈ ℝ , σ > 0 |
Weakly informative, heavy-tailed prior | Priors on scale parameters (e.g., τ ~ cauchy(0, 2.5) ) |
Exponential | `exponential_lpdf(y | λ)` | λ > 0 |
Time to event, memoryless processes | Survival models, Poisson process modeling |
Gamma | `gamma_lpdf(y | α, β)` | α > 0 , β > 0 |
Positive skewed data | Priors on rates or shape parameters |
Inverse Gamma | `inv_gamma_lpdf(y | α, β)` | α > 0 , β > 0 |
Prior for variances | Priors on σ² , τ² , especially in hierarchies |
Lognormal | `lognormal_lpdf(y | μ, σ)` | μ ∈ ℝ , σ > 0 |
Positive, right-skewed data | Income, durations, reliability |
Beta | `beta_lpdf(y | α, β)` | α > 0 , β > 0 |
Probabilities or proportions | Priors on probabilities (θ ∈ (0, 1) ) |
Dirichlet | `dirichlet_lpdf(θ | α)` | θ : simplex, α > 0 vector |
Probabilities summing to 1 | Priors for category proportions, LDA |
Poisson | `poisson_lpmf(y | λ)` | λ > 0 |
Count data, rare event modeling | GLMs for count data |
Negative Binomial | `neg_binomial_2_lpmf(y | μ, φ)` | μ > 0 , φ > 0 |
Overdispersed count data | GLMs with extra-Poisson variation |
Ordered Logistic | `ordered_logistic_lpmf(y | η, c)` | η ∈ ℝ , c : ordered cut-points |
Ordinal outcomes | Ordinal regression |
Uniform | `uniform_lpdf(y | a, b)` | a < b |
Flat prior within range | Non-informative priors |
Pareto | `pareto_lpdf(y | y_min, α)` | y_min > 0 , α > 0 |
Heavy-tail data, power-law phenomena | Extremes, outlier modeling |
Von Mises | `von_mises_lpdf(y | μ, κ)` | μ ∈ [0, 2π) , κ ≥ 0 |
Circular data (angles, wind direction) | Directional models |
Weibull | `weibull_lpdf(y | α, σ)` | α, σ > 0 |
Survival times, failure rates | Survival models, reliability analysis |
LKJ Correlation | `lkj_corr_cholesky_lpdf(L | η)` | η > 0 , L : Cholesky factor |
Prior for correlation matrices | Hierarchical models with random slopes |
Wishart | `wishart_lpdf(S | ν, Σ)` | ν > dim-1 , Σ : scale matrix |
Prior on covariance matrices | Multivariate Gaussian models (rarely used) |
7.2 Main Stan Functions Cheatsheet
Stan is a robust platform for Bayesian statistical modeling, renowned for its Hamiltonian Monte Carlo (HMC) engine and flexible modeling language. While probability distributions like normal_lpdf
or poisson_lpmf
define priors and likelihoods, Stan’s non-distribution functions—spanning mathematical operations, matrix algebra, utility tools, and specialized solvers—are equally critical for building efficient and expressive models. These functions enable data transformations, efficient computations, and post-processing in the generated quantities
block.
This cheatsheet organizes Stan’s most commonly used non-distribution functions into categories, providing their purpose, example usage, and the model types where they’re most applicable. Whether you’re crafting linear regressions, hierarchical models, or dynamic systems, this guide will help you leverage Stan’s toolkit effectively. We’ll wrap up with an example model to bring these functions to life.
7.3 Why Non-Distribution Functions?
Stan’s non-distribution functions serve several key purposes:
- Transformations: Functions like log
, exp
, and inv_logit
map parameters to constrained spaces or perform nonlinear calculations.
- Matrix Operations: Functions like dot_product
and cholesky_decompose
enable efficient linear algebra for multivariate models.
- Utilities: Functions like to_vector
and mean
simplify data manipulation and posterior summaries.
- Specialized Tools: Solvers like ode_rk45
and integrate_1d
tackle complex systems, such as differential equations or custom likelihoods.
- Posterior Processing: Functions in the generated quantities
block, like sum
or sd
, compute diagnostics or predictions.
This cheatsheet focuses on these functions to help you streamline model specification and analysis.
7.4 Stan Functions Cheatsheet
7.4.1 1. Mathematical Functions
These functions perform scalar operations, often used in transformed parameters
or model
blocks.
Function | Purpose | Example Usage | Model Type(s) |
---|---|---|---|
abs(x) |
Absolute value | real z = abs(x); |
General computations, robust stats |
exp(x) |
Exponential (e^x) | lambda = exp(alpha); |
Rate models, transformations |
log(x) |
Natural logarithm | real l = log(y); |
Log-likelihoods, transformations |
sqrt(x) |
Square root | sigma = sqrt(variance); |
Variance computations, scaling |
lgamma(x) |
Log gamma function | lp += lgamma(alpha); |
Mixture models, custom likelihoods |
log_sum_exp(x) |
Log-sum-exp for numerical stability | lp = log_sum_exp(log_theta); |
Mixture models, marginal likelihoods |
7.4.2 2. Transformation Functions
These map parameters to constrained spaces, often in transformed parameters
.
Function | Purpose | Example Usage | Model Type(s) |
---|---|---|---|
inv_logit(x) |
Logistic sigmoid (ℝ → (0,1)) | theta = inv_logit(alpha + beta*x); |
Logistic regression, probability models |
logit(p) |
Log-odds ((0,1) → ℝ) | eta = logit(p); |
Logistic regression, probit models |
softmax(x) |
Normalize vector to simplex | theta = softmax(alpha); |
Multinomial regression, LDA |
inv(x) |
Reciprocal (1/x) | inv_sigma = inv(sigma); |
Variance transformations |
7.4.3 3. Matrix and Vector Operations
These enable efficient linear algebra, critical for multivariate and hierarchical models.
Function | Purpose | Example Usage | Model Type(s) |
---|---|---|---|
dot_product(a, b) |
Inner product of two vectors | real z = dot_product(a, b); |
Linear regression, similarity measures |
matrix_times_vector(A, v) |
Matrix-vector multiplication | eta = matrix_times_vector(X, beta); |
Multivariate regression, GLMs |
cholesky_decompose(S) |
Cholesky factorization | L = cholesky_decompose(Sigma); |
Hierarchical models, multivariate normals |
multiply_lower_tri_self_transpose(L) |
Covariance from Cholesky factor | Sigma = multiply_lower_tri_self_transpose(L); |
Multivariate normals, hierarchical models |
diag_matrix(v) |
Diagonal matrix from vector | M = diag_matrix(v); |
Covariance priors, scaling |
determinant(A) |
Matrix determinant | det = determinant(Sigma); |
Model diagnostics, multivariate priors |
7.4.4 4. Utility Functions
These simplify data manipulation and posterior summaries, often in generated quantities
.
Function | Purpose | Example Usage | Model Type(s) |
---|---|---|---|
to_vector(x) |
Convert matrix/array to vector | vec = to_vector(matrix); |
Posterior summaries, data reshaping |
to_array_1d(x) |
Convert to 1D array | arr = to_array_1d(matrix); |
Data preprocessing, summaries |
sum(x) |
Sum of elements | total = sum(y); |
Aggregations, diagnostics |
mean(x) |
Mean of elements | avg = mean(y_rep); |
Posterior summaries, diagnostics |
sd(x) |
Standard deviation | std = sd(y_rep); |
Posterior summaries, diagnostics |
int_step(x) |
Indicator (x ≥ 0 → 1, else 0) | flag = int_step(x - 1); |
Conditional logic, model diagnostics |
7.4.5 5. Specialized Solvers
These handle advanced computations like differential equations or parallel processing.
Function | Purpose | Example Usage | Model Type(s) |
---|---|---|---|
ode_rk45(fun, y0, t0, ts, ...) |
Solve ODEs (Runge-Kutta 45) | y = ode_rk45(ode_sys, y0, t0, ts, params); |
Dynamic systems, pharmacokinetics |
integrate_1d(f, a, b, ...) |
Numerical integration | val = integrate_1d(f, a, b, params); |
Custom likelihoods, marginalization |
map_rect(f, phi, ...) |
Parallel computation over data shards | results = map_rect(f, phi, theta, data); |
Large-scale hierarchical models |
7.5 Example: Hierarchical Linear Regression
Here’s a Stan model for a hierarchical linear regression, using matrix_times_vector
, to_vector
, and mean
to demonstrate practical function usage:
data {
int<lower=0> N; // Number of observations
int<lower=0> J; // Number of groups
array[N] int<lower=1,upper=J> group; // Group indicators
matrix[N, 2] X; // Design matrix (intercept + predictor)
vector[N] y; // Outcome
}
parameters {
vector[2] beta; // Fixed effects
vector[J] alpha; // Group-level intercepts
real<lower=0> sigma; // Residual standard deviation
real<lower=0> tau; // Standard deviation of group intercepts
}
model {
beta ~ normal(0, 5); // Prior on fixed effects
tau ~ cauchy(0, 2.5); // Prior on group SD
alpha ~ normal(0, tau); // Group-level priors
sigma ~ cauchy(0, 2.5); // Prior on residual SD
vector[N] mu = matrix_times_vector(X, beta) + to_vector(alpha[group]);
y ~ normal(mu, sigma); // Likelihood
}
generated quantities {
vector[N] y_rep; // Posterior predictive
real mean_y_rep; // Mean of predictions
for (n in 1:N) {
y_rep[n] = normal_rng(matrix_times_vector(X[n], beta) + alpha[group[n]], sigma);
}
mean_y_rep = mean(to_vector(y_rep)); // Summary statistic
}
This model:
- Uses matrix_times_vector
to compute the linear predictor efficiently.
- Employs to_vector
to align group-level intercepts with observations.
- Computes mean_y_rep
in generated quantities
using mean
and to_vector
for posterior diagnostics.
- Generates predictions with normal_rng
for posterior predictive checks.
7.6 Tips for Using Stan Functions
- Efficiency: Prefer vectorized operations like
matrix_times_vector
over loops for speed. - Numerical Stability: Use
log_sum_exp
for summing exponentials to avoid overflow. - Posterior Analysis: Leverage
mean
,sd
, andto_vector
ingenerated quantities
for summaries and diagnostics. - Constraints: Ensure inputs meet requirements (e.g.,
x > 0
forlog
, positive-definite matrices forcholesky_decompose
). - Advanced Modeling: Use
ode_rk45
for dynamic systems ormap_rect
for parallelized large-scale models. - Documentation: The Stan Reference Manual (e.g., version 2.33) and Stan’s GitHub examples provide detailed guidance.