Chapter 7 Appendix

7.1 Main Stan Distributions Cheatsheet

Statistical modeling in Stan is powered by a flexible and expressive probabilistic language grounded in log-density functions. While the modeling blocks (model, data, parameters, etc.) help structure a model, the core statistical logic is defined through distributions. This cheatsheet offers a practical summary of the most important distributions used in Stan, their syntax, required parameters, typical use cases, and examples of where they show up in statistical modeling.


Distribution Function Parameters Use Case Model Type(s)
Bernoulli `bernoulli_lpmf(y θ)` θ ∈ (0, 1) Binary outcome (0/1) Logistic regression, classification
Binomial `binomial_lpmf(y n, θ)` n ∈ ℕ⁺, θ ∈ (0, 1) # of successes in n trials Logistic GLMs, grouped binomial models
Categorical `categorical_lpmf(y θ)` θ: simplex vector (length K) Single draw from K categories Multinomial regression
Multinomial `multinomial_lpmf(y θ)` y: int vector of counts, θ: simplex Category count data Count models with category splits
Normal `normal_lpdf(y μ, σ)` μ ∈ ℝ, σ > 0 Gaussian noise, residuals Linear regression, priors for real parameters
Student’s t `student_t_lpdf(y ν, μ, σ)` ν > 0, μ ∈ ℝ, σ > 0 Heavy-tailed data, robust models Robust regression, hierarchical priors
Cauchy `cauchy_lpdf(y μ, σ)` μ ∈ ℝ, σ > 0 Weakly informative, heavy-tailed prior Priors on scale parameters (e.g., τ ~ cauchy(0, 2.5))
Exponential `exponential_lpdf(y λ)` λ > 0 Time to event, memoryless processes Survival models, Poisson process modeling
Gamma `gamma_lpdf(y α, β)` α > 0, β > 0 Positive skewed data Priors on rates or shape parameters
Inverse Gamma `inv_gamma_lpdf(y α, β)` α > 0, β > 0 Prior for variances Priors on σ², τ², especially in hierarchies
Lognormal `lognormal_lpdf(y μ, σ)` μ ∈ ℝ, σ > 0 Positive, right-skewed data Income, durations, reliability
Beta `beta_lpdf(y α, β)` α > 0, β > 0 Probabilities or proportions Priors on probabilities (θ ∈ (0, 1))
Dirichlet `dirichlet_lpdf(θ α)` θ: simplex, α > 0 vector Probabilities summing to 1 Priors for category proportions, LDA
Poisson `poisson_lpmf(y λ)` λ > 0 Count data, rare event modeling GLMs for count data
Negative Binomial `neg_binomial_2_lpmf(y μ, φ)` μ > 0, φ > 0 Overdispersed count data GLMs with extra-Poisson variation
Ordered Logistic `ordered_logistic_lpmf(y η, c)` η ∈ ℝ, c: ordered cut-points Ordinal outcomes Ordinal regression
Uniform `uniform_lpdf(y a, b)` a < b Flat prior within range Non-informative priors
Pareto `pareto_lpdf(y y_min, α)` y_min > 0, α > 0 Heavy-tail data, power-law phenomena Extremes, outlier modeling
Von Mises `von_mises_lpdf(y μ, κ)` μ ∈ [0, 2π), κ ≥ 0 Circular data (angles, wind direction) Directional models
Weibull `weibull_lpdf(y α, σ)` α, σ > 0 Survival times, failure rates Survival models, reliability analysis
LKJ Correlation `lkj_corr_cholesky_lpdf(L η)` η > 0, L: Cholesky factor Prior for correlation matrices Hierarchical models with random slopes
Wishart `wishart_lpdf(S ν, Σ)` ν > dim-1, Σ: scale matrix Prior on covariance matrices Multivariate Gaussian models (rarely used)

7.2 Main Stan Functions Cheatsheet

Stan is a robust platform for Bayesian statistical modeling, renowned for its Hamiltonian Monte Carlo (HMC) engine and flexible modeling language. While probability distributions like normal_lpdf or poisson_lpmf define priors and likelihoods, Stan’s non-distribution functions—spanning mathematical operations, matrix algebra, utility tools, and specialized solvers—are equally critical for building efficient and expressive models. These functions enable data transformations, efficient computations, and post-processing in the generated quantities block.

This cheatsheet organizes Stan’s most commonly used non-distribution functions into categories, providing their purpose, example usage, and the model types where they’re most applicable. Whether you’re crafting linear regressions, hierarchical models, or dynamic systems, this guide will help you leverage Stan’s toolkit effectively. We’ll wrap up with an example model to bring these functions to life.

7.3 Why Non-Distribution Functions?

Stan’s non-distribution functions serve several key purposes: - Transformations: Functions like log, exp, and inv_logit map parameters to constrained spaces or perform nonlinear calculations. - Matrix Operations: Functions like dot_product and cholesky_decompose enable efficient linear algebra for multivariate models. - Utilities: Functions like to_vector and mean simplify data manipulation and posterior summaries. - Specialized Tools: Solvers like ode_rk45 and integrate_1d tackle complex systems, such as differential equations or custom likelihoods. - Posterior Processing: Functions in the generated quantities block, like sum or sd, compute diagnostics or predictions.

This cheatsheet focuses on these functions to help you streamline model specification and analysis.

7.4 Stan Functions Cheatsheet

7.4.1 1. Mathematical Functions

These functions perform scalar operations, often used in transformed parameters or model blocks.

Function Purpose Example Usage Model Type(s)
abs(x) Absolute value real z = abs(x); General computations, robust stats
exp(x) Exponential (e^x) lambda = exp(alpha); Rate models, transformations
log(x) Natural logarithm real l = log(y); Log-likelihoods, transformations
sqrt(x) Square root sigma = sqrt(variance); Variance computations, scaling
lgamma(x) Log gamma function lp += lgamma(alpha); Mixture models, custom likelihoods
log_sum_exp(x) Log-sum-exp for numerical stability lp = log_sum_exp(log_theta); Mixture models, marginal likelihoods

7.4.2 2. Transformation Functions

These map parameters to constrained spaces, often in transformed parameters.

Function Purpose Example Usage Model Type(s)
inv_logit(x) Logistic sigmoid (ℝ → (0,1)) theta = inv_logit(alpha + beta*x); Logistic regression, probability models
logit(p) Log-odds ((0,1) → ℝ) eta = logit(p); Logistic regression, probit models
softmax(x) Normalize vector to simplex theta = softmax(alpha); Multinomial regression, LDA
inv(x) Reciprocal (1/x) inv_sigma = inv(sigma); Variance transformations

7.4.3 3. Matrix and Vector Operations

These enable efficient linear algebra, critical for multivariate and hierarchical models.

Function Purpose Example Usage Model Type(s)
dot_product(a, b) Inner product of two vectors real z = dot_product(a, b); Linear regression, similarity measures
matrix_times_vector(A, v) Matrix-vector multiplication eta = matrix_times_vector(X, beta); Multivariate regression, GLMs
cholesky_decompose(S) Cholesky factorization L = cholesky_decompose(Sigma); Hierarchical models, multivariate normals
multiply_lower_tri_self_transpose(L) Covariance from Cholesky factor Sigma = multiply_lower_tri_self_transpose(L); Multivariate normals, hierarchical models
diag_matrix(v) Diagonal matrix from vector M = diag_matrix(v); Covariance priors, scaling
determinant(A) Matrix determinant det = determinant(Sigma); Model diagnostics, multivariate priors

7.4.4 4. Utility Functions

These simplify data manipulation and posterior summaries, often in generated quantities.

Function Purpose Example Usage Model Type(s)
to_vector(x) Convert matrix/array to vector vec = to_vector(matrix); Posterior summaries, data reshaping
to_array_1d(x) Convert to 1D array arr = to_array_1d(matrix); Data preprocessing, summaries
sum(x) Sum of elements total = sum(y); Aggregations, diagnostics
mean(x) Mean of elements avg = mean(y_rep); Posterior summaries, diagnostics
sd(x) Standard deviation std = sd(y_rep); Posterior summaries, diagnostics
int_step(x) Indicator (x ≥ 0 → 1, else 0) flag = int_step(x - 1); Conditional logic, model diagnostics

7.4.5 5. Specialized Solvers

These handle advanced computations like differential equations or parallel processing.

Function Purpose Example Usage Model Type(s)
ode_rk45(fun, y0, t0, ts, ...) Solve ODEs (Runge-Kutta 45) y = ode_rk45(ode_sys, y0, t0, ts, params); Dynamic systems, pharmacokinetics
integrate_1d(f, a, b, ...) Numerical integration val = integrate_1d(f, a, b, params); Custom likelihoods, marginalization
map_rect(f, phi, ...) Parallel computation over data shards results = map_rect(f, phi, theta, data); Large-scale hierarchical models

7.5 Example: Hierarchical Linear Regression

Here’s a Stan model for a hierarchical linear regression, using matrix_times_vector, to_vector, and mean to demonstrate practical function usage:

data {
  int<lower=0> N; // Number of observations
  int<lower=0> J; // Number of groups
  array[N] int<lower=1,upper=J> group; // Group indicators
  matrix[N, 2] X; // Design matrix (intercept + predictor)
  vector[N] y; // Outcome
}
parameters {
  vector[2] beta; // Fixed effects
  vector[J] alpha; // Group-level intercepts
  real<lower=0> sigma; // Residual standard deviation
  real<lower=0> tau; // Standard deviation of group intercepts
}
model {
  beta ~ normal(0, 5); // Prior on fixed effects
  tau ~ cauchy(0, 2.5); // Prior on group SD
  alpha ~ normal(0, tau); // Group-level priors
  sigma ~ cauchy(0, 2.5); // Prior on residual SD
  vector[N] mu = matrix_times_vector(X, beta) + to_vector(alpha[group]);
  y ~ normal(mu, sigma); // Likelihood
}
generated quantities {
  vector[N] y_rep; // Posterior predictive
  real mean_y_rep; // Mean of predictions
  for (n in 1:N) {
    y_rep[n] = normal_rng(matrix_times_vector(X[n], beta) + alpha[group[n]], sigma);
  }
  mean_y_rep = mean(to_vector(y_rep)); // Summary statistic
}

This model: - Uses matrix_times_vector to compute the linear predictor efficiently. - Employs to_vector to align group-level intercepts with observations. - Computes mean_y_rep in generated quantities using mean and to_vector for posterior diagnostics. - Generates predictions with normal_rng for posterior predictive checks.

7.6 Tips for Using Stan Functions

  1. Efficiency: Prefer vectorized operations like matrix_times_vector over loops for speed.
  2. Numerical Stability: Use log_sum_exp for summing exponentials to avoid overflow.
  3. Posterior Analysis: Leverage mean, sd, and to_vector in generated quantities for summaries and diagnostics.
  4. Constraints: Ensure inputs meet requirements (e.g., x > 0 for log, positive-definite matrices for cholesky_decompose).
  5. Advanced Modeling: Use ode_rk45 for dynamic systems or map_rect for parallelized large-scale models.
  6. Documentation: The Stan Reference Manual (e.g., version 2.33) and Stan’s GitHub examples provide detailed guidance.