Bayesian Quantile Regression
As discussed in an earlier post quantile regression is a powerful statistical technique that allows for the estimation of conditional quantiles of the response variable. Unlike traditional regression models that focus on the mean, quantile regression provides insights into the distribution of the dependant variable at different quantiles. This enables a more comprehensive understanding of the relationship between predictors and the response across the entire distribution. In recent years, the Bayesian approach to quantile regression using the probabilistic programming language Stan has gained popularity due to its flexibility and ability to incorporate prior knowledge. The Bayesian framework provides a natural way to account for uncertainty in quantile regression. By specifying prior distributions for the model parameters, we can incorporate existing knowledge or beliefs about the relationships between predictors and quantiles. This is particularly useful in situations where limited data are available or when prior information from previous studies is available.
In several posts we presented Stan as a powerful probabilistic programming language that allows for flexible and efficient Bayesian modeling. It provides a user-friendly interface for specifying complex statistical models and performs efficient inference using Hamiltonian Monte Carlo (HMC) sampling. Bayesian quantile Regression uses the capabilities of Stan to estimate the conditional quantiles of a response variable. The key idea behind Bayesian quantile Regression with Stan is to model the conditional quantiles as a function of the predictor variables using a hierarchical Bayesian framework. The model assumes that the response variable follows a distribution that depends on the predictor variables and a set of latent variables. The latent variables capture the uncertainty in the estimation of the quantiles and are assigned prior distributions. In the Bayesian quantile regression framework, the model is specified by defining prior distributions for the regression coefficients and other relevant parameters. Using Markov chain Monte Carlo (MCMC) Stan provides a joint posterior distribution that characterizes the uncertainty in the estimates, allowing for probabilistic inference and hypothesis testing.
One advantage of the Bayesian approach to quantile regression is its ability to handle complex and flexible models. This is particularly important in healthcare research, where the relationships between predictors and health outcomes are often multifaceted and may vary across different quantiles. The Bayesian framework allows for modeling these complexities, enabling a more nuanced understanding of the factors influencing health outcomes. Another advantage of Bayesian quantile regression with Stan is its ability to incorporate informative priors. Prior knowledge or expert opinions can be explicitly incorporated into the model by assigning prior distributions to the parameters. This is especially valuable in healthcare research, where domain expertise can guide the specification of prior distributions, leading to more accurate and interpretable results. Informative priors help to regularize the estimation process, particularly when the data are limited or when there is a need to borrow strength from related studies.
Another advantage of Bayesian quantile Regression is its ability to provide uncertainty estimates for the estimated quantiles. The posterior distribution of the parameters obtained from the MCMC sampling provides a measure of uncertainty for the estimated quantiles. This allows for a more comprehensive understanding of the relationship between variables and provides a basis for decision-making under uncertainty. In addition to estimating the conditional quantiles, Bayesian quantile Regression with Stan also allows for hypothesis testing and model comparison. Hypothesis tests can be performed by comparing the posterior distributions of the parameters to a null hypothesis. Model comparison can be done using techniques such as the deviance information criterion (DIC) or the widely applicable information criterion (WAIC).
Bayesian quantile Regression with Stan is a powerful and flexible approach for estimating the conditional quantiles of a response variable, by incorporating prior information and providing uncertainty estimates, it allows for better decision-making under uncertainty. With its ability to incorporate prior knowledge and handle complex models, Bayesian quantile regression with Stan facilitates more accurate and nuanced analyses, enhancing our understanding of the factors influencing health outcomes. Likewise, by explicitly modeling the conditional quantiles of the response variable, the approach provides a comprehensive understanding of the relationships between predictors and health outcomes with a great promise for advancing healthcare research and informing evidence-based decision-making in clinical practice and policy settings.
To illustrate the use of Bayesian quantile Regression with Stan, let’s consider an example with synthetic heteroskedastic data.
Model Specification
Using bayesQR package
Simulate data from heteroskedastic regression
set.seed(66)
n <- 200
X <- runif(n=n,min=0,max=10)
X <- X
y <- 1 + 2*X + rnorm(n=n, mean=0, sd=.6*X)
Estimate series of quantile regressions with adaptive lasso to limit execution time of the example, ndraw is set to a very low value. Set value to 5000 for a better approximation of the posterior distirubtion.
out <- bayesQR(y~X, quantile=c(.05,.25,.5,.75,.95), alasso=TRUE, ndraw=500)
Initiate plot
plot(X, y, main="", cex=.6, xlab="X")
## Add quantile regression lines to the plot (exclude first 500 burn-in draws)
sum <- summary(out, burnin=50)
for (i in 1:length(sum)){
abline(a=sum[[i]]$betadraw[1,1],b=sum[[i]]$betadraw[2,1],lty=i,col=i)
}
outOLS = lm(y~X)
abline(outOLS,lty=1,lwd=2,col=6)
# Add legend to plot
legend(x=0,y=max(y),legend=c(.05,.25,.50,.75,.95,"OLS"),lty=c(1,2,3,4,5,1),
lwd=c(1,1,1,1,1,2),col=c(1:6),title="Quantile")
Using brms package
n <- 200
x <- runif(n = n, min = 0, max = 10)
y <- 1 + 2 * x + rnorm(n = n, mean = 0, sd = 0.6*x)
dat <- data.frame(x, y)
# fit the 20%-quantile
fit <- brm(bf(y ~ x, quantile = 0.2), data = dat, family = asym_laplace())
summary(fit)
Backend stan model
functions {
/* helper function for asym_laplace_lpdf
* Args:
* y: the response value
* quantile: quantile parameter in (0, 1)
*/
real rho_quantile(real y, real quantile) {
if (y < 0) {
return y * (quantile - 1);
} else {
return y * quantile;
}
}
/* asymmetric laplace log-PDF for a single response
* Args:
* y: the response value
* mu: location parameter
* sigma: positive scale parameter
* quantile: quantile parameter in (0, 1)
* Returns:
* a scalar to be added to the log posterior
*/
real asym_laplace_lpdf(real y, real mu, real sigma, real quantile) {
return log(quantile * (1 - quantile)) -
log(sigma) -
rho_quantile((y - mu) / sigma, quantile);
}
/* asymmetric laplace log-CDF for a single quantile
* Args:
* y: a quantile
* mu: location parameter
* sigma: positive scale parameter
* quantile: quantile parameter in (0, 1)
* Returns:
* a scalar to be added to the log posterior
*/
real asym_laplace_lcdf(real y, real mu, real sigma, real quantile) {
if (y < mu) {
return log(quantile) + (1 - quantile) * (y - mu) / sigma;
} else {
return log1m((1 - quantile) * exp(-quantile * (y - mu) / sigma));
}
}
/* asymmetric laplace log-CCDF for a single quantile
* Args:
* y: a quantile
* mu: location parameter
* sigma: positive scale parameter
* quantile: quantile parameter in (0, 1)
* Returns:
* a scalar to be added to the log posterior
*/
real asym_laplace_lccdf(real y, real mu, real sigma, real quantile) {
if (y < mu) {
return log1m(quantile * exp((1 - quantile) * (y - mu) / sigma));
} else {
return log1m(quantile) - quantile * (y - mu) / sigma;
}
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
}
transformed parameters {
real quantile = 0.2; // quantile parameter
real lprior = 0; // prior contributions to the log posterior
lprior += student_t_lpdf(Intercept | 3, 11, 7.8);
lprior += student_t_lpdf(sigma | 3, 0, 7.8)
- 1 * student_t_lccdf(0 | 3, 0, 7.8);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept + Xc * b;
for (n in 1:N) {
target += asym_laplace_lpdf(Y[n] | mu[n], sigma, quantile);
}
}
// priors including constants
target += lprior;
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
This Stan code specifies a Bayesian model for asymmetric Laplace regression, where the main goal is to estimate the population-level effects and dispersion parameter of the model from the provided data. The asymmetric Laplace distribution is used as the likelihood function for the response variable.
-
The
functions
block contains three helper functions:rho_quantile
,asym_laplace_lpdf
, andasym_laplace_lccdf
. These functions are used to calculate the asymmetric Laplace log-PDF, log-CDF, and log-CCDF for a single response variable. -
The
data
block defines the input data for the model, including the total number of observationsN
, the response variableY
, the number of population-level effectsK
, the population-level design matrixX
, and a binary variableprior_only
that indicates whether to ignore the likelihood (for prior-only sampling). -
The
transformed data
block preprocesses the data. It calculates the centered version of the design matrixXc
, removes the intercept from the design matrixX
, and stores the column means ofX
before centering in the vectormeans_X
. -
The
parameters
block defines the parameters to be estimated in the model. It includes the population-level effectsb
, the temporary intercept for centered predictorsIntercept
, and the dispersion parametersigma
. -
The
transformed parameters
block calculates the quantile parameterquantile
(set to 0.2 in this case) and the prior contributions to the log posterior (lprior
). Thelprior
term includes the priors for theIntercept
andsigma
parameters, which are specified as Student’s t-distributions. -
The
model
block defines the likelihood and priors for the model. The likelihood accounts for the asymmetric Laplace distribution for the response variableY
, given the linear predictormu
(calculated using the population-level effectsb
andIntercept
) and the dispersion parametersigma
. Ifprior_only
is true, the likelihood is ignored, and the model only considers the priors. -
The
generated quantities
block computes the actual population-level interceptb_Intercept
by removing the effect of the centered predictors from the temporary interceptIntercept
.
References
- Yu, K., & Moyeed, R. A. (2001). Bayesian quantile regression. Statistics & Probability Letters, 54(4), 437-447.
- Kottas, A., & Gelfand, A. E. (2001). Bayesian semiparametric median regression modeling. Journal of the American Statistical Association, 97(457), 109-121.
- Koenker, R., & Xiao, Z. (2006). Quantile autoregression. Journal of the American Statistical Association, 101(475), 980-990.
- Yu, K., & Moyeed, R. A. (2000). Bayesian quantile regression. Journal of the Royal Statistical Society: Series D (The Statistician), 49(3), 385-392.
- Koenker, R., & Xiao, Z. (2004). Inference on the quantile regression process. Econometrica, 72(1), 71-104.
- https://cran.r-project.org/web/packages/bayesQR/bayesQR.pdf