Chapter 5 Causal Inference in Practice IV: Instrumental Variables in Healthcare

5.1 Introduction

Establishing causal relationships in observational studies is challenging due to confounding, where unobserved factors influence both treatment assignment and outcomes, leading to biased estimates. While methods like propensity score matching or Difference-in-Differences address specific confounding scenarios, they often rely on strong assumptions, such as conditional ignorability or parallel trends, which may not hold when key confounders are unmeasured. Instrumental Variables (IV) estimation offers a robust alternative by leveraging exogenous variation to estimate causal effects, even in the presence of unobserved confounding, making it particularly valuable in settings like healthcare where randomized trials may be infeasible or unethical.

This essay explores the IV method, detailing its theoretical framework, mathematical formalism, and practical application in a healthcare context—estimating the causal effect of hospital quality on patient recovery time. Using simulated data, we implement IV estimation in R, verify key assumptions, and compare results with naive approaches. We also discuss the method’s strengths, limitations, and contrasts with other causal inference techniques, providing a clear and practical guide for researchers.

5.2 Theoretical Framework

5.2.1 Motivation and Identification

In observational data studies, unmeasured confounding violates the conditional ignorability assumption, i.e., $Y(1), Y(0) D X$, where $Y(1)$ and $Y(0)$ are potential outcomes under treatment ($D = 1$) and control ($D = 0$), and $X$ represents observed covariates. IV methods address this by using an instrument $Z$, a variable that induces variation in the treatment $D$ but affects the outcome $Y$ only through $D$. A valid instrument must satisfy three conditions:

Relevance: The instrument must be correlated with the treatment, formally $(Z, D) $.
Exclusion Restriction: The instrument affects the outcome only through the treatment, i.e., $Y = f(D, X, )$, with $Z f$.
Independence: The instrument is exogenous, independent of unobserved confounders affecting the outcome, i.e., $Z X$.

When these hold, IV identifies the Local Average Treatment Effect (LATE), the causal effect for compliers—individuals whose treatment status is influenced by the instrument. Unlike the Average Treatment Effect (ATE), which applies to the entire population, LATE is specific to this subpopulation, a trade-off for addressing unmeasured confounding.

5.2.2 Estimation

For binary instruments and treatments, the IV estimand is given by the Wald estimator:

\[ \widehat{\text{LATE}} = \frac{\mathbb{E}[Y \mid Z = 1] - \mathbb{E}[Y \mid Z = 0]}{\mathbb{E}[D \mid Z = 1] - \mathbb{E}[D \mid Z = 0]} \]

This ratio scales the reduced-form effect (instrument on outcome) by the first-stage effect (instrument on treatment). In general settings with continuous variables or covariates, two-stage least squares (2SLS) is used:

First Stage: Regress the treatment on the instrument and covariates:

\[ D_i = \pi_0 + \pi_1 Z_i + \pi_2^\top X_i + \nu_i \]
Second Stage: Regress the outcome on the predicted treatment and covariates:

\[ Y_i = \alpha_0 + \alpha_1 \hat{D}_i + \alpha_2^\top X_i + \varepsilon_i \]

The coefficient $_1$ estimates the LATE, consistent under valid IV assumptions, even with unmeasured confounding.

5.2.3 Comparison with Other Methods

IV differs from propensity score methods, which assume conditional ignorability and estimate ATE or ATT, requiring all confounders to be observed. IV, by contrast, handles unmeasured confounding but requires a valid instrument and estimates LATE. Difference-in-Differences (DiD) relies on parallel trends, assuming untreated units represent the counterfactual trend, and is suited for time-varying treatments but cannot address confounding that varies differentially over time. IV’s strength lies in its robustness to hidden bias, though it demands careful instrument selection and limits generalizability to compliers.

5.3 Healthcare Application: Hospital Quality and Recovery Time

5.3.1 Scenario

We aim to estimate the causal effect of hospital quality ($D_i = 1$ for high-quality hospitals, $D_i = 0$ for standard hospitals) on recovery time after surgery ($Y_i$, in days, lower is better). Naive regression, $Y_i = _0 + _1 D_i + _2^X_i + _i$, may be biased due to unmeasured confounders like patient health consciousness, which affects both hospital choice and recovery. We use geographic proximity to a high-quality hospital ($Z_i = 1$ if within 10 miles, $Z_i = 0$ otherwise) as an instrument, assuming patients closer to high-quality hospitals are more likely to choose them, but proximity affects recovery only through hospital choice.

5.3.2 Assumptions

Relevance: Proximity influences hospital choice ($(Z_i, D_i) $), testable via the first-stage F-statistic.
Exclusion Restriction: Proximity affects recovery only through hospital choice, not via other pathways like local healthcare quality.
Independence: Proximity is uncorrelated with unmeasured confounders (e.g., health consciousness), plausible after controlling for observables like income or education.

5.3.3 R Implementation

We simulate data for 1000 patients, with a true LATE of -5 days (high-quality hospitals reduce recovery time by 5 days for compliers). The R code below implements 2SLS, computes the Wald estimator, and compares with naive OLS.

if (!requireNamespace("AER", quietly = TRUE)) install.packages("AER")
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
library(AER)
library(ggplot2)

# Set seed for reproducibility
set.seed(123)

# Simulate data
n <- 1000  # Number of patients
p <- 5     # Number of covariates
X <- matrix(rnorm(n * p), nrow = n)  # Covariates (e.g., age, comorbidities)
colnames(X) <- paste0("X", 1:p)

# Instrument: Z = 1 if within 10 miles
Z <- rbinom(n, 1, 0.5)

# Treatment: D = 1 for high-quality hospital
pi_1 <- 0.8  # Strong instrument effect
D <- as.numeric(runif(n) < plogis(0.5 + pi_1 * Z + 0.2 * rowSums(X[, 1:2])))

# Outcome: Recovery time (days)
true_late <- -5
Y <- 20 + true_late * D + 2 * rowSums(X[, 1:3]) + rnorm(n, sd = 2)

# Data frame
data <- data.frame(Y = Y, D = D, Z = Z, X)

# First stage: Check relevance
first_stage <- lm(D ~ Z + ., data = data[, c("D", "Z", paste0("X", 1:p))])
cat("First-stage F-statistic:", summary(first_stage)$fstatistic[1], "\n")

# 2SLS estimation
iv_model <- ivreg(Y ~ D + . | Z + ., data = data[, c("Y", "D", "Z", paste0("X", 1:p))])
iv_results <- summary(iv_model)
cat("2SLS LATE estimate:", coef(iv_results)["D", "Estimate"], "\n")
cat("Standard error:", coef(iv_results)["D", "Std. Error"], "\n")

# Wald estimator
y_z1 <- mean(data$Y[data$Z == 1])
y_z0 <- mean(data$Y[data$Z == 0])
d_z1 <- mean(data$D[data$Z == 1])
d_z0 <- mean(data$D[data$Z == 0])
wald_estimate <- (y_z1 - y_z0) / (d_z1 - d_z0)
cat("Wald estimator LATE:", wald_estimate, "\n")

# Naive OLS
ols_model <- lm(Y ~ D + ., data = data[, c("Y", "D", paste0("X", 1:p))])
ols_estimate <- coef(ols_model)["D"]
cat("Naive OLS estimate:", ols_estimate, "\n")

# Visualize results
estimates <- data.frame(
  Method = c("2SLS", "Wald", "OLS"),
  Estimate = c(coef(iv_results)["D", "Estimate"], wald_estimate, ols_estimate),
  SE = c(coef(iv_results)["D", "Std. Error"], NA, summary(ols_model)$coefficients["D", "Std. Error"]),
  Lower = c(coef(iv_results)["D", "Estimate"] - 1.96 * coef(iv_results)["D", "Std. Error"],
            NA, ols_estimate - 1.96 * summary(ols_model)$coefficients["D", "Std. Error"]),
  Upper = c(coef(iv_results)["D", "Estimate"] + 1.96 * coef(iv_results)["D", "Std. Error"],
            NA, ols_estimate + 1.96 * summary(ols_model)$coefficients["D", "Std. Error"])
)

ggplot(estimates, aes(x = Method, y = Estimate, fill = Method)) +
  geom_bar(stat = "identity", alpha = 0.6) +
  geom_errorbar(aes(ymin = Lower, ymax = Upper), width = 0.2, na.rm = TRUE) +
  geom_hline(yintercept = true_late, linetype = "dashed", color = "red") +
  labs(title = "Estimated Effect of Hospital Quality on Recovery Time",
       y = "Effect (Days)", x = "Method") +
  scale_fill_manual(values = c("2SLS" = "#1f77b4", "Wald" = "#ff7f0e", "OLS" = "#2ca02c")) +
  theme_minimal() +
  annotate("text", x = 3.5, y = true_late + 0.5, label = "True LATE", color = "red")

5.4 Interpretation

The first-stage F-statistic assesses relevance; values above 10 indicate a strong instrument. In our simulation, ($_1 = 0.8$) ensures relevance. The 2SLS estimate ($_1$) should approximate the true LATE (-5 days), with confidence intervals reflecting precision. The Wald estimator, computed manually, should align closely with 2SLS but may be less precise without covariates. Naive OLS, ignoring unmeasured confounding, often yields biased estimates, typically attenuated toward zero. The plot visualizes these estimates, with 2SLS and Wald near -5 and OLS deviating, highlighting IV’s ability to correct for confounding.

Exclusion and independence assumptions rely on domain knowledge. Proximity is assumed to affect recovery only through hospital choice, not via other channels like local healthcare quality. Independence holds if proximity is uncorrelated with unmeasured confounders, plausible after adjusting for observables.

5.5 Limitations

IV estimation faces several challenges. Weak instruments (low first-stage F-statistic) lead to biased and imprecise estimates. Violation of the exclusion restriction, e.g., if proximity affects recovery through local resources, biases the LATE. Similarly, correlation between the instrument and unmeasured confounders violates independence. The LATE applies only to compliers, limiting generalizability to the broader population. Multiple instruments or endogenous variables require additional assumptions and tests, such as the Hansen J-test for overidentification. Heterogeneous treatment effects further complicate interpretation, as the LATE may not reflect effects for non-compliers.

5.6 Conclusion

Instrumental Variables is a powerful tool for causal inference in observational studies, particularly in healthcare where unmeasured confounding is common. By leveraging an exogenous instrument like geographic proximity, IV can estimate causal effects, such as the impact of hospital quality on recovery time, even when randomized trials are impractical. The R implementation demonstrates practical application, with diagnostics ensuring assumption validity. While IV requires careful instrument selection and assumption justification, its ability to address hidden bias makes it indispensable for robust causal inference. Future research can explore extensions, such as handling heterogeneous effects or integrating IV with machine learning for improved precision.

5.7 References

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press.
Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.
Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.