Chapter 1 Foundations of Causal Statistical Analysis

1.1 Introduction

Understanding causality is one of the most enduring and fundamental challenges in science. Across disciplines—from public health and economics to education, neuroscience, and artificial intelligence—researchers are increasingly tasked not only with identifying patterns in data but with uncovering the mechanisms that generate them. While traditional statistical analysis excels at quantifying associations, scientific inquiry often aims at a deeper ambition: to infer causal relationships—to determine what would happen under specific interventions, policies, or changes to a system. The distinction between correlation and causation is more than a methodological nuance; it defines the boundary between description and explanation, and between prediction and control.

This essay serves as the first in a multi-part series on the foundations of causal statistical analysis. It provides a panoramic overview of the most widely used techniques for estimating causal effects, each grounded in distinct theoretical frameworks and operational assumptions. These methods span randomized controlled trials (RCTs), which serve as the epistemic gold standard, to a wide range of quasi-experimental and model-based approaches developed to address the limitations of real-world data. In practice, researchers must often navigate data landscapes in which randomization is infeasible, treatment selection is endogenous, and temporal or structural confounding is ubiquitous. This is where modern causal inference techniques offer essential tools—not only for estimating effects, but for interrogating the validity of those estimates.

At the heart of this endeavor lies a tension between identifiability and assumptions. Every causal method rests on a set of assumptions—about how the data were generated, how variables relate, and what sources of bias are controlled or ignored. While some methods emphasize robustness through design (e.g., difference-in-differences, regression discontinuity, or instrumental variables), others attempt to model the data-generating process explicitly, drawing from structural modeling, counterfactual reasoning, or machine learning. Each method is powerful under the right conditions and misleading when applied uncritically. This underscores a central theme of the series: there is no universally “best” method for causal inference. Rather, the suitability of each technique depends on the scientific question, data structure, and the plausibility of underlying assumptions.

To guide practitioners in this complex terrain, we begin with a comparative summary table outlining the assumptions, strengths, limitations, and implementation tools for each technique. This table is not merely a catalog—it is a scaffold for deeper engagement. Subsequent posts in the series will explore each method in detail, presenting both theoretical foundations and practical workflows using open-source statistical packages in R and Python. These installments will include visualizations, diagnostics, sensitivity analyses, and real-world case studies drawn from public health, education, and policy evaluation.

Causal analysis is both an art and a science: it demands careful reasoning, domain knowledge, and transparent methodology. As the demand for evidence-based decision-making grows—particularly in the age of big data and algorithmic governance—causal inference provides a principled framework for moving from data to action. This series is designed to empower readers to approach causal questions rigorously, critically, and creatively.

1.2 Summary Table: Techniques for Causal Statistical Analysis

Technique Key Assumptions Use Cases Strengths Limitations Tools/Packages
Randomized Controlled Trials (RCTs) Random assignment ensures exchangeability Clinical trials, A/B testing Eliminates confounding Often infeasible or unethical randomizr (R), DoWhy (Py)
Regression Adjustment No unmeasured confounders, correct model Policy, health outcomes Simple, widely used Sensitive to omitted variables, model misspecification lm(), glm() (R), statsmodels, sklearn (Py)
Propensity Score Matching (PSM) Conditional independence given observed covariates Observational studies Balances covariates, intuitive Sensitive to unmeasured confounding, poor overlap MatchIt, twang (R), DoWhy, causalml (Py)
Inverse Probability Weighting (IPW) Correct treatment model, positivity Longitudinal data Handles time-varying confounding Can produce unstable weights ipw, survey (R), zEpid (Py)
Difference-in-Differences (DiD) Parallel trends Policy reforms, natural experiments Controls for unobserved time-invariant confounders Vulnerable if trends diverge fixest, did (R), linearmodels (Py)
Instrumental Variables (IV) Relevance, exclusion restriction Endogeneity correction Addresses unmeasured confounding Finding valid instruments is hard ivreg, AER (R), linearmodels.iv (Py)
Regression Discontinuity (RDD) Sharp cutoff, local randomization Education, policy thresholds Transparent identification Limited to local effect near cutoff rdrobust, rddtools (R), rdd, statsmodels (Py)
Causal Forests Unconfoundedness, heterogeneity Precision medicine, targeting Captures treatment heterogeneity Requires large data, unmeasured confounding risk grf, causalTree (R), econml, causalml (Py)
Bayesian Structural Time Series (BSTS) No unmeasured confounders post-intervention Time series interventions Handles complex time trends Sensitive to model/priors CausalImpact, bsts (R), tfcausalimpact (Py)
Targeted Maximum Likelihood Estimation (TMLE) Double robustness Epidemiology, observational data Robust, ML integration Computationally intensive tmle, ltmle (R), zepid (Py)
G-Computation No unmeasured confounding, correct model Mediation, marginal effects Flexible, counterfactuals Model dependence gfoRmula, ltmle (R), zepid (Py)
Structural Equation Modeling (SEM) Correct structure, no unmeasured confounding Latent variables, mediation Models complex relationships Requires strong assumptions lava RESOLVED: lavaan (R), semopy, pysem (Py)
Directed Acyclic Graphs (DAGs) Causal sufficiency, accurate knowledge Study design, confounder control Clarifies assumptions Not an estimation method dagitty, ggdag (R), causalgraphicalmodels (Py)
Double Machine Learning (DML) Frameworks Conditional ignorability, consistent nuisance estimation High-dimensional observational studies Robust to model misspecification, handles high-dimensional confounders Requires large data, assumes no unmeasured confounding DoubleML (R), econml, causalml (Py)

1.3 Methodological Deep Dive with Practical Guidance

  • Randomized Controlled Trials (RCTs) RCTs are the gold standard for causal inference. Random assignment neutralizes confounding, ensuring internal validity. However, practical, ethical, or financial constraints often limit their feasibility. When viable, they deliver the most credible causal estimates.

  • Regression Adjustment This method models the outcome as a function of treatment and covariates. While easy to implement, it assumes no unmeasured confounding and correct model specification. It’s essential to examine covariate balance and conduct robustness checks.

  • Propensity Score Matching (PSM) PSM aims to mimic randomization by matching units with similar probabilities of treatment. It balances covariates well but fails under unmeasured confounding. Diagnostic tools like balance plots are crucial.

  • Inverse Probability Weighting (IPW) IPW reweights samples to simulate random assignment. It handles time-varying confounding but can produce unstable weights, requiring trimming or stabilization. It’s powerful for longitudinal and panel data.

  • Difference-in-Differences (DiD) DiD compares treated and control units over time, assuming parallel trends. It is popular for evaluating policy interventions but sensitive to trend violations. Visualizing pre-treatment trends and using placebo tests enhance credibility.

  • Instrumental Variables (IV) IV methods handle endogeneity by using external variables that affect treatment but not the outcome directly. The approach hinges on the strength and validity of instruments—criteria that are difficult to verify.

  • Regression Discontinuity Design (RDD) RDD exploits sharp cutoffs for treatment assignment. It provides quasi-experimental validity but estimates only local effects near the threshold. Validity depends on smoothness and non-manipulation at the cutoff.

  • Causal Forests Causal forests extend random forests to estimate heterogeneous treatment effects. They are ideal for personalized interventions but require large datasets and are vulnerable to omitted confounding.

  • Bayesian Structural Time Series (BSTS) BSTS combines state-space models with Bayesian inference to estimate intervention effects in time series. It accommodates trend and seasonality but is sensitive to model misspecification and prior choices.

  • Targeted Maximum Likelihood Estimation (TMLE) TMLE integrates machine learning into causal effect estimation. It provides double robustness and efficient inference under complex data settings but can be computationally demanding.

  • G-Computation G-computation models potential outcomes under each treatment. It is flexible and counterfactual-based but requires accurate modeling and complete covariate adjustment.

  • Structural Equation Modeling (SEM) SEM enables the modeling of complex causal structures, including latent constructs and mediation. Its interpretability is appealing but hinges on correct model specification and the absence of unmeasured confounding.

  • Directed Acyclic Graphs (DAGs) DAGs are essential for clarifying causal assumptions. While not an estimation method, they guide design and analysis by identifying confounders, mediators, and colliders.

1.4 Discussion

The comparative framework presented in this foundational overview highlights both the diversity and the interdependence of causal inference techniques. A central takeaway is that no single method guarantees valid causal inference in all contexts. Rather, the validity of any technique depends critically on whether its assumptions align with the structure of the data and the theoretical understanding of the system under study. This observation has two key implications for applied researchers.

First, triangulation—the use of multiple methods to approach the same causal question—is not only desirable but often necessary. For instance, one might use propensity score matching to achieve covariate balance, regression adjustment to model outcome differences, and then compare results with those from a targeted maximum likelihood estimation (TMLE) approach. If conclusions converge, confidence in causal interpretation increases. If not, divergences can reveal sensitivity to assumptions such as model specification or unmeasured confounding. Thus, causal inference is inherently iterative, requiring both methodological flexibility and diagnostic rigor.

Second, methodological literacy is not enough; researchers must also cultivate causal reasoning. Directed Acyclic Graphs (DAGs), while not themselves estimators, play a vital role in clarifying which variables to control for and which paths to block or preserve. DAG-based thinking helps researchers navigate common pitfalls such as controlling for colliders or mediators, both of which can induce bias. The thoughtful use of DAGs, therefore, bridges qualitative theoretical insight with quantitative estimation.

Another tension arises between interpretability and complexity. Classical techniques like regression or instrumental variables are often preferred for their clarity and theoretical grounding, while modern approaches such as causal forests and TMLE offer increased flexibility and robustness in high-dimensional or non-linear settings. However, these gains often come at the cost of interpretability, especially for stakeholders or policy-makers who require transparent causal narratives. This raises an important trade-off: when should we prioritize explainability over precision, and how do we communicate these decisions to interdisciplinary audiences?

In addition, the growing use of machine learning in causal inference—exemplified by methods like causal forests and TMLE—requires new standards for validation and transparency. Unlike predictive modeling, causal questions are inherently counterfactual and cannot be validated through conventional cross-validation. Techniques such as falsification tests, placebo analyses, and sensitivity analyses become indispensable, particularly when machine learning models are involved.

Finally, equity and ethics must be central to causal analysis, especially in domains like public health, criminal justice, and education. Methods that adjust for observed variables can inadvertently perpetuate structural inequalities if those variables are themselves proxies for systemic bias. Researchers must therefore engage critically with both the data and the social contexts from which they arise, treating causal models not just as statistical tools but as ethical instruments.

The subsequent posts in this series will explore each technique in depth, including code implementation, diagnostic strategies, and real-world case studies. By weaving together statistical rigor, domain expertise, and ethical reflexivity, we aim to equip researchers with a robust and responsible causal toolkit.

1.5 References

  1. Hernán & Robins (2020). Causal Inference: What If.
  2. Pearl, Glymour, & Jewell (2016). Causal Inference in Statistics: A Primer.
  3. VanderWeele (2015). Explanation in Causal Inference.
  4. Causal AI Blog by Judea Pearl: https://causality.cs.ucla.edu/blog/
  5. Netflix Tech Blog on Causal Inference: https://netflixtechblog.com/computational-causal-inference-at-netflix-293591691c62
  6. Number Analytics Education Series: https://www.numberanalytics.com/blog/