Chapter 1 Foundations of Causal Statistical Analysis

1.1 Introduction

Understanding causality is one of the most enduring and fundamental challenges in science. Across disciplines—from public health and economics to education, neuroscience, and artificial intelligence—researchers are increasingly tasked not only with identifying patterns in data but with uncovering the mechanisms that generate them. While traditional statistical analysis excels at quantifying associations, scientific inquiry often aims at a deeper ambition: to infer causal relationships—to determine what would happen under specific interventions, policies, or changes to a system. The distinction between correlation and causation is more than a methodological nuance; it defines the boundary between description and explanation, and between prediction and control.

This essay serves as the first in a multi-part series on the foundations of causal statistical analysis. It provides a panoramic overview of the most widely used techniques for estimating causal effects, each grounded in distinct theoretical frameworks and operational assumptions. These methods span randomized controlled trials (RCTs), which serve as the epistemic gold standard, to a wide range of quasi-experimental and model-based approaches developed to address the limitations of real-world data. In practice, researchers must often navigate data landscapes in which randomization is infeasible, treatment selection is endogenous, and temporal or structural confounding is ubiquitous. This is where modern causal inference techniques offer essential tools—not only for estimating effects, but for interrogating the validity of those estimates.

At the heart of this endeavor lies a tension between identifiability and assumptions. Every causal method rests on a set of assumptions—about how the data were generated, how variables relate, and what sources of bias are controlled or ignored. While some methods emphasize robustness through design (e.g., difference-in-differences, regression discontinuity, or instrumental variables), others attempt to model the data-generating process explicitly, drawing from structural modeling, counterfactual reasoning, or machine learning. Each method is powerful under the right conditions and misleading when applied uncritically. This underscores a central theme of the series: there is no universally “best” method for causal inference. Rather, the suitability of each technique depends on the scientific question, data structure, and the plausibility of underlying assumptions.

To guide practitioners in this complex terrain, we begin with a comparative summary table outlining the assumptions, strengths, limitations, and implementation tools for each technique. This table is not merely a catalog—it is a scaffold for deeper engagement. Subsequent posts in the series will explore each method in detail, presenting both theoretical foundations and practical workflows using open-source statistical packages in R and Python. These installments will include visualizations, diagnostics, sensitivity analyses, and real-world case studies drawn from public health, education, and policy evaluation.

Causal analysis is both an art and a science: it demands careful reasoning, domain knowledge, and transparent methodology. As the demand for evidence-based decision-making grows—particularly in the age of big data and algorithmic governance—causal inference provides a principled framework for moving from data to action. This series is designed to empower readers to approach causal questions rigorously, critically, and creatively.

1.2 Summary Table: Techniques for Causal Statistical Analysis

Technique	Key Assumptions	Use Cases	Strengths	Limitations	Tools/Packages
Randomized Controlled Trials (RCTs)	Random assignment ensures exchangeability	Clinical trials, A/B testing	Eliminates confounding	Often infeasible or unethical	`randomizr` (R), `DoWhy` (Py)
Regression Adjustment	No unmeasured confounders, correct model	Policy, health outcomes	Simple, widely used	Sensitive to omitted variables, model misspecification	`lm()`, `glm()` (R), `statsmodels`, `sklearn` (Py)
Propensity Score Matching (PSM)	Conditional independence given observed covariates	Observational studies	Balances covariates, intuitive	Sensitive to unmeasured confounding, poor overlap	`MatchIt`, `twang` (R), `DoWhy`, `causalml` (Py)
Inverse Probability Weighting (IPW)	Correct treatment model, positivity	Longitudinal data	Handles time-varying confounding	Can produce unstable weights	`ipw`, `survey` (R), `zEpid` (Py)
Difference-in-Differences (DiD)	Parallel trends	Policy reforms, natural experiments	Controls for unobserved time-invariant confounders	Vulnerable if trends diverge	`fixest`, `did` (R), `linearmodels` (Py)
Instrumental Variables (IV)	Relevance, exclusion restriction	Endogeneity correction	Addresses unmeasured confounding	Finding valid instruments is hard	`ivreg`, `AER` (R), `linearmodels.iv` (Py)
Regression Discontinuity (RDD)	Sharp cutoff, local randomization	Education, policy thresholds	Transparent identification	Limited to local effect near cutoff	`rdrobust`, `rddtools` (R), `rdd`, `statsmodels` (Py)
Causal Forests	Unconfoundedness, heterogeneity	Precision medicine, targeting	Captures treatment heterogeneity	Requires large data, unmeasured confounding risk	`grf`, `causalTree` (R), `econml`, `causalml` (Py)
S-Learner, T-Learner, X-Learner	Ignorability, overlap, consistent outcome models	Estimating heterogeneous treatment effects (CATEs)	Adapt supervised ML for causal inference, flexible	S-learner may blur heterogeneity; T-learner inefficient under imbalance; X-learner more complex	`metalearners` (R), `econml`, `causalml` (Py)
Bayesian Structural Time Series (BSTS)	No unmeasured confounders post-intervention	Time series interventions	Handles complex time trends	Sensitive to model/priors	`CausalImpact`, `bsts` (R), `tfcausalimpact` (Py)
Targeted Maximum Likelihood Estimation (TMLE)	Double robustness	Epidemiology, observational data	Robust, ML integration	Computationally intensive	`tmle`, `ltmle` (R), `zepid` (Py)
G-Computation	No unmeasured confounding, correct model	Mediation, marginal effects	Flexible, counterfactuals	Model dependence	`gfoRmula`, `ltmle` (R), `zepid` (Py)
Structural Equation Modeling (SEM)	Correct structure, no unmeasured confounding	Latent variables, mediation	Models complex relationships	Requires strong assumptions	`lavaan` (R), `semopy`, `pysem` (Py)
Directed Acyclic Graphs (DAGs)	Causal sufficiency, accurate knowledge	Study design, confounder control	Clarifies assumptions	Not an estimation method	`dagitty`, `ggdag` (R), `causalgraphicalmodels` (Py)
Double Machine Learning (DML) Frameworks	Conditional ignorability, consistent nuisance estimation	High-dimensional observational studies	Robust to model misspecification, handles high-dimensional confounders	Requires large data, assumes no unmeasured confounding	`DoubleML` (R), `econml`, `causalml` (Py)

1.3 Methodological Deep Dive with Practical Guidance

Randomized Controlled Trials (RCTs) RCTs are the gold standard for causal inference. Random assignment neutralizes confounding, ensuring internal validity. However, practical, ethical, or financial constraints often limit their feasibility. When viable, they deliver the most credible causal estimates.
Regression Adjustment This method models the outcome as a function of treatment and covariates. While easy to implement, it assumes no unmeasured confounding and correct model specification. It’s essential to examine covariate balance and conduct robustness checks.
Propensity Score Matching (PSM) PSM aims to mimic randomization by matching units with similar probabilities of treatment. It balances covariates well but fails under unmeasured confounding. Diagnostic tools like balance plots are crucial.
Inverse Probability Weighting (IPW) IPW reweights samples to simulate random assignment. It handles time-varying confounding but can produce unstable weights, requiring trimming or stabilization. It’s powerful for longitudinal and panel data.
Difference-in-Differences (DiD) DiD compares treated and control units over time, assuming parallel trends. It is popular for evaluating policy interventions but sensitive to trend violations. Visualizing pre-treatment trends and using placebo tests enhance credibility.
Instrumental Variables (IV) IV methods handle endogeneity by using external variables that affect treatment but not the outcome directly. The approach hinges on the strength and validity of instruments—criteria that are difficult to verify.
Regression Discontinuity Design (RDD) RDD exploits sharp cutoffs for treatment assignment. It provides quasi-experimental validity but estimates only local effects near the threshold. Validity depends on smoothness and non-manipulation at the cutoff.
Causal Forests Causal forests extend random forests to estimate heterogeneous treatment effects. They are ideal for personalized interventions but require large datasets and are vulnerable to omitted confounding.
S-Learner, T-Learner, and X-Learner These meta-learners are machine learning strategies for estimating conditional average treatment effects (CATEs). The S-learner fits a single model that includes treatment as a feature, estimating potential outcomes by switching treatment values. The T-learner trains separate models for treated and control groups, then takes their difference. The X-learner combines both by imputing counterfactuals with T-learner models and then modeling pseudo-effects, often weighting by propensity scores. S-learners are simple but can blur heterogeneity, T-learners separate effects but suffer in imbalanced data, while X-learners improve efficiency by leveraging information across groups. These approaches illustrate how supervised learning can be adapted for causal estimation, particularly in high-dimensional or non-linear settings.
Bayesian Structural Time Series (BSTS) BSTS combines state-space models with Bayesian inference to estimate intervention effects in time series. It accommodates trend and seasonality but is sensitive to model misspecification and prior choices.
Targeted Maximum Likelihood Estimation (TMLE) TMLE integrates machine learning into causal effect estimation. It provides double robustness and efficient inference under complex data settings but can be computationally demanding.
G-Computation G-computation models potential outcomes under each treatment. It is flexible and counterfactual-based but requires accurate modeling and complete covariate adjustment.
Structural Equation Modeling (SEM) SEM enables the modeling of complex causal structures, including latent constructs and mediation. Its interpretability is appealing but hinges on correct model specification and the absence of unmeasured confounding.
Directed Acyclic Graphs (DAGs) DAGs are essential for clarifying causal assumptions. While not an estimation method, they guide design and analysis by identifying confounders, mediators, and colliders.

1.4 Discussion

The comparative framework presented in this foundational overview highlights both the diversity and the interdependence of causal inference techniques. A central takeaway is that no single method guarantees valid causal inference in all contexts. Rather, the validity of any technique depends critically on whether its assumptions align with the structure of the data and the theoretical understanding of the system under study. This observation has two key implications for applied researchers.

First, triangulation—the use of multiple methods to approach the same causal question—is not only desirable but often necessary. For instance, one might use propensity score matching to achieve covariate balance, regression adjustment to model outcome differences, and then compare results with those from a targeted maximum likelihood estimation (TMLE) approach. If conclusions converge, confidence in causal interpretation increases. If not, divergences can reveal sensitivity to assumptions such as model specification or unmeasured confounding. Thus, causal inference is inherently iterative, requiring both methodological flexibility and diagnostic rigor.

Second, methodological literacy is not enough; researchers must also cultivate causal reasoning. Directed Acyclic Graphs (DAGs), while not themselves estimators, play a vital role in clarifying which variables to control for and which paths to block or preserve. DAG-based thinking helps researchers navigate common pitfalls such as controlling for colliders or mediators, both of which can induce bias. The thoughtful use of DAGs, therefore, bridges qualitative theoretical insight with quantitative estimation.

Another tension arises between interpretability and complexity. Classical techniques like regression or instrumental variables are often preferred for their clarity and theoretical grounding, while modern approaches such as causal forests and TMLE offer increased flexibility and robustness in high-dimensional or non-linear settings. However, these gains often come at the cost of interpretability, especially for stakeholders or policy-makers who require transparent causal narratives. This raises an important trade-off: when should we prioritize explainability over precision, and how do we communicate these decisions to interdisciplinary audiences?

In addition, the growing use of machine learning in causal inference—exemplified by methods like causal forests and TMLE—requires new standards for validation and transparency. Unlike predictive modeling, causal questions are inherently counterfactual and cannot be validated through conventional cross-validation. Techniques such as falsification tests, placebo analyses, and sensitivity analyses become indispensable, particularly when machine learning models are involved.

Finally, equity and ethics must be central to causal analysis, especially in domains like public health, criminal justice, and education. Methods that adjust for observed variables can inadvertently perpetuate structural inequalities if those variables are themselves proxies for systemic bias. Researchers must therefore engage critically with both the data and the social contexts from which they arise, treating causal models not just as statistical tools but as ethical instruments.

The subsequent posts in this series will explore each technique in depth, including code implementation, diagnostic strategies, and real-world case studies. By weaving together statistical rigor, domain expertise, and ethical reflexivity, we aim to equip researchers with a robust and responsible causal toolkit.

1.5 References

Hernán & Robins (2020). Causal Inference: What If.
Pearl, Glymour, & Jewell (2016). Causal Inference in Statistics: A Primer.
VanderWeele (2015). Explanation in Causal Inference.
Causal AI Blog by Judea Pearl: https://causality.cs.ucla.edu/blog/
Netflix Tech Blog on Causal Inference: https://netflixtechblog.com/computational-causal-inference-at-netflix-293591691c62
Number Analytics Education Series: https://www.numberanalytics.com/blog/

Causal Inference in R