When you study the effects of a treatment, the challenge is to separate the true impact of the treatment from other influencing factors. These influencing factors are called confounders. In real-world studies, it is almost impossible to identify all confounders, especially in observational research where treatments are not randomly assigned. Without randomization, you cannot be fully sure that treatment groups are comparable. Even if you list known confounders, data might be incomplete, which limits the accuracy of your analysis.
Confounding and Its Impact
Confounding is a common issue in clinical studies. Imagine a scenario where the indication for a drug (the reason why the drug is prescribed) is itself a risk factor for the outcome. This situation, known as confounding by indication, cannot always be removed, even with advanced statistical modeling, because the underlying condition is tied directly to both treatment and outcome.
If you are studying the effect of a new heart drug on heart attacks, the very reason patients are prescribed the drug (heart disease) is also a risk factor for heart attacks. Unless you have a proper control group, this confounding cannot be fully addressed.
To reduce confounding, randomized controlled trials (RCTs) are preferred because random assignment balances both known and unknown factors between treatment groups. However, RCTs are not always possible due to cost, ethics, or time constraints, which is why observational studies often rely on advanced statistical techniques.
The Concept of Counterfactuals
To understand causality, researchers often use the concept of counterfactual outcomes. A counterfactual describes what would have happened to an individual if they had received a different treatment.
For an individual
, let:
= outcome if the individual receives treatment 
= outcome if the individual receives control 
The individual causal effect is defined as:
![]()
The problem is that we can only observe one of these outcomes for each person—either
or
, but not both. This is known as the fundamental problem of causal inference.
Average Treatment Effect
Since we cannot observe both outcomes for each person, we focus on the average treatment effect (ATE) for a group. For
individuals, the ATE is:
![]()
If we could measure both outcomes for every individual, we could compute this directly. In practice, we approximate it using statistical methods, often by comparing groups that are as similar as possible.
Stable Unit Treatment Value Assumption (SUTVA)
The counterfactual framework relies on assumptions, one of which is the Stable Unit Treatment Value Assumption (SUTVA). This assumption states that the outcome of one individual does not depend on the treatment status of others. For example, in a vaccine study, if the vaccination of one person reduces the infection risk for others (herd immunity), this assumption may be violated.
Exchangeability and Estimation
Another key assumption is exchangeability, which means that the treated and control groups are comparable, as if they were randomly assigned. In randomized trials, exchangeability holds because randomization balances confounders. When this assumption is valid, the average treatment effect can be estimated as the difference in means:
![]()
Here:
= average outcome of the treated group
= average outcome of the control group
Propensity Scores: Balancing Observational Data
In observational studies, treatment assignment is not random. Certain patients may have a higher chance of receiving treatment due to their characteristics. To adjust for this, researchers use the propensity score (PS), which is defined as:
![]()
where
is a set of observed characteristics (like age, disease stage, or prior treatments).
For Example: Suppose young, healthy patients are more likely to receive a new diabetes medication. If you compare treated and untreated patients directly, the results may be biased. By matching patients with similar propensity scores, you create a balanced dataset that mimics randomization.
Propensity Score Matching and Weighting
Propensity score matching pairs treated individuals with untreated individuals who have similar PS values. For example, if a treated patient has a PS of 0.7, we look for an untreated patient with a PS close to 0.7. This way, the two groups have similar characteristics.
Propensity score weighting gives different weights to observations so that the overall treated and control groups have similar distributions of characteristics. This is often used when matching is difficult due to sample size limitations.
Practical Example of Treatment Effect Estimation
Imagine a study comparing a new painkiller to a standard drug.
- The average pain duration in the treated group is
hours. - The average pain duration in the control group is
hours.
The estimated treatment effect is:
![]()
This means the new painkiller reduces pain by 1 hour on average.
Confounding That Cannot Be Measured
Even with propensity scores, there is always a risk of unmeasured confounding. These are factors you did not or could not include in your analysis. For example, patient lifestyle choices, genetic differences, or socio-economic status might influence both treatment and outcome but may not be recorded in the study. Sensitivity analyses are often performed to check how robust the findings are to such unmeasured confounders.
Key Formulas Recap
-
Individual treatment effect:

-
Average treatment effect:

-
Difference in means (if groups are exchangeable):

-
Propensity score:

If you are a student or a researcher, understanding these concepts will help you design better studies and interpret results with more confidence. Causal inference is not just about computing averages; it is about understanding what would happen if a treatment or policy were applied differently. By learning counterfactuals and propensity scores, you can go beyond surface-level correlations and make real causal claims.