Comparing Two Groups with Likelihood

Research updated on January 11, 2026
Author: Santhosh Ramaraj

We often compare an outcome between two groups – for instance, disease incidence in an exposed group vs. an unexposed group. The effect can be summarized by measures like the risk ratio, rate ratio, or odds ratio.  In previous article we saw log likelihood. Let’s focus on a rate ratio example to see how likelihood helps estimate these effects and their uncertainty.

Scenario

A cohort study in Guatemala investigates acute lower respiratory infections in young children. The researchers suspect that housing conditions affect infection rates. They categorize children under 5 into two groups:

  • Exposed group: Children living in poor housing conditions (e.g. overcrowded, poor ventilation).
  • Unexposed group: Children living in good housing conditions.

Over one year, they follow both groups and count how many lower respiratory infections occur.
The results:

  • Poor housing group: 33 infections observed over 355 child-years of follow-up.
  • Good housing group: 24 infections observed over 518 child-years of follow-up.

From these data, the incidence rate in each group can be calculated as cases divided by person-time:

l_1 = 33/355 \approx 0.0930 (infections per child-year for the exposed)
l_0 = 24/518 \approx 0.0463 (infections per child-year for the unexposed)

The rate ratio (RR) comparing poor to good housing is:

RR = l_1 / l_0 \approx 0.0930 / 0.0463 \approx 2.01

So children in poor housing had about double the rate of respiratory infection as those in good housing.

This 2.01 is our point estimate (MLE) for the rate ratio.

Now, we will confirm that using a likelihood approach and see how to get a confidence interval.

Likelihood for Rate Ratio

To approach this with likelihoods, we need a probability model for the data.

Counts of infections over time often follow a Poisson distribution (especially when the counts are relatively low and the events are independent).

For two groups, we can think in terms of a Poisson regression or simply two Poisson likelihoods.

We have two parameters of interest:

  • The baseline rate l_0 for the unexposed (good housing) group.
  • The rate ratio \theta. The rate in the exposed group will be l_1 = \theta \times l_0.

Using properties of the Poisson distribution, the likelihood of the observed data (33 events in group 1 over T_1=355 years, and 24 events in group 0 over T_0=518 years) can be written as:

L(l_0, \theta) = \frac{e^{-l_0 T_0}(l_0 T_0)^{24}}{24!} \times \frac{e^{-\theta l_0 T_1}(\theta l_0 T_1)^{33}}{33!}

The log-likelihood \mathcal{L} is:

\mathcal{L}(l_0, \theta) = 24 \ln(l_0 T_0) - l_0 T_0 + 33 \ln(\theta l_0 T_1) - \theta l_0 T_1 + \text{const}

We can separate parts involving l_0 and \theta:

\mathcal{L}(l_0, \theta) = (24 + 33)\ln(l_0) + 33 \ln(\theta) + 24\ln T_0 + 33 \ln T_1 - l_0 (T_0 + \theta T_1) + \text{const}

Maximizing with respect to l_0, the solution is:

\hat{l}_0 = \frac{24 + 33}{T_0 + \theta T_1}

Plugging \hat{l}_0 back gives the profile log-likelihood for \theta:

\mathcal{L}_{\text{profile}}(\theta) = 33 \ln\Big(\theta \frac{T_1}{T_0}\Big) - (24+33) \ln\Big(1 + \theta \frac{T_1}{T_0}\Big) + \text{const}

The maximum occurs at \theta = 2.01, confirming the MLE for the rate ratio.
So the MLEs match the intuitive estimates:

  • \hat{l}_0 = 24/518 \approx 0.0463
  • \hat{\theta} = 2.01

Log-Scale and Confidence Interval for the Rate Ratio

For ratios, it’s often convenient to switch to the log scale.

Let u = \ln(\theta), so \theta = e^u.

Near the peak, the log-likelihood in u is approximately quadratic.

The standard error for \ln(RR) can be obtained from:

\text{SE}[\ln(RR)] = \sqrt{\frac{1}{d_1} + \frac{1}{d_0}}

where d_1=33, d_0=24. So:

\text{SE}[\ln(RR)] = \sqrt{\frac{1}{33} + \frac{1}{24}} \approx \sqrt{0.0720} = 0.268

Then:

  • \ln(2.01) \approx 0.698
  • 95% CI for u: 0.698 \pm 1.96 \times 0.268
  • u_{\text{low}} \approx 0.173, \ u_{\text{high}} \approx 1.223
  • Back-transform: e^{0.173} \approx 1.19, e^{1.223} \approx 3.40

Thus, the 95% confidence interval for the rate ratio is approximately 1.19 to 3.39.

Since this interval does not include 1, it suggests a statistically significant difference.

Why Likelihood is Useful Here

We used the likelihood approach conceptually to derive these results. In practice, one might plug data into a Poisson regression software, but under the hood it’s doing the same thing – finding the l_0 and \theta that maximize the likelihood, and then using the curvature of the log-likelihood at the peak to get confidence intervals or p-values.

This example shows that:

  • The MLEs for multi-parameter models are often just the intuitive estimates (observed values).
  • Likelihood provides a unified way to get both the estimate and its uncertainty.
  • Using the log scale for ratios gives symmetrical confidence intervals on that log scale, which correspond to asymmetric intervals on the original scale.

In epidemiological papers, you’ll often see something like “Rate ratio = 2.01, 95% CI [1.19–3.39], p = 0.009”. All of that information can be traced back to a likelihood-based calculation (or an equivalent large-sample approximation).

So, likelihood methods seamlessly extend from one-sample problems (like estimating one probability) to comparing two groups (estimating ratios, differences, etc.), and further to many groups or more complex regression models. The core idea remains: find the parameter values that best explain the data, and then determine which other values are reasonably compatible with the data by seeing how much the likelihood drops off when you move away from the best fit.

Disclaimer: This article is for educational purposes only.