When dealing with likelihoods, especially for complex data, it is common to work with the logarithm of the likelihood. The log-likelihood (often denoted L) is simply:
![]()
Using the log has two main advantages:
- Simplification of Math: Multiplicative terms in the likelihood become additive in the log-likelihood. Products of probabilities turn into sums of log-probabilities, which are easier to handle (especially to find maxima by differentiation).
- Nice Shape: For large samples, the log-likelihood curve tends to be approximately a parabola (a quadratic shape) near its peak. This is a consequence of the Central Limit Theorem, which implies that many estimates have an approximately normal (bell-curve) distribution when sample sizes are high. A normal distribution’s log-likelihood is a perfect parabola (quadratic function).
Log-Likelihood Ratio and Quadratic Approximation
The log-likelihood ratio for a parameter value is the difference between the log-likelihood at that value and the log-likelihood at the MLE:
![]()
At the MLE,
because
. If we move away from the MLE, the log-likelihood will decrease (or at best stay the same).
For moderate shifts, the log-likelihood often falls in a shape that can be well-approximated by a downward-opening parabola. Formally, near the peak we can often write:
![]()
where
is a measure related to the standard error of the estimate. This equation is the equation of a concave quadratic (an inverted parabola) that touches the actual log-likelihood curve at the top and has the same curvature there. It’s symmetric around the MLE and equals 0 at
. The value
in this formula is known in statistics as the Fisher information – essentially it quantifies how sharp the peak of the likelihood is. A lot of information (large
) means a very peaked likelihood (small
, narrow confidence interval), while little information means a flatter likelihood (larger
, wide confidence interval).
Connection to the 95% Confidence Interval
If the log-likelihood is quadratic, finding a confidence interval is straightforward. For instance, a 95% confidence interval corresponds to the parameter values where the log-likelihood has fallen by a certain amount from its maximum. Specifically, for a 95% CI in one dimension, the log-likelihood drops by about 1.9208 units below the top. (This number comes from the fact that
; 1.96 is the 97.5th percentile of the standard normal distribution.)
In terms of likelihood ratio (not log), this drop of 1.92 in log-likelihood corresponds to a likelihood ratio of
(14.65%). This is why earlier we mentioned that about 0.146 is the LR cutoff for 95% CI.
But most people are familiar with the formula for a 95% CI:
![]()
It turns out this is equivalent to using the quadratic log-likelihood approximation. Here
is the MLE (the estimate) and SE is its standard error (which is
in the formula above).
and
will be the lower and upper 95% confidence limits. These are exactly the points where [(θ − θ̂)/S = ±1.96], and plugging that into the quadratic form
gives -1.9208 for the log-likelihood ratio – which matches the 95% criterion.
So, the usual confidence interval calculation is really a shortcut from the assumption of an approximately quadratic log-likelihood shape. It’s a way of saying: “We think the estimate’s sampling distribution is roughly normal with SE = S, so we’ll take plus/minus 1.96 SE as the range.” This will be accurate if the sample size is sufficiently large or the underlying distribution is normal.
Example: Estimating a Proportion
Let’s make this concrete with a simple example. Imagine we are estimating the prevalence of a certain infection in a population (say the prevalence of diabetes in a town). Suppose in a sample of 200 people, 50 have diabetes. The MLE for the prevalence (proportion with disease) is
(25%).
The standard error for a proportion
is roughly:
![]()
If we use the normal approximation, a 95% CI would be
, which is approximately 0.25 ± 0.06, or from 0.19 to 0.31 (19% to 31%).
Now, if we were to derive the confidence interval via the likelihood ratio method: the likelihood function is
![]()
The log-likelihood at the MLE
is the highest. We’d find the values of
where
is 1.92 units lower than at the peak. Solving that might be tedious by hand, but the answer should align closely with 0.19 and 0.31. In a larger sample, the alignment is typically excellent. In smaller samples, the normal approximation might falter, and one might prefer to directly use the likelihood ratio method or an exact calculation to get a confidence interval.
Why Log-Likelihood? A Deeper Insight
By using log-likelihood and its quadratic approximation, statisticians have a unifying approach to inference: both point estimates (MLEs) and interval estimates (CIs) flow from the shape of the likelihood. This approach works even when the sampling distribution of the estimate is not exactly normal, as long as we have enough data for the normal approximation to be reasonable. It also provides a way to compute CIs in complex models (like logistic regression or Cox proportional hazards models in survival analysis) where explicit formulas for SE might be complicated. One can always, in principle, compute the log-likelihood for different parameter values and find where it drops to the required threshold.
In summary, the log-likelihood method for confidence intervals is a powerful concept in epidemiology and clinical research. It reinforces the idea that our confidence in parameter values is related to how much the likelihood deteriorates when we move away from the best estimate. The more data we have (higher information), the sharper the log-likelihood peak, and the narrower the confidence interval. With sparse data, the peak is flat and the supported range is wide, reflecting greater uncertainty. By leveraging the approximately quadratic shape of the log-likelihood, we conveniently tap into the well-understood properties of the normal distribution to draw our inference.