Assessing Compatibility of an Epidemiologic Estimate with a Reference Value

Research updated on August 10, 2025
Author: Santhosh Ramaraj

When evaluating an epidemiologic measure, a key question is, how compatible is the observed estimate with an expected or reference value? This question often guides decision making in public health studies and clinical audits.

Imagine you are monitoring the rate of hospital complications. If the actual proportion of patients experiencing a severe relapse is higher than the target, you want to know whether this is just random fluctuation or evidence of a real problem.

Defining the Problem in Statistical Terms

Let us define some variables to make the concept precise:

$r$ = observed proportion in your sample.
$r_0$ = reference or target proportion.

For example, suppose the target is no more than 2 percent of patients having severe relapses, but your data shows 5 percent. The statistical question is:

Is the observed 5 percent compatible with the reference 2 percent?

Setting the Null Hypothesis

To answer this, you start by stating the null hypothesis:

$H_0: r = r_0$

In the example:

$H_0: r = 0.02$

The alternative hypothesis is that the true proportion is greater than 2 percent. Because we are only concerned about increases in complications, a one sided p value is the appropriate choice here.

Understanding the p Value in Context

The p value tells us the probability of observing a result as extreme or more extreme than our sample estimate, assuming the null hypothesis is true.

Formally, for this case:

$p = P(r \geq \text{observed } r ,|, r_0)$

If this probability is high, it suggests that the observed value is compatible with the reference and could easily occur by chance. If it is low, the result is less compatible with the reference value and is unlikely to be due to chance alone.

Why High p Values Can Be Misleading

A high p value does not automatically mean that the null hypothesis is true. It could also mean that the sample size is too small to detect a meaningful difference.

This is why compatibility should always be interpreted alongside the study’s statistical power, which we will discuss later.

Example: Relapse Rates in Hospital Admissions

Consider this real world style example:

Observed data: 5 out of 100 admissions had serious relapses.
Observed proportion: $r = \frac{5}{100} = 0.05$ or 5 percent.
Target value: $r_0 = 0.02$ or 2 percent.

We want to know: Is 5 percent compatible with 2 percent?

Step 1: Setting the Statistical Model

The number of relapses in 100 admissions can be modeled using the binomial distribution, because each admission is an independent trial with two outcomes: relapse or no relapse.

If the random variable $K$ is the number of relapses, and each admission has a probability $p$ of relapse, then:

$K \sim B(n, p)$

where:

$n$ = number of trials (here 100)
$p$ = probability of success (here relapse)

Step 2: Probability Mass Function

The binomial probability of exactly $k$ relapses is:

$P(K = k) = \binom{n}{k} p^k (1 - p)^{n - k}$

Here, the combination term is:

$\binom{n}{k} = \frac{n!}{k!(n-k)!}$

Step 3: Calculating Directly

If we wanted the probability of exactly 2 relapses given $p = 0.05$ , it would be:

$P(K = 2) = \binom{100}{2} (0.05)^2 (0.95)^{98}$

Substituting:

$\binom{100}{2} = \frac{100 \times 99}{2 \times 1} = 4,950$
$(0.05)^2 = 0.0025$
$(0.95)^{98} \approx 0.0066$

Multiplying:

$P(K = 2) = 4,950 \times 0.0025 \times 0.0066 \approx 0.0812$

Step 4: Cumulative Probability

What we need for compatibility assessment is the probability of 5 or more relapses when the true rate is 2 percent. This is the cumulative probability:

$P(K \geq 5 ,|, p = 0.02)$

This can be calculated:

Directly using the binomial formula for k = 5, 6, …, 100 and summing the results.
Using a binomial distribution calculator.
Using statistical software such as R.

Step 5: Using R for Exact Calculation

In R, you can run:

The output:

p value ≈ 0.05083
Conclusion: At a significance level $\alpha = 0.05$ , we do not reject the null hypothesis.

This means the observed 5 percent could still be considered compatible with the reference 2 percent.

Effect of Sample Size on p Values

One important insight is that p values shrink with larger sample sizes even if the difference between observed and reference proportions remains the same.

For example, if you had 500 admissions with the same 5 percent relapse rate, the p value would likely be well below 0.05, leading you to reject the null hypothesis.

This is why statistical significance does not always mean clinical or public health significance.

Introducing Statistical Power

The ability to detect a real difference when it exists is called statistical power. Power is defined as:

$\text{Power} = 1 - \beta$

where $\beta$ is the probability of making a Type II error — failing to reject a false null hypothesis.

In practice, studies are often designed to have at least 80 percent power. This means there is an 80 percent chance of detecting a meaningful difference if it exists.

Why Power Matters in Compatibility Assessment

If you have a small sample size, a high p value might simply mean you lacked the power to detect the difference. That is why compatibility assessments should be combined with power calculations during study planning.

Summary Table

Concept	Meaning	Formula or Key Idea	Example in Context
Point estimate	Best guess of true value	$r = \frac{k}{n}$	5% relapse rate in 100 patients
Null hypothesis	No difference from reference	$H_0: r = r_0$	r = 0.02
p value	Probability of observed or more extreme result under H0	$p = P(r \geq \text{observed} ,</td> <td data-col-size="sm" data-start="7027" data-end="7045">, r_0)$
Type I error	Rejecting true H0	$\alpha$	α = 0.05
Type II error	Failing to reject false H0	$\beta$	β = 0.20
Power	Detecting a real difference	$1 - \beta$	0.80

Key Takeaways

Always compare observed rates to meaningful reference values.
Use one sided p values when your concern is only for an increase or only for a decrease.
Remember that high p values can reflect low sample size rather than true compatibility.
Plan studies with adequate power, ideally 80 percent or more

Relevant Reads