How to Calculate Sample Size and Improve Statistical Power in Epidemiologic Research

Research updated on August 10, 2025
Author: Santhosh Ramaraj

When working with small datasets, one of the biggest limitations you will face is low statistical power. Power is the probability of detecting a true effect when it exists. In simple terms, low power means your study might miss important differences because the sample is too small.

Fortunately, there are several ways to increase power in hypothesis testing.

Ways to Increase Statistical Power

Increase the sample size while keeping other factors constant.
Increase the significance level $\alpha$ which widens the acceptance region for the alternative hypothesis.
Increase the standard deviation in certain study designs, although this is not always practical.
Use more extreme population parameters that produce larger effect sizes.

In practice, the most common and effective method is increasing the sample size, because this directly improves precision.

Example 1: Calculating Sample Size for Estimating a Mean

Suppose you want to estimate the mean $\mu$ of a population with a certain margin of error and a specified confidence level. The required sample size $n$ can be calculated as:

$n = \left( \frac{z_{\alpha/2} , s}{E} \right)^2$

Where:

$z_{\alpha/2}$ = critical z score for the desired confidence level.
$s$ = population standard deviation.
$E$ = desired margin of error.

This formula comes from a chain of reasoning grounded in the Central Limit Theorem.

Step 1: Understanding the Central Limit Theorem (CLT)

The Central Limit Theorem explains why averages of many independent random variables tend to follow a normal distribution, regardless of the shape of the original distribution.

For example, if you roll a biased die a large number of times, the distribution of the average roll will approach a bell shaped curve. This is why so many real world measurements, like blood pressure or cholesterol levels, can be treated as normally distributed when the sample is large enough.

Two common forms are:

Classical CLT: If $X_1, X_2, ..., X_n$ are independent and identically distributed with mean $\mu$ and variance $\sigma^2$ , then the sample mean $S_n$ approaches a normal distribution with mean $\mu$ and variance $\sigma^2 / n$ as $n$ increases.
Lyapunov CLT: A more general form that applies even when the variables are not identically distributed, as long as certain variance and moment conditions are satisfied.

Step 2: Estimating a Proportion with a Large Sample

When your goal is to estimate a proportion, such as the proportion of smokers in a community, you need to know the variability of the sample proportion $p$ .

If the population size $N$ is much larger than the sample size $n$ (at least ten times larger), the standard deviation of the sample proportion is:

$s_p = \sqrt{\frac{P (1 - P)}{n}}$

Where $P$ is the population proportion.

If $P$ is unknown, you can use the standard error:

$SE_p = \sqrt{\frac{p (1 - p)}{n}} = \sqrt{\frac{pq}{n}}$

Here $q = 1 - p$ .

Step 3: Understanding the Critical Value

The critical value $z_{\alpha/2}$ is the z score that marks the boundary between likely and unlikely sample results under the null hypothesis.

For a 95 percent confidence level, $z_{\alpha/2} \approx 1.96$ .

The critical value tells you how far from the mean your estimate must be to be considered statistically significant at a given alpha level.

Step 4: Understanding the Margin of Error

The margin of error $E$ is the maximum expected difference between the sample proportion (or mean) and the true population value, given a specified confidence level.

For a proportion:

$E = z_{\alpha/2} \sqrt{\frac{pq}{n}}$

This formula assumes the binomial distribution can be approximated by the normal distribution when both $np \geq 5$ and $nq \geq 5$ .

Step 5: Rearranging to Find the Sample Size

Starting with:

$E = \frac{z_{\alpha/2} , s}{\sqrt{n}}$

We rearrange to get:

$n = \left( \frac{z_{\alpha/2} , s}{E} \right)^2$

This shows that:

Larger standard deviation increases required sample size.
Smaller margin of error dramatically increases sample size (cutting $E$ in half increases $n$ fourfold).

Practical Considerations When Standard Deviation is Unknown

Often, the population standard deviation $s$ is not known. You can:

Estimate from previous studies if similar measurements exist.
Use the range rule of thumb: $s \approx \text{range} / 4$ .
Conduct a pilot study with at least 30 participants to estimate $s$ .

Always round the sample size up to the next whole number. Having a sample slightly too large is far better than one that is too small.

Example: Hospital Stay Lengths

Suppose you want to estimate the average hospital stay with a 95 percent confidence level, a margin of error of 2 days, and a known standard deviation of 6 days.

$z_{\alpha/2} = 1.96$
$s = 6$
$E = 2$

Then:

$n = \left( \frac{1.96 \times 6}{2} \right)^2 = (5.88)^2 \approx 34.6$

Rounding up, you would need 35 patients.

How Sample Size Links to Power

Sample size, margin of error, and standard deviation directly influence the power of your test. For most epidemiologic studies, you aim for power of at least 80 percent:

$\text{Power} = 1 - \beta$

Where $\beta$ is the probability of a Type II error (failing to reject a false null hypothesis).

If your sample size is too small, even a real effect may go undetected, leading to false reassurance.

Key Takeaways for Study Design

Increasing sample size improves power and narrows confidence intervals.
Margin of error and standard deviation both influence sample size requirements.
Use the Central Limit Theorem to justify normal approximations in large samples.
Always plan for a slightly larger sample than the minimum calculated to account for dropouts or unusable data.

Relevant Reads