Biostatistics, sometimes called biometry or biometrics, is the science of applying statistical methods to topics in biology and health. It plays a key role in medicine, public health, and epidemiology. This field covers the design of biological experiments, the collection and analysis of data, and the interpretation of results so they can be used to make informed decisions.
When you think about it, every health-related survey or disease study relies on biostatistics. From determining how many people in a community have a certain illness to figuring out what might be causing it, statistics is the bridge between raw data and clear conclusions.
A Real-World Example. Tracking Disease in a Community
Let us imagine a community in the United States where an epidemiologist is studying the spread of a specific disease. For our example, consider swine flu, also known as H1N1 influenza A. The goal is to understand how widely the disease has spread and what might be contributing to its spread.
Two main objectives guide this investigation. First, we want to know how much of the population is affected. Second, we want to explore possible causes. The first step is to measure prevalence, which is the fraction of people in a population who have the disease at a given time.
Step One. Estimating Prevalence
Suppose we take a random sample of people from the community. A random sample means each person has an equal chance of being chosen. In this example, we select N = 500 individuals and run an antibody test to see who has been infected in the past. If X = 4 people test positive, then the sample prevalence is
![]()
This means that about 0.8 percent of the sampled population has antibodies for the disease.
Step Two. Using Probability Models
To make sense of these numbers, biostatisticians often use probability models. A common starting point is the binomial model. This model assumes that each person in the sample has only two possible outcomes: either they have the disease or they do not. The word “binomial” comes from “bi” meaning two, and “nomial” referring to terms or outcomes.
Why use such a model? It helps us understand how likely it is to get different numbers of positive cases in our sample just by chance. This in turn helps us make predictions for the entire population.
Step Three. Making Predictions for the Larger Population
If our sample shows a prevalence of 0.8 percent, we can use the same probability model to estimate the expected number of cases in the whole community. Suppose the community has n = 1,000,000 people. The expected value in a binomial model is given by:
![]()
Substituting our values:
![]()
So, we would expect around 8,000 cases in the community, assuming the sample is representative and the conditions are similar throughout the population.
The Assumptions Behind the Numbers
Predictions like this depend on assumptions. First, we assume that the binomial model is a good fit for the data. Second, we assume that our single sample accurately reflects the larger population. If either assumption is wrong, our estimate may not be reliable.
In practice, public health researchers often take more than one sample. They may also use more advanced models that account for differences between subgroups, such as age, location, or exposure risk. This makes the predictions more accurate and the decisions based on them more effective.
The Inferential Biostatistics Approach
The process we just described is a classic example of inferential biostatistics. It usually follows three key steps:
- Choose a probability model for the population. In our example, the binomial distribution was used.
- Collect a sample from the population and calculate the sample statistics.
- Use the model to make predictions or draw conclusions about the entire population.
This approach allows you to move from a small amount of data to meaningful insights that apply more broadly.
Descriptive vs Inferential Biostatistics
Not all biostatistics is about making predictions. Descriptive biostatistics is focused on summarizing and describing the data you already have. This might involve calculating averages, percentages, and ranges, or creating tables and graphs to show patterns.
For example, if you simply reported that 4 out of 500 people tested positive, without trying to predict numbers for the larger population, that would be descriptive biostatistics. It is simpler and can still be very useful, especially for initial reporting or when detailed modeling is not necessary.
Why Software Matters in Biostatistics
Many of these calculations can be done by hand for small examples. However, real-world studies often involve large datasets, complex models, and repeated calculations. This is why software is essential.
The open-source statistical software R is one of the most widely used tools in the field. It is free, powerful, and supported by a large community of researchers. With R, you can run everything from basic descriptive statistics to advanced inferential models, and you can visualize the results clearly.
So, you start by understanding the question you want to answer. You choose an appropriate probability model. You collect and analyze data, and then you interpret the results in context. Each step builds on the previous one, and the better your data and models, the better your conclusions will be.