Classical Test Theory (CTT) is a foundational method used to evaluate how well a questionnaire or scale performs. It focuses on two main ideas, reliability and validity. In the context of patient-reported outcome measures, CTT helps us understand how much of a person’s score reflects their actual experience versus random noise or error.
At its core, CTT assumes that any observed score (X) a person gets is made up of two parts,
- a true score (T), which reflects their actual level of the trait or condition being measured, and
- an error (E), which represents random variation. In simple terms:
Observed Score = True Score + Error
The “true score” is not something we can directly measure. Instead, it’s a theoretical value. The average score someone would get over many repeated tests under perfect conditions. In practice, we can only work with the observed score, which is a single, imperfect snapshot.
As someone’s true score increases, say, for fatigue or depression, we expect that their answers on the related items should also reflect that increase. This is based on the assumption that the questions are coded in a way where higher scores mean more of the concept is present.
The Role of Constructs and Abstract Ideas
Patient reported outcome instruments often aim to measure abstract ideas, or constructs, like anxiety, pain, hyperactivity, or fatigue. These are not physical things we can measure with a scale or device. Instead, they are mental models, concepts we use to explain patterns in how people think, feel, or behave.
Because constructs can’t be seen or touched, we use carefully developed items or questions to capture aspects of these experiences. The underlying assumption is that we can infer someone’s level of a construct from how they respond to related items.
What Is Construct Validity?
Construct validity refers to how well a tool actually measures the concept it claims to. If a questionnaire is supposed to measure anxiety, for example, it should produce results that match our expectations of how anxiety behaves, how it relates to other traits, how it varies across people, or how it changes over time.
One widely accepted definition describes construct validity as the degree to which scores from a measure align with theoretical expectations. Whether that’s relationships between items within a scale, correlations with other measures, or differences between groups known to vary on the construct.
If the patient report outcomes tool doesn’t match up with the construct it’s supposed to measure, there could be a few reasons: the tool might be flawed, the theory might be incorrect, or both might need rethinking.
Read more about different PRO instruments.
How Construct Validity Is Tested
To evaluate construct validity, researchers use several statistical techniques. These include:
- Descriptive statistics (e.g., means, ranges, distributions)
- Correlations between items and scales
- Plots and visual patterns
- Regression models to test predicted relationships
- Group comparisons (e.g., testing whether people with different clinical conditions score differently)
- Tracking changes over time (e.g., improvement after treatment)
These methods help determine whether the tool behaves as expected and truly reflects the construct of interest.
Also, here are the detailed steps how to develop a PRO instrument.