In quality of life data, it is common to have the outcome measures as ordinal e.g., points on a Likert Scale. These outcome measures are based on attributes that cannot be assessed directly, such as pain or satisfaction level. One of the major difficulties that we come across while analyzing the ordinal outcome measures is that these data preclude any arithmetic operations, such as addition or subtraction etc. When it comes to measurement of change in ordinal outcome measures for longitudinal data, one either uses standard classical methods such as the paired t-test, repeated measures or the effect size statistics to test the statistical significance of the change. However, these approaches have some critical issues.

Let’s discuss about scores as outcome measures which are ordinal in nature. Being ordinal, the unit difference between adjacent scores is not equal at different levels. For example, the distance between ‘Excellent’ and ‘Very Good’ may not be the same as that between ‘Very Good’ and ‘Good’. Given this, it may not be a good idea to average the different levels of the question at two time points and then see the change. By doing this we will in fact treat the ordinal scores as interval measures and that can lead to flawed conclusions. In absence of an interval scale, it is possible to ascertain only whether a change has happened; the amount of change, cannot be determined.

Another drawback of using standard classical test theory to measure change is that these statistics employ the mean change score and, therefore, only measure the overall change of the subjects/patients involved. We do not get an idea of the change at the individual level. Individual level change information, can give us plenty of critical and useful information. For example, by studying the patients who responded to a particular treatment, one can identify the specific features of those patients which led to the treatment response. Also, since every patient has a distinct hidden latent capability of responding to the treatment, which also depends on the disease status of the patient, the treatment does not necessarily have to be the same for all the patients.

One of the effective approaches to handle the above-mentioned issues could be to use Item Response Theory (IRT). From the statistical point of view, IRT is a collection of models that link the association between underlying hidden latent capability of a subject/patient and the probability of a particular response to a question or item. In healthcare context, the hidden latent capability can be the health status of the patients.

IRT models can be broadly classified as dichotomous or polytomous based on the response levels of the items or questions. A dichotomous model will have two levels for the response whereas in a polytomous model the response can have more than two levels e.g., "Ratings on a scale of 1 to 5" for the Likert-type items. Also, the model classification can be based on whether the model takes only a single latent variable or multiple latent variables. Once arriving with the models based on the situation, one way to choose the model best suited could be to compare the models based on Information criteria viz. the Akaike Information Criteria (AIC) or Bayesian Information Criteria (BIC) etc. The model with lower AIC/BIC is chosen. Once the model is chosen, one can estimate the latent capability of a patient and the probability of a particular response to an item.

The entire methodology of implementation of IRT is beyond the scope of this article. The current article attempts to highlight that in case of analyses of change for the ordinal outcome measures, the classical test theory comes with some crucial shortcomings and the approaches based on IRT can prove to be a better alternative. Implementation of IRT concepts pose much greater complicacy. However, if it can be established for a data, IRT can be an extremely valuable addition to a clinical study. Interested readers can further study some important references suggested below.


  1. Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: University of Chicago Press, 1960 (Reprinted 1980).
  2. Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5).
  3. Sebastian Ueckert: Modeling Composite Assessment Data Using Item Response Theory. CPT Pharmacometrics Syst. Pharmacol. 2018; 7(4); 205–218
  4. Wright BD, Masters GN. Rating Scale Analysis. Chicago: MESA Press, 1982.

Leave a Reply

Your email address will not be published. Required fields are marked *