On the Assessment of Gain Scores by Means of Item Response Theory

Gerhard H. Fischer


The problem of the measurement and statistical assessment of change based on test scores which arises in repeated measurement designs with two time points is treated within an item response theory framework. The latter is delineated by postulating a Partial Credit Model, of which the Rating Scale Model and the Rasch Model are special cases. A conditional maximum likelihood estimator of the amount of change, Clopper-Pearson and related significance tests for the change parameter, uniformly most accurate confidence intervals, and uniformly most powerful unbiased tests are presented. They are all 'exact' in the sense that no asymptotic approximations are needed. They are grounded on the conditional distribution of the gain score, given the sum score of both time points. These methods are quite flexible because they do not require the same test to be given on both occasions; it is necessary, though, that the items presented at the two time points be chosen from an item pool conforming to the Partial Credit Model, and that the item X category parameters of that model be known (i.e. have been estimated with sufficient precision from a previous sample of testees). Possible applications are the computation of significance tables for 'gain scores' (i.e. score differences) for fixed pre and posttests or the computation of the significance of a score difference for an individual in individualized (adaptive) testing. That all results hold for single individuals and thus are applicable in single case studies is a noteworthy feature of the present methods.