Validation of research instruments

Lately I’ve been working with colleagues on a validation study of a classroom observation instrument  they developed for use in undergraduate mathematics classes. While thinking about the basic question, what is validity? I came across a seminal article on the subject by Samuel Messick (1995 “Validity of psychological assessment,” American Psychologist, 50:9, 741-9). He said, “Validity is not a property of the test or assessment as such, but rather of the meaning of the test scores” (p. 741). In other words, validity is established for uses of instruments and interpretation of results. It’s inappropriate to say that a survey or test has been validated; rather, you have to specify how the instrument will be used and how the results will be interpreted.

A good example of this is a hypothetical validation study of a foot-long ruler. The relevant validation question might be: Does this instrument accurately measure distances? But this question is not a focused enough. Our validation study would find that the ruler is a valid instrument for measuring distances between, say 1/16 of an inch and 12 inches, along a more-or-less straight line/flat surface, to an accuracy of perhaps 1/32″. So, within those parameters, the ruler is a “valid” instrument. Or rather, it will yield valid measurements within those constraints. On the other hand, it would not yield accurate or useful results if you wanted to measure distances of more than a few feet, or on a curved or irregular surface, or, say, the thickness of a piece of wire. So, for these uses it is not a valid instrument.

There is a tendency in social sciences to use measurements for all kinds of things for which they are not valid measures. For example, standardized tests of student content knowledge may (or may not) be valid measures of the educational achievement of groups of students. Because of the technical qualities of the tests, those same measures may not be valid measures of individual students, or of the competence of their teachers, or of the quality of their schools. Researchers and evaluators should not ask, is this a validated instrument? Rather, the question is, is this instrument valid for my proposed purpose?

