A report recently released by the Economic Studies program at the Brookings Institution delves into the complex process behind designing and scoring cognitive assessments. Author Brian Jacobs illuminates the difficult choices developers face when creating tests—and how those choices impact test results.
Understanding exam scores should be a simple enough task. A student is given a test, he answers a percentage of questions correctly, and he receives a score based on that percentage. Yet for modern cognitive assessments (think SAT, SBAC, and PARCC), the design and scoring processes are much more complicated.
Instead of simple fractions, these tests use complex statistical models to measure and score student achievement. These models—and other elements, such as test length—alter the distribution (or the spread) of reported test scores. Therefore, when creating a test, designers are responsible for making decisions regarding test length and scoring models that impact exam results and consequently affect future education policy.
Test designers can choose from a variety of statistical models to create a scoring system for a cognitive assessment. Each model distributes test scores in a different way, but the purpose behind each is the same: reduce the margin of error and provide a more accurate representation of student scores that can be analyzed on individual and aggregate levels.
With certain models, one element that designers must consider is how many parameters (or conditions) to include when measuring student performance. Parameters account for the varying difficulty of specific test items, how well test items differentiate between the abilities of students, and the possibility that a response was answered correctly by guessing. A test that accounts for only one parameter will score student performance differently than a test that accounts for all three.
Test length, one of the simpler elements a test designer considers when creating an assessment, also significantly affects score distributions. Teachers and students tend to prefer shorter tests, but longer tests better measure student ability. Like statistical models, longer tests decrease the margin of error by shifting the scores of the lowest- and highest-performing students closer to the mean. This, in turn, produces more accurate test results.
Once designers have determined a test’s length and how to measure student performance, they decide how to report the test’s results. Modern cognitive tests assign scores using numerical scales that provide an ordinal ranking of student performance within a range of possible scores.
The report provides a clear and focused explanation of how modern assessments are designed and scored. Its summary of this complex process is an excellent resource for those who make decisions based on test results. Schools, administrators, and policy makers should all make it a priority to educate themselves on the intricacies of cognitive testing—and this report provides an easy-to-navigate introduction to that process. Better understanding will lead to better decisions that will benefit schools, teachers, and students.
SOURCE: Brian A. Jacob, “Student test scores: How the sausage is made and why you should care,” Brookings Institution (August 2016).