The New York Times recently gave lavish attention to a "study" conducted by Arizona State University's David C. Berliner and Audrey L. Amrein, and funded by the teachers' unions, that purports to show that high-stakes tests don't promote student learning. In fact, however, the Times has called our attention to a perfect example of how not to study high-stakes testing. Along the way, it has helped mislead the nation as to the actual impact of one of the most important education reform strategies now underway.
Senior author Berliner has long opposed high-stakes testing, and he and his colleague leave readers in no doubt as to their views on the subject. They provide a tendentious history of the high-stakes testing movement in which, to take only one example, the perception that U.S. schools aren't providing an adequate education is said to have originated not from the system's faltering performance, but rather from public fears of Soviet technological superiority after the launch of Sputnik. They also inform us that advocates of high-stakes testing "derive satisfaction" from "punishing the slackers" in the public education system. This theme will not surprise followers of Berliner's repeated Panglossian efforts to say that U.S. K-12 schools are basically doing OK and that those who say otherwise are enemies of public education.
The authors make much of the finding that states with many low-performing poor and minority students are more likely to adopt high-stakes accountability testing. This fact, however, is as unsurprising as it is encouraging: states in greatest need of educational improvement are most likely to adopt school reforms. But in Amrein and Berliner's Looking Glass education world, such moves are cause for alarm rather than applause. "If these high-stakes tests are discovered not to have their intended effects," they warn, "the mistake will have greater consequences for America's children of color" and for "America's poorest children." By this logic, we should be very careful never to do anything to help poor minority students get a better education, because if we make a mistake it will hurt them most.
The authors even manage to portray one of high-stakes testing's greatest strengths--its ability to expose and quantify the failure of inner-city schools to provide a decent education to poor and minority students--as an oppressive menace. Acknowledging the persistence of an "achievement gap between wealthy, mostly white school districts and poor, mostly minority school districts," they accuse high-stakes exams of "testing poor students on material they have not had a sufficient opportunity to learn." Back through the Looking Glass: one could hardly imagine a more conclusive argument in favor of high-stakes tests, which expose the shameful failure of inner-city schools to provide their students with decent educational opportunities. If disadvantaged youngsters are held back because they fail a high-stakes test, but pass the next year (or the following year) after having learned basic reading and math skills that they lacked before, one might expect a sense of accomplishment: high-stakes testing has forced the school to provide those students with a real education.
The rest of the study doesn't improve much upon this unpromising start. The authors' goal is to measure whether higher scores on high-stakes tests represent improvements in "real learning," or just the transfer of school effort away from real learning and towards narrow skill training and test prep. To gauge this, they measure the performance of students on four exams--SAT, ACT, AP and NAEP-- before and after the introduction of high-stakes testing in their states. They find that the introduction of high-stakes tests is not associated with gains on any of these external exams.
The first problem with this should be obvious: three of the four exams are taken only by college-bound students. Just a third of all high school students take the SAT and fewer take the ACT and AP tests. What's more, the students that statewide regimens of high-stakes testing are mainly intended to help--low-performing youngsters who often lack even the most basic skills--are precisely those who are least apt to take college entrance exams. The main point of high-stakes testing as a reform strategy is to push schools to provide a decent education to those toward the bottom of the class, those in danger of graduating without even rudimentary reading and math skills. It's not to boost the scores of the minority of students who are already doing so well that they're applying to college.
That leaves just the NAEP scores. Here Amrein and Berliner face another problem: NAEP is only given in intermittent years and wasn't given at all at the state level before 1990. Several of the states they study adopted high-stakes testing well before 1990, so any beneficial effects of high-stakes testing may have occurred before NAEP existed as a state-level barometer. As for the rest of the states, the years in which high-stakes tests were adopted don't line up with the years in which NAEP was given, making it difficult to use NAEP to reliably audit the effects of other tests. (This will, to some degree, be improved in future years by the testing requirements and calendars of No Child Left Behind.) Amrein and Berliner try to paper over this problem by imputing data for years in which they have none; this, however, assumes that year-to-year changes in NAEP scores occur along smooth, gradual lines, when the whole point is to study how the introduction of high-stakes testing affects annual changes in NAEP scores.
It is possible to undertake an empirical examination of whether high-stakes tests produce "real learning" that can be detected in the results of other tests. The correct approach would be to compare scores on high-stakes tests with scores on similar broad-based standardized tests (rather than college entrance exams) given at the same time (rather than intermittently). Such an analysis could be done with school-level test data, rather than with the crude state-level test data used by Amrein and Berliner; it could measure scores every year and thus accurately track year-to-year changes; and it could compare similar student populations rather than comparing the general population to the college-bound elite. But that's not what Berliner and Amrein did. What they did, regrettably, is shoddy research that does not warrant the fawning attention of the nation's newspaper of record.
Jay P. Greene is a senior fellow and Greg Forster is a senior research associate in the Manhattan Institute's Education Research Office in Davie, Florida. (www.miedresearchoffice.org).
"The Impact of High-Stakes Tests on Student Academic Performance," Great Lakes Center for Education Research and Practice, December 2002
"High-Stakes Testing, Uncertainty, and Student Learning," Education Policy Analysis Archives, March 28, 2002
"Make-or-Break Exams Grow, But Big Study Doubts Value," by Greg Winter, New York Times, December 28, 2002