Donald Campbell was an American social psychologist and noted experimental social science researcher who did pioneering work on methodology and program evaluation. He has also become—posthumously—an unlikely hero of the anti-testing and accountability movement in the United States. In the hands of accountability critics, his 50 years of research on methodology and program evaluation have been boiled down to a simple retort against testing: Campbell’s Law. But a deeper reading of his work reveals a more complicated and constructive message: Measuring progress (using both quantitative and qualitative indicators) is essential; when using quantitative data for evaluation, the indicators can become distorted or manipulated; and there are concrete steps we can—and must—take to minimize data manipulation and distortion.

Campbell’s December 1976 article, “Assessing the Impact of Planned Social Change,” has become a flashpoint in the educational accountability debate. There, he argued,

The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

Foes of testing and accountability frequently evoke this “Law” to argue against the use of standardized tests and test-based accountability. In a May 25 blog post, for example, Diane Ravitch explained:

Campbell’s Law helps us understand why No Child Left Behind and Race to the Top are harmful to education…As high-stakes testing has become the main driver of our nation’s education policy, we will see more cheating, more narrowing of the curriculum, more gaming of the system.

In response to an article posted here last week, Ravitch glibly said, “Porter-Magee should google Campbell’s Law and study it.” Similarly, several people took to Twitter to evoke Campbell’s Law as a way of broadly dismissing the suggestion that standardized testing can and should have a role in school-level accountability or education policy.

But like all things, it’s not quite that simple.

For starters, Campbell wasn’t writing about testing in schools—he was writing about all of the indictors that are used to inform social decision making, from the ways we measure poverty in our cities to the indicators used to determine air quality to the reporting systems for assessing the level of discrimination in the work place. He didn’t set out to argue that there was anything about measurement in schools that proposed any greater challenge than measurement in any other complicated social endeavor. In fact, one of his key insights was that the challenges of using data to inform social decision making and program evaluation were universal and that we could learn a lot about how to minimize these challenges by looking across fields for patterns of failure and examples of success.

Second, to my knowledge, Campbell never argued that because numerical data was susceptible to manipulation or distortion, or because it is imperfect, we should abandon all hope of using these data to improve programs and systems. Indeed, in his conclusion, Campbell goes out of his way to emphasize,

[T]he overall picture is not as gloomy as this seems…And many of the quasi-experimental evaluations that I have scolded could have been implemented in better ways.

Even more than that, Campbell encouraged us to be wary of institutions and administrators whose impulse was to resist evaluation. Specifically, he noted, “[I]n the United States, one of the reasons why interpretable program evaluations are so rare is the widespread resistance of institutions and administrators to having their programs evaluated.” And he argued that we needed to understand “the reasons for this resistance” and identify “ways of overcoming it.”

Finally, and perhaps most importantly, Campbell argued that it was possible to protect against the “harmful” impact of using data to drive decisions by:

1.      Identifying ways that we can uncover the distortion, corruption, and/or misuse of data and institutionalizing them. (In education, for instance, he highlights two instances where independent external observers were able to identify and flag inappropriate behavior that corrupted student achievement data—one in Texarkana and one in Seattle.)

2.      Using multiple measures (both quantitative and qualitative) so that numerical data doesn’t become the only way to judge a program’s effectiveness or impact.

At the same time, Campbell did warn specifically against the use of this kind of quantitative data to evaluate individuals—a useful caution for policymakers seeking to link test-score data directly and systematically with, for example, teacher evaluation.

What does that mean for education evaluation and test-based accountability? In a presentation delivered in 2010, Daniel Koretz, author of Measuring Up, offered some helpful context and suggestions.

Specifically, Koretz argues that “a primary cause of Campbell’s Law is incomplete measurement of desired outcomes.” And the way we’ve structured our testing and accountability systems has set us up for failure. For starters, we have focused the lion’s share of our accountability attention on two subjects: reading and math. That schools have focused an ever-increasing number of hours on these subjects, to the near-exclusion of all else, is a reasonable, if undesirable, response. If we value learning in other areas, we need to measure it. And that doesn’t mean simply adding testing hours but rather being more deliberate and creative about the assessments we administer and the content they measure.

Campbell’s warning suggests not that we abandon state-level standardized assessment and accountability, but rather that we do a better job of protecting ourselves against this kind of manipulation and distortion.

What’s more, practically speaking, even within a particular content area it would be near-impossible to develop a test that measures everything students should know and be able to do. A standardized test, therefore, measures a “sample” of the total knowledge and skills. Problems arise, however, when high stakes are attached to test results and when that “sample” becomes highly predictable. After all, if teachers know precisely what will be measured—and if that changes very little from year to year—they are far more likely to narrow the focus of their instruction on the knowledge and skills that will be assessed. Even at the expense of the broader content and skills that students should master to be prepared for what lies ahead.

Thus, when tests become too predictable, teachers can effectively narrow their instruction to match the particular knowledge and skills that will appear on the test. Or worse, they can “coach” students to game the test even without firm mastery of the content. In such cases, the test isn’t really measuring what it was designed to assess.

This is the kind of instruction many testing critics think of when they deride “teaching to the test.” And this has become one of the most frequently cited—and damning—arguments against the use of standardized tests in accountability.

Of course, the only reason this kind of “coaching” is possible is because state tests have become far too predictable. They test the same narrow subset of the state standards every year—and too often those standards that are easiest to assess and not necessarily those that are most important or most indicative of deep conceptual understanding or content mastery. Teachers, curriculum developers, and school leaders recognize the patterns in state tests, and then many of them focus on ensuring their students can correctly answer the questions that are asked over and over again.

But Campbell’s warning suggests not that we abandon state-level standardized assessment and accountability, but rather that we do a better job of protecting ourselves against this kind of manipulation and distortion. To that end, we should recognize three things:

1.      The greater the consequence we attach to test results, the less “predictable” the questions need to be. If we’re going to attach high stakes to tests, we need to make it hard to predict how to narrow their curriculum to the “tested” content at the expense of the full range of knowledge and skills laid out in the standards. 

2.      The greater the consequence we attach to evaluations, the more we need to diversify the indicators. We need to balance numerical data with other information, including qualitative data—which paints a clearer picture of how well a school is doing and how much or how little its students are learning.

3.      The more we focus on accountability with consequences, the more we need to independently check the data. States could, for instance, invest in inspectorates whose focus is on site visits and other measures that could serve as a “reality check” on the data.

This is where the policy debate should be centered if we want to stay true to the legacy of Donald Campbell’s work. No simple retorts. Just a lot of hard work ahead to make school measurement and evaluation serve our students and schools, rather than the other way around.

Kathleen is the Superintendent and Chief Academic Officer at the Partnership for Inner-City education and a Senior Visiting Fellow at the Thomas B. Fordham Institute. Before joining the Partnership, Kathleen served as the Senior Advisor for Policy and Instruction at the College Board, as the Director of Curriculum and Professional Development at Achievement First, and the Director of Teacher and Principal Professional…

View Full Bio