Test scores don't tell us everything, but they certainly tell us something about school quality and student success

Michael J. Petrilli

5.5.2016

Editor's note: This post is the fourth in an ongoing discussion between Fordham's Michael Petrilli and the University of Arkansas's Jay Greene that seeks to answer this question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results? Prior entries can be found here, here, and here.

I think we’re approaching the outline of a consensus, Jay—at least regarding the most common situations in the charter quality debate. We both agree that closing low-performing schools is something to be done with great care, and with broad deference to parents. Neither of us wants “distant regulators” to pull the trigger based on test scores alone. And we both find it unacceptable that some states still use test score levels as measures of school quality.

I think you’re right that in the vast majority of cases, charter schools that are closed by their authorizers are weak academically and financially. Parents have already started to “vote with their feet,” leaving the schools under-enrolled and financially unsustainable. Closures, then, are akin to euthanasia. That’s certainly been our experience at Fordham.

But what about the rare instances when a popular, fiscally viable charter school demonstrates lackluster results? Do such cases even exist? I queried Greg Richmond, the president and CEO of the National Association of Charter School Authorizers. “I have never seen any research on this,” he replied, “but I have to believe that hundreds of schools with wait lists have been closed.”

But Greg isn’t apologetic about such closures (which shouldn’t be surprising, since he’s a true accountability hawk). “Would we recommend keeping an academically failing school open because it has a wait list?” he asked. “No. Or ‘Hell no.’”

My own position is somewhere between yours and Greg’s. I agree with you that we should try to “defer to those with much more information about school quality, and it will be extremely rare that these more informed assessments of quality will be at odds with parental preferences.” If parents are choosing a school despite its lousy value-added results, we should ask why. Is it because of slick marketing and gimmicks, like free iPads (or, in the virtual charter schools space, free computers)? Are parents prioritizing something other than school quality? Or is there something really special going on in the school that, for some reason, doesn’t register on the test scores?

But Greg is right, too, in wanting to shut down “academically failing schools.” If the school isn’t getting the job done academically—even if it’s doing well at the other stuff that you and I value, like conferring character or non-cognitive skills—it’s not worthy of taxpayer support. Even if parents like it. Full stop. The issue, then, is how to define academic success and failure.

Which brings us back to the original question: How well can test score gains address the question of whether a school is academically effective? You argue that they are “much less reliable indicators” than many of us assume, and that only the well-publicized Raj Chetty study demonstrates a link between test scores and long-term outcomes—“a very thin reed on which to rest the entire education reform movement.” Is that right?

First, on the research: No. There was also a Chetty study on the famous Project STAR experiment on class-size reduction, which demonstrated that “students who were randomly assigned to higher-quality classrooms in grades K–3—as measured by classmates' end-of-class test scores—have higher earnings, college attendance rates, and other outcomes.” That demonstrates a link between early test scores and long-term outcomes, even though the test scores themselves faded out over time.

And there’s a new study by David Deming and colleagues in Education Next demonstrating that test score gains for Texas high school students, juiced by the state’s accountability system, “led to long-term gains for students who attended schools that were at risk of falling below a minimum performance standard”; these gains including being more likely to graduate from college and make more money at age twenty-five. (Though there’s a caveat: Accountability pressures led higher-performing schools to exempt their low-performers from the test, which did long-term harm.)

And what about the claim that test scores are “much less reliable” indicators than we assume, and that we should be careful about using them for high-stakes decisions?

I am more sympathetic to that argument when it comes to older students. Few states have an effective way of looking at student learning gains over time once teenagers get to high school. Nor is it clear how one might do so, especially if we want a system wherein students’ pathways are allowed to diverge. (Should college preparation and CTE students’ learning be measured in the exact same way? I don’t think so.) But if we just look at passing rates—on state tests, the SAT or ACT, or Advanced Placement exams—we will conflate the impact of high schools with students’ background characteristics and prior achievement. That’s a huge challenge.

Let’s acknowledge, though, that our other primary measure of high school quality—the graduation rate—is hugely problematic as well. Particularly with the widespread use of “credit recovery,” we can’t be sure that one school’s standards for graduation come close to resembling another’s. Yet several of the studies you point to use high school graduation rates as the key indicator of long-term outcomes. I like to think that school choice is helping more low-income kids graduate, but are we sure the graduation standards at their schools of choice are the same as those at the public schools? How would we know? If the relationship between test scores and graduation rates is weaker than we would expect, why question the test scores instead of the graduation standards?

At the elementary school level, though, your argument falls apart.

It’s going to be very hard, if not impossible, for state accountability systems to trace a causal connection between a) what students learn from the ages of five to eleven and b) what happens to them 15–20 years later. Chetty’s studies were remarkable for being able to do so—but they relied on IRS data or confidential files that nobody else has access to. No state accountability system could ever replicate it. And even if one could, it would lead to the “analysis paralysis” that I mentioned previously. Are we going to keep elementary schools open despite horrible value-added results in the hope that, fifteen years hence, the data will show that their kids did OK regardless?

Second, all of your evidence—showing a disconnect between test score gains and long-term outcomes—comes from the high school level. The evidence from Chetty comes from kindergarten (Project Star) or grades 3–8 (teacher value added). Maybe gain scores for younger kids are more predictive, since they tend to vary more. (High school students in general don’t show as much progress in reading and math as little kids do, at least in America.)

Third, we do know that achievement in elementary school is predictive of achievement in later years. That’s why we’re always hearing about the importance of “reading by the third grade,” or getting a solid foundation in mathematics. Those are “long-term outcomes,” too.

How we measure progress in elementary schools really matters. If the tests are dumbed-down, low-level, and skills-centric, they could give us a false sense about whether kids are on track. In reading especially, tests could be encouraging a kind of instruction that leads to short-term bursts in achievement but not the development of broad knowledge and vocabulary that offers real payoff down the line. That’s one reason why I don’t share your affection for nationally norm-referenced tests, and why we prefer the new assessments linked to the Common Core. But I digress.

***

So let me offer a solution that both you and I might embrace. When it comes to high schools, we should exercise caution about using test scores to evaluate school quality. Charter school authorizers and state accountability systems should work toward models that use data from real long-term outcomes instead.

For elementary and middle schools, however, test data should play a more central role. And in the rare cases when a school is fully enrolled but academically atrocious—as judged by, say, negative value-added scores over multiple years—authorizers and other regulators should dig in to find out why. The school’s educators and parents should have a chance (at an appeal, for example) to make their case, using other indicators to argue that the institution is indeed preparing its charges for success in the future. I’m doubtful that many such schools will be able to make a convincing argument, but I’m not opposed to giving them the chance to try. If they fail, let’s shutter those schools swiftly and move on.

Jay, deal or no deal?

Policy Priority:

Quality Choices

Topics:

Charter Schools

Tags:

Common Core

Texas

Advanced Placement

ACT

SAT

Test scores don't tell us everything, but they certainly tell us something about school quality and student success

Ohio Charter News Weekly – 3.28.25

Gadfly Bites 3/28/25—I don’t write ‘em, I just compile ‘em

Freddie deBoer is right about one thing, not everything

Test scores don't tell us everything, but they certainly tell us something about school quality and student success

Related Content

Ohio Charter News Weekly – 3.28.25

Gadfly Bites 3/28/25—I don’t write ‘em, I just compile ‘em

Freddie deBoer is right about one thing, not everything