When trying to improve educational outcomes, it is hard not to feel the need for urgency. We want to figure out what works now and implement changes immediately—because if we wait, kids who are in schools now will miss out. Unfortunately, this pressure to act quickly may be fundamentally at odds with the ability to measure what really works, since meaningful changes in the trajectory of student achievement are not always apparent until years later. Diane Whitmore Schanzenbach of Northwestern University provides a compelling example of exactly this conundrum.
Schanzenbach’s thesis is that too often, education research only assesses an intervention’s immediate or intermediate outcomes without capturing its long-term benefits. This may be particularly relevant, she asserts, when judging the impact of early childhood investments.
Schanzenbach offers the example of two studies (both of which she co-authored) on the famous 1990s Project STAR class size experiment in Tennessee. That well-known experiment assigned students randomly to either regularly sized classes or smaller ones. Researchers behind both papers (the first from Dynarski, Hyman, and Schanzenbach and the second from Chetty, Friedman, Hilger, Saez, Schanzenbach, and Yagan) found that the smaller kindergarten classes yielded an immediate bump in student test scores for that year; but both papers report that this bump faded as students entered middle school.
That’s not the end of the story, though. When the students became adults, clear positive impacts reemerged for those students who had been placed in the smaller classes. Schanzenbach concludes, “We find that the actual long-run impacts were larger than what would have been predicted based on the short-run test score gains.”
The failure of test score gains to endure and carry through to what later turn out to be positive outcomes may confirm public skepticism about test scores as an accurate indicator of long-term achievement.
But not so fast.
Schanzenbach is right in noting that the fade-out of higher test scores between two and six years after the intervention did not correlate with more positive life outcomes. However, the immediate test score gains from the year of the intervention, when students were in kindergarten, were highly predictive of students’ college attendance and degree completion. Schanzenbach admits as much, stating the first paper that “the short-term effect of small classes on test scores, it turns out, is an excellent predictor of its long-term effect on adult outcomes.”
Schanzenbach’s theory finds stronger footing in her second publication. This study looked at both kindergarten class size and each student’s kindergarten classroom quality (as measured by the average test scores of his classmates at the end of kindergarten—a proxy for a combination of peer effects, teacher effects, and other classroom characteristics). Again, small kindergarten classes correlated with higher kindergarten test scores and higher college attendance.
Moreover, while the higher kindergarten test scores were correlated with higher earnings at age twenty-seven, they provide a statistically significant explanation for only a small portion of the difference in earnings. Thus, the short-term test score bump can barely begin to explain the benefits students derived later in life from having been assigned to a smaller or higher-quality class.
The missing piece of the statistical puzzle was students’ non-cognitive skills. When the STAR students were in fourth and eighth grades, they were assessed on non-cognitive outcomes, with results finding stronger non-cognitive outcomes but faded test-score gains for the students who had been in the small class sizes.
Furthermore, these non-cognitive measures seem to explain a much greater share of future earnings than do the academic outcomes. Teasing apart the positive impact of higher test scores and stronger non-cognitive skills achieved in a higher-quality kindergarten classroom, the higher fourth-grade test scores would predict an additional $40 of income at age twenty-seven, but the non-cognitive skills would predict an additional $139 in earnings.
Although we think Schanzenbach’s characterization of the findings in undersells the predictive power of immediate test score gains, she does raise several critical points. The first is that early childhood interventions may foster outcomes that most strongly emerge long after the initial study period has ended, thereby eluding researchers who only measure immediate and intermediate outcomes for a few years. The second is that interventions may yield effects that cannot be evaluated purely by measures of academic skills and content. As our understanding of the importance of grit and executive functioning grows, so too should our measures of the impact of classroom experience on these skills alongside standardized test scores.
Editor's note: This post was originally published in a slightly different form by NCTQ.