In the wake of dismal NAEP reading scores released earlier this year, calls for stronger education policies have grown louder. Test scores from third grade (which is likely already too late for intervention) are the earliest indication of whether students are on track or falling behind. But do third grade test scores serve as a true indication of the quality of students’ early education? Obviously, since federally-mandated standardized testing begins in grade three, school accountability ratings don’t account for student progress in grades K–2. The question is whether that fact unfairly penalizes schools that are making commendable gains with youngsters in those early grades. A recent study from Mathematica’s Walter Herring investigates.
The study leverages NWEA MAP Growth test scores from ten states from the 2013–14 through 2018–19 school years, containing millions of test events across those five years. Still, it is not a nationally-representative sample, in part because education leaders choose to opt in and partner with NWEA for testing. Within those limits, Herring has access to spring MAP test scores and school-level demographic data and is able to calculate separately achievement and growth scores for each school. NWEA uses student growth percentiles, or SGPs, which assess the growth that a student exhibits in a given year relative to his or her peers with a similar test history. Although SGPs do not adjust for differences in student characteristics beyond prior achievement, they are commonly used in state accountability systems.
Herring calculates a school’s average test scores in grades K–2 on MAP and their average test scores in grades 3–5 on standardized assessments. He evenly weights achievement and growth to produce a combined score. To compare the combined scores in grades 3–5 with the scores they would have received if the reporting included K–2 results, he calculates achievement and growth scores based on MAP Growth results in grades K–5, then ranks schools based on their combined scores across two different grade bands (3–5 and K–5) within each state and year. Finally, he assesses the degree to which school rankings changed based on the proportion of schools that changed their quintiles (dependent on the grade levels included in the achievement and growth distributions). He also looks at whether the rankings changed based on the bottom 5 percent of achievement ratings, which are the schools in which (per ESSA) many states intervene in order to turn them around.
Results show that achievement scores in grades 3–5 were highly correlated with achievement scores in grades K–2. Not surprisingly, schools serving more low-income students tend to have lower average test scores, including in the early grades: Specifically, a 10 percentage-point increase in the proportion of students receiving free and reduced-price lunch is associated with one-tenth of a standard deviation decrease in average test scores in grades K–2. By contrast, schools’ growth scores in the upper elementary grades tended to be very different than their scores in the lower elementary grades, revealing a much weaker relationship between SGP measures across grade levels. But that’s not good news for high-poverty schools either, as schools with more low-income students had lower growth scores in the untested early grades after controlling for their growth scores in the tested grades.
In terms of how rankings might change, Herring compares the combined achievement/growth measure in grades 3–5 against schools’ rankings after incorporating scores for all students in grades K–5. Results show that 42 percent of schools change quintiles after accounting for test scores in K–2, with 5 percent moving multiple quintiles. And 38 percent of schools that fall in the bottom 5 percent based on results in grades 3–5 no longer appear in that lowest level when grades K–2 are accounted for. Schools that decreased quintiles served larger proportions of low-income students and Black children; likewise, schools that fell below the 5 percent threshold after including early elementary scores served more Black children.
So yes, including K–2 results could make a difference in school ratings. But the more unfortunate takeaway is that, because most high-poverty schools see slower growth in grades K–2 than other schools do, including these early grades in state accountability systems would tend to exacerbate the ratings gap between rich and poor schools. Is this an argument for more testing in the early grades? Or for putting the very best teachers in the lowest grade levels along with solid curricula? Or for identifying low-performing schools early and providing them with strong supports for their youngest learners? How about trying all of the above?
SOURCE: Walter Herring, “The Other Half of the Story: Does Excluding the Early Grades from School Ratings Matter?” Annenberg Institute at Brown University (August 2022).