One of the most contentious debates in American education focuses on whether to group students into classrooms using some measure of prior achievement. Whole class grouping by prior achievement or content mastery is most common for math instruction, less common for English and reading, and least common for other subjects; it appears to be more common in middle schools (48 percent) than in elementary schools (24 percent). The back and forth about grouping has become especially heated in recent years, with several high-profile states (Virginia, California) and districts (San Francisco, Seattle, New York) either eliminating grouping or considering adopting the practice for some subjects, grades, or performance levels.
The research on instructional grouping, however, is more positive than grouping critics would have us believe. Complicating the careful study of this practice is the many forms that grouping can take and the roles of grade level, subject matter, and teacher ability to differentiate instruction in the effectiveness of any grouping strategy or intervention. On balance, though, we find that large-scale studies and meta-analyses of grouping show evidence of positive effects for high-performing students and little downside (and often upside) for lower-performing students.
In concept, any kind of grouping by readiness or prior content mastery is a response to American students of any given age being academically diverse in what they know and can do. In some recent work, we and our colleagues found that the typical American classroom includes students that span three to seven grade levels of achievement mastery. This translates to a fifth-grade classroom that includes students who have yet to master second-grade math content, as well as those who have already mastered eighth-grade math content.
A separate 2021 study by Blaine Pedersen et al., using international TIMSS data, showed that the typical American fourth-grade classroom includes student achieving at all four international benchmarks in math. Although that may be hard to picture, note that there are only four international benchmarks, meaning that the entire possible range of student performance is present in the typical fourth-grade classroom! This was true before the pandemic, and recent research suggests that Covid-19 has made American students even more diverse in terms of grade-level content mastery.
Some schools respond to this diversity in academic readiness by grouping students by performance level for instruction. This was the topic of a recent National Bureau of Economic Research working paper by Kate Antonovics, Sandra E. Black, Julie Berry Cullen, and Akiva Yonah Meiselman, who looked at the effects of “tracking” on Texas students from 2010 to 2019. It’s important to note that this wasn’t a study of tracking per se. Rather, it evaluated the relationship between math achievement and being exposed to classrooms that were more or less academically diverse than the school as a whole. Under a scenario where students are assigned to classes at random, every student is taught alongside students from the entire achievement distribution present in the school. What skills a student has mastered has no bearing on class placement. We make this point because there is no way to know exactly what mechanism led to some students being in more- or less-homogenous classrooms, though the authors did control for things like the number of classrooms available in a given school (i.e., if there’s only one classroom for a grade, then of course the classroom will include the full range of academic diversity).
The researchers came to several important conclusions, in addition to confirming that this form of instructional grouping is more common in middle than elementary school. They also report that grouping by prior test scores is much more common than any kind of grouping by race/ethnicity or socioeconomic status. Although, of course, prior achievement is correlated with demographics, this finding means that most of the observed disparities in racial, ethnic, or socioeconomic representation across classes is due to achievement and not demographics. And finally, the extent of instructional grouping is correlated with degree of academic diversity within a school grade. Schools with a wider range of academic needs within a given grade tended to group more. In fact, this remained one of the only significant predictors of schools implementing instructional grouping. Average achievement, whether the school was a magnet, the political lean of the county, and the demographic makeup of the school were all non-significant predictors in the final model.
So schools with more academic diversity tend to do more instructional grouping. But is this good? The authors measured how exposure to more- or less-grouped classrooms influenced math achievement as measured by where students fell in comparison to their peers across the state. In other words, how does being exposed to more- or less-grouped classrooms influence student academic achievement, as compared to the rest of the state, and how do those effects differ for students who start out lower achieving versus higher achieving?
The findings are somewhat complicated because of rigorous multiple estimation methods and because effects were examined for students achieving at or below the 25th percentile, as well as students at or above the 75th percentile. In the end, the analyses suggest, greater exposure to instructional grouping is associated with no change in predicted math achievement for low-achieving students, but is positively predictive of upward percentile mobility for high-achieving students. In other words, when lower-achieving kids are taught in classrooms with a narrower range of the achievement distribution than is present in the entire schools, it does them no harm. They do as well as if they taught in classrooms that didn’t involve any form of ability grouping. But when higher-achieving kids are grouped, they do better.
How much better? Not a ton. A 1 standard deviation increase in grouping exposure for kids who were in the top 25 percent of math achievers in third grade predicted a 1.3 percentile-point increase in eighth-grade test scores. Instead of scoring at the 80th percentile among eighth graders in Texas, the student would score just over the 81st percentile. And although there were no similar positive effects for lower-achieving students, the students did end up in smaller classes. The authors hypothesized that this might have helped mitigate any hidden negative effects of grouping on lower-achieving students, but there’s no way to know from these data.
This study does not support or contradict a school’s decision to engage in more or less grouping by prior achievement. Instead, it simply shows that this kind of grouping does not appear to harm or hold back lower-achieving students, while it does seem to help higher-achieving students a bit. An implication of such findings is that higher- and lower-achieving students would become even more different in their math achievement, similar to what was seen in the recent NAEP 2022 data.
The study also raises two issues that are common in this type of research. First, flexible grouping and tracking are not the same thing. “Grouping” refers to placing students in flexible groups where membership depends on interest, subject matter, and recent performance and can change throughout the year (a mix of between- and within-class grouping); “tracking” refers to placing students in long-term, full-class, essentially permanent groups. Tracking also has a negative connotation, so tends to make people (ourselves included) nervous when it appears—the word, that is—in the literature, regardless of the grouping arrangements being studied. Again, the data alone don’t tell us or the researchers what mechanisms led some students to be taught in more- or less-academically-diverse classrooms than others.
Second, what happens within the groups is of paramount importance, but most studies—the present one is no exception—do not look at the curriculum, instructional strategies, or quality of differentiation within the groupings of interest. Given the well-documented importance of these and other factors on student learning, it’s hard to examine the effects of any type of “grouping” without knowing what actually happened within each classroom. As noted earlier, we understand why these facets of education aren’t included in the present study (it’s really difficult!), but it’s definitely a limiting factor in much of the grouping literature.
Our interpretation of these grouping/tracking studies (and other research on reforms that are primarily organizational in nature) is that they average out the role of curriculum and teachers, which are probably the most important factors. In other words, if we accept that high-quality instruction and curriculum is normally distributed within each group or track (which it probably isn’t), then those factors would balance each other out in most studies. If that is the case, then the results of these studies tell us something important—the organizational reform of grouping students by readiness for instruction appears to have small benefits for the brightest students and no negative impact on students at the lowest readiness levels—but they don’t tell us what the organizational strategies would do in the presence of, for example, pre-differentiated, prescriptive curricula with teachers skilled in differentiation (see here for an interesting, nuanced take).
Put differently, this latest study may show us the lower-bound of what a form of instructional grouping can do—that it didn’t hold any students back or restrict their learning—but it also doesn’t show us the full scope of possible benefits. This is by no means a criticism of the present study, but rather a guide and call for future research that informs the types of grouping that can best facilitate learning for all students.
Editor’s note: This was also published as a guest article in an edition of “Advance,” a newsletter from the Thomas B. Fordham Institute written by Brandon Wright, our Editorial Director, and published every other week. Its purpose is to monitor the progress of gifted education in America, including legal and legislative developments, policy and leadership changes, emerging research, grassroots efforts, and more. You can subscribe on the Fordham Institute website and the newsletter’s Substack.