Everyone knows standardized tests paint an incomplete picture of schools’ impacts on students. Yet coming up with thoughtful alternatives to tests is easier said than done.
After all, any truly informative indicator of school quality must satisfy at least five criteria.
First, it must be valid. In other words, it must capture something that policymakers, parents, or “we the people” care about.
Second, it must be reliable. That is, it must be reasonably accurate and consistent from one year to the next or when averaged over multiple years.
Third, it must be timely. For example, it makes little sense to report elementary schools’ effects on college completion, since that information would be useless to parents and policymakers by the time it was available.
Fourth, it must be fair. For example, if the point is to gauge school performance, an indicator shouldn’t systematically disadvantage schools with lots of initially low-performing students (though of course, it’s also important to report students’ absolute performance).
Finally, it must be trustworthy. For example, it may be unwise to hold high schools accountable for their own graduation rates, as ESSA requires, since doing so gives schools an incentive to lower their standards.
Finding an indicator that checks all of these boxes is inherently challenging. And in practice, bureaucratic inertia, risk aversion, and the difficulty of collecting and effectively communicating new data to stakeholders can make it hard for policymakers to think outside the box.
Still, there is at least one underutilized and potentially powerful data point that nearly every school system in the country already collects—a student’s grade point average (GPA). Might it be used to evaluate school quality?
Perhaps the most obvious objection is that holding schools accountable for their own students’ GPAs incentivizes grade inflation. But what if we instead rewarded schools for improving students’ grades at the next institution they attend? More specifically, what if an elementary school’s accountability rating were partly based on its students’ GPAs in sixth grade? Similarly, what if a middle school’s accountability rating were partly based on the grades that its graduates earned in ninth grade?
A measure that is based on students’ subsequent GPAs has at least two compelling features.
First, because it captures students’ performance across all subjects, it provides parents and policymakers with valuable information about underappreciated dimensions of schools’ performance.
Second, by holding schools accountable for students’ success at their next school, it disincentivizes teaching to the test, socially promoting students, and other behaviors that yield short-lived or illusory gains.
Of course, like standardized test scores, the grades students earn reflect their lives outside of school, in addition to whatever grading standards educators choose to establish. But what if we held schools accountable for their effects on students’ subsequent grades, much as traditional test-based value-added measures hold schools accountable for their effects on achievement?
In the past decade, rigorous research has shed new light on the effects that individual schools have on non-test-based outcomes,[1] of which “subsequent GPA” is perhaps the most connected to the day-to-day work of educators and has perhaps the most robust record of predicting students’ success.[2] Yet there has been no discussion of how schools could be evaluated for the additional value they might add to students’ grade point averages once factors they can’t control have been taken into account.
So, to better understand the potential of “GPA value-added” as an indicator of school quality, we asked two of the country’s most prolific education scholars, the University of Maryland’s Jing Liu (ably seconded by his research assistant, Max Anthenelli) and American University’s Seth Gershenson, to conduct parallel analyses of nearly a decade of student-level data from Maryland and North Carolina, both of which collect detailed information on students’ individual course grades.
Because of the limitations of that data, the North Carolina analysis does not examine elementary schools’ GPA-value-added (whereas the Maryland analysis includes both elementary and middle schools). But, overall, the results from the two states were encouraging and reasonably consistent, suggesting that the proposed indicator would likely exhibit similarly desirable properties in other jurisdictions.
So how does it work? (Warning: It’s complicated.)
To isolate a middle school’s contribution to students’ ninth grade GPAs, the proposed measure controls for a broad range of variables including but not limited to individual students’ eighth grade GPAs, eighth grade reading and math scores, and socio-demographic backgrounds, as well as the average eighth grade GPA of each individual middle school.[3] It then limits subsequent comparisons to students attending the same high school.
In practice, the inclusion of those school-average variables and the same–high school restriction effectively accounts for the unique approaches that schools take to grading, thus ensuring that middle schools that send their students to high schools with tougher grading standards aren’t unfairly handicapped. But, of course, even within a high school, not all courses are created equal. So, to avoid penalizing middle schools for putting more students on accelerated tracks, the proposed indicator only assesses a school’s effect on students who enroll in Algebra I in ninth grade, meaning it excludes the roughly 40 percent of students who enroll in more or less advanced courses.[4]
Intuitively, the resulting model relies on comparisons between “observably similar” students who attended different middle schools but the same high school (and were thus subject to the same grading standards). It also averages across lots of teachers and even more student-teacher relationships, thus ensuring that a particular school’s score won’t be unfairly high or low because Mr. Smith doesn’t like little Johnny’s attitude.
Principals and teachers who are familiar with test-based growth measures might best understand the new indicator as “GPA-based growth,” or the academic progress that a middle school’s students make as measured by their high school GPAs (or that an elementary school’s students make as measured by their middle school GPAs).
For parents and other guardians, an indicator of students’ “high school readiness” (or “middle school readiness”) likely makes more sense.
Now let’s look at how well the proposed indicator satisfies the five criteria outlined above.
Is it valid?
By definition, non-trivial effects on students’ subsequent GPAs are important. And in fact, the results suggest that both elementary and middle schools have sizable effects on the grades that students earn at their next schools.
Is it reliable?
Overall, the results suggest GPA-based growth is about as reliable as test-based growth. For example, the “intertemporal correlation coefficient,” which is a measure of year-to-year stability, is 0.86 for test-based value-added in North Carolina middle schools and 0.74 for GPA-based value-added.
Is it timely?
As noted, we propose that elementary schools be rated based on their effects on students’ sixth grade GPAs and that middle schools be rated based on students’ ninth grade GPAs,[5] meaning that information on middle and high school readiness would only be a year “out of date” by the time it reached parents and policymakers.
Is it fair?
As noted, both the elementary measure and the two middle school measures are uncorrelated with observable demographic characteristics. For example, there is no statistically significant relationship between a Maryland middle school’s GPA-based growth and the percentage of students who are economically disadvantaged.
Is it trustworthy?
Although it’s impossible to be sure, our sense is that it would be difficult to game this new measure. After all, the typical high school student gets letter grades in 6–8 subjects, each of which is taught by a different teacher. Moreover, even if every teacher in a high school lowered his or her grading standards, the effect on feeder middle schools’ GPA-based growth would be minimal because the measure would still be relying on comparisons between students from different feeder middle schools.
So, before piloting or otherwise experimenting with the measure, we recommend that states explain to educators the impossibility of artificially boosting schools’ GPA-based growth.
—
In case it’s not clear, we are not advocating for any version of the proposed measure to replace test-based growth. However, we do believe that it has potential as a supplement.
So let’s experiment with this new measure (and try to figure out how to communicate it effectively!). Because, at the end of the day, it sends a clear message to schools that one of their core missions is to help their graduates succeed in their next step—not just in reading and math, but in all subjects—and not just on tests, given the array of non-academic attributes that grades measure.
In short, it gives educators whose contributions are sometimes shortchanged by existing measures an officially sanctioned reason to do something that everyone should want them to do: teach to the best of their ability.
[1] Jackson, C. Kirabo. “What Do Test Scores Miss? The Importance of Teacher Effects on Non–Test Score Outcomes.” Journal of Political Economy 126, no. 5 (2018): 2072–107. Jackson, C. Kirabo, et al. “School Effects on Socioemotional Development, School-Based Arrests, and Educational Attainment.” American Economic Review: Insights 2, no. 4 (2020): 491–508.
[2] Bowen, William G., Matthew M. Chingos, and Michael McPherson. Crossing the Finish Line: Completing College at America’s Public Universities. Princeton, NJ: Princeton University Press, 2009. Roderick, Melissa, et al. From High School to the Future: A First Look at Chicago Public School Graduates’ College Enrollment, College Preparation, and Graduation from Four-Year Colleges. Chicago, IL: University of Chicago Consortium on Chicago School Research, 2006. https://consortium.uchicago.edu/sites/default/files/2018-10/Postsecondary.pdf. Bowers, A. J., R. Sprott, and S. A. Taff. "Do We Know Who Will Drop Out? A Review of the Predictors of Dropping Out of High School: Precision, Sensitivity, and Specificity." The High School Journal 96, no. 2 (2013): 77–100. Brookhart, Susan M., et al. "A Century of Grading Research: Meaning and Value in the Most Common Educational Measure." Review of Educational Research 86, no. 4 (2016): 803–48.
[3] Note that a school’s GPA is the average student’s GPA across all subjects including untested subjects. Results are similar when we consider different versions of GPA, such as using only core subjects or weighted GPA by number of credits.
[4] Because ESSA mandates that accountability systems include data for all students as well as specific subgroups, the measures that are the basis for this study may not be ESSA compliant; however, it is our belief that a suitably motivated state could construct analogous indicators for students who take other math courses in ninth grade, with the goal of constructing a more ESSA-compliant index.
[5] Note that ninth grade is often considered a “make or break” year for students, which is why some states and districts use the number of classes students fail in ninth grade as an early warning system.