NOTE: The publication of a recent Flypaper post arguing that growth measures (like “value added” or “student growth percentiles”) are a fairer way to evaluate schools than are proficiency measures drew quick reaction both inside and outside of Fordham. Here we present a "letter to the editor" in response to the initial blog post, lightly edited.
To the editors:
I find your argument that state accountability systems should increase the weight of growth indicators, as against proficiency indicators, perplexing. Here is a summary as to why.
The most basic difficulty with the growth models you recommend is this: they attempt to estimate a school’s average contribution to students’ achievement based on past achievement within a given state and a comparison group in that state. Such a growth measure is norm-based rather than criteria-based, i.e., relative to other students in other schools as opposed to an external standard. Assigning such a heavy weight to relative growth may end up removing a school from funding and other support even if its students perform far more poorly than students in schools that would be identified for intervention.
To focus on the details: The first problem in your recommendation is its lack of specificity. You suggest that states use growth models in their accountability systems “three, five, maybe even nine times more…than proficiency.” Given this vast range, modeling of what these different weightings might mean in identifying schools would have been helpful, even essential, because the differences will have important implications as to which schools are identified under your proposed framework. For example, take a school serving high-needs children that has achieved very high proficiency rates but shows low growth. Is this truly a low performing school? Should this school receive interventions and scarce Title I resources from the federal government?
Second, given the large weight you want to give to growth models, it would have been important to address serious technical issues. Top of the list: school-level, value-added scores vary greatly year-to-year—much more so than do academic performance levels. You owe your readers data on this issue. (For the stability of teacher estimates, see for example: McCaffrey et al., 2009 and Goldhaber & Hansen, 2008.) How would you address this? Should states use multiple-year averages? Without such details, states simply cannot follow your recommendation.
The third problem with this approach is the opportunity cost. We don’t need to think about growth measures as you propose but could, instead, use this opportunity to set clear, criterion-based standards for student achievement through the use of “Growth to Standard.” Such an indicator would be informative as to how much progress (or not) students have made towards an external standard set by the state, such as college and career readiness. States could include this element in their accountability systems, comparing the Growth to Standard for each demographic subgroup of students, and for ELL, Special Needs, or economically disadvantaged students; indeed, ESSA requires precisely that data for any state-chosen indicator. Although some would argue that such a Growth to Standard measure might be calibrated to a single, perhaps quite low, standard such as “proficiency,” the state model can, and should, be built to benchmark growth to an externally validated standard for college readiness and an advanced standard beyond that to capture the performance of truly well-educated/talented students.
Even here I recommend that a state accountability system use a modest weighting of the growth measure versus academic performance. However, a Growth to Standard framework at least tells parents something concrete about their child. Parental interest is surely focused on understanding where his or her own child really stands in terms of readiness for further education and lifetime opportunities, rather than on a comparison with other, perhaps equally unprepared, students.
The Minnesota Department of Education, in explaining growth scores, nails it: “The growth score codes of Low, Medium, and High do not represent whether a student has learned less than, about a year, or more than a year’s worth of material. There is no clear relationship between growth z-scores and information learned” (emphasis in the original).
David Steiner is executive director of the Johns Hopkins Institute for Education Policy and a professor in the School of Education at Johns Hopkins.
Goldhaber, D. D., & Hansen, M. (2008). Is it Just a Bad Class? Assessing the Stability of Measured Teacher Performance. Center on Reinventing Public Education.
McCaffrey, D. F., Sass, T. R., Lockwood, J. R., & Mihaly, K. (2009). The intertemporal variability of teacher effect estimates. Education, 4 (4), 572-606.
Minnesota Department of Education, Growth Data Help Document.