Next-generation teacher evaluations: Are they living up to expectations?
Early results say no. Chad Aldis
Early results say no. Chad Aldis
Over the last five years, prodded by the feds, states have adopted teacher evaluation systems. According to a recent report from the National Council on Teacher Quality, forty-one states, including Ohio, now require evaluations that include objective measures of student achievement. These aren’t the meat-axe assessments of yesteryear, though. These next-generation teacher evaluations combine classroom observations using new prescriptive protocols with quantitative evidence of learning gains on state tests (or another form of assessment) to determine each teacher’s effectiveness.
The national focus on teacher evaluations raises a couple of questions. First, why have states chosen to focus on teacher evaluations (i.e. what’s the problem that policymakers are trying to solve)? Second, are the new evaluations proving effective in solving the problem?
Let’s start with the why. Recall all the evidence that the single most important in-school factor for student achievement is teacher quality. If we know that good teachers make a difference, it's not surprising that we've focused on evaluating them. Such evaluations hold the potential to identify great teachers whom we can reward, retain, and/or hold up as models, struggling or developing teachers whom we can help to improve, and ineffective teachers who should be removed from the classroom. In other words, evaluations are intended to boost the effectiveness of teachers whom our children learn from.
That’s really only part of the answer, though. Even before there was a law mandating it, principals have long conducted teacher evaluations. Yet those traditional evaluations, typically based solely upon classroom observations, had little effect on teacher quality. Teachers remained in place even if they were obviously struggling. And nearly every one of them got a satisfactory (or even “outstanding” rating). For instance, a California judge in the recently-decided Vergara case found that a significant number of “grossly incompetent” teachers were allowed to remain in the classroom “because school officials don’t want to go through the time and expense to investigate and prosecute” these cases. (The court estimated that, under the state’s teacher laws, it could take between two and ten years and anywhere from $50,000 to $450,000 to investigate and potentially dismiss an ineffective teacher.)
Now for the second question: Are the new teacher evaluation systems effective in accurately identifying which teachers are making the biggest difference for students and which are struggling?
Early results from states with next-generation teacher evaluation systems suggest that the answer is a resounding no. In Florida, more than 97 percent of teachers are still rated as effective or better. In Delaware, it’s 99 percent. New evaluations are producing more of the same results.[1]
It’s important to note that there isn’t a “right” number of teachers that should be deemed ineffective. But we also need to be honest that teaching—especially teaching well—is incredibly difficult to do. So while it would be great if 99 percent of teachers were effective, it’s hard to believe that in teaching (or any profession) we could actually reach that level.
While there is some anecdotal evidence from teachers and school leaders that regular observations required under the new teacher evaluation systems have value and can improve practice, these relatively modest gains (if you believe them) have come with significant costs.
First, the controversy surrounding teacher evaluations has resulted in a moving target for educators. Here in Ohio, for instance, the GOP-controlled legislature spent the better part of six months debating changes to the nascent Ohio Teacher Evaluation System (OTES). OTES, still in its first year and without any meaningful data, was already being changed in substantive ways, including the frequency of evaluations, the percentage of the evaluation based upon student value-added, and the use of student surveys in evaluations. At the end of the day, a compromise was reached, but nothing about the debate suggests that the underlying issues have been resolved. It’s worth pondering how effective an evaluation system can be if its key components continue to be the focus of debate.
Second, whether fair or not, teacher evaluation systems are being portrayed publicly as being anti-teacher. This is harmful in at least two ways: it has a negative effect on teacher morale, and the ensuing debate becomes polarized and focused on the wrong issue. If we can’t even discuss the real problem, the chance of finding a solution is barely a dream.
Third, perhaps the least anticipated and potentially greatest cost relates to student testing. As Russ Whitehurst and Matt Chingos have noted, state assessments—usually limited to reading and math in grades 3–8—don’t give us enough information to analyze most teachers based upon student growth on objective assessments. To gauge a teacher’s impact on student learning in Ohio, we’ve devised an intricate web of additional assessments and student learning objectives that supplement the state assessments. The result has been a dramatic increase in time students spend testing and teachers spend administering tests. The overreliance on tests for every teacher’s evaluation has put the entire test-based accountability system on trial and jeopardizes the progress that’s been made over the past twenty years.
***
The move to enshrine teacher evaluations in state law, however well intended, has created a bureaucratized, divisive solution. It’s also a workaround, designed to escape the harmful effects of the laws and policies that the Vergara court found protected adults but damaged the educational prospects of children. As with most workarounds, teacher evaluations are an inefficient way of achieving an admirable and needed goal.
If statewide, uniform teacher evaluations fail to effectively identify struggling teachers, change regularly, cause debate over the value of teachers, and contribute to needless over-testing of students, then we should rethink how we can best achieve the goal of improving the quality of teachers in every classroom. Stay tuned for a future Gadfly where we’ll explore what that might look like.
[1] Ohio has not yet released the first-year results of its teacher evaluation system, so we should reserve judgment as to how effective our system is until that time.
In January, the Civil Rights Division of the Department of Justice (DOJ) and the Office of Civil Rights in the Department of Education (ED) issued a joint “Dear Colleague” letter to K–12 schools. The letter calls into question whether minority children are punished more harshly than white children for the same infractions. The letter notes that schools could be guilty of discrimination in one of two ways: If a student is treated differently because of his or her race, or if a neutral policy has a “disparate impact.”
While the first method of determining discrimination is clear and fair, the second method is far more open to interpretation. The letter explains that “examples of policies that can raise disparate impact concerns include policies that impose mandatory suspension, expulsion, or citation upon any student who commits a specified offense.” What the departments are suggesting here is that zero-tolerance policies, which impose a specific penalty for a specific offense, could have a disparate impact on minority students and may be discriminatory.
The disparate impact analysis forces the DOJ and ED into the murky water of differentiating between strict enforcement of zero-tolerance policies that are necessary to meeting educational goals and selective enforcement of policies that aren’t. Take, for example, what’s happening in Akron Public Schools (APS). The Akron Beacon Journal recently discovered that students in APS who commit egregious acts (like assaulting a teacher or bringing a weapon to school) have historically been immediately transferred to a different school—a de facto expulsion from their home school. While few of these transfers occur (they make up 1.4 percent of the entire student body), three-quarters of these transfers involved black students—even though black students make up less than half of the district’s students.
On the surface, this is exactly the kind of “disparate impact” that the ED and the DOJ are talking about—a disproportionate number of minority students are being penalized under a particular zero-tolerance policy. Disparate impact or no, schools cannot tolerate the assault of a teacher. It’s unacceptable, creates an unsafe school for students and staff, and clearly merits a zero-tolerance reaction. But only half of the APS transfers were related to teacher assaults. What about the other behavior-related transfers? Some were undoubtedly related to bringing weapons to school and other serious transgressions, but what about the rest?
I once taught at one of the lowest-performing (and, by reputation, most dangerous) high schools in Memphis. Zero tolerance was used as a blanket to cover any and every behavior infraction. I saw dozens of suspensions and expulsions that weren’t the result of justifiable zero-tolerance policies (like in the case of assaults, weapons, and drugs) but from school rules that were subjectively enforced and overly harsh. Uniform violations were huge. For example, why does a student with his shirt tail untucked as he leaves the gym locker room receive a five-day suspension while another who wears a sweatshirt that’s clearly against uniform policy (not to mention vulgar) only get a warning? Why does a male student who refuses to remove a single stud earring receive a suspension while another male student with two studs in his ear gets a free pass? These instances aren’t the result of a zero-tolerance policy needed to reach an academic goal; they are selective enforcement and overly harsh consequences for minor infractions. This is where disparate impact lives: not in data that the departments can study, but in how administrators make everyday judgments on discipline.
Our president, Mike Petrilli, argued earlier this month that if minority students actually misbehave at higher rates than white students, then the disproportionate numbers are merely the result of fairly applied policies. If that were true, I would agree. But research and my teaching experience say it isn’t. The problem isn’t that the departments are wrong about disparate impact. The real problem is that school discipline as a whole is a mess because so many schools are so quick to resort to exclusionary practices (suspensions and expulsions) when they don’t need to. By focusing on remedying only one consequence (albeit an awful one) of our disastrous reliance on exclusionary discipline, the departments are trying to put a Band-Aid over the hole in the ship. Unless we effectively plug it and actively work on preventing future holes, we’re going to keep sinking and we’re going to keep losing kids.
Exclusionary practices are harmful. Students who are suspended or expelled for a discretionary violation (a violation of a school rule, such as a uniform policy) are nearly three times as likely to be in contact with the juvenile justice system the following year. Students who are suspended or expelled are also more likely to be held back a grade or drop out (perhaps because they miss so much instructional time). Studies show that at-risk students do not change their behavior as a result of suspension. Furthermore, data shows that schools with higher rates of exclusionary practices tend to spend a disproportionate amount of time on disciplinary matters.
What is needed resides outside of the hands of the federal government. We need to treat school discipline as an issue with the same serious implications as low achievement. (The New York Times has done that here and here.) We also needs schools (like Akron Public) to closely examine their discipline practices and take the initiative, if necessary, to fix them. Schools across the nation, from Fairfax County to New York City to California are headed in the right direction. But we can do better. If schools learned how to use discipline data in an effective way, they could identify struggling students and intervene (similar to what has been proven to work in ninth-grade academies). Programs that focus on restorative justice (which schools from Oakland to Chicago are utilizing and seeing results from) are a great place to start, though they have to be implemented carefully and consistently in order to be effective. Teachers should have better access to strategies that work and teacher prep programs should be devoting far more time and energy toward training teachers on behavior management and interventions. Partnering with community organizations that work closely with students, their families, and the school gives students and teachers extra support when behavior issues arise, but it also leads to the kind of relationships that can prevent or eliminate misbehavior in the first place. It’s time to take a hard look at overly harsh exclusionary practices. Educators and students deserve better.
The facility arrangements of one Ohio charter school recently came under fire in a Columbus Dispatch exposé. An investigation discovered that roughly half of the school’s budget was dedicated to rental payments, potentially shortchanging teaching and learning. But this episode isn’t an isolated case; many Buckeye charters have struggled to secure adequate facilities. How can Ohio policymakers and school leaders better ensure that charters have the facilities they need at a reasonable cost? First, they should consult this new report from the Local Initiatives Support Corporation (LISC), which contains a wealth of information on charter-school facilities funding from both private and public sources. The report includes descriptions of the key nonprofits in charter-facilities financing, including the Charter School Growth Fund, Capital Impact Partners, Low Income Investment Fund, and LISC. These nonprofits—twenty in all—have provided an impressive $2 billion in direct financing for charter facilities (e.g., loans and grants). When it comes to state support for charter facilities, Ohio has been woefully stingy. The state provided, for the first time in 2013, per-pupil funding to support the facility costs of brick-and-mortar charters (up to $100 per-pupil). But other jurisdictions are far less tightfisted. For example, Washington, D.C., Arizona, and Minnesota provide more than $1,000 per-pupil for facilities; four other states provide between $250 and $1,000 per pupil. To make matters worse, Ohio has not appropriated any funds to support its charter school loan program and provides no charter-facilities grants. Again, other jurisdictions do much better: Eleven provide capital-grant funding and ten provide loans. New York, for example, has provided roughly $3 million per year through a competitive grant program to support facilities, while Massachusetts has provided more than $26 million in direct loans to charters. The report calls on states to increase their charter-facilities initiatives to meet the need for high-quality seats. Ohio policymakers should take heed.
Source: Local Initiatives Support Corporation, 2014 Charter School Facility Finance Landscape (New York: Reena Abraham, et. al., September 2014).
The information yielded by standardized tests—and the analyses based on test results, like value-added—should form the basis for tough decisions regarding which schools (charter and district) or entire school systems require intervention. Parents need information about school quality, and taxpayers ought to know whether their resources are being put to good use. But at the same time, parents and policymakers alike have valid concerns about “overtesting” students, and how high-stakes tests change how schools behave.
Over the past decade, Ohio has tested social studies and science unevenly, and will continue to do so under the new assessment program set to begin in spring 2015. Under the old system, the state administered science tests in just grades 5 and 8, while math and English language arts (ELA) were assessed in all grades 3–8. Social studies was tested for just three years (2006–07 to 2008–09) in grades 5 and 8, but it was “suspended” effective fall 2009. The new state testing program continues science assessments in grades 5 and 8 and resurrects social studies testing in grades 4 and 6.
Should Ohio test in science and social studies, in addition to ELA and math assessments? And if so, how often? With that in mind, let’s look at the case to test and not to test in social studies and science—and then consider some policy options.
The case against testing in social studies and science
The case against social studies and science rests on this premise: The incremental costs of social studies and science testing could outweigh the additional school-quality information gained from those tests.
Let’s look first at the information gained when social studies and science are tested on top of math and ELA. Consider the chart below, which displays the close correlation between fifth-grade science and reading results. (Each blue dot represents an individual school’s results.) Remarkably, testing students in science as well as reading provides nearly the same information on student achievement. In other words, if students in a school scores well in one subject-area exam, they’ll very likely perform well on the other area—and vice-versa. The correlations in the other grades are similar to fifth-grade science and reading; for those results, see this document. An argument thus could be made that our view of school quality, as gauged by student proficiency, changes little when social studies and science results are added on top of math and ELA.[1]
Chart: 5th Grade Science versus Reading Proficiency Rates – Ohio Schools, 2013-14
[[{"fid":"113749","view_mode":"default","fields":{"format":"default"},"type":"media","attributes":{"height":"356","width":"452","class":"media-element file-default"},"link_text":null}]]
Testing in additional content areas also creates avoidable costs. For one, there is the actual price of the standardized assessment and its administration, a non-trivial expense. In fiscal year 2014, the state spent roughly $68 million on student assessments; in fiscal year 2015, the state is slated to spend roughly $88 million.[2] These budget figures include math and ELA expenses—PARCC costs roughly $25 per pupil—along with the expenses associated with science and social studies tests. (Some perspective is still needed: The overall funding for K–12 education is $20 billion per year in Ohio—state, local, federal dollars.) Meanwhile, in addition to monetary costs, concerns could be raised on whether additional time on testing “crowds out” other beneficial educational activities.
The case for testing in social studies and science
But wait. Before we leave social studies and science testing in the dustbin, a compelling reason favors testing in social studies and science—and even increasing their frequency. Testing in these content areas could stem the “narrowing the curriculum” tide, the consequence of states designing their assessment and accountability systems around math and reading. (For example, as noted earlier, Ohio scrapped social studies testing in grades 5 and 8 starting in 2009.)
If schools have hollowed out civics, geography, science, etc., due to an overemphasis on math and ELA testing, students may be losing significant opportunities to learn. And in fact, they might be losing the chance to learn to read with understanding, since that hinges in large part on the knowledge acquired in science, social studies, and other content areas. Ramping up science and social studies testing—and holding schools accountable for those results—could spur schools to increase their emphasis on these content areas, which in turn would improve reading achievement. Meanwhile, scrapping social studies and science altogether, or continuing non-consecutive-year testing, may send the (wrong) message that those subjects are of less educational significance.
Assuming, therefore, that additional state testing in social studies and science could incentivize behavior that benefits students, ratcheting up testing in these areas could outweigh the costs of those tests.
Policy options
Ohio has not emphasized testing and accountability for social studies and science over the past decade. And though it will increase testing under the new system set to begin in spring 2015, these subject areas are likely to play second fiddle to math and ELA testing. This leaves policymakers in something of a pickle—and here, as I see them, are four possible options:
1.) Keep the status quo. This would ensure that social studies and science are tested, but in non-consecutive years (e.g., science in grades 5 and 8). Yet the status quo still does not compel schools to treat these subjects as equal partners with ELA and math.
2.) Eliminate testing in social studies and science. This approach would reduce the cost of testing in these areas, which gives us little new information about student achievement for school-quality purposes. However, this option would likely encourage even more focus on ELA and math and would require a waiver from federal statute which presently requires science testing at least once in elementary, middle, and high school.
3.) Increase testing in social studies and science to the same frequency as math and ELA (i.e., test these subjects annually in grades 3-8). This would balance schools’ incentives to treat each subject equally, but at the cost of more time and money. From an information perspective, although little additional information is yielded in terms of student proficiency, annual testing could help analysts construct growth (i.e., “value-added”) measures for these subjects.
4.) Decrease testing in math and ELA to non-consecutive grades to match the frequency of social studies and science (e.g., test math and ELA in grades 4 and 6, not consecutively in grades 3-8). This would also balance schools’ incentives to treat subjects equally, but at the cost of less information and accountability. It would also require federal action to grant Ohio relief from consecutive-year-testing mandates in math and ELA in grades 3–8, or more likely, a rewritten federal law that governs state accountability (No Child Left Behind).
My assumption is that schools respond to policies, including state assessment policies. But the incentives have encouraged schools to focus squarely on math and ELA to the neglect of social studies and science. My view is that Ohio policymakers should increase testing of social studies and science, creating the incentive to teach those subjects with equal rigor and urgency as math and ELA. But that also appears to be a perilous option given the politics of the day, with additional time and money costs associated as well. Admittedly, none of the policy choices are perfect, yet Ohio policymakers should be aware of viable options for science and social-studies testing—and what behaviors they incentivize through the state-testing program.
[1] It’s worth noting that schools in Ohio have not received social studies and science value-added scores.
[2] The budget figures include state and federal funds allocated to student assessment. See Ohio Legislative Services Commission, “FY 2014-15, Budget in Detail – As Enacted,” pp. 30, 33.
The Carnegie Science Center recently published a multi-faceted look at STEM education in a seventeen-county area encompassing parts of Pennsylvania, West Virginia, and Ohio. The impetus of the study was a perceived "STEM gap"—employers in the region report having difficulty finding individuals with the requisite technical skills to fill vacant positions. Campos Research Strategy conducted in-depth interviews with educators and business leaders, surveyed nearly 1000 parents of school-age children in the region, held “family dialogues,” and conducted an online survey of one hundred middle and high school students. Efforts were made to balance participants among the counties and between rural and urban areas. Despite high hopes for STEM education among business, industry, and education leaders, the study found that parents’ and students’ awareness and understanding of what STEM is and how it might benefit them or their children is low. Awareness of STEM seems highest in urban areas in the region, but parents’ interest in STEM-related fields for their children is lowest in those same places. A majority of parents participating in the study indicated that their underlying attitudes toward education and careers aligned with many STEM fundamentals, but the typical language of STEM education and careers did not resonate with them. Anecdotes given by educators indicate that adults who had never participated in “engaging, hands-on activities” during their K–12 schooling were mistrustful of such education methods—seen as key components of the type of STEM education most needed in the area—and were a barrier to participation in them for their own children. Science and technology education in middle and high school was far more likely to be seen by parents as a precursor to college rather than as a career path unto itself. Although not covered in the study, it’s hard not to think about the increased emphasis and additional education resources being spent on STEM education. Is that money going to be wasted, or will quality programs drive public opinion and build support? If educators don’t do a better job of convincing parents that STEM opens both economic and educational opportunities for their kids, the STEM investment seems likely to be squandered, and the need for technical skills will remain.
SOURCE: Campos Research Strategy, “Work to Do: The Role of STEM Education in Improving the Tri-State Region’s Workforce,” Carnegie Science Center (October, 2014).