When the Ohio Teacher Evaluation System (OTES) went into effect in 2011, it was the culmination of a process that began back in 2009 with House Bill 1. This bill was a key part of Ohio’s efforts to win the second round of Race to the Top funding, which, among other things, required states to explain how they would improve teacher effectiveness.
Beyond bringing home the bacon, Ohio’s evaluation system aimed to accomplish two goals: First, to identify low-performing teachers for accountability purposes, and second, to help teachers improve their practice. Unfortunately, as we hurtle toward the end of the fourth year of OTES implementation, it’s become painfully clear that the current system hasn’t achieved either goal.
To be fair, there have been some extenuating circumstances that have crippled the system. Thanks to its ever-changing assessments, Ohio has been in safe harbor since the 2014-15 school year, which means that the legislature prohibited test scores from being used to calculate teacher evaluation ratings. As a result, the full OTES framework hasn’t been used as intended since its first year of implementation in 2013-14. But even back then, OTES didn’t offer much evidence of differentiation—approximately 90 percent of Ohio teachers were rated either accomplished or skilled (the two highest ratings) during the first year, and only 1 percent were deemed ineffective.
Despite the fact that most teachers earn the same ratings, their experience with the system can vary wildly depending on the grade and subject taught. To understand why, it’s important to understand how the current system works: In Ohio, there are two teacher evaluation frameworks that districts choose between. The original framework assigns teachers a summative rating based on teacher performance (classroom observations) and student academic growth (student growth measures), with both components weighted equally at 50 percent. The alternative framework also assigns a summative rating based on teacher performance and student academic growth, but changes the weighting and adds an additional component: 50 percent on teacher performance, 35 percent on student growth, and 15 percent based on alternative components, such as student surveys. Under both frameworks, there are three ways to measure student growth: value added data (based on state tests and used for math and reading teachers in grades 4-8), approved vendor assessments (used for grade levels and subjects for which value added cannot be used), and local measures (reserved for subjects that are not measured by traditional assessments, such as art or music). Local measures include shared attribution, which evaluates non-core teachers based on test scores from the core subjects of reading and math, and Student Learning Objectives (SLOs), which are long-term academic growth targets set by teachers and measured by teacher-chosen formative and summative assessments.
Results from these frameworks have left many teachers feeling that the system—and the student growth component in particular—is unfair. They’re not wrong. As our colleague Aaron Churchill wrote back in 2015, Ohio teachers with student growth evaluated based on value added measures were less likely to earn a top rating than teachers using other methods. A 2015 report from the Ohio Educational Research Center (OERC) found that 31 percent of Ohio teachers used shared attribution to determine their student growth rating—meaning nearly a third of teachers’ ratings were dependent on another teacher’s performance rather than their own. SLOs, meanwhile, are extremely difficult to implement consistently and rigorously, they often fail to effectively differentiate teacher performance, and they’re a time-suck: A 2015 report on testing in Ohio found that SLOs contribute as much as 26 percent of total student test-taking time in a single year. In essence, OTES doesn’t just fail to differentiate teacher performance—it fails to evaluate teachers fairly period.
As far as professional development goes, the results probably haven’t been much better. A quick glance at the ODE template for a professional growth plan, which is used by all teachers except those who are rated ineffective or have below-average student growth, offers a clue as to why practice may not be improving: It’s a one-page, fill-in-the-blank sheet. Furthermore, the performance evaluation rubric by which teachers’ observation ratings are determined doesn’t clearly differentiate between performance levels, offer examples of what each level looks like in practice, or outline possible sources of evidence for each indicator. In fact, in terms of providing teachers with actionable feedback, Ohio’s rubric looks downright insufficient compared to other frameworks like Charlotte Danielson’s Framework for Teaching.
In short, OTES has been unfair and unsuccessful in fulfilling both of its intended purposes. Luckily, there’s a light at the end of the tunnel: ESSA has removed federal requirements for states related to teacher evaluations. This makes the time ripe for Ohio to improve its teacher evaluation system. We believe that the best way to do this is to transform OTES into a system with one specific purpose—to give quality feedback to teachers to help them improve their craft.
A series of new recommendations from Ohio’s Educator Standards Board (ESB) contains some promising proposals that could accomplish this, including a recommendation to end Ohio’s various frameworks and weighting percentages by embedding student growth measures directly into a revised observational rubric. Ohio teachers would then have their summative rating calculated based only on a revised observation rubric rather than a combination of classroom observations and student growth components. Specifically, ESB recommends that five of OTES’ ten rubric domains incorporate student growth and achievement as evidence of a teacher mastering that domain. These domains include knowledge of students, differentiation, assessment of student learning, assessment data, and professional responsibility.
Not only would teachers be required to “use available high-quality data illustrating student growth and achievement as evidence for specific indicators in the OTES rubric,” they would also be required to use these data “reflectively in instructional planning and in other applicable areas of the revised OTES rubric.” This will go a long way toward convincing teachers that assessment data can help improve their practice rather than just unfairly “punish” them. Most importantly, though, it reflects a solid understanding of how good teachers use assessments and data already.
The only problem with this idea is that ESB recommends including value added measures based on state tests as part of the new system. State tests weren’t designed for and weren’t intended to measure teacher effectiveness. So rather than carrying these assessments into a revised system, we propose that the role of state tests in teacher evaluations cease completely. Removing state tests from consideration and letting districts select formative and summative assessments with real classroom purposes is a far better way to fulfill the ESB’s call to “promote the use of meaningful data by teachers and districts that reflects local needs and contexts.”
As with many policy proposals, there are some implementation issues that could undermine the potential of this recommendation. The revision of the rubric—and how assessments are incorporated into it—will be hugely important. If the use of student achievement and growth becomes just one of many evidence boxes to check off rather than a deciding factor for both performance ratings and which professional development opportunities to explore, then the rubric won’t be honest and won’t lead to effective professional development.
To be clear, this suggestion isn’t an attempt to rollback teacher accountability. Rather, it’s an acknowledgement that Ohio’s current system—before and during safe harbor—doesn’t actually hold anyone accountable. Ohio doesn’t have a statewide law that permits the dismissal of teachers based solely on teacher evaluation ratings, so even for the small number of teachers who are identified as ineffective there aren’t meaningful consequences. Moreover, the testing framework built specifically for OTES has created its own bureaucracy and helped feed the anti-testing backlash.
Data show that well-designed evaluation systems based solely on rigorous observations can impact the quality of the teacher workforce. By transforming OTES into a system that focuses on teacher development, we don’t just get improvement for teachers and better learning experiences for kids; we could also end up effectively differentiating teachers without high stakes testing. What’s not to like about that?
 A separate ESB recommendation not explored in this piece advises that the OTES rubric be updated in collaboration with a national expert in rubric design and the assessment of teaching. This revision process is likely how student growth measures would be embedded into the rubric.
 The ESB notes that “ODE will establish high-quality criteria which all growth and achievement data must meet.”