Getting observations wrong: the Phil Jackson Fallacy

David Griffith

3.23.2015

Was Phil Jackson really a great coach? Despite his reputation as the Zen master of hoops, I’ve never been convinced. After all, Kobe, Shaq, and His Airness would have made any coach look like a genius, and there’s never been a natural experiment quantifying Jackson’s impact.

Inside the classroom, a similar question lingers. In a recent study of district evaluation systems, Grover Whitehurst, Matthew Chingos, and Katharine Lindquist found that teachers with high-performing students were far more likely to be rated highly by observers than those with low-performing students. Moreover, this pattern was not the result of better teachers being matched with better students. Rather, observers were biased towards teachers with higher-performing students—the Phil Jacksons of the teaching world.

As the authors of the study make clear, eliminating this bias by adjusting for student background characteristics is relatively straightforward. So why aren’t we doing this already? A few weeks ago, Luke Kohlmoos of the Tennessee Department of Education argued against such adjustments, suggesting they were a “disservice to students and teachers” that would take us back to the bad old days of lower expectations for black and brown students. According to Kohlmoos, if we “systematize” lower expectations through classroom observations, teachers and students will stoop to meet them.

Obviously, we don’t want “lower expectations” for teachers or students, but when it comes to adjusting observation scores, it’s worth asking how those expectations are communicated and whether they are really “lower” in any meaningful sense.

Start with teachers: Why would the expectation that they use effective instructional techniques and control their classrooms to the best of their ability change if observation scores were adjusted? After all, there would still be enough variation among teachers with similar students to distinguish between high and low performers, so there would still be an incentive for teachers to compete for high scores. In fact, for teachers with low-performing students, the incentive to teach well might increase if they had a realistic shot of being rated “highly effective.”

Now students: Does the average underserved tween track changes in teacher evaluation formulae? If not, then presumably any lower expectations would be mediated by teachers. But if teachers still have an incentive to perform well, why should they expect less of their students simply because their observation scores are adjusted? Or is the fact that some students are more challenging than others an official secret that has been kept from unsuspecting teachers? Somehow, I doubt this is the case.

Kohlmoos puts great faith in the ability of training and “norming” to improve the performance of observers and eliminate the bias toward teachers with stronger students; yet the evidence to date suggests that this faith is misplaced. Repeatedly and reliably disentangling the individual teacher from his classroom assignment is impossible, especially for observers who are unfamiliar with the students in question.

If we don't adjust observation scores, we are asking observers to distinguish between a teacher's own instructional skill and pre-existing student characteristics that may adversely impact the quality of classroom interactions. In other words, we are asking them to provide us with the observational equivalent of test-based value added by answering this question: How much is this teacher's instructional skill contributing to the quality of the interactions I’m observing?

Given how little time observers spend in a given classroom, this is not a reasonable expectation. Moreover, it ignores the critical point that what constitutes effective teaching depends greatly on context. In particular, the degree to which a teacher’s effectiveness depends on subject matter expertise versus skilled classroom management depends on the type of students in his or her classroom.

To be clear, I don’t mean to suggest that teachers of low-performing students shouldn’t be experts in their subject matter, or that observers should use different rubrics for high- and low-performing classrooms. But I do think it’s impossible to design an observational rubric that doesn’t implicitly favor some teachers over others, and the only way to solve this problem is by adjusting observers’ scores. Failing to do so is unfair to teachers. Moreover, it creates a dangerous incentive for good teachers to leave tough classrooms, where teacher turnover is already damagingly high.

From a purely motivational standpoint, it’s important that we infuse teachers with the belief that they control what happens in their classroom; but policymakers shouldn’t fool themselves into thinking this is actually true. At the end of the day, every teacher knows some classrooms are more difficult to teach in than others, and we do a disservice to teachers by pretending otherwise.

Too often, people in good situations get disproportionate credit for their success, while talented people in bad situations labor in obscurity. Call it the Phil Jackson Fallacy! And don’t let it permanently distort our view of teachers.

Topics:

Curriculum & Instruction

Tags:

Grover Whitehurst