A new study by WestEd researchers looks at the validity of ratings from the Charlotte Danielson Framework for Teaching, a very popular classroom observation instrument often used in teacher evaluation systems.
The study is small in scope, examining the framework’s use in just one district (Nevada’s Washoe County, which we profiled a few years ago for its work in implementing the Common Core.) Its purpose was to determine whether the ratings differentiate among teachers, measure distinct areas of teaching practice, and link to teacher effectiveness.
The data cover 713 Washoe elementary, middle, and high school teachers (both tenured and non-tenured) who were observed on all twenty-two components of the Danielson instrument in the 2012–13 school year. The instrument covers four domains: planning and preparation, classroom environment, instruction, and professional responsibilities. Each domain has five or six components that roll up into a single four-point rating for the domain (from ineffective to highly effective).
Key findings: Ratings showed at least 90 percent of teachers were rated effective or highly effective on nearly every one of the twenty-two components, with “effective” the most common rating. So principals tend to use the ratings to discriminate between effective and highly effective teachers but rarely use the minimally effective or ineffective ratings.
Researchers also found that, within each domain, principals were consistent in their scoring: Teachers who received a high rating for one component tended to receive a high rating for others in that domain as well. Data also showed (via “confirmatory factor analysis”) that the domains are not measuring different aspects of teaching but, rather, a single dimension; thus, they recommend that the district use a single rating.
Finally, for a subset of teachers who teach grades 4–8, correlations showed a statistically significant positive relationship between observation scores and student growth (for students who had attended Nevada schools for at least one previous year). Not many details are provided on these latter analyses—nor are these methods rigorous or longitudinal—but they did reveal a relationship.
The district intends to use the ratings in its state teacher evaluation system to identify areas of needed professional development, as well as to determine performance bonuses and tenure/retention decisions—which is a lot. Determining whether it is valid for these multiple purposes is a good idea, since results here suggest that the lengthy instrument could be made a lot simpler for rating purposes. But because principals don’t distinguish among components, the instrument is not a great tool for identifying professional development needs. The outcome analysis is thin, so identifying teachers for retention, termination, or pay raises does not inspire much confidence, either—at least not based on this one study. Bottom line: Users beware.
SOURCE: Andrea Lash, Loan Tran, and Min Huang, "Examining the validity of ratings from a classroom observation instrument for use in a district’s teacher evaluation system," WestEd (May 2016).