Establishing a baseline: Ohio’s education system as it enters a new era
Like other states, Ohio has over the past few years put into place a standards a
Like other states, Ohio has over the past few years put into place a standards a
____________
If you have questions about the book, please email Aaron Churchill.
NOTE: This is the Foreword from Fordham’s latest report, released today.
Over the past few years, states across the nation have undertaken big changes in public education—a system reboot, if you will. Policymakers have raised academic standards, toughened up exams, and demanded stronger results from schools. Like other states, Ohio has also put into place a standards and accountability framework with the clear goal of readying every student for college or career when she graduates high school.
It’s no secret that a flood of controversy has accompanied these changes. The Common Core, a set of college-and-career ready standards in math and English language arts, has been the subject of great debate. Yet the Common Core remains in place in Ohio and at least forty other states. States have also adopted next-generation assessments aligned to these standards, though the rollout of the new exams has been rocky. As a result of these transitions, Ohio policymakers have temporarily softened accountability and slowed the implementation of new school report cards.
Given the difficulty of these changes, one may ask why we conducted an overhaul in the first place. Why must states, including Ohio, see through the full and faithful implementation of educational change?
Some of the answer rests in the pages of this report. The statistics presented here bear out the stark reality that too many Ohio students have not been fully prepared for their next step after high school—whether college or career. Consider the following facts about Ohio students:
The achievement statistics for historically disadvantaged students are even bleaker. For example, while 43 percent of white students are proficient on national eighth-grade reading exams, one can say the same of just 16 percent of African-American pupils. State exams reveal similar achievement gaps between different groups of students, whether by family income, race or ethnicity, or disability.
The data in this report mark a starting point by which Ohio leaders can track our state’s progress going forward. Are more students hitting rigorous academic benchmarks as they proceed from Kindergarten to graduation? Are more students truly prepared for college or career when they leave high school? Only time will tell, of course, but the standards and accountability framework that has been implemented gives us confidence that strong educational gains will be made in the years to come.
It’s that time of year: Parents are perusing the back-to-school section with their perhaps not-so-eager-to-return-to-school children. Teachers, meanwhile, are gearing up for—or are already attending—in-service and professional development sessions that aim to prepare them for the year ahead. While studying class lists, decorating classrooms, and prepping lesson plans for a new year is exciting for teachers (trust me, walking into the teacher store before a new school year is just like coming downstairs on Christmas morning), the black cloud of professional development (PD) looms. And then it remains.
In a new report entitled The Mirage, TNTP (the nonprofit that brought us The Widget Effect) took a deep dive into teacher PD in three large traditional districts and one midsize charter network. The findings were not pleasant. In the traditional districts, an average of approximately $18,000 was spent on development per teacher, per year—totaling anywhere from 5 to 11 percent of the districts’ annual operating budgets. Overall, district teachers spent about 10 percent of their typical school year in PD. Despite all that time, however, ratings showed only three out of every ten teachers substantially improved their performance, based on the districts’ own evaluations. While beginning teachers improved the fastest—between 2.5 and 5 times faster than all other teachers—the average teacher grew less after her fifth year and practically not at all during and after her tenth (which begs the question of whether beginning teachers are growing thanks to PD or because of on-the-job experience). Furthermore, TNTP could find no type, amount, or combination of development activities that distinguished teachers who improve from those who don’t. The results out of the charter network were better (about seven out of ten teachers substantially improved), but the charter network also spent nearly twice as much as the traditional districts on PD—a whopping $33,000 per teacher.
Reaction to TNTP’s report has been swift. A headline from the Washington Post reads, “Study: Billions of dollars in annual teacher training is largely a waste.” Matt Barnum at the Seventy Four emphasized that teachers deserve training that helps them get better rather than wastes their time and the public’s money. While the study may be news, the conclusions—for those who have been paying attention—aren’t. A December 2014 study by the Boston Consulting Group examined teachers' views on professional development and found that only about 29 percent of teachers describe themselves as “highly satisfied.” The media has long lamented "useless" professional development. Reports, articles, and directories highlight the problems with PD and offer a multitude of suggestions. A March 2015 brief even called for Congress to redefine professional development and reengineer Title IIA money (approximately $2.5 billion a year that’s largely spent on PD and class size reduction) in order to focus on continuous performance improvement. According to the report’s author, teachers will perform better only when they “acquire the right knowledge and the right skills and have a chance to practice these new learnings, study the effects, and adjust accordingly.”
And there’s the rub. As a former teacher, I’m a firm believer that the lack of opportunity for teachers to practice what they’ve learned, reflect, and adjust with the help of coaches (who can be expert peers or school-designated coaches) is a significant reason why we keep failing the millions of teachers who want to get better. The Mirage recommends that teachers be given a clear, deep understanding of their own performance and progress. I agree. But that’s the purpose of evaluation, not professional development. PD should go a step farther. It’s more than teachers working with coaches to review their current performance level—its teachers and coaches identifying strategies to raise that level, and putting those strategies into practice in the classroom. Coaches should be observing classrooms regularly, reflecting with each teacher, and suggesting adjustments or new strategies that are tailored to each teacher’s needs.
This idea comes to life in The Mirage’s examination of the charter network’s PD. Not only did seven out of ten teachers show substantial growth; that growth existed at all experience levels, newbies and veterans alike. Perhaps unsurprisingly, the students at the charter network were getting better results too. Those results existed for a variety of reasons, I’m sure, but I believe a big part of it was the regular practice and feedback cycle. TNTP found that every teacher in the charter network received a weekly observation from a coach, followed by a debrief lasting from thirty to forty-five minutes. These teachers also spent between two and three hours every week with other teachers reflecting on practices and outcomes, practicing new skills, and preparing for future adjustments. Now that’s development.[1]
In light of TNTP’s report, teacher coaching should, but likely won’t, catch fire. Yes, it is expensive (in TNTP’s report, the charter network spent almost twice as much as the districts, though most of this difference in spending is due to different allocations of time for staff and teachers rather than additional staff), but so is throwing money down a PD black hole that never shows results. And so is the price that kids pay when their teachers aren’t given the chance to get better. Anecdotally, the only PD I ever found worthwhile was that which offered me the opportunity to practice, reflect with either a coach or peer, and then adjust for the future. My days were incredibly busy, and if my coach hadn’t followed up with me on my goals and the strategies I planned to implement, I never would have implemented them—not because I didn’t want to, but because they would’ve taken a back seat to the dozens of other demands on my time. In addition, if my coach hadn’t observed my classroom regularly and pointed out habits that I didn’t realize I had, my growth would have been far slower. (After all, who among us is perfectly aware of their every strength and weakness?) This is particularly true because a coach doesn’t just point out strengths and weaknesses (which is all an evaluation does). A coach brainstorms solutions and strategies, then observes the classroom again to watch these solutions in action. If they work, move on to the next growth area. If they don’t work, try a different strategy.
There’s more than anecdotal evidence. Although TNTP found no link between certain types of development and teacher improvement, various other studies show that coaching improves student achievement in addition to improving teacher development and performance. A recent article from U.S. News & World Report explains that D.C. schools have started using teacher coaches that—surprise, surprise—aid in lesson planning, observe lessons in action, and help the teacher analyze what worked and didn’t work and how to improve. The masterminds behind the D.C. coaching program are clear about certain requirements (the coach should be an expert who provides ongoing, tailored support, for example) but other than that, the specifics can and should look different at every school.
Making PD better for teachers isn’t about giving them cutting-edge strategies or linking improvement to tangible rewards and consequences. Those are both well and good, but they miss the main point, which is that teachers learn just like their students. For PD to work, teachers must practice under the watchful eye of a coach who corrects and encourages, work with peers who point out additional strengths and weaknesses, execute what was learned, and then reflect and adjust. Professional development means developing teachers by coaching them through each step of the learning process until they’ve mastered the skill or concept. Don’t stop short, cross your fingers, and hope they figure it out on their own, which is exactly what happens when teachers are forced to “sit and get” in PD sessions and then walk away with no follow up. We wouldn’t do that to our students. We shouldn’t do it to our teachers either.
[1] To be fair, TNTP did ask district teachers if they felt that coaching had improved their practice. Not many agreed. But TNTP also points out that meaningful PD “depends as much on the conditions in which development takes place as on the nature of the development itself.” In other words, the practice-feedback cycle of the charter network—the formal coaching from coaches and informal coaching from peers—might not be the same kind of coaching that the districts offer. As someone who experienced coaching in a district and in a charter network, I can attest to that.
There are two basic arguments for charter schools’ existence, note Michael McShane and Jenn Hatfield: First, by taking advantage of flexibility not afforded traditional public schools, they can raise student achievement. Second, they can use that freedom and deregulation to create a more diverse set of schools than might otherwise come into being. There is an increasingly robust body of evidence on charter schools’ academic performance. Far less is known about the second aspect. So how diverse is the nation’s charter sector?
The short answer is: more diverse than you might expect, but less than we might hope. McShane and Hatfield ran the numbers on 1,151 schools, which combine to educate nearly half a million students in seventeen different cities. Based on each school’s description of its own mission or model, they were divided into “general” or “specialized” schools. Within the latter category, schools were further divided in thirteen sub-types, including “no-excuses,” STEM schools, progressive, single-sex, etc. There’s an even split between generalized and specialized schools, with the most common types being no-excuses and progressive.
The pair also found significant variation between cities. They contend that these distinctions are driven by demographics, the age and market share of each city’s charter sector, and (most interestingly) the number and type of authorizers. The higher the percentage of black residents living in a city, they found, the more students are enrolled in no-excuses schools. More poor residents tend to correlate with more specialized schools. McShane and Hatfield posit a “Maslow’s Hierarchy of Charter Schools” to explain their findings. “Academic achievement is often the primary concern for low-income communities; thus more no-excuses and STEM schools in poorer communities,” they write. “But in wealthier communities, families have the luxury of looking for specialized options such as international and foreign language schools.” Authorizers also “might be inclined to support established models” over programs that are more innovative but harder to implement (hence more KIPP). Or market diversity could simply be a function of maturity: The older a city’s charter sector—and the greater the number of authorizers—the more diverse the menu of options. “There is no ideal mix of schools for a given community,” McShane and Hatfield conclude, “or at least there is no ideal mix that can be determined by people outside of the community.” It’s not unreasonable, they observe, to make the diversity of charter offerings a “second-order concern behind school quality.” Yet as the authors note (nicely citing Fordham’s What Parents Want report), not every parent is a test score hawk. “If we require all schools to perform well across one set of metrics before we think about allowing for diversity,” McShane and Hatfield conclude, “we will most likely limit the amount of diversity that we will see.”
SOURCE: Michael Q. McShane and Jenn Hatfield, “Measuring diversity in charter school offerings,” American Enterprise Institute (July 2015).
When Governor Kasich signed the budget on June 30, two significant changes to Ohio’s assessment system became law. First, safe harbor was extended through the 2016 17 school year; second, PARCC ceased to be Ohio's state test. Soon after the ink was dry, the Ohio Department of Education (ODE) announced that the state would use tests developed in consultation with AIR for all subjects during the 2015–16 school year. (AIR provided Ohio’s science and social studies assessments in 2014–15 and also developed Ohio’s former tests—the OAA and OGT.)
Throughout the month of July, questions loomed surrounding what these tests would look like, how they would be administered, and when teachers and school leaders would receive preparation resources. Not all of those questions have been answered, but some have. Let’s take a look at what we know so far.
Test features
For many people, one of the most attractive aspects of the new ELA and math assessments is that they are shorter than PARCC tests. While PARCC tests are (depending on subject and grade level) around four or five hours each, the state tests that Ohio students will take this year will last approximately three hours for each subject. In addition, both the math and ELA tests will be divided into two parts so that districts can choose to give one 180-minute test or two ninety-minute tests in each subject.
While Ohio will use a fixed-form test in 2015–16—the format used by PARCC and earlier with the OAAs and OGTs—an intriguing aspect of AIR is that they have experience developing adaptive assessments. AIR characterizes an adaptive test as one that adjusts to a student’s responses. Students who are doing well are given harder items, and students who are struggling are given easier items. (Other widely used K–12 tests, such as NWEA MAP and STAR, are also adaptive.) By adjusting the difficulty level of questions, the tests could be shorter and might ascertain students’ strengths and weaknesses with greater accuracy. This year, because of the extremely short turnaround time between budget passage and the start of the school year, Ohio has chosen not to undertake an adaptive assessment. However, officials from ODE have said that they are keeping adaptive testing in mind for future years. (To be clear, opting for a fixed-form test instead of an adaptive one doesn’t mean that all tests must be paper-and-pencil affairs; districts can give a fixed-form test online.)
As for whether the tests will be administered online or in paper-and-pencil format, that’s up to districts (as it was last year). If districts administer tests online, the ELA and math tests will use the same testing platform as last spring’s science and social studies tests. This is likely music to the ears of districts that spent much of last year upset that students were forced to use two different platforms instead of one.
In terms of test preparation resources, blueprints and sample test items for the social studies and science tests are already available. ODE is in the process of creating testing blueprints for the math and ELA tests, and also plans to have them available by September. Sample test items will be available by October.
Testing windows
The new tests will be administered only once, at the end of the year (unlike PARCC’s mandate of two administrations, which caused some grief last year).[1] To provide flexibility, ODE has given districts the ability to select either the same dates for all tests or one set of dates for ELA and a different set for other content areas (math, science, and social studies). If districts do opt for two sets, the dates are allowed to overlap. Although ODE has blocked off long testing windows (in some cases up to thirty days) to give districts additional flexibility, districts are only able to select ten consecutive days for paper tests or fifteen consecutive days for online tests. They cannot use ODE’s entire testing window. Here’s what the windows look like:
[[{"fid":"114724","view_mode":"default","fields":{"format":"default"},"type":"media","attributes":{"style":"height: 325px; width: 500px;","class":"media-element file-default"},"link_text":null}]]
It’s important to remember that a testing window is just a time frame during which the test can take place. So even though the online testing window for ELA lasts from April 4 to April 29, that doesn’t mean a student is testing all day on each of those days. It means that schools have the choice to give the ELA test (which, remember, is only about three hours long) on any day within the allowed time frame and within the ten- or fifteen-day window they’ve selected for their districts. In theory, a seventh grader could take her online ELA test on April 25, her paper-and-pencil math test on May 2, and then be done with state testing for the year. A fourth grader, even though he must also take a social studies test, could have a similar schedule: social studies on April 18, ELA on April 25, and math on May 2. Three hours on each of those days (not the whole day) and he’s done.
In addition to granting flexibility for districts, ODE decided to push the ELA testing earlier because of the nature of the assessments. ELA tests include writing portions that must be graded by hand, which means they take longer to process. In order to follow state law, which now requires that test results be returned within forty-five days or by June 30 (whichever comes first), the ELA testing window had to end prior to the math, science, and social studies window for the ELA tests to be graded in time.
Test questions
Because of the short turnaround between the budget adoption and the start of the school year, ODE didn’t have the option to work with educators to create an Ohio test from scratch. As a result, this year’s tests will include questions from an item bank that has already been field-tested in other states. However, the items selected to appear on Ohio’s tests will be approved by review committees made up of Ohio educators. (The committees began meeting on August 17 and will continue meeting through September). In other words, the online and paper tests administered to Buckeye students in the spring will be built by ODE and AIR, but they are based on sets of questions approved by review committees. (For more information, check out ODE’s road map of the process).
In 2017, Ohio will begin field-testing questions written by Ohio educators. The goal is to develop item banks created by Ohio educators, which will then be used to create tests. In the years to come, when ODE says these tests will be Ohio-made, they will mean Ohio-made.
***
This year’s tests will undoubtedly encounter some bumps in the road. Luckily, because ODE has worked with AIR for many years—and because many districts and schools are already familiar with AIR’s online testing platform—the new state tests should have a smoother transition period than schools faced last year. But in order to earn the confidence of Buckeye families and educators, ODE must make sure that test preparation materials and resources, as well as guidance around accommodations, are timely and clear. Perhaps most importantly, a high bar for achievement must be set—questions that measure critical thinking (instead of test-bubbling ability) and rigorous cut scores are non-negotiable if ODE is committed to ending Ohio’s proficiency illusion.
[1] The only tests that will be given in the first semester of 2015-16 are tests associated with third grade reading (December) and tests given to high school students on a block schedule.
Only math and reading teachers in grades 4–8 receive evaluations based on value-added test results. For all other teachers—80 percent of them in Ohio—it’s on to Plan B. To evaluate these teachers, schools are using alternative measures of student growth, which include vendor assessments (commercial, non-state exams) and student learning objectives (SLOs, or teacher-designed goals for learning). But how are these alternative measures being administered? What are their pros and cons? The research on this issue is terribly thin, but a new study from the Institute of Education Sciences casts an intriguing ray of light. Through in-depth interviews, the researchers elicited information on how eight mid-Atlantic districts (unnamed) are implementing alternative measures.
Here are the study’s four key takeaways: First, educators considered vendor assessments (with results analyzed through a form of value-added modelling) to be a fairer and more rigorous evaluation method than SLOs. Second, both alternative measures yielded greater variation in teacher performance than observational methods alone. Third, implementing SLOs in a consistent and rigorous manner was extremely difficult. In fact, the authors write, “All types of stakeholders expressed concern about the potential for some teachers to ‘game the system’ by setting easily attainable goals.” Fourth, the implementation of these alternative measures took a great amount of time and came at a financial cost. The costs related to time should be of particular concern to states wrestling with worries about over-testing. (In fact, the Ohio Department of Education recently suggested eliminating SLOs to reduce time spent on testing.) So do the benefits outweigh the disadvantages? The authors don’t render judgment, but they raise enough concerns to make this reader think that alternative measures (especially SLOs) may not be worth all the effort.
SOURCE: Moira McCullough, et al., Alternative Student Growth Measures for Teacher Evaluation: Implementation Experiences of Early-Adopting Districts (Washington, D.C.: Institute of Education Science, July 2015)