For R & D to work in education, we must consistently secure funding from governments and philanthropies. That means presenting them with realistic, sensible ideas that can be adopted and implemented at a reasonable cost—both in money and teachers’ time. Fordham and CAP’s Moonshot for Kids competition yielded proposals for several such tools.
One of the goals of the Moonshot for Kids initiative that we at the Thomas B. Fordham Institute have been running along with our colleagues at the Center for American Progress is to make the benefits of research and development tangible. It’s one thing to say that “schools could benefit from more R & D.” That’s almost a platitude, like “fitness is good” or “climate change is bad.” But it’s also too nebulous and ethereal—appealing to good-government types, perhaps, but not concrete enough to actually get funded by governments or philanthropists. If that is to change, we need to make it real.
As I wrote earlier this month, persuading people of the value of R & D isn’t very challenging in the private sector, where venture capital firms exist to place bets on big ideas that might scale up in the real world and make all involved a ton of money. But given that K–12 education has been where so many VC-funded projects go to die, we have to make a public-goods argument, not just a get-rich pitch, if we want to unleash serious R & D investment for our schools.
We also need to be clear-eyed about what education R & D is and what it’s not. It’s not just any idea, effort, or initiative, however ingenious, that might improve our schools. There’s a whole world of “school improvement”—or “improvement science” if you like—that’s focused on helping educators identify solutions to particular problems, and figure out how to make them work in their contexts. And there’s lots of federal funding available—especially via Title I and its “school improvement” set-aside—to support such efforts, as well as any number of state-level and non-profit ventures.
Research and development, on the other hand, is about producing tools and technologies that can help educators to be more effective or efficient.
And arguably, effectiveness and efficiency are two sides of the same coin. As my friend Michael Goldstein (education impresario and author of another perceptive post on tools for teachers) said to me via email recently, in K–12 education, “‘true cost’ includes teacher time, and it seems like lots of ‘research-tested’ ideas don’t stick because they are ‘expensive in teacher time.’” And thus if we want R & D efforts to lead to tools that are actually adopted, we need to avoid those that are expensive—that “take lots of teacher time, which most teachers do not want to spend.”
Now for the good news: Our Moonshot for Kids competition yielded several proposals for tools that could take work off teachers’ plates so they can be more impactful with the time available.
Take, for example, one of the competition winners, FineTune, with its proposal to automate a key part of the student writing process:
Learning to write a structured essay is much more about organizing one’s thoughts than it is finding the right words, and doing that on one’s own is a daunting task. There’s a reason most students dread writing. It’s what we at FineTune call the Blank-Page Problem. The challenge of getting started. Traditionally, for students, the solution to this problem has been to conference with one’s teacher because, ironically, doing so doesn’t involve writing. Instead, it entails the teacher asking pointed questions intended to elicit responses that move the student toward clarity of message (i.e., thesis) and flow of logic—that is, talking through one’s ideas until they achieve coherence. Clear thinking leads to clear writing. The larger problem—the one that has lacked a solution to date—is that the conference model doesn’t scale. Few, if any, teachers can devote the time required to meet with every student while managing other classroom responsibilities. But imagine if they could. Imagine if every student, regardless of ability or geography or socioeconomic status, could [via technology] derive the benefits of a conference before each writing assignment.
Or consider Bibliomatic, which wants to use Artificial Intelligence to develop speech recognition software to aid with early literacy. From its proposal:
Children need practice reading aloud to become fluent readers, but many don’t get the practice they need. Speech recognition for children could help solve this problem by allowing educational apps to “listen” to children read and give them helpful feedback. This isn’t happening yet because speech recognition for children is not yet accurate enough to be widely useable. Public and private investment could change that by accelerating the development of speech recognition for children. This proposal argues for an investment in public data sets and “common task”-style challenges with metrics for evaluating research. This approach has proven successful in adult speech recognition, where DARPA’s investment in early speech laid the groundwork for technological advances that eventually led to widespread adoption of consumer speech recognition like Siri and Alexa.
Wouldn’t it be great if bots could help tots learn to read, or teens learn to get their thoughts down on paper?
To be clear, like most white-collar workers, teachers aren’t going to be replaced by tech, not in my lifetime. But with assistance from the likes of FineTune and Bibliomatic, they would surely have more time for everything else that’s on their to-do list, including offering more personalized attention and instruction to the kids who need it most.
Those ideas—among many more—surfaced just through a pretty rudimentary think-tank contest with a grand prize of a measly $10,000. Think what would happen if billions of federal or philanthropic dollars were available for real-live education R & D.
Our schools need better tools. Investing in R & D may be the surest way to get them.
I haven’t yet got my hands on the much-discussed new book by Yuval Levin, one of the most thoughtful conservative public intellectuals and writers of our time (also editor of National Affairs and head of “social, cultural and constitutional studies” at the American Enterprise Institute). But A Time To Build holds messages for educators and ed reformers, as is evident from the author’s column in the Sunday New York Times, his presentation at the American Enterprise Institute, and Barton Swaim’s excellent review in the Wall Street Journal
Levin explains why Americans have lost confidence in their institutions and what must be done to rekindle that confidence if we’re to “revive the American dream.” His analysis of the problem goes far beyond the conventional view that the loss of confidence that’s evident on so many fronts was the inevitable result of institutions failing to deliver, turning greedy and deceiving, and collapsing. “What stands out about our era in particular,” he writes, “is a…tendency to think of institutions not as molds of character and behavior but as platforms for performance and prominence.” In his view, well-functioning institutions mold those who work in them and for them, making those people more trustworthy for a society that needs more trust, even as those being molded cause the institutions themselves to be more effective in the work they do. Once that work is subordinated to the fame or prosperity of individuals, however, confidence in the institution falters and the dream dims.
You don’t have to look far to find innumerable examples of this, whether in government or the media, in corporations or labor unions, in Hollywood and big-time sports for sure, but even in museums and some religious groups. Instead of going about their essential work, carrying out their missions, and engaging their members in wholehearted commitment to their missions, almost every one of them has evolved into a platform upon which individuals and their causes, their groups, and their “identities” can attract attention, gain celebrity, win influence, advance themselves, and perhaps get rich along the way.
Levin looks for exceptions—institutions that have retained their integrity and stuck to their knitting—but doesn’t find many. “The military,” he says, “is the most conspicuous exception and also the most unabashedly formative of our national institutions—molding men and women who clearly take a standard of behavior and responsibility seriously. And that can help us see what we might do to help alleviate the social crisis we confront.”
Which got me thinking about schools. What are these ubiquitous institutions “molding” nowadays? And what about their leaders and staff? To what extent are they, in Levin’s words, “letting the distinct integrities and purposes of those institutions shape [them], rather than just using them as stages from which to be seen and heard”?
There are bright spots in the K–12 realm: individual schools—district, charter, private—that are about incubating young people into adults “who clearly take a standard of behavior and responsibility seriously” and who are also literate and numerate. In these schools—I’ve visited awesome examples and you can read about more with the help of Hunter and Olson—team members are indeed shaped by and contribute to the “integrities and purposes” of the institutions to which they’re committed, not posturing and gesturing to make themselves more prominent. I’ve seen the same dynamic at work in small charter networks and school systems—generally the kind that serve as functional communities of which the schools are pillars, not only because they educate kids, but also because they employ one’s neighbors and often serve as community centers in their own right.
When you look at sprawling districts and big CMOs, however, the picture often changes. And when you look at state and national education leaders and their organizations, it changes even more. Identity politics, ideology, partisanship, and self-promotion rear their unlovely heads, and the institutions themselves begin to behave in ways that erode rather than building confidence. What else could one conclude from the shenanigans roiling the New York City school system under Messrs. de Blasio and Carranza? What else is one to think about the noisy turns toward “social justice” and woke-ness that we see in once-revered reformist outfits like KIPP and TFA? Or the identity politics—and craven board-level cowardice—that led to Steven Wilson’s recent ouster at Ascend? Is that sort of thing meant to foster confidence in the institutions themselves and in those who lead them? For that matter, organizations that strive to develop such leaders for the education realm often succumb to similar tendencies.
Neglect of the central mission of institutions, replaced by self-interest (and celebrity), can also be glimpsed in the wave of angry teacher strikes that have disrupted a host of cities and states in recent years. Nobody can argue against the right of loyal members of an institution to seek more generous compensation, but what does it say when they force that institution to cease functioning so as to add to their own pocketbooks—and, often, the celebrity of their leaders?
A happy counter-example popped up these past few weeks in response to John White’s announcement that he’s stepping down as Louisiana’s state superintendent. The many accolades that have showered upon him and his work were linked to his dogged and creative efforts to turn the public schools of the Bayou State into viable and effective institutions. It seems we’re still able to recognize and applaud institution builders. (Could it be because they’re now so rare?)
Current nationwide efforts to advance social-emotional learning and civics education are a mixed bag. On the plus side, they’re animated by a desire for schools to be places where children learn to “take a standard of behavior and responsibility seriously.” Yet they can slide into empty self-esteem, protest politics, and groupthink of the kind that suppresses true dialogue, disagreement, and the quest for compromise. They also provide high-visibility platforms for individuals who crave that sort of prominence.
Educators, ed reformers, and leaders of education-related organizations would do well to take Levin’s admonitions seriously:
All of us have roles to play in some institutions we care about, be they familiar or communal, educational or professional, civic, political, cultural, or economic. Rebuilding trust in those institutions will require the people within them—that is, each of us—to be more trustworthy…. As a practical matter, this can mean forcing ourselves, in little moments of decision, to ask the great unasked question of our time: “Given my role in this institution, how should I behave?” That’s what people who take an institution they’ve involved with seriously would ask. “As a president or a member of Congress, a teacher or a scientist, a lawyer or a doctor, a pastor or a member, a parent or a neighbor, what should I do here?”
The people you most respect these days probably ask that kind of question before they make important judgments. The people who drive you crazy, who you think are part of the problem, are most likely those who clearly fail to ask it when they should…. And asking such questions is one thing we all can do to take on the complicated social crisis we are living through and begin to rebuild the bonds of trust essential for a free society.
And now to read the book…
I owe my education career to reader’s workshop, the Teachers College Reading and Writing Project, and its founder Lucy Calkins. I started as a mid-career switcher with a two-year commitment to teach fifth grade in a South Bronx public school. Two things about my school are worth knowing: It was the lowest-performing school in New York City’s lowest-performing district. And we were devoted to Calkins’s Units of Study.
My initial response to the reading and writing “workshop model” Calkins helped make famous and ubiquitous was willing suspension of disbelief. To the degree I remembered learning to read at all, it had nothing in common with how I was expected to teach it. Next came frustration. My “TC” staff developer spoke in inscrutable koans, encouraging me to “be the author of your own teaching.” When I took that advice and gave explicit instruction, however, she shook her head and said, “That’s not teaching, that’s giving directions.” Frustration gave way to exasperation, then resistance, and finally hostility. I left the classroom determined to advocate for curriculum and instruction thanks to Calkins and balanced literacy. My struggling fifth graders needed a lot of things, but not that.
This is all to say that I read the new report from Student Achievement Partners, “Comparing Reading Research to Program Design: An Examination of Teachers College Units of Study,” not as a neutral observer, but largely conversant with the many issues it surfaces and already a convert. Still, the report is staggering—as authoritative and thorough dismantling as you’re likely to find of a curriculum that has been widely praised, implemented, and imitated. Well, not exactly. As another TC staff developer insisted, “It’s not a curriculum, it’s a philosophy.” Either way, schools that are relying on the workshop model, particularly if they serve disadvantaged students and English language learners, should now feel obligated to explain why they continue to use it when the vast weight of evidence is so clearly arrayed against it.
The report begins gently enough. “The literacy expert reviewers were impressed by how beautifully crafted the Units of Study materials are.” Lessons are “charming, elegant, and highly respectful of teachers.” The reviewers, which include bold-faced names in reading research including Tim Shanahan, Lilly Wong Fillmore, Marilyn Jager Adams, and Claude Goldenberg, agreed that the program is “organized above all on the value of loving to read and the encouragement of reading and writing as lifelong habits, both laudable and vital ambitions.”
There’s a “but” coming—lots of buts, actually—and they run for more than sixty pages across multiple dimensions of reading instruction: phonics and fluency; text complexity and language development; building background knowledge and vocabulary; English language learner supports. In none of them is Units of Study found to be anything but lacking.
The program gives insufficient time and attention to phonics skills and recommends teachers use the so-called “three-cueing system” (read: guessing) to help children get past unfamiliar words they’re unable to decode “in direct opposition to an enormous body of settled research.” There is “insufficient guidance” for teachers on how to use assessments to inform instruction. “This means any student who does not immediately master an aspect of foundational reading is at risk of never getting it.” The sternest criticism in the report is that Units of Study “fail(s) to systematically and concretely guide teachers to provide English learners (ELs) the supports they need to attain high levels of literacy development.”
Children who come to school already reading or primed to read “may integrate seamlessly into the routines of the Units of Study model and maintain a successful reading trajectory,” the report cautions. But that’s of little value to those who need additional support and instruction. “These students are not likely to get what they need from Units of Study to read, write, speak, and listen at grade level.”
The overarching conceit of the review’s process is to evaluate the program not by how well it’s aligned to standards, but whether it encourages scientifically validated practice in reading instruction. This is a good and important lens. There may be disagreement over standards (and as a prominent advocate for Common Core, Student Achievement Partners would otherwise be vulnerable to conflict of interest charges), but the weight of scientific evidence is harder to challenge. Enlisting prominent outside experts to evaluate the program makes it the critique stick and sting, and makes it harder to explain away.
Indeed, the review stands as a critique not just of Units of Study but the workshop model, and balanced literacy more broadly. “If you run a balanced literacy classroom that shares some aspects of Units of Study but not others,” the report notes, “it follows that some of the research findings in this report will apply and others may not.” That’s a good and scholarly caveat. But Units of Study is the most clearly articulated and prescriptive program of its type. It stands to reason that other flavors of balanced literacy that are even less well developed are equally or more deficient.
At present, there has been no response that I’m aware of from the Teachers College Reading and Writing Project; Heinemann, the publisher of Units of Study; or from Calkins herself. In a lengthy blog post late last year, written in response to the “phonics-centric people who are calling themselves ‘the science of reading,’” Calkins insisted that “no one interest group gets to own science.” Perhaps not. But what she really needs to own is a shovel, to dig Units of Study out from under the mountain of contrary scientific evidence it is now buried beneath.
Amid all of the hullabaloo over teacher evaluations, fewer states are now using test scores to assess the quality of their teacher workforce. Thankfully, intrepid researchers Tom Dee, Jessalynn James, and James Wyckoff ask a key question before tossing out the teacher-measurement baby with the bathwater: How has the District of Columbia’s evaluation system, IMPACT, evolved in recent years, and has this evolution continued to strengthen the teacher workforce there?
Dee and colleagues have been tracking the impact of IMPACT not long after it began in 2009. This time they ask whether the key changes in “IMPACT 3.0” have been beneficial, which include reducing the percentage of the final rating attributed to individual value added from 50 to 35 percent; eliminating school-level value added; and allowing teachers in tested grades, just like those in non-tested grades, to choose a “teacher selected assessment” to comprise part of their rating—as opposed to only using PARCC scores (which is D.C.’s “state” test). IMPACT 3.0 also made use of new performance-based career ladders that helped determine base pay increases and initiated incentives to teach in the forty most demanding schools in the district.
Key to this study, IMPACT 3.0 also introduced higher performance standards for lower-performing teachers. Specifically, it included a new performance category called “Developing” that was added to the existing ratings by dividing the Effective category in half, with the lower portion comprising the new category. (Evidence showed the prior Effective range reflected considerable variability.) As always, teachers with one or two consecutive Minimally Effective ratings were to be let go. But in IMPACT 3.0, teachers with three consecutive Developing ratings would also be let go. So analysts created two data sets—one for teachers at the Minimally Effective/Developing threshold and another at the Developing/Effective threshold—and conducted an intent-to-treat regression continuity design to analyze impacts at below and above those thresholds. The full study sample included over 17,000 teacher-by-year observations of teachers who received IMPACT ratings between 2010–11 and 2014–15.
On the descriptive front, they find that under IMPACT 3.0, nearly 20 percent of all DCPS teachers leave each year and 44 percent leave over three years; but attrition among Effective (15 percent each year) and Highly Effective (10 percent each year) teachers is much lower. Among Developing, Minimally Effective, and Ineffective educators, one-year attrition is 26, 53, and 91 percent, respectively.
The key empirical finding is that teachers just below the Minimally Effective threshold are approximately 11 percentage points less likely to return the following year, an increase in attrition of approximately 40 percent, which suggests that IMPACT 3.0 is effective in inducing low-performing teachers to voluntarily exit. Teachers just below the Developing threshold who have two more years to earn an Effective rating or higher are 5 percentage points less likely to return the following year.
Recall that the Developing teachers include those band of teachers who were previously considered Effective, so the next question becomes, did they really develop? Indeed they did, as analysts found that, among those who remained in DCPS, more than two-thirds (68 percent) of Developing teachers improved to Effective or Highly Effective two years later.
Teachers receive not only multiple observations but formal feedback and coaching following each one, as DCPS takes its responsibility to develop teachers, particularly low-performing ones, quite seriously.
Still, we know that the nation’s capital is unique in the various assets that it enjoys to make IMPACT work, such as sky-high spending and a deep pool of local talent. But it is encouraging that leadership continues to tweak the system in response to both teacher feedback and the results and challenges that IMPACT has experienced in the last decade. These continuous, thoughtful changes to the system have thus far resulted in sustained improvements in teacher effectiveness in the city. And that’s a very good thing for the kids who live there.
SOURCE: Thomas Dee, Jessalynn James, and Jim Wyckoff, “Is Effective Teacher Evaluation Sustainable? Evidence from DCPS,” Education Finance and Policy (November 26, 2019).
The American Recovery and Reinvestment Act of 2009 marked a massive federal investment in our schools, with more than $100 billion to shore up school systems in the face of the Great Recession. Along with that largesse came two grant programs meant to encourage reform with all of those resources: Race to the Top and School Improvement Grants (SIGs). While Race to the Top aimed to spur system innovation, SIG was intended to facilitate turnarounds of the nation’s lowest performing public schools. Congress appropriated $3.5 billion for the first cohort of SIGs and went on to authorize five subsequent cohorts, bringing the total investment in SIG to approximately $7 billion.
There is plenty of research that examines whether this hefty price tag was worth it. Most of it is mixed, and some of the findings are downright disappointing. Back in 2017, Andy Smarick called an IES report on SIG effects a “devastating” blow to Obama’s education legacy and noted that the report “delivered a crushing verdict: The program failed and failed badly.” On the other hand, studies that have focused on state or local results—such as this one on SIG in Ohio—have been more promising.
A new working paper from Annenberg Institute at Brown University seeks to add to the discussion by offering the first comprehensive study of the longitudinal effects of SIG on school performance. The paper estimates SIG effects on student achievement and graduation rates for the first two program cohorts in four geographically diverse locations: two states, North Carolina and Washington, and two urban districts, San Francisco Unified School District and the pseudonymous “Beachfront County” Public Schools (the authors are still waiting for permission to use this district’s name). Sixty-six schools were awarded funding in the first cohort during the 2010–11 school year; and thirty-three schools awarded funding in the second cohort the following year. The data span from the 2007–08 school year through 2016–17 in order to include three years before the first cohort and three years after funding ceased. The researchers used state and district administrative datasets on student characteristics, state tests in both math and English language arts, graduation rates, and school contexts. They also controlled for changes in students’ demographic characteristics.
Results show gradually increasing positive effects during the intervention years in both math and ELA in grades three through eight. Effects were larger in the second and third year of the program than they were in the first. After SIG funding ended, positive effects began to decrease slightly, but were sustained in math through the third or fourth year post-policy (the sixth or seventh year after the school initially received the grant). Perhaps most significantly, effects on graduation were also positive: Four-year graduation rates steadily increased throughout the six or seven year period after the start of SIG interventions. Effects on students of color and low-income students were similar to overall effects and were sometimes slightly larger. Results across the four geographic locations were generally consistent but had differing magnitudes, a finding that’s in line with previous research indicating that variations could be a result of local design and implementation decisions.
The authors note that these findings suggest that SIG interventions could be one of the federal government’s most successful capacity-building investments for improving schools’ low performance. In fact, the researchers note that SIG effects on test scores in this analysis are “similar to the effects on student test scores estimated for the market-based reforms in New Orleans after Hurricane Katrina in 2005.” But the fact remains that other reputable and wide-ranging studies found far less positive results. As Chad Aldeman notes, it appears the best question regarding SIG effectiveness is “not ‘did SIG work?’ but rather ‘why did it produce results in some places and not others?’”
SOURCE: Min Sun, Alec Kennedy, Susanna Loeb, “The Longitudinal Effects of School Improvement Grants,” Annenberg Institute at Brown University (January 2020).
On this week’s podcast, Matthew Steinberg, associate professor of education policy at George Mason University, joins Mike Petrilli and David Griffith for the second installment of our Research Deep Dive series, this one focusing on school discipline reform.