The search for clarity among the What Works Clearinghouse and its peers

Jeff Murray

3.28.2023

Clearinghouses in education are entities that review research studies, analyze the effects of the interventions studied, and provide ratings of those interventions. Such ratings are relied upon more and more by states, educators, and schools as sources of fully-vetted, high-quality curricula and educational programs—and it is assumed that they’re based on deep analysis that verifies whether the models work as intended to help students learn and achieve. But how exactly do education clearinghouses work? Upon what evidence are programs and curricula rated? And how often do various analyses of the same curriculum agree? A new review aims to answer these questions and more.

A trio of researchers from George Washington University and Northwestern University initially collected a roster of forty-three education clearinghouses in the United States and the United Kingdom. The criteria for final inclusion were entities that conducted effectiveness analyses of education programs from pre-K to college, did not borrow their rating schemes from other clearinghouses, and produced ratings that were publicly accessible via the internet. A total of twelve clearinghouses met the criteria, all based in the U.S. The researchers collected and coded data from the clearinghouses between June 2019 and August 2020, including the types of interventions evaluated, their rating processes, funding sources, and the standards of evidence used to assess both research designs and the outcomes achieved.

A bit more about the dozen clearinghouses: Four were education-focused exclusively, while the other eight included additional areas. Most focused on student-centered programs only, some on programs serving children and their families, and one on programs specifically for military families. Some better-known clearinghouses include the National Dropout Prevention Center, the Collaborative for Academic, Social, and Emotional Learning (CASEL), What Works Clearinghouse (WWC), and Best Evidence Encyclopedia (BEE). It may be simply a matter of timing that the extremely well-known EdReports was not included in the researchers’ short or long lists—their initial search of clearinghouses stopped in 2016 just as EdReports was ramping up its work—but its prominence in the space since then should warrant its inclusion in any future research. Funding for the clearinghouses reviewed comes from combinations of government, university, and foundation sources and varies greatly across the entities. The U.S. Department of Education, for example, reported to the authors that it had spent a whopping $100 million-plus supporting WWC; many other clearinghouses indicated that they operated on a fraction of that amount.

Available resources plus choice or area of concentration serve to shape the work of the various clearinghouses—including what and how many education programs they analyze. To list just three diverse examples: WWC employs experts to proactively search for programs to review according to an extensive protocol, the Corporation for National and Community Service Evidence Exchange only reviews programs that it funds, and the Home Visiting Evidence of Effectiveness entity looks only at programs designed to boost school readiness. Program size, study design, and publication date also serve as criteria for inclusion in any given clearinghouse’s body of work.

Because the criteria used to evaluate programs differ greatly from clearinghouse to clearinghouse, the researchers worked hard to hammer out some useful comparisons. On the upside, all twelve clearinghouses cite randomized control trials (RCT) as their preferred research design and give greater weight to RCT studies and their outcomes than to other quasi-experimental designs. On the downside, the same type of non-RCT design that one clearinghouse readily accepts can be downgraded or even summarily rejected for analysis by others. Of the 1,359 educational curricula and interventions analyzed by these clearinghouses during the time of this study, 83 percent were assessed by just a single entity. Among those analyzed by more than one clearinghouse, similar ratings were achieved for only about 30 percent of the programs. Thus there’s not much “inter-rater reliability.” With many of those concurring ratings being low ones, perhaps the most comforting takeaway is that the duds seem easy enough for clearinghouses to spot.

The last third of the report is a set of case studies looking at all the possible combinations of outcomes for programs rated by multiple clearinghouses. Those deal with programs that earned maximal agreement in ratings across clearinghouses (both high ratings and low), modest agreement, modest disagreement, and maximal disagreement. The details of each are interesting—including the insistence of some clearinghouses on replication of findings, a state of affairs long lacking in education research—but the case of maximal disagreement is worth highlighting. Five clearinghouses reviewed research on the effectiveness of the very-well-known dropout prevention program Communities in Schools (CIS); it received one high rating, two middling ratings, and two low ratings. Among other factors, a lack of RCT studies available and some inconclusive impact findings among non-RCT studies dragged the program down for four of the clearinghouses. The one clearinghouse which gave CIS an unqualified thumbs up seemed to have its own source of RCT studies (not identified or linked) which presented enough positive outcomes to rate the program as “promising.” In short, the “evidence base” for different evaluations of the same program is wildly inconsistent for reasons unconnected to CIS itself. Spare a thought for any charter school or state education agency looking to make quality program or curricular choices in the face of these variables.

There are numerous limitations to this research, including a reliance on clearinghouses to have fully stated all of their criteria for study selection and rating on their websites. As a result, the researchers’ call for more clarity overall is sound. On the simpler side, that could mean clearinghouses add detail to their ratings—explaining, for example, that no program can earn a “recommended” rating without an RCT less than five years old, no matter what other evidence is available. Or it could mean more financial support for replication. On the more complex side, perhaps a central authority is needed to either police the varying evidence and ratings or to simply reduce the number of clearinghouses out there, which could also have the benefit of spreading resources to fewer entities, thus allowing more programs to be rated. As it stands now, clarity is clearly lacking in the clearinghouse space.

SOURCE: Mansi Wadhwa, Jingwen Zheng, and Thomas D. Cook, “How Consistent Are Meanings of ‘Evidence-Based’? A Comparative Review of 12 Clearinghouses that Rate the Effectiveness of Educational Programs,” American Educational Research Association AERA Journal (February 2023).

Policy Priority:

School Funding

Topics:

Evidence-Based Learning

Tags:

United States Department of Education