In the last decade, states have experimented with many new assessment systems in math and reading. A new Bellwether brief by Bonnie O’Keefe and Brandon Lewis examines recent innovations used or proposed by states that could serve educators well. The authors highlight the drawbacks of relying heavily on summative end-of-year exams in math and science for accountability, which test accumulated knowledge. They welcome the shift towards using lower-stakes, shorter interim exams for accountability and the increasing state investments in formative assessments, which are designed to improve student learning rather than to gather school-wide data.
The authors focus on four trends among the states: a push to replace end-of-year exams with interim ones; increased investment in formative assessments; the creation of test item banks shared across states; and the development of science and social studies exams. For each trend, they identify the benefits and risks of states applying them to their testing systems.
There are four main takeaways. First, states are developing adaptive interim exams to replace summative end-of-year exams. Computer-adaptive exams use algorithms to select the next test questions based on students’ previous responses, making them easier or harder to gather data on their skill level. End-of-year tests are usually summative and uniform, making them useful for evaluating school performance. A switch to adaptive interim exams for accountability would reclaim weeks of instructional time that were taken by long testing periods and test prep, and lets parents and principals hold teachers accountable for student performance during the year. Furthermore, interim exams would give teachers timely feedback that they can use to improve instruction. O’Keefe and Lewis cite Nebraska as an example. The state is transitioning from using a summative end-of-year exam in grades three through eight to using adaptive interims combined with an adaptive end-of-year exam for accountability. It will then phase out end-of-year exams and rely solely on interim exams.
Second, the authors suggest that increasing investment in formative assessments could improve teachers’ instruction. Teachers usually make their own formative tests, such as reading quizzes, though they often invest in supplemental materials. State-level and regional consortia now offer coaching and resource materials to teachers who want to design or purchase better formative assessments. One example the brief cites is a consortium in Michigan that helps teachers develop assessments. But the authors warn that states following this trend may be distracted from funding effective accountability systems.
O’Keefe and Lewis overstate the case for why more investment in formative exams is positive. They cite that classroom assessment spending outpaced state-level mandatory test spending nationally by $0.3 billion in 2017 to claim schools are interested in formative tests. But that does not suggest they serve students or teachers well; perhaps the classroom assessments have better marketing. While the authors admit a flooded marketplace makes it hard for school districts to discern quality, they believe state-level recommendations would resolve the issue. However, the increases in funding and infrastructure this would require largely outweigh the benefits.
Third, shared-item test banks now supply states with quality exam questions and designs. These resources are more cost-effective than states individually developing tests from scratch and can supplement individual states’ test designs. The brief cites Michigan, which builds on the Smarter Balanced math and reading content to develop their M-STEP assessment.
Finally, O’Keefe and Lewis acknowledge the lack of valid and reliable statewide assessments in science and social studies. For example, though many states have adopted the Next Generation Science Standards (NGSS), few have developed tests aligned to those standards. One exception they cite is the New Hampshire PACE exam. It combines locally-designed, live-performance science tasks with traditional testing but, as with all performance assessments, brings with it the challenge of training teachers to assess tasks consistently across districts.
In social studies, however, there are fewer models to emulate. The brief notes that some legislatures adopted the U.S. citizenship test and faults them for settling for a low bar instead of developing a proper, state-standards-aligned assessment. But O’Keefe and Lewis also laud Louisiana’s evolving assessment system, which will soon incorporate social studies questions into its English exam. They end this section with an invitation to states to design more science and social study assessments.
The brief’s point is clear: We have some reasons to be optimistic about the future of assessments. O’Keefe and Lewis are right to stress how the last decade’s innovations have offered educators and policymakers improved designs and quality data. And this year’s package of federal testing grants could produce long-awaited scalable science assessments and facilitate the incorporation of project-based learning. Shifting from end-of-year exams to interim ones could benefit students the most because they track progress and give teachers the flexibility to use ready-made assessments that are relevant to their curricula. But more state investment in formative tests is a costly distraction from improving accountability systems for most states. We have indeed gained quality data and test designs, but new data and designs risk overwhelming local school districts and confusing parents and students as they struggle to decipher their perennially changing tests.
SOURCE: Bonnie O’Keefe and Brandon Lewis, “The State of Assessment: A Look Forward on Innovation in State Testing Systems,” Bellwether Education Partners (July 2019).