Editor’s note: This is the first in a series of blog posts that will take a closer look at the findings and implications of Evaluating the Content and Quality of Next Generation Assessments, Fordham’s new first-of-its-kind report.
Debates about testing—and state tests in particular—have reached new levels of intensity and rancor. While affecting only a fraction of the U.S. public schools population, the opt-out movement reflects a troubling trend for state and district leaders who rely on the tests to monitor their efforts to prepare all students to successfully transition to higher education or the workforce.
The recently adopted Every Student Succeeds Act (ESSA), like its NCLB predecessor, requires annual standardized testing in English language arts/literacy and mathematics in grades three through eight and once in high school. While ESSA contains some new flexibilities and even encourages use of much more than test scores to evaluate performance, states will continue to use—and the public will continue to debate—state tests.
And that’s exactly why Fordham’s new study and the companion study by HumRRO are so important. In a time of opt-out initiatives and heated debate, state decision-makers need to know whether a given test is worth fighting for.
The new approach taken by these two studies provides decision-makers with a better lens—a more substantive set of information—than has been available to date to make that determination.
State departments of education have long commissioned alignment studies to determine whether a given test was a legally defensible measure of their state standards. This typically involved panels of educators being guided through an evaluation of the items on a test form and determining their match to content standards and cognitive demand.
But these prior approaches were quite limited in scope and failed to address key attributes of the Common Core State Standards, such as coverage of the standards for mathematical practice and the balance of informational and literary texts in English language arts. In addition, the Center for Assessment’s new methodology, specifically designed to evaluate fidelity to the CCSS, requires evaluators to focus on three questions that are—or should be—of central importance to decision-makers:
- Do items (test questions) truly require the student to provide evidence of the desired skill or knowledge?
Too often in the past, alignment studies were treated as a checklist that determined whether each standard was addressed by at least one item. But an item can ask about the topic or skill of a given standard without actually requiring the student to provide evidence of their knowledge/skill. Many of us have taken tests in which there was a high likelihood that we could answer items correctly through guessing, superficial review of the item, or prior knowledge unrelated to the desired skill.[1]
This study, in contrast, asked evaluators to carefully determine what each item asked the student to do, and whether a correct response required use of the desired skill or knowledge. For example, when evaluating a question assessing, according to the vendor, criterion B.3.1 (“Item requires close reading and analysis”), reviewers needed to determine whether the question actually required analysis as opposed to just a superficial reading and restating of the text.
- Does the test, overall, place appropriate emphasis on the priority knowledge and skills of the grade?
Not all grade level standards are of equal importance, and a dominant design feature of the Common Core was the intentional winnowing of the number of topics to be addressed at each grade level. This was done to overcome the “mile wide and inch deep” curriculum problems that had plagued schools. Fewer topics per grade would presumably result in more instructional time on the essential building blocks of the grade and deeper understanding and mastery.
After analyzing each item, the panels were asked to determine whether each form placed adequate emphasis on the priority skills and knowledge of the grade. In mathematics, those priorities are referred to as the Major Work of the grade. In English language arts/literacy, the priorities are the same across grades and involve increasing levels of skill in the close reading of high-quality, complex texts (both literary and informational), use of evidence from texts, writing to sources, academic vocabulary, language conventions, and research skills.
- Does each test form assess higher order thinking skills, as called for by the standards of that grade?
This is an area in which state tests do not have a good track record of late, even though earlier alignment methodologies called for its evaluation. Even among the best of state tests administered between 2000 and 2010, only about 2 percent of math items and 21 percent of ELA items assessed higher order thinking skills (Depth of Knowledge levels 3 and 4).[2] Some of these higher order skills, such as problem-solving skills, are among the most highly rated skills by employers and colleges.[3] Either the alignment studies commissioned by states in recent years did not give sufficient emphasis to these skills, or decision-makers failed to put sufficient weight on that portion of the results.
Part of the problem in the past may have been that decision-makers were not given clear, quantitative criteria for determinations of “well aligned” versus moderately or weakly aligned. The CCSSO Criteria put forward tentative cutoffs for panels to consider when developing the final ratings of Excellent Match, Good Match, Limited/Uneven Match, and Weak Match. Our panels had concerns with some of these cutoffs and made recommendations for changes, but they are an important starting point. I remember all too well when, while serving on a state board of education in the late 1990’s, being presented with the results of an alignment study which showed that fewer than 60 percent of items were “well aligned” to the standard they were intended to measure. No criteria existed for interpreting those results, and the test was recommended for adoption.
To be sure, this deeper look at the quality of a state test—the degree to which it requires students to demonstrate their mastery of high-priority skills and knowledge and higher order thinking and communications skills—is a labor-intensive task. Important work remains to be done to make the methodology more feasible for individual states to implement.
That said, its use will be worth a greater investment than past approaches, particularly in light of the systemic costs associated with tests that fail to focus on the right stuff, resulting in false signals to students, parents, educational leaders, and taxpayers about the academic preparedness of our K–12 students.
Just as the Common Core and similar college and career readiness standards have set a new, higher bar for our students, we hope decision-makers will take advantage of this study and its new methodology to set a higher bar for the tests states use to measure their progress.
[1] Prior knowledge is most often an issue within reading items. For example, if the item is being used to determine whether students can locate and cite evidence within a text for a particular claim, but the student can provide a correct answer solely through prior knowledge of the topic, then the desired skill has not been measured.
[2] Estimating the Percentage of Students who were Tested on Cognitively Demanding Items through State Achievement Tests. L. Yuan and V. Le, RAND Corporation, 2012.
[3] Measurement of 21st Century Skills Within the Common Core State Standards. P. Kyllonen, Educational Testing Service, 2012.
Nancy Doorey co-authored Evaluating the Content and Quality of Next Generation Assessments. She has been deeply involved in educational reform for more than thirty years, serving as a teacher, state and district policymaker, program director, and consultant in the areas of assessment, teacher quality, and leadership.