Will NAEP wreck its reading assessment?

Chester E. Finn, Jr.

7.22.2020

At the heart of the National Assessment of Educational Progress (NAEP) are lengthy yet little known documents called “frameworks” that, for every subject NAEP touches, set forth what is to be assessed and how that’s to be done. Here’s how the National Center for Education Statistics (NCES) describes them:

Frameworks define the subject-specific content and thinking skills needed by students to deal with the complex issues they encounter in and out of the classroom. The NAEP frameworks are devised through a development process that ensures they meet current educational requirements. Assessments must be flexible and mirror changes in educational objectives and curricula. Therefore, the frameworks must be both forward-looking and responsive, balancing current teaching practices with research findings.

These are bulky documents—the current math framework runs to seventy-five pages, the U.S. history framework to sixty-five—and they’re always the product of much heavy lifting over several years by multiple committees, contractors, and reviewers before final adoption by the National Assessment Governing Board (NAGB).

NAEP is now fifty years old, and its subject frameworks periodically need revision as curricular emphases, pedagogical practices, and state standards evolve. But changing a NAEP assessment is harder than moving a cemetery. It takes years of lead time, costs lots of money, and requires endless palaver among people with divergent views of the subject. Remember, it’s a national assessment, yet (in some subjects) results are reported for every state and nearly thirty big districts, as well as both public and private schools. Since the same assessment will be taken by school kids in Oregon and Texas, in Cleveland and Miami-Dade, in Vermont and Wyoming, it’s no easy matter to reach agreement on what to test.

Moreover, changing a frameworks risks (in NAEP parlance) “breaking the trend line.” Inasmuch as NAEP is America’s most valued source of information about changes over time in student achievement, losing the trend line is tantamount to starting over. Yet if the new framework and the tests based upon it differ in big ways from their predecessors, that’s what usually happens. It’s akin to what happens every time the College Board “re-centers” the SAT or replaces an AP framework. They try hard to deploy fancy psychometric techniques to equate the scores and “bridge” the trend line but that is not always possible and not always credible. (Think about it. If you test addition on three consecutive Fridays and subtraction on the next three Fridays, how do you determine whether a kid is better or worse at math in week five than in week two? You really can’t, not when what the tests are testing is so different.)

For all these reasons, NAGB doesn’t often replace its frameworks. But it’s currently in the middle of a humongous effort to do just that in the most core of all core subjects, namely reading. The reading framework dates to 2009, and the replacement effort aims to have a new one in place in time to guide the assessment in 2025 and thereafter. The replacement process commenced last year when NAGB charged newly constituted “Visioning and Development Panels” with recommending changes that would “maximize the value of NAEP to the nation” while also taking advantages of “the affordances [sic] of digital based assessment.”

We’re now at the stage where NAGB’s contractors and panels have presented a full draft of the proposed new reading framework (swollen to 149 pages), and public comment is invited through tomorrow.

Many, many comments have arrived (and more will in the coming hours), including my own gloomy assessment of what’s being proposed, which I, sorrowfully, provide here.

—

I’ve seen a lot of NAEP frameworks, revisions, and proposed revisions over the past several decades. This is, I believe, the first time I’ve ever seen one that leaves me feeling that the bad outweighs the good by a considerable margin.

To begin with a few key practicalities:

The changes proposed here are so extravagantly comprehensive that I’m certain their implementation would break the NAEP reading trendline. This is no time to do that, certainly not with Covid-caused school stoppages upon us and with ESSA just five years old and unlikely to be reauthorized for a number of years. (If Congress moves on that at the same speed it moved on NCLB, a new law might be signed in 2028 and, presumably, kick in a year or two later.) The current reading trend line data—the one that incorporates accommodations—goes back to 1998, which means it spans the NCLB and ESSA eras. Going forward, it’s crucial that national, state, and TUDA performance in the ultimate core subject of reading stays on an unbroken line, not least because of the devolution of so many decisions to states that occurred with ESSA. How else will states know how they’re doing? How else will federal officials determine whether ESSA did kids—and which kids?—more good than NCLB?

The changes are so extravagantly comprehensive that implementing them would cost tons more money than continuing with the present assessment framework and test design. (Just think of the expanded samples needed to yield valid data for the subgroups of subgroups that are being urged.) NAEP’s budget is stretched so tight already that vital twelfth grade state-level data on reading and math are missing and other key subjects cannot be assessed on a regular basis. When they are, it’s usually just a single grade and national-only. This is no time to burden the budget further! And because many of the changes proposed here will be hotly controversial in Congress and elsewhere, I see little prospect that they’ll lead to a budget increase. (The opposite is a lot more likely!) I’m sobered by the fact that the House Appropriations Committee recently rejected the administration’s request for an additional $28 million for NAEP and think NAGB needs to keep in mind that any changes that balloon the cost of one assessment will likely result in the bobtailing or elimination of others.

Many of the changes in the proposed reading framework are so extravagantly comprehensive (see, for example “Shift #8”) that either all of NAEP must be changed to align with them—every subject, every grade level, etc.—or else reading will be analyzed and reported completely differently from the rest of NAEP. That’s deeply confusing for everyone and ultimately just unacceptable.

More cosmically, the developers of the present draft yearn for NAEP to be and do something more than it’s capable of and more than Congress ever assigned it to do. NAEP is more like a thermometer than a CT scan. It does one thing pretty well, which is to record the prowess of school children in large units and groups at handling the knowledge and skills they are expected to learn in key subjects. It does not explain why they’re doing that well or poorly. It’s not an experimental design, so it cannot account for causation and it cannot erase performance differences that exist, whatever the reasons may be. The most it can do is offer various correlations.

Of course children differ in their motivation, in their background knowledge, and in the opportunities they have had. At the micro-level, I see many such differences among my three grandchildren, notwithstanding that they share just about every “sociocultural” characteristic noted by the drafters. NAEP really can’t account for those things. It can, of course, distinguish between kids’ reading prowess, and it can divide the student population into the kinds of “subgroups” that are common in federal statistical programs of all kinds and specified in ESSA. (ESSA’s nine subgroups: Economically disadvantaged students, Children with disabilities, English learners, African-American, American Indian/Alaska Native, Asian, Native Hawaiian/Other Pacific Islander, Hispanic or Latino, and White.) Reading is one of the subjects (for grades four and eight only) where NAEP can also “sort” students geographically, i.e., by state and TUDA district. And of course NAEP can look at student populations at various levels of performance (whether on the vertical scale or by achievement level) to see how many of which student populations are in those performance levels.

Within rather severe constraints, NAEP can also seek correlations with school and classroom characteristics, but for this it depends on teacher and principal surveys. It also gathers whatever background information can be furnished by participating student test-takers, but that’s often shaky. And some key correlate data are growing shakier, especially the SES information, as many schools now include all their students, regardless of income, in federal nutrition programs. But NAEP gets into trouble when it gets inquisitive about children’s home circumstances—privacy considerations, touchy parents, ill-informed kids—and it’s also gotten into trouble when it has attempted to “explain” too much that bears on complex societal issues and policy debates as opposed to simply reporting.

All that’s by way of saying that one of the framework developers’ key impulses is a truly worrying overreach for NAEP. I conclude by observing—with some regret—that, on balance, American education and America’s children would be better served by retaining the present reading framework.

Policy Priority:

High Expectations

Topics:

Accountability & Testing

Tags:

No Child Left Behind Act