For the last twenty years, the world of public education has loved to debate the value of standardized testing and policies associated with the results. This debate has become an all-consuming distraction for some, as test scores are concise and easily digestible. Kids either pass or fail. Students in different demographic groups either meet standards or don’t. Little in public education could be clearer in the aggregate. Debates thrive on binary choices, which standardized tests provide. Either they’re good for purposes of accountability, or they’re bad for kids because they’ve narrowed the curriculum. They’re either good for shining a light on the gross inequities in our schools or bad because of all the statistical noise in the results. By engaging in this debate, we avoid the much more difficult questions of resource equity, the legacy of institutional racism, how to organize systems to promote principles of effective schools, the role of poverty, and issues of governance. Debates aren’t for explorations of the messy middle. As we recover from Covid, though, the messy middle is exactly where educators find themselves. A change in the terms of this debate would be a welcome effect of the pandemic-induced pause in standardized testing.
Standardized testing has been part of public education for nearly a century. From the first use of IQ tests to the redesign of school systems according to Taylorism, authorities have craved measurement. Assumptions about the utility of standardized tests are reasonably straightforward. Public education prepares young people to be productive citizens. Public dollars fund public education and must be accounted for. Standardized tests enable state and local authorities to determine the return on the investment of those dollars. It wasn’t until the advent of the standards movement in the late 1980s and early 1990s, and the resulting enactment of the No Child Left Behind Act in 2002, that state standardized testing gained its enormous foothold on K–12 public education. Test results have become the impetus for both increased investments in schools that struggle to meet state standards and public disgrace for those that fall short. While there have been recent efforts under the Every Student Succeeds Act to broaden the public’s conception of success through using multiple sources of data to gauge school effectiveness, standardized testing remains hegemonic in American public education. That hegemony rests on a set of flawed assumptions.
The first flawed assumption endemic to the standardized testing regime is one of high modernism. In a 2020 article in the Harvard Educational Review, Jack Schneider of University of Massachusetts Lowell and Andrew Saultz of Pacific University describe how “the high modernist state … develops quantitative systems designed to measure performance. Such systems, by their nature, tend to ignore the nuances of reality on the ground.” Performance-management systems built on state tests dismiss the human element of change management. Policies based on them are grounded in a technocratic belief about how to improve teaching and learning. Their adherents’ lack of faith in the professionalism of educators and admiration of simple data over all else has not led to appreciably better results, at least as measured by the tests. Technocracy assumes that through performance-management systems that start with a standardized test, adults in schools will be spurred to action. Students take a test, scores come back, schools analyze the data to determine actions meant to improve results, new initiatives such as curriculum, technology, and interventions are funded, educators undergo professional development, the new initiative is implemented, students take the test again in the spring, and results are reviewed to determine what new actions should be taken. This pattern, repeated in school systems throughout the country ad nauseam, doesn’t take into account how adults actually learn new skills that will help them improve their practice.
The standardized-testing hegemony also assumes that test scores are valid measurements of student performance in English language arts and mathematics and that those are the two most important content areas for students to master in order to succeed in 21st-century America. Standardized tests can be useful as blunt instruments to help understand the progress, or lack thereof, that a school or system is making. They can also be the first step in a deeper inquiry into teacher effectiveness and student need. Their use within policies that affect the real life of schools, however, is specious, as they only account for one aspect of student learning. Moreover, some of those polices, such as teacher evaluations that heavily rely on scores and classifications of some schools as failing or succeeding, are simply not grounded in sound practice. State tests tell us something, but more about student demographics and factors external to a school than anything else.
The assumption that ELA and math are the most important content areas for students to master seems obvious. After all, we all need to read, write, and calculate in order to live in the world and be eligible for good, paying jobs. Yet, there are other essential skills that are harder to measure in standardized ways. Critical thinking, problem solving, emotional intelligence, scientific literacy, and civics are just some of the domains that make a well-educated person. These domains are difficult to measure at scale, and definitions of them are likely to differ among districts within a state and throughout the country. Civic education in liberal Montgomery County, Maryland, where I was superintendent of schools, can lead to the Board of Education debating whether students should have excused absences for attending protests in neighboring D.C. Another board, even in a blue state like Maryland, may not agree. Efforts to address student social-emotional learning and emotional intelligence are increasing exponentially, yet measuring these competencies is a new and tenuous proposition, and there isn’t agreement throughout the country on what the terms even mean. Using them within a state, let alone as part of a federal accountability system, is fraught with difficulties. Thus, we’re left to succumb to the narrow dominance of ELA and math as indicators of success. Curriculum has been narrowed as a result, and an entire generation of students have come to believe that their value is reflected in their test scores.
We know the mixed results of the last twenty years of reform. We don’t, however, know the counterfactual. What would our schools look like today, and how well prepared would our children be, if we had focused our collective energies on what we know actually works to improve outcomes? Let’s say, for example, that the enormous effort put into convincing state legislators of the value of standardized testing had gone instead towards equitable funding formulas. Money matters in public education, especially for serving the most vulnerable students. Yet funding formulas still perpetuate gross inequities. Moreover, given what we know about the relationship between a family’s economic status and student achievement, what would have happened if the collective effort of reforms had been focused on addressing entrenched societal issues? If states and communities had invested in wraparound services, public transportation that enables the working poor to get to jobs, better housing, affordable preventative healthcare, and food security, we would expect our children to have better academic outcomes.
Within schools, what would have happened if all of the money and energy that’s gone into standardized testing had instead been invested in strengthening the profession and scaling what actually works to improve schools? We know that great schools have a foundation of internal accountability, constant and collaborative professional learning, strong family engagement, distributed leadership, and a rich curriculum (among other things). What if districts had been incentivized to organize change efforts around those elements, rather than focus solely on ELA and math achievement? Simply put, we don’t know what would have happened because such efforts have been in spite of national and state agendas, not because of them. We do know, however, that what we have invested in hasn’t brought us nearly the results that we need for our kids.
So, what can we do going forward? How might we ensure excellence, equity, and accountability without annual standardized tests? I want to be clear; I am not arguing for a collective dismissal of the use of data by educators at all levels, nor do I believe we should be shirking accountability. I actually think the opposite. We need more and better data to make decisions in schools, districts, and states, and we need better accountability metrics in order to weed out mediocrity. We must also continue to ensure that data can be disaggregated according to student demographics, given their correlation to school success and our moral imperative to address the needs of the most vulnerable. The Covid crisis has revealed to a larger public the gross inequities in our schools and the needs of our children, and it provides an opportunity to create a new baseline.
First, let’s use testing data as entry points to asking better questions. Start by ending the practice of giving annual tests to all kids in most grades. If the unit of change for improvement efforts is (or should be) the school, a sample of students can fulfill external accountability requirements while educators within schools, and with help from central office, focus on the needs of individual students and staff based on more authentic assessment of student learning. Many other nations use a sampling methodology rather than a census one. The National Assessment of Educational Progress, or NAEP, a no-stakes test considered the nation’s report card, is a sample of students, as is the Program for International Student Assessment, or PISA, and both are used to decry and celebrate performance. Sampling is an effective method to understand patterns and can be used as a launching pad to probe further into deficiencies and strengths. A sample of 3rd graders in every school in reading every year can be one part of a process to determine whether a school’s approach to literacy is effective. Start this spring, with a quick turnaround time for results, and then use those data at the local level to make decisions about interventions and supports. The standardized test in this scenario can allow a school and district to understand its status relative to a standard set by the district or state. A board and superintendent can then allocate the necessary resources to help the school improve its practice, without having to subject every student to sitting through a standardized test. I believe we should also sample test for ELA and Math in 5th and 8th grade. These would give a district’s leadership a sense of where the school stands in relation to similar schools and whether its leadership and improvement strategies are having the desired effect. There would, of course, need to be a commensurate investment in formative assessments and professional learning to increase teachers’ knowledge and skills about assessment. But the return on that investment would be significant, as it would allow for teachers to more quickly address the individual needs of students. It would significantly diminish the amount of time and energy spent on preparing each child to pass a standardized test, which could then be put towards actually improving teacher practice that is more likely to lead to increased achievement.
In high schools we have enough measures to determine whether students are ready for college and careers. Industry certification tests, Advanced Placement and International Baccalaureate exams, the SAT and ACT, higher level course-taking, writing samples, lab reports, community service, extracurricular participation, and acceptance into college without the need for remediation are just some of the indicators that can be used. Rather than spend time and energy on a standardized exam in 10th grade, focus on eliminating low-level courses and revising curriculum to be engaging, problem based, and culturally relevant. Those steps actually have an impact on student performance. In the meantime, invest in developing teacher capacity to conduct and use formative assessments to adjust instruction.
Federal and state oversight of public schools through the high-modernist regime of standardized testing isn’t having the desired effect. If ever there were a time to reduce tests and help orient schools towards equitable instructional practices that actually increase student achievement, that time is now. Surely many will see this as a retreat from accountability and a dismissal of equity, as annual census testing is assumed to ensure that we know each child’s status and the gross inequities in our schools. But we also know it hasn’t worked for the last twenty years. While teachers, parents, communities, and schools have come to value the role of public education more than ever during the Covid crisis, we owe it to them to focus our collective efforts on what actually works, not on a theory of action that has been proved false.
Joshua P. Starr is chief executive officer of PDK International. Before that, he was superintendent of schools in Montgomery County, Maryland, and in Stamford, Connecticut.