Is PISA a Victim of it Own Success? IES Head Calls for Change
The international assessment program faces two significant challenges
Data collection for the 2018 Program for International Student Assessment has just ended. When PISA scores are released later this year, they will attract attention across the globe. However, there are several flaws in PISA that need attending to. Getting PISA right is important because it is likely the world’s best-known education assessment.
Coordinated by the Organization for Economic Cooperation and Development, PISA has grown from testing students in 32 countries in the year 2000 to 80 countries and subnational "entities" (including some Chinese provinces) in 2018. Designed to monitor national education systems through regular assessments employing a common framework, PISA provides benchmarks against which countries can compare their own performance. The United States has supported PISA since its inception, but we in the U.S. Department of Education are now concerned that PISA is in danger of becoming a victim of its own success.
The United States sees two specific challenges potentially undermining the quality of PISA:
- First is PISA’s lack of investment in the research and development needed to ensure the quality of such a complex, wide-ranging, and high-profile assessment program.
- Second is PISA’s short, three-year testing cycle, which made sense in 2000 when PISA started, but, given new testing technologies, is expensive and unnecessary.
We believe that action is needed now to safeguard the future of PISA.
Every large-scale assessment program requires extensive, sustained research and development. For a program with the international visibility of PISA, R&D must be more than basic validation and testing; it should encompass a strategic approach that places R&D at the heart of future planning and creates an infrastructure for validating ideas.
Rather than recognizing how much R&D is needed to get things right, PISA has been trying to roll out a new "innovative" domain every three years. The sad lesson from PISA’s 2018 "global competence" assessment is that there is great risk in rushing to prepare an assessment for a brand new construct every three years without extensive research conducted over a reasonable time period. Indeed, the global-competence assessment was of such low quality that 40 of the countries taking the digitally based version of PISA simply refused to administer it—the United States among them.
A fully functioning R&D operation could help ensure quality, by conducting the work needed to validate or modify proposed changes to PISA’s scale, scope, and design. It also could help turn investments in exploratory, one-time assessments of innovative domains into better-thought-out assessments. Clearly, this R&D function would have to have a large degree of independence from the OECD to resist political and competitive pressures—including the lure of innovation—that now buffet PISA.
Development in PISA is not limited to the innovative domains. The three major domains of math, reading, and science literacy require constant research and updating. PISA essentially has two tasks: measure evolving forms of literacy to help assess the mastery of skills needed to prosper in the changing world and produce trend data on levels of skills across time. The first goal requires an assessment framework that constantly evolves; the second requires reliable measurement over time. Without sufficient development work, PISA is in danger of losing the capacity to address either task adequately.
With proper R&D, PISA could measure trends against past performance, while letting the assessment framework evolve so that any changes in the nature of literacies needed could be taken into account. But meeting that challenge will require careful research and development conducted over more time than PISA now allows.
The OECD also needs to reconsider its three-year testing cycle. This cycle made sense in the early days of PISA when countries were first learning about where their education systems stood relative to others, when there were only three subjects tested, and, perhaps most critically, when accurately measuring all subjects was not feasible on every cycle.
Initially, PISA was designed to assess 15-year-old students in-depth in one of three core subjects—mathematics literacy, reading literacy, and scientific literacy—with one of the three administered as the "main" domain and assessed much more accurately than the others. This meant that in-depth data on student performance in any single domain could be reported only every nine years, which was deemed adequate for countries to monitor their education systems.
Assessment technologies have improved markedly since the launch of PISA in 2000. These improvements mean that PISA will be able to provide in-depth data for each of its core subjects after each administration, rather than waiting nine years.
Today, a five-year testing cycle would produce the information countries need almost twice as often as originally envisioned in PISA, but with a lower financial burden and less strain on each country’s capacity to administer the tests. Most importantly, a longer cycle would allow more resources to be devoted to test development and analysis rather than test administration, allow more time to ensure that PISA reporting and the assessments themselves adhere to the highest quality standards, and give OECD more time to vet innovations.
The PISA program is currently spending too much time and too much money on secondary innovative domains and a three-year survey cycle. Those resources should be redirected into research that extends the existing PISA program to investigate system-level educational interventions that can increase our understanding of what policies are associated with improvements in learning outcomes.
OECD members and other PISA participants have long called for changes to safeguard and improve the quality of PISA. It is time for the OECD to acknowledge these calls and to recognize that it is time for a systematic reconsideration of PISA’s design, periodicity, and quality control.