Why What Works in One Place Doesn’t Always Work Elsewhere: Rethinking External Validity

A team of researchers had just shown that a simple intervention could reduce violence against women in one country. Soon after, government officials from another region reached out with a practical question: How should we do this so it works here?
Anna Wilke, an assistant professor of political science at New York University and one of the study’s authors, replied in a way that captured a complication at the heart of social science. She said they should run another experiment in their country.
Last month, Wilke joined experts across statistics and data and social sciences at Yale’s Institution for Social Policy Studies for a conference exploring the concept of external validity. They acknowledged how science has grown adept at figuring out what might work in one setting. But as they delved into today’s most advanced methods, participants grappled with a harder question: Will findings travel to other places and contexts?
“External validity isn’t just a statistical issue — it’s a question that runs through how we design studies, interpret results, and make decisions in the real world,” said Melody Huang, an assistant professor of political science and statistics & data science and faculty fellow at ISPS who organized the conference. “This is one of the central challenges of modern science.”
Huang compared the need for external validity in social science to medicine.
“If a medication is shown to work for adult men, can we be sure it works for women?” Huang said. “Medical studies have shown that is not always the case. And we need to think about the same level of cross-population, cross-location uncertainty in social science.”
Hongseok Namkoong of Columbia Business School argued that modern machine learning changes what uncertainty means and that generalization may come less from theory and more from exposure to diverse data.
“Uncertainty is not about hidden variables,” Namkoong said. “It’s about the fact that you have not yet seen the observations to come.”
Michael Findley from the University of Texas, Austin stressed how policymakers want guidance, not theory.
“They actually want to learn from this exercise,” Findley said. “They want to know: What’s the next practical step?”
He said the data indicate no simple way to combine studies across contexts. And that researchers must always balance relevance in a local context and stability from aggregate evidence, borrowing evidence from both.
Cyrus Samii of New York University reframed generalization as model selection. Simple averages can hide real effects, he said, discussing a multi-country experiment in which averages showed no significant effects, but deeper analysis showed structured behavior.
He argued that knowledge that travels is not a number but a pattern that remains stable across contexts.

“A richer model tells us more about the world,” Samii said. “The question we must answer is whether its claims hold across populations.”
Xinran Miao of the University of Wisconsin, Madison, called for researchers to measure the fragility of their conclusions. She introduced a tool for measuring sensitivity by how much conclusions change when assumptions prove slightly incorrect.
Some studies are inherently more reliable than others, Miao said. Robustness depends on data structure, assumptions, and the precise target of statistical analysis.
In recounting her interactions with government officials, Wilke shared a fundamental disconnect in how different people interact with scientific findings.
“Researchers care about generalization,” she said. “Policymakers care about action.”
She also expressed a critical tension between scientists and the public.
“Communicating uncertainty without losing all trust is very hard,” Wilke said.