Beyond the Lab: Melody Huang Builds Tools to Help Experiments Hold Up in the Real World

Authored By 
Rick Harrison
July 22, 2025

Melody Huang sits in her office in front of a fireplace

Melody Huang describes her research as “developing robust statistical methods to credibly estimate causal effects under real-world complications.”

What does that mean? We are glad you asked.

Huang just finished her first year as a fellow at the Institution for Social and Policy Studies, joining Yale as an assistant professor of political science and statistics & data science after a stint as a postdoctoral fellow at Harvard University. She received a Ph.D. in statistics at the University of California, Berkeley and a B.S. in mathematics and economics from the University of California, Los Angeles.

“I love being a part of ISPS,” she said. “I get to talk with so many interesting and smart people, colleagues doing super-interesting research that impacts how we think about American politics. I’m always getting inspired with new methodological ideas to address challenges we have.”

We sat down for a chat with her about her work and how it can improve the accuracy and practicality of social science.

ISPS: A lot of your research involves thinking about the external validity of experiments. In your papers, I often see references to something called an “overlap violation.” What does that mean?

Melody Huang: When conducting an experiment, we can estimate the average effect of the treatment we are studying with only a few assumptions. We can say they are internally valid because we can tease out a causal effect within the context of the study. But when studying public policies, for example, we care about treatment effects not only inside the experimental construct but outside the context of this very controlled setting.

ISPS: Because we live in the real world with many more complicating factors, right?

MH: Yes. To make claims about whether we can generalize our results, we rely on certain assumptions. One assumption is overlap. What we mean by overlap is that the people who are in the population we are interested in affecting through a public policy are represented by study subjects.

ISPS: They need someone from their demographic group?

MH: Yes, but it’s even more complicated. One challenge is that when you run an experiment, it tends to be biased toward people who decide to participate in the experiment. That ends up excluding certain subsets of people.

ISPS: Such as people who don’t have the time, resources, or perhaps interest in signing up for a study.

MH: Yes. There are many reasons you might not get a representative sample of a population for any particular experiment.

ISPS: What’s a real-world example of an overlap violation?

MH: The state of Hawaii recently reached a record lawsuit settlement over a popular blood-thinning drug that was ineffective for people of East Asian and Pacific Island ancestry. This was an overlap violation. There was a significant subset of people not represented in the trial, and yet the company marketed the medication as safe and effective for everyone.

ISPS: And this can be a problem even for social science research?

MH: In an upcoming paper, I stress the importance of addressing these types of overlap violations for biomedical and social science. When we think about policy interventions or about, say, the effectiveness of campaign canvassing, we need to understand how people respond differently. We need to know how we can accurately represent the population we are focusing on. The paper is about these violations, whether you can detect them, and how we can accurately describe how large or small our treatment effect is across the target population.

ISPS: How does your paper help researchers address this problem and enhance external validity?

MH: We’ve developed a sensitivity analysis framework.

ISPS: How does it work?

MH: So, you know these overlap violations exist. But you cannot fix them because you have already run your experiment. Instead, we can vary how much overlap violation exists and see how much that affects the results. As we fail to capture more and more individuals from the target population, we see how this impacts the causal effect we measured. It’s like having two different dials. We can see how the causal effect changes as things go wrong. And allow researchers to calibrate how much they can generalize their results to the wider population.

ISPS: What might that look like from a researcher’s perspective?

MH: Maybe you find that only for a very large overlap violation would your result substantially change. You’d feel really good about that. On the other hand, if you find that a very weak overlap violation would overturn your result, maybe you should be cautious about saying you can generalize your impact.

ISPS: And then what?

MH: Then you can maybe rescope your target population. The idea is for us to help researchers think clearly about what subset of individuals this result can be applied to.

ISPS: What is design sensitivity? What research issue does it help to address?

MH: This is something people have thought about with respect to unobserved confounding.

ISPS: Unobserved? Does this involve observational studies? Which don’t involve assigning a treatment randomly to separate groups, right?

MH: Correct. To say anything about causation requires more assumptions.

ISPS: But it’s not impossible?

MH: Right. You can account for potential confounders to adjust for effects and still estimate a causal effect. Confounders are variables that influence both the supposed cause and effect of what you are studying, possibly getting in the way of a clear understanding of what’s happening. The concern is that you haven’t accounted for all possible confounders. You want to perform your analysis in such a way to minimize sensitivity to unobserved confounders.

ISPS: Ah. This is what you mean by “design sensitivity.”

MH: In general, saying something is causal is a hard thing to do. It relies on assumptions. Many assumptions do not hold upon investigation. Sensitivity analysis asks: How much do they have to be violated to make a meaningful impact? Design sensitivity then in turn asks: How do we design our observational studies to minimize sensitivity to unmeasured confounding?

ISPS: What’s a real-life example of sensitivity analysis in action?

MH: The canonical example would be how, a long time ago, people were trying to figure out if smoking caused lung cancer. For ethical reasons, they could not randomly assign healthy people to a group that would start smoking. But there was observational data showing a relationship between people who smoked and lung cancer. A 1959 paper led by Jerome Cornfield showed that a hypothetical unobserved confounder would have had to be essentially untenably strong to undermine the conclusion that smoking caused lung cancer.

ISPS: You have a preprint paper evaluating the effectiveness of something called the Public Safety Assessment (PSA), an algorithmically generated instrument that provides judges with information about an arrestee’s risk of failing to appear at subsequent court dates, new criminal activity, and new violent criminal activity. Has anyone tested this system before?

MH: Believe it or not, no. But it’s widely used. The question the paper tries to answer is particularly relevant. We have these tools being deployed in courtrooms across the country. Are they actually helping us make better decisions? If not, why?

ISPS: You and your co-authors found that AI recommendations did not significantly enhance the judges’ decision-making accuracy and that AI-alone decisions often resulted in higher false positive rates. What do you think court systems should conclude?

MH: The motivation behind using these algorithmically generated risk scores is to improve human decision making. In general, we are not very good many kinds of decisions. We tend to be biased. There have been studies showing that judges sentence convicted defendants differently before and after lunch. The hope is that using an AI, you can make decisions that are more objective.

ISPS: But this one didn’t seem to solve that problem. It made it worse.

MH: One problem is that it is hard to train these systems. We also cannot examine a counterfactual world — if a judge issues a cash bond for a defendant, we don’t know what would have happened if the defendant was just released on signature bond. We are comparing judges’ performances by themselves to a treatment arm in which they have access to the PSA. We can’t observe the counterfactual, but we can identify some differences. If the PSA were to hypothetically make all sentencing decisions on its own without a human in the loop, the differences between its performance and the judges’ performance are quite large. It would issue a cash bond for more people than necessary.

ISPS: So maybe courts should not be using AI to assist in decision-making?

MH: I don’t think the point of the paper is to not use AI — broadly speaking. But we need to be cautious when deploying these systems. We need to provide tools to evaluate how they are doing.

ISPS: Lots of folks in social sciences are inspired to learn something new about the way we live. Your work focuses on making those findings more reliable and useful. What inspired you to pursue methodology?

MH: I did my undergraduate degree in math and economics. I was always interested in social science problems, which are hard to study. People behave in seemingly unpredictable ways. Talking to different researchers and hearing about their methodological challenges made me interested in developing tools to help.

ISPS: Particularly involving the types of assumptions we were discussing earlier.

MH: Exactly. Social scientists do not want to make more assumptions than necessary. But they sometimes do not go as far as they could. I want to give people the tools to say, yes, we have these assumptions baked into the study. But maybe it’s not as unreliable as we might have thought.

ISPS: What do you think is a common difficulty people have with statistics?

MH: There is a notion out there that there are math people and not-math people. That makes me sad. I think maybe there are good math teachers and teachers who do not always teach math in ways that are accessible for their students. I’m hopeful that there will be more training for people to become more data literate.

ISPS: Statistical processes and conclusions can be counterintuitive. What can we do to help people better understand the ways we can use data to understand our world?

MH: I don’t think anyone should naively believe results they are presented. But there is often not an easy solution for making analysis both accessible and rigorous. I think it’s on the person presenting the results to explain the context. For teachers and researchers, we need to help people understand how we can even arrive at this result. Social scientists can’t avoid making assumptions. The best we can do is be transparent about it.

ISPS: Speaking generally, do you think social scientists are designing their studies properly?

MH: We are. Social science is hard. People behave in different ways. Things change over time. Across the field, we are improving our methods. There’s always room for more improvement, but that’s true about everything. And I’m hopeful more people are learning about the power and benefits of statistics. When applied appropriately, statistics provide you with a rich set of tools to answer questions people care about.

Area of study 
Methodology