How Yale Students are Using Data Science to Predict Election Results

Authored By 
Rick Harrison
October 29, 2024

A cupped hand under a floating glass orb with an American flag inside and a button that says Election Results

Nobody knows for sure who will win the presidential election on Nov. 5. But most professional pollsters agree on one conclusion: It will be close.

As the country awaits the final results, Yale’s Institution for Social and Policy Studies has challenged students to see if they can compete with political operatives, media organizations, and each other to predict vote share and turnout for the presidential, U.S. Senate, and U.S. House of Representatives elections.

“The stakes for this election are tremendously consequential,” said ISPS Director Alan Gerber, Sterling Professor of Political Science. “At the same time, it presents a unique opportunity for students to build up their data science skills and help them better understand the nuances of geographic patterns in American voter behavior.”

More than 70 undergraduate and graduate students signed up for the election prediction contest, sponsored by Democratic Innovations, an ISPS program that identifies and tests new ideas for improving the quality of democratic representation and governance. 

ISPS faculty fellow Josh Kalla supervises the Democratic Innovations’ undergraduate research group and is coordinating the election contest. In addition to the vote tallies, students must also predict when each state’s presidential winner will be declared.

“It’s great to see students from different academic backgrounds work together,” Kalla said. “There are political science students who have limited exposure to computer science. And computer and data science students who are learning that to build a good predictive model, you need a qualitative understanding of American politics.” ISPS faculty fellow Josh Kalla kicks off the 2024 Election Prediction Competition, in which students attempt to forecast voter share and turnout for the presidential, U.S. Senate, and U.S. House of Representatives races.

Bilal Kharrat, a sophomore majoring in political science and history, has only begun taking a course in data science for political campaigns. He has been attempting to identify voter trends and using the computer language Python to analyze the data and identify outliers that could help predict specific demographic outcomes.

“I’ll be honest, I’m not going into this with hopes of creating a perfectly accurate model,” Kharrat said. “I’m trying to develop my technical skills and apply it to this prediction. I’m interested in the process.”

Kyle Thomas Ramos, a political science major, is competing on a team with statistics and data science students.

“When they talk about different regression models that might be best, I’m not as privy to that knowledge,” he said. “It’s nice to learn from each other.”

Ramos said he appreciates applying course content to the real world in a tactical manner, singling out Kalla’s class on data science in political campaigning and a graduate-level course on American political behavior; ISPS faculty fellow Christina Kinane’s introduction to American politics and graduate seminar on policymaking under the separation of powers; ISPS faculty fellow Kevin DeLuca’s instruction on building models in the present based on past information; and skills on using the computer coding language R learned in ISPS faculty fellow Shiro Kuriwaki’s class on analyzing legislative politics.

“I have been a research assistant for Professor Kinane since my first semester,” Ramos said. “She has taught me how to collect, clean, and analyze data and how to navigate different databases that really help with this current assignment.” Contest participants include students majoring in political science, computer science, statistics and data science, and more.

ISPS predoctoral fellows held workshops for students without coding experience. Some participants are focusing on current polling. Others are attempting to update past election results with current data on different demographic groups or other variables that could affect the outcome.

“You need to have some model in your head about what is happening now and what happened before,” Kalla said. “For example, you can start with the 2020 election. But the inflation rate is different. Maybe that will affect the results. How well might that have predicted results in the past?”

Participants can watch the election night news coverage at ISPS to track the results.

“For me, I want to see people think a little bit outside the box,” said Niklas Haehn, an ISPS predoctoral fellow helping to run the contest. “Maybe use some data source we never thought of or get some feeling of what is going on beyond the polls or what we can quantify in some spreadsheet.”

Contest winner will receive a modest prize. Haehn hopes ISPS can engage and grow the group of contest participants with a new challenge on a different topic in the spring.

“It’s really great how many people have been interested in this challenge,” Haehn said. “It is not part of any course curriculum. People are just interested in the competition.”

Area of study 
Political Behavior