From Big Data to Big Solutions: DISSC Empowers Social Scientists at Yale
Big data represents a big challenge for science.
Google’s cloud service defines big data as “extremely large and diverse collections of structured, unstructured, and semi-structured data that continues to grow exponentially over time,” adding that “these datasets are so huge and complex in volume, velocity, and variety, that traditional data management systems cannot store, process, and analyze them.”
And yet such huge repositories of constantly growing and evolving information offer unique, valuable opportunities to better understand and our world and confront its many challenges.
In 2020, a committee comprised of social scientists from across the university made a key recommendation to create a campus hub to administer and support the acquisition, security, and use of new, often massive, datasets currently transforming the work of social science, such as political science, economics, psychology, and sociology.
With university infrastructure resources provided by the Office of the Provost and administrative support from the Institution for Social and Policy Studies (ISPS) and the Tobin Center for Economic Policy, Yale’s new Data-Intensive Social Science Center (DISSC) began to take shape in 2022 before opening for business last year.
Led by Ron Borzekowski, former director of economics for Amazon Web Services and former head of the federal Consumer Financial Protection Bureau’s Office of Research, the DISSC consists of three basic initiatives: supporting social science researchers, consolidating and improving access to strategic data assets, and building a community of social scientists across disciplines to learn from one another.
“The cutting edge of much social science research is now defined by accessing new data sources or matching sources in new ways,” Borzekowski said. “This type of work requires the coordination of several different entities, including negotiating data use agreements or buying the data. It also requires technical knowledge and computing infrastructure to use the data, which can be a burden on faculty members. We can do this faster and better.”
Borzekowski spent months meeting with faculty members to understand how they use large or novel datasets and what they need to better collaborate and advance their work. In partnership with ISPS, Tobin, and Yale University Library, he has designed a center that can connect Yale faculty, students, postgraduates, and staff to the resources they need at various points in the research cycle for conducting data-intensive work in the social sciences.
“Yale has many resources for faculty and students conducting social science research,” Borzekowski said. “But it can be tricky to find and access a specific dataset or source, sometimes requiring legal assistance to negotiate a data use agreement. DISSC can help you with your problem or connect you with someone who can.”
Toward this end, DISSC has launched a website to consolidate and simplify information so that users can more easily identify and access available resources across the university, such as real estate or voter datasets that the Yale libraries have licensed for broad use by individual researchers.
“As we build up the website, it will serve as a resource so that researchers can quickly and easily find what they need to accomplish their goals,” Borzekowski said.
The center has also begun to host seminars, workshops, and other events for researchers to share techniques and promote exchanges of ideas from individuals working in different disciplines. Events last semester included a panel discussion on advancing open scholarship at Yale and an information session about the Yale Federal Statistical Research Data Center that was led by Dr. Shirley H. Liu, an economist and administrator for the New York Federal Statistical Research Data Centers, Baruch Center for Economic Studies, and the U.S. Census Bureau.
Limor Peer, ISPS associate director for research and strategic initiatives, serves as a senior research support specialist for DISSC.
“One goal of DISSC is to convene people in various fields around a common interest,” Peer said. “To improve knowledge through an interdisciplinary exchange.”
DISSC has created an air-gapped computer not connected to the internet, so that researchers can adhere to security protocols in some data use agreements that require data to remain offline and strict control of who accesses the data.
In addition, DISSC is working with curators from several social science departments to launch a new scholarly data repository at Yale, negotiated the acquisition of a novel data set exclusive to Yale that includes a substantial fraction of all U.S. credit and debit card transactions over several years, teamed with Yale’s ITS Department to build processes necessary to support the development and launch of researcher-specific websites for sharing data, and arranged for the purchase of a powerful statistical software package for use across the university.
“ISPS is thrilled to partner with DISSC and excited about the growth and potential of its capacity to advance a core component of modern social science research,” said Alan Gerber, ISPS director and Sterling Professor of Political Science, who chaired the university-wide committee that led to the center’s creation. “Ron and his team are performing a crucial service that stands as model for others to follow.”
Gerber and Tobin Center Faculty Director Steven Berry, David Swensen Professor of Economics, serve as faculty directors of the new center.
Joshua Kalla, ISPS faculty fellow and associate professor of political science, has been working with large data files containing information on registered voters from across the United States and welcomed DISSC’s efforts to build cloud-based software that makes it easier to employ and analyze the data.
“By giving me the tools I need, DISSC has allowed me to spend more time focusing on my research,” Kalla said.
Shiro Kuriwaki, ISPS faculty fellow and assistant professor of political science, has advised his students on original analyses they can produce using datasets obtained with DISSC’s help. Having joined Yale and ISPS in 2022, he expressed gratitude for the new center’s guidance.
“Querying large datasets is a routine part of the social scientist’s work, but we’re often left having to self-teach ourselves industry-standard tools that are proprietary and complex,” Kuriwaki said. “Being able to partner with DISSC staff through this process has been an exciting part of my time at Yale.”
DISSC has begun to hire staff to organize, conduct support, and align disparate efforts across the social sciences into a more effective, cohesive venture.
“I am tremendously grateful for the support we have received across campus,” Borzekowski said. “If we can have the information in one place, we can consult with any faculty member so they can get their work done.”