MacMillan-CSAP Quantitative Research Methods Workshop: Margaret Roberts (UCSD), “Matching Methods for High-Dimensional Data with Applications to Text”
“Matching Methods for High-Dimensional Data with Applications to Text”
Margaret Roberts, Assistant Professor of Political Science, UC San Diego
Abstract: Matching is a popular technique for preprocessing observational data to facilitate causal inference and reduce model dependence by ensuring that treated and control units are balanced along pre-treatment covariates. While most applications of matching balance on a small number of covariates, we identify situations where matching with thousands of covariates may be desirable, such as causal inference where confounders are measured with text. With high-dimensional covariates, traditional matching methods are less effective and may be difficult or impossible to implement. We characterize the problem of matching in a high-dimensional context as a tradeoff between dimension reduction and imbalance bounding. We develop a new method called Topical Inverse Regression Matching (TIRM) that optimizes this tradeoff by including both a low-dimensional projection of covariates and information about the probability of treatment. We illustrate our approach by estimating the effect of censorship on the writing of Chinese bloggers, the effects of gender on citation counts in international relations, and the effects of targeted killings and capture by counterterrorists on the popularity of jihadist writings.
Speaker Bio: Margaret Roberts is an Assistant Professor in the Department of Political Science at the University of California, San Diego. Her research interests lie in the intersection of political methodology and the politics of information, with a specific focus on methods of automated content analysis and the politics of censorship in China. Her work has appeared in the American Journal of Political Science, American Political Science Review, Political Analysis, and Science.