|
I am an Assistant Professor of Political Science at Northeastern University, a core faculty member of the NULab for Text, Maps and Networks, and a core faculty member of the Network Science Institute. My research examines how political opinions form and change as a result of discussion, deliberation and argument in domains such as legislatures, campaigns, social media, and the judiciary, using techniques from machine learning, automated text analysis, and social network analysis.
|
Research
Research Interests
American Politics: Political Behavior, Politican Communication, Political Psychology, Congress, Online and Social Networks
Political Methodology: Quantitative Text Analysis, Machine Learning, Bayesian Methods, Networks, Agent-based Models
Publications
Journals
"Why Keep Arguing? Predicting Engagement in Political Conversations Online," Sage Open Forthcoming, 2019. (with Sarah Shugars)
ABSTRACT PDF
Individuals acquire increasingly more of their political information from social media, and ever more of that online time is spent in interpersonal, peer-to-peer communication and conversation. Yet many of these conversations can be either acrimoniously unpleasant, or pleasantly uninformative. Why do we seek out and engage in these interactions? Who do people choose to argue with, and what brings them back to repeated exchanges? In short, why do people bother arguing online? We develop a model of argument engagement using a new dataset of Twitter conversations about President Trump. The model incorporates numerous user, tweet, and thread-level features to predict user participation in conversations with over 98% accuracy. We find that users are likely to argue over wide ideological divides, and are increasingly likely to engage with those who are different from themselves. Additionally, we find that the emotional content of a tweet has important implications for user engagement, with negative and unpleasant tweets more likely to spark sustained participation. Though often negative, these extended discussions can bridge political differences and reduce information bubbles. This suggests a public appetite for engaging in prolonged political discussions that are more than just partisan potshots or trolling, and our results suggest a variety of strategies for extending and enriching these interactions.
"Microblog Conversation Recommendation via Joint Modeling of Topics and Discourse," Proceedings of the North American Association for Computational Linguistics 2018. (with Xingshan Zeng, Jing Li, Lu Wang, Sarah Shugars, Kam-Fai Wong)
ABSTRACT PDF
Millions of conversations are generated every day on social media platforms. With limited attention, it is challenging for users to select which discussions they would like to participate in. Here we propose a new method for microblog conversation recommendation. While much prior work has focused on post-level recommendation, we exploit both the conversational context, and user content and behavior preferences. We propose a statistical model that jointly captures: (1) topics for representing user interests and conversation content, and (2) discourse modes for describing user replying behavior and conversation dynamics. Experimental results on two Twitter datasets demonstrate that our system outperforms methods that only model content without considering discourse.
"Winning on the Merits: The Joint Effects of Content and Style on Debate Outcomes," Transactions of the Association for Computational Linguistics 2018 (with Lu Wang, Sarah Shugars, and Kechen Qin)
ABSTRACT PDF
Debate and deliberation play essential roles in politics and government, but most models of debate presume that debates are won mainly via superior style or agenda control. Ideally, however, debates would be won on the merits, as a function of which side has the stronger arguments. We propose a predictive model of debate that estimates both the effects of linguistic features and the latent persuasive strengths of different topics, as well as the interactions between the two. Using a dataset of 118 Oxford-style debates, our model's combination of content (as latent topics) and style (as linguistic features) allows us to predict audience-adjudicated winners with 74% accuracy, significantly outperforming linguistic features alone (66%). Our model finds that winning sides employ stronger arguments, and allows us to identify the linguistic features associated with strong or weak arguments.
"Predicting and Interpolating State-level Polls using Twitter Textual Data," American Journal of Political Science 2017
ABSTRACT PDF
Spatially or temporally dense polling remains both difficult and expensive using existing survey methods. In response, there have been increasing efforts to approximate various survey measures using social media, but most of these approaches remain methodologically flawed. To remedy these flaws, this paper combines 1200 state-level polls during the 2012 presidential campaign with over 100 million state-located political Tweets; models the polls as a function of the Twitter text using a new linear regularization feature-selection method; and shows via out-of-sample testing that when properly modeled, the Twitter-based measures track and to some degree predict opinion polls, and can be extended to unpolled states and potentially sub-state regions and sub-day timescales. An examination of the most predictive textual features reveals the topics and events associated with opinion shifts, sheds light on more general theories of partisan difference in attention and information processing, and may be of use for real-time campaign strategy.
"What Terrorist Leaders Want: A Content Analysis of Terrorist Propaganda Videos," Studies in Conflict and Terrorism, 2016 (with Max Abrahms and Joseph Mroszczyk)
ABSTRACT PDF
In recent years, a growing body of empirical research suggests that indiscriminate violence against civilian targets tends to carry substantial political risks compared to more selective violence against military targets. To better understand why terrorist groups sometimes attack politically suboptimal targets, scholars are increasingly adopting a principal-agent framework where the leaders of terrorist groups are understood as principals and lower level members as agents. According to this framework, terrorist leaders are thought to behave as essentially rational political actors, whereas lower level members are believed to harbor stronger non-political incentives for harming civilians, often in defiance of leadership preferences. We test this proposition with an original content analysis of terrorist propaganda videos. Consistent with the principal-agent framework, our analysis demonstrates statistically that terrorist leaders tend to favor significantly less indiscriminate violence than their operatives actually commit, providing unprecedented insight into the incentive structure of terrorist leaders relative to the rank-and-file.
"A Bottom-up Approach to Linguistic Persuasion in Advertising," (Research Note) The Political Methodologist, 2011.
Book Chapters
"Modeling and Measuring Deliberation Online," Book chapter, Oxford Handbook of Networked Communication, 2018.
ABSTRACT PDF
Online communication is often characterized as predominated by antagonism or groupthink, with little in the way of meaningful interaction or persuasion. This essay examines how we might detect and measure instances of more productive conversation online, considered through the lens of deliberative theory. It begins with an examination of traditional deliberative democracy, and then explores how these concepts have been applied to online deliberation, and by those studying interpersonal conversation in social media more generally. These efforts to characterize and measure deliberative quality have resulted in a myriad of criteria, with elaborate checklists that are often as superficial as they are complex. This essay instead proposes that we target what is arguably the core deliberative process -- a mutual consideration of conceptually interrelated ideas in order to distinguish the better from the worse and to construct better conceptual structures. The essay finishes by discussing two computational models of argument quality and interdependence as templates for richer, scalable, nonpartisan measures of deliberative discussion online.
"Measuring Public Opinion with Social Media Data," Book chapter, Oxford Handbook of Polling and Polling Methods, 2017 (with Marko Klasnja, Pablo Barbera, Joshua Tucker and Jonathan Nagler)
ABSTRACT PDF
This chapter examines the use of social networking sites such as Twitter in measuring public opinion. It first considers the opportunities and challenges that are involved in conducting public opinion surveys using social media data. Three challenges are discussed: identifying political opinion, representativeness of social media users, and aggregating from individual responses to public opinion. The chapter outlines some of the strategies for overcoming these challenges and proceeds by highlighting some of the novel uses for social media that have fewer direct analogs in traditional survey work. Finally, it suggests new directions for a research agenda in using social media for public opinion work.
Other formats
"Visualizing Biographies of Artists of the Middle East," The Amory Art Show, New York, March 2015
The State of the Union Address in a Single Image The Monkey Cage, Washingtonpost.com, January 2015
A Network Analysis of the Ferguson Witness Reports The Monkey Cage, Washingtonpost.com, December 2014
"The Ideological Position of Obama's SOTU Relative to Past Presidents," The Monkey Cage, Washingtonpost.com, January 2012
"Findings of an independent panel on allegations of statistical evidence for fraud during the 2004 Venezuelan Presidential recall referendum," in Observing the Venezuela Presidential Recall Referendum: Comprehensive Report, The Carter Center, 2004 (with Henry Brady, Richard Fowles, Aviel Rubin, and Jonathan Taylor)
Research in the news
Moving through a 'space of hate' NiemanLab, August 2018
This algorithm identifies the key ingredients to winning a debate Digital Trends, June 2018
"The Persuasion Principle," Impact: Journal of the Market Research Society, London UK, January 2016
Inside the Message Machine that Could Make Politicians More Persuasive NPR's All Things Considered, October 2015
An Algorithm to Help Politicians Pander Wired magazine, October 2015
How to Make Your Speeches Better, Automatically Pacific Standard magazine, September 2015
Working Papers
(Please feel free to email me for working drafts of any of these papers.)
"Climbing Mount Obamacare: Experimentally Optimized Textual Treatments,"
ABSTRACT
Advertisements, talking points, and online and social media content often take the form of short chunks of spoken or written text, yet crafting these short documents remains more art than science due to the extremely high-dimensional nature of textual content. Focus groups, A/B testing, and substantive theories of political opinion can suggest the general themes for a persuasive text, but do little to help shape it on the word or sentence level. Recent progress has been made in moving beyond binary experimental testing into higher dimensions \cite{hainmueller2014causal}, harkening back to earlier work outside of political science in fractional factorial design, but such approaches remain insufficient for the extreme case of free-form textual optimization. This paper instead develops a new approach and machine learning optimizer to craft short chunks of text in order to maximize their persuasive impact. First, a large collection of sentences in support of Obamacare are scraped from obamacarefacts.com and parameterized via a simple 7-topic LDA. Each text treatment comprises three sentences from this pool, parameterized in 21D-space. The persuasive effects of each treatment is assessed using Mechanical Turk subjects, and these treatments iteratively improved using nonlinear optimization over the 21D parameter space to suggest progressively more persuasive three-sentence combinations. A new optimization algorithm is designed specifically for this class of problem and shown though extensive Monte Carlo testing to out-perform the dominant existing method. Furthermore, lasso techniques allow us to assess the response surface using the samples collected during the maximization procedure, allowing more general inferences about the individual and interactive effects of the topics on opinion. Together, these procedures constitute a new approach to designing and testing new persuasive text, and not just assessing the effects of existing treatments.
"Visualizing and Modeling Rhetorical Structures in Individual Documents,"
ABSTRACT
This paper develops a new tool for the visualization and analysis of individual documents, representing the progression of a document as a trajectory through its own word-space. We present two models for estimating this document trajectory, the first a slower probabilistic generative model, and the second a faster approach using principal component analysis, which produce similar results. Document trajectories are then analyzed for a large corpus of important political speeches, with the goal of identifying characteristic topological patterns than may reflect hidden structural patterns underlying substantively very different speeches. Speech trajectories are clustered into topologically similar patters using affine transformations, revealing both the most common rhetorical patterns in these speeches, and differences in core patterns between speakers (such as Bush vs Obama) and over time. Modeling temporal semantic structures in single documents opens the way for a new analysis of rhetorical structure in political speech, and ultimately understanding the effects of these structures on listeners.
"Using Text to Scale Legislatures with Uninformative Voting"
ABSTRACT
This paper shows how legislators' written and spoken text can be used to ideologically scale individuals even in the absence of informative votes, by positioning members according to their similarity to two reference texts constructed from the aggregated speech of every member of each of two major parties. The paper develops a new Bayesian scaling that is more theoretically sound that the related Wordscores approach, and a new vector-based scaling that works better than either at matching the vote-based scaling DW-Nominate in the US Senate. Unsupervised methods such as Wordfish or principal component analysis are found to do less well. Once validated in the US context, this approach is then tested in a legislature without informative voting, the UK House of Commons. There the scalings successfully separate members of different parties, order parties correctly, match expert and rebellion-based scalings reasonably well, and work across different years and even changes in leadership. The text-based scaling developed here both matches existent scalings fairly well, and may be a much more accurate window into the true ideological positions of political actors in legislatures and the many other domains where textual data are plentiful.
"A Bottom-up Approach to Linguistic Persuasion in Advertising"
ABSTRACT , POSTER
This paper presents a new, bottom-up approach to modeling the effects of television advertising on vote intention. Because ads are so numerous and varied, existing studies generally begin with a specific theory of persuasion, or must simplify the data down to a few latent dimensions or effective ads. Instead, this new approach first develops a one-at-a-time regression technique to estimate the effects of hundreds of different ads on vote intention during the 2004 presidential campaign. The aggregate effect of advertising is found to be significant, though many individual ads have small or backfiring effects. To explain these varying effects, new automated text analysis procedures are developed which can predict the effects of ads based only on their text, and reveal complex and asymmetric strategies that mix affect, policies, issue ownership, negativity, and targeting. This bottom-up procedure constitutes a new method for understanding persuasion in campaigns and politics more broadly.
" 'Someone is Wrong on the Internet': Political Argument as the Exchange of Conceptually Networked Ideas"
ABSTRACT , POSTER
Political opinion, and hence political behavior, is shaped largely via talk: with family, friends, and increasingly, online. But such discussions are often taken to be unstructured ideological posturing with little persuasive effect. This paper instead proposes a more deliberative model, where argument consists of the strategic exchange of topics, frames and ideas that are interconnected in a complex conceptual network. This network of ideas is inferred using new Bayesian topic modeling methods applied to a new dataset of millions of political discussions from the largest political forum online. By modeling arguments as a Markov process with the network as transition matrix, we can predict what topics arguers will deploy in response to each other: contrary to framing or expressive models of speech which predict that speakers will echo or ignore their interlocutor, this new model shows discussion to be more deliberative, where speakers offer ideas, facts, and topics relevant to, but missing from, what their interlocutor has said. In the longer term, panel vector autoregression methods reveal that a significant subset of users appear to change their views in response to what they hear, although listeners are biased against speech too unlike their own. Finally, because users can vote to recommend posts, factor analysis of this voting data reveals a strong underlying ideological dimension, largely centering (on this mainly Democratic forum) around criticism or praise of Obama. This ideological behavior is illuminated by the conceptual network: we see criticism of Obama largely based on left-wing policy issues, whereas praise is largely emotional and personal. This asymmetry is consistent with numerous psychological models of ideology, and also reveals which discussed topics may influence voting behavior in the long term. This text-based network of ideas allows us to model both the short-term dynamics of political argument and long-term opinion change, using a framework that is as rich, complex, and substantively interpretable as people have always claimed their arguments were.
"Predicting and Explaining Supreme Court Decisions Using the Texts of Briefs and Oral Arguments"
ABSTRACT
This paper finds that the decisions of the Supreme Court over the last 10 years can be systematically predicted using the text of the briefs and oral arguments that precede those decisions. This automated text analysis also sheds new light on the variety of mechanisms underlying decisions, including the tradeoffs between political and procedural decision-making. Support vector machines and ensembles of univariate regressions are first used to predict decisions of the court as a whole and of individual judges, with up to 62\% (out-of-sample) accuracy -- better than many experts. These ensemble methods are then used to extract a small subset of words that are significantly associated with tendencies in the court and individual judges to vote more liberally or more conservatively. These terms reveal a large array of different decision-making strategies at work, with implications for the ongoing debates between legal reasoning and precedent on the one hand, and policy preferences and ideological attitudes on the other. The decisions of the conservative justices appear especially predictable, revealing a tendency to vote more conservatively on constitutional and criminal topics, but more liberally when briefs emphasize the intentions of Congress and statutory language. Analysis of oral transcripts reveals how the qualities of debate affect decisions, in particular a tendency for some conservative judges to vote more liberally when the conversation is intense but marked with laughter. Together, the text of briefs and oral arguments provide distinct but complementary insights into the varied decision-making of different justices. More broadly, this approach both constitutes a useful predictive tool, and sheds light on numerous existing theories of judicial decision-making while suggesting many new avenues of exploration.
"Blossom: A new evolutionary strategy optimizer with applications to matching and sampling"
ABSTRACT , POSTER , R code
This paper introduces a new maximization and sampling algorithm, "Blossom," along with an associated R package, which is especially well suited to rugged functions where even approximate gradient methods are unfeasible, such as those encountered in highly nonlinear likelihoods and in matching for causal inference. The Blossom algorithm is an evolutionary strategy related to the Estimation of Multivariate Normal Algorithm (EMNA) or Covariance Matrix Adaptation (CMA), within the general family of Estimation of Distribution Algorithms (EDA). It works by successive iterations of sampling, selecting the highest-scoring subsample, and using the variance-covariance matrix of that subsample to generate a new sample, with various self-adapting parameters. Compared against a benchmark suite of challenging functions introduced in Yao, Liu, and Lin (1999), it finds equal or better maxima to those found by the genetic algorithm Genoud introduced in Mebane and Sekhon (2011). The algorithm is then applied to two real-world problems from political science: maximizing a difficult likelihood function combining both utility and spatial metric parameters, and two high-dimensional matching problems, where it produces better results than many existing packages in R such as GenMatch (Sekhon, 2011). Finally, to utilize the samples generated in the process of maximization to accurately sample from the posterior likelihood, approximate voronoi cells around sample points are used to approximate numerical integrals. This sampling method is shown to produce better results than a simple metropolis MCMC sampler using benchmark distributions such as the "banana" function.
"How do we combine issues? Simultaneously Estimating Spatial Metrics and Utility Functions"
ABSTRACT
Most spatial models of preference assume that the spaces in question are Euclidean, and that utility functions are quadratic. Although increasing work has recently been done in estimating utility functions from empirical data, and some theoretical work has been done with non-Euclidean spatial metrics, relatively little has been done to estimate spatial metrics from empirical data in the political context. This paper employs maximum likelihood techniques to directly estimate both spatial metrics and utility functions from ANES survey data. A simulation is also conducted to confirm that these methods can indeed accurately recover spatial and utility parameters. The results show that in the most general case, the spatial metric appears close to Euclidean, but the utility function is much less "risk-adverse" than generally assumed. Furthermore, different combinations of issues produce different estimates for both spatial metrics and utility functions, although in all cases the utility functions are far from quadratic. Of practical importance, coefficients on policy variables appear to vary with different spatial metrics and utility functions, indicating that assumptions made about the metric of a space may be biasing empirical results.
Teaching
Northeastern
Bayesian and Network Statistics, NETS 7983
Introduction to Computational Statistics, PPUA 6301 (Syllabus)
Social Network Analysis, POLS 7334 (Syllabus)
Congress, POLS 3300 and POLS 7251 (Syllabus)
Quantitative Techniques, POLS 2400
Previously
Social Networks, Columbia University, Spring 2013
Data Analysis for the Social Sciences, Columbia University, Fall 2012, Spring 2013 (Syllabus)
Math for Political Scientists, Columbia University, Fall 2012 (Syllabus)
Power and Politics in America, Teaching Assistant, NYU, Spring 2011
Math for Political Science, Teaching Assistant, NYU, Fall 2008
Game Theory I, Teaching Assistant, NYU, Spring 2008
Quantitative Methods I, Teaching Assistant, NYU, Fall 2007
Education and Employment
Assistant Professor, Department of Political Science, Northeastern University, 2013-
Lecturer in Discipline, Department of Political Science and Quantitative Methods in the Social Sciences Program, Columbia University, 2012-2013
Ph.D., Political Science, New York University, September 2012
M.A., Literature in English, Johns Hopkins University, 2001
B.A., Philosophy, English, Yale University, 1996
Nicholas Beauchamp
Department of Political Science
960A Renaissance Park
360 Huntington Avenue
Northeastern University
Boston, MA 02115
Office: RP 931 & Network Science Institute, Fl2
Email: n D0T beauchamp @northeastern.edu
Web: nickbeauchamp.com