|
I am an Associate Professor in the Department of Political Science and the Network Science Institute at Northeastern University. My research examines how political opinions form and change as a result of discussion, deliberation and argument in domains such as legislatures, campaigns, and social media, using techniques from natural language processing, Bayesian statistics, and network analysis.
|
Research
Research Areas
American Politics: Political Communication, Social Networks, Deliberation, Congress, Political Psychology
Political Methodology: Natural Language Processing, Network Analysis, Machine Learning, Bayesian Methods
Publications
Peer-reviewed Journals and Proceedings
"This Candle Has No Smell": Detecting the effect of Covid anosmia on Amazon reviews using Bayesian Vector Autoregression Proceedings of the International AAAI Conference on Web and Social Media (ICWSM) 16(1), 1363-1367. (N. Beauchamp) 2022.
Abstract Paper Poster Presentation Data Update to 6/1/22: reviews are now predictive
While there have been many efforts to monitor or predict Covid using digital traces such as social media, one of the most distinctive and diagnostically important symptoms of Covid -- anosmia, or loss of smell -- remains elusive due to the infrequency of discussions of smell online. It was recently hypothesized that an inadvertent indicator of this key symptom may be misplaced complaints in Amazon reviews that scented products such as candles have no smell. This paper presents a novel Bayesian vector autoregression model developed to test this hypothesis, finding that "no smell" reviews do indeed reflect changes in US Covid cases even when controlling for the seasonality of those reviews. A series of robustness checks suggests that this effect is also seen in perfume reviews, but did not hold for the flu prior to Covid. These results suggest that inadvertent digital traces may be an important tool for tracking epidemics.
"POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection," Proceedings of the North American Association for Computational Linguistics (NAACL, Short papers) (Y. Liu, X. F. Zhang, D. Wegsman, N. Beauchamp, L. Wang) 2022.
Abstract Paper
Ideology is at the core of political science research. Yet there still does not exist general-purpose tools to characterize and predict ideology across different genres of text. To this end, we study Pretrained Language Models using novel ideology-driven pretraining objectives that rely on the comparison of articles on the same story written by media of different ideologies. We further collect a large-scale dataset, consisting of more than 3.6M political news articles, for pretraining. Our model POLITICS outperforms strong baselines and the previous state-of-the-art models on ideology prediction and stance detection tasks. Further analyses show that POLITICS is especially good at understanding long or formally written texts, and is also robust in few-shot learning scenarios.
"A Multisource Database Tracking the Impact of the COVID-19 Pandemic on the Communities of Boston, MA" Nature Scientific Data. (A. Ristea et al.) 2022.
Abstract Paper
A pandemic, like other disasters, changes how systems work. In order to support research on how the COVID-19 pandemic impacted the dynamics of a single metropolitan area and the communities therein, we developed and made publicly available a "data-support system" for the city of Boston. We actively gathered data from multiple administrative (e.g., 911 and 311 dispatches, building permits) and internet sources (e.g., Yelp, Craigslist), capturing aspects of housing and land use, crime and disorder, and commercial activity and institutions. all the data were linked spatially through BaRI's Geographical Infrastructure, enabling conjoint analysis. We curated the base records and aggregated them to construct ecometric measures (i.e., descriptors of a place) at various geographic scales, all of which were also published as part of the database. the datasets were published in an open repository, each accompanied by a detailed documentation of methods and variables. We anticipate updating the database annually to maintain the tracking of the records and associated measures.
"DebateVis: Visualizing Political Debates for Non-Expert Users." IEEE VIS Short Papers (L. South, M. Schwab, N. Beauchamp, L. Wang, J. Wihbey and M.A. Borkin) 2020.
Abstract Paper
Political debates provide an important opportunity for voters to observe candidate behavior, learn about issues, and make voting decisions. However, debates are generally broadcast late at night and last more than ninety minutes, so watching debates live can be inconvenient, if not impossible, for many potential viewers. Even voters who do watch debates may find themselves overwhelmed by a deluge of information in a substantive, issue-driven debate. Media outlets produce short summaries of debates, but these are not always effective as a method of deeply comprehending the policies candidates propose or the debate techniques they employ. In this paper we contribute reflections and results of an 18-month design study through an interdisciplinary collaboration with journalism and political science researchers. We characterize task and data abstractions for visualizing political debate transcripts for the casual user, and present a novel tool (DEBATEVIS) to help non-expert users explore and analyze debate transcripts.
"Educational Accountability and State ESSA Plans," Education Policy (J. Portz and N. Beauchamp) 2020.
Abstract Paper
This paper examines different state approaches to educational accountability in response to the Every Student Succeeds Act. Cluster analysis reveals three groups of states with similar indicator weights and rating systems, and principal component analysis identifies two dimensions underlying these clusters. We find that state-level demographics are correlated with the types of assessment policies adopted by states: policy liberalism is associated with putting greater weight on school quality and student success, while economic variables are associated with traditional performance measures, such as graduation rates and testing. These clusters reveal different approaches to measuring accountability and prioritizing different kinds of information, which can in turn influence the nature of education politics.
"Why Keep Arguing? Predicting Engagement in Political Conversations Online," Sage Open (S. Shugars and N. Beauchamp) 2019.
Abstract Paper
Individuals acquire increasingly more of their political information from social media, and ever more of that online time is spent in interpersonal, peer-to-peer communication and conversation. Yet many of these conversations can be either acrimoniously unpleasant, or pleasantly uninformative. Why do we seek out and engage in these interactions? Who do people choose to argue with, and what brings them back to repeated exchanges? In short, why do people bother arguing online? We develop a model of argument engagement using a new dataset of Twitter conversations about President Trump. The model incorporates numerous user, tweet, and thread-level features to predict user participation in conversations with over 98% accuracy. We find that users are likely to argue over wide ideological divides, and are increasingly likely to engage with those who are different from themselves. Additionally, we find that the emotional content of a tweet has important implications for user engagement, with negative and unpleasant tweets more likely to spark sustained participation. Though often negative, these extended discussions can bridge political differences and reduce information bubbles. This suggests a public appetite for engaging in prolonged political discussions that are more than just partisan potshots or trolling, and our results suggest a variety of strategies for extending and enriching these interactions.
"Microblog Conversation Recommendation via Joint Modeling of Topics and Discourse," Proceedings of the North American Association for Computational Linguistics(X. Zeng, J. Li, L. Wang, N. Beauchamp, S. Shugars and K.F. Wong) 2018.
Abstract Paper
Millions of conversations are generated every day on social media platforms. With limited attention, it is challenging for users to select which discussions they would like to participate in. Here we propose a new method for microblog conversation recommendation. While much prior work has focused on post-level recommendation, we exploit both the conversational context, and user content and behavior preferences. We propose a statistical model that jointly captures: (1) topics for representing user interests and conversation content, and (2) discourse modes for describing user replying behavior and conversation dynamics. Experimental results on two Twitter datasets demonstrate that our system outperforms methods that only model content without considering discourse.
"Winning on the Merits: The Joint Effects of Content and Style on Debate Outcomes," Transactions of the Association for Computational Linguistics (L. Wang, N. Beauchamp, S. Shugars and K. Qin) 2018.
Abstract Paper
Debate and deliberation play essential roles in politics and government, but most models of debate presume that debates are won mainly via superior style or agenda control. Ideally, however, debates would be won on the merits, as a function of which side has the stronger arguments. We propose a predictive model of debate that estimates both the effects of linguistic features and the latent persuasive strengths of different topics, as well as the interactions between the two. Using a dataset of 118 Oxford-style debates, our model's combination of content (as latent topics) and style (as linguistic features) allows us to predict audience-adjudicated winners with 74% accuracy, significantly outperforming linguistic features alone (66%). Our model finds that winning sides employ stronger arguments, and allows us to identify the linguistic features associated with strong or weak arguments.
"Predicting and Interpolating State-level Polls using Twitter Textual Data," American Journal of Political Science (N. Beauchamp) 2017.
Abstract Paper
Spatially or temporally dense polling remains both difficult and expensive using existing survey methods. In response, there have been increasing efforts to approximate various survey measures using social media, but most of these approaches remain methodologically flawed. To remedy these flaws, this paper combines 1200 state-level polls during the 2012 presidential campaign with over 100 million state-located political Tweets; models the polls as a function of the Twitter text using a new linear regularization feature-selection method; and shows via out-of-sample testing that when properly modeled, the Twitter-based measures track and to some degree predict opinion polls, and can be extended to unpolled states and potentially sub-state regions and sub-day timescales. An examination of the most predictive textual features reveals the topics and events associated with opinion shifts, sheds light on more general theories of partisan difference in attention and information processing, and may be of use for real-time campaign strategy.
"What Terrorist Leaders Want: A Content Analysis of Terrorist Propaganda Videos," Studies in Conflict and Terrorism (M. Abrahms, N. Beauchamp and J. Mroszczyk) 2016.
Abstract Paper
In recent years, a growing body of empirical research suggests that indiscriminate violence against civilian targets tends to carry substantial political risks compared to more selective violence against military targets. To better understand why terrorist groups sometimes attack politically suboptimal targets, scholars are increasingly adopting a principal-agent framework where the leaders of terrorist groups are understood as principals and lower level members as agents. According to this framework, terrorist leaders are thought to behave as essentially rational political actors, whereas lower level members are believed to harbor stronger non-political incentives for harming civilians, often in defiance of leadership preferences. We test this proposition with an original content analysis of terrorist propaganda videos. Consistent with the principal-agent framework, our analysis demonstrates statistically that terrorist leaders tend to favor significantly less indiscriminate violence than their operatives actually commit, providing unprecedented insight into the incentive structure of terrorist leaders relative to the rank-and-file.
"A Bottom-up Approach to Linguistic Persuasion in Advertising," (Research Note) The Political Methodologist (N. Beauchamp) 2011.
Book Chapters
"Modeling and Measuring Deliberation Online," Book chapter, Oxford Handbook of Networked Communication (N. Beauchamp) 2018.
Abstract Paper
Online communication is often characterized as predominated by antagonism or groupthink, with little in the way of meaningful interaction or persuasion. This essay examines how we might detect and measure instances of more productive conversation online, considered through the lens of deliberative theory. It begins with an examination of traditional deliberative democracy, and then explores how these concepts have been applied to online deliberation, and by those studying interpersonal conversation in social media more generally. These efforts to characterize and measure deliberative quality have resulted in a myriad of criteria, with elaborate checklists that are often as superficial as they are complex. This essay instead proposes that we target what is arguably the core deliberative process -- a mutual consideration of conceptually interrelated ideas in order to distinguish the better from the worse and to construct better conceptual structures. The essay finishes by discussing two computational models of argument quality and interdependence as templates for richer, scalable, nonpartisan measures of deliberative discussion online.
"Measuring Public Opinion with Social Media Data," Book chapter, Oxford Handbook of Polling and Polling Methods (M. Klasnja, P. Barbera, N. Beauchamp, J. Nagler and J.A. Tucker) 2017.
Abstract Paper
This chapter examines the use of social networking sites such as Twitter in measuring public opinion. It first considers the opportunities and challenges that are involved in conducting public opinion surveys using social media data. Three challenges are discussed: identifying political opinion, representativeness of social media users, and aggregating from individual responses to public opinion. The chapter outlines some of the strategies for overcoming these challenges and proceeds by highlighting some of the novel uses for social media that have fewer direct analogs in traditional survey work. Finally, it suggests new directions for a research agenda in using social media for public opinion work.
Other research works
"Visualizing Biographies of Artists of the Middle East," Exhibit, The Amory Art Show, New York, March 2015
Excerpt 1: Biography plotmaps Excerpt 2: Co-exhibition network
The State of the Union Address in a Single Image The Monkey Cage, Washingtonpost.com, January 2015
A Network Analysis of the Ferguson Witness Reports The Monkey Cage, Washingtonpost.com, December 2014
"The Ideological Position of Obama's SOTU Relative to Past Presidents," The Monkey Cage, Washingtonpost.com, January 2012
"Findings of an independent panel on allegations of statistical evidence for fraud during the 2004 Venezuelan Presidential recall referendum," in Observing the Venezuela Presidential Recall Referendum: Comprehensive Report, The Carter Center, 2004 (with Henry Brady, Richard Fowles, Aviel Rubin, and Jonathan Taylor)
PDF
Research in the news
Rachel Maddow, MSNBC opening segment; Customers are Flooding Yankee Candle's Amazon reviews with claims that the candles have no scent, but the surge in Omicron cases may be to blame Business Insider; Well, People Can't Smell their Candles Again, Gawker; Hmm, Angry Reviews of Candles are Spiking Again, Input; Brace yourselves, one-star candle reviews are spiking again, BoingBoing. 2021-2022.
Moving through a 'space of hate' NiemanLab, August 2018
This algorithm identifies the key ingredients to winning a debate Digital Trends, June 2018
"The Persuasion Principle," Impact: Journal of the Market Research Society; Inside the Message Machine that Could Make Politicians More Persuasive NPR's All Things Considered; An Algorithm to Help Politicians Pander Wired magazine; How to Make Your Speeches Better, Automatically Pacific Standard magazine, 2015-2016.
Working Papers
(Please feel free to email me for working drafts of any of these papers.)
"Wisdom over Madness: How the resilience of experts to social information improves collective intelligence," (Revise and Resubmit)
Abstract
Performance on difficult tasks such as forecasting generally benefits from the "wisdom of crowds," but multiple strands of research have suggested that while aggregating individual predictions boosts accuracy, communication among individuals may harm performance by reducing independent information. Subsequent work has shown that weighting estimates by the expertise of individuals improves collective accuracy, and recent work has shown that experts may be more resistant to peer influence, thereby effectively upweighting the contributions of experts and boosting collective accuracy even within a communicating group. The proposed mechanisms by which experts resist bad information, however, have remained somewhat schematic. To elucidate this, we construct a set of realistic event-prediction challenges and randomize the exchange of both numerical and linguistic information among individuals. This allows us to estimate a continuous nonlinear response function connecting signals and predictions, which we show is consistent with a Bayesian model that captures both the effects of expertise and a ceiling limiting the effect of implausible peer information. ``Expertise'' is an elusive concept whose specific mechanisms remain under-explored, and we show that its effects here are due less to self-confidence or general ability than to domain-specific skills and active efforts to resist bad information, and that the resilience of experts operates similarly with linguistic as well as numeric information. Finally, we show that our model allows us to ``debias'' peer influence and boost collective accuracy among non-experts to the level of experts, and suggests promising new avenues for constructing new communication structures within groups.
"Conceptual Network Structures are Correlated with Ideology and Personality," (Under review)
Abstract
Every individual maintains a vast and diverse set of beliefs and opinions, with complex internal structures and correlations that are only partially captured by personality clusters like the Big 5 or low-dimensional spatial representations such as ideology. The complex logical and associative interconnections between beliefs naturally suggests a networked model of human thought. Although such models have been theorized numerous times in numerous disciplines, empirical applications have been hindered by the difficulty of measuring and validating these networks at the individual level. We develop, test and validate a variety of automated graphical and textual methods for inferring these latent networks. We show that although different approaches produce apparently very different networks, individuals exhibit characteristic network structures that persist regardless of topic or inference method. Furthermore, we find that an individual's underlying network structures correlate with their personality and ideology, with liberals exhibiting relatively more connected and hierarchical networks than conservatives. These results shed new light on how beliefs are structured and suggest that individual network structures may play important roles in behavior, dialogue, and opinion change.
"Climbing Mount Obamacare: Experimentally Optimized Textual Treatments,"
Abstract
Advertisements, talking points, and online and social media content often take the form of short chunks of spoken or written text, yet crafting these short documents remains more art than science due to the extremely high-dimensional nature of textual content. Focus groups, A/B testing, and substantive theories of political opinion can suggest the general themes for a persuasive text, but do little to help shape it on the word or sentence level. Recent progress has been made in moving beyond binary experimental testing into higher dimensions \cite{hainmueller2014causal}, harkening back to earlier work outside of political science in fractional factorial design, but such approaches remain insufficient for the extreme case of free-form textual optimization. This paper instead develops a new approach and machine learning optimizer to craft short chunks of text in order to maximize their persuasive impact. First, a large collection of sentences in support of Obamacare are scraped from obamacarefacts.com and parameterized via a simple 7-topic LDA. Each text treatment comprises three sentences from this pool, parameterized in 21D-space. The persuasive effects of each treatment is assessed using Mechanical Turk subjects, and these treatments iteratively improved using nonlinear optimization over the 21D parameter space to suggest progressively more persuasive three-sentence combinations. A new optimization algorithm is designed specifically for this class of problem and shown though extensive Monte Carlo testing to out-perform the dominant existing method. Furthermore, lasso techniques allow us to assess the response surface using the samples collected during the maximization procedure, allowing more general inferences about the individual and interactive effects of the topics on opinion. Together, these procedures constitute a new approach to designing and testing new persuasive text, and not just assessing the effects of existing treatments.
"Visualizing and Modeling Rhetorical Structures in Individual Documents,"
Abstract
This paper develops a new tool for the visualization and analysis of individual documents, representing the progression of a document as a trajectory through its own word-space. We present two models for estimating this document trajectory, the first a slower probabilistic generative model, and the second a faster approach using principal component analysis, which produce similar results. Document trajectories are then analyzed for a large corpus of important political speeches, with the goal of identifying characteristic topological patterns than may reflect hidden structural patterns underlying substantively very different speeches. Speech trajectories are clustered into topologically similar patters using affine transformations, revealing both the most common rhetorical patterns in these speeches, and differences in core patterns between speakers (such as Bush vs Obama) and over time. Modeling temporal semantic structures in single documents opens the way for a new analysis of rhetorical structure in political speech, and ultimately understanding the effects of these structures on listeners.
"Using Text to Scale Legislatures with Uninformative Voting"
Abstract
This paper shows how legislators' written and spoken text can be used to ideologically scale individuals even in the absence of informative votes, by positioning members according to their similarity to two reference texts constructed from the aggregated speech of every member of each of two major parties. The paper develops a new Bayesian scaling that is more theoretically sound that the related Wordscores approach, and a new vector-based scaling that works better than either at matching the vote-based scaling DW-Nominate in the US Senate. Unsupervised methods such as Wordfish or principal component analysis are found to do less well. Once validated in the US context, this approach is then tested in a legislature without informative voting, the UK House of Commons. There the scalings successfully separate members of different parties, order parties correctly, match expert and rebellion-based scalings reasonably well, and work across different years and even changes in leadership. The text-based scaling developed here both matches existent scalings fairly well, and may be a much more accurate window into the true ideological positions of political actors in legislatures and the many other domains where textual data are plentiful.
Teaching
Introduction to Computational Statistics, INSH 5301 (Syllabus)
Bayesian and Network Statistics, NETS 7983 (Syllabus)
Social Network Analysis, POLS 7334 (Syllabus)
Congress, POLS 3300 and POLS 7251 (Syllabus)
Bostonography, INSH 2102 (Syllabus)
Nicholas Beauchamp
Department of Political Science
960A Renaissance Park
360 Huntington Avenue
Northeastern University
Boston, MA 02115
Office: RP 931 & Network Science Institute, 208
Email: n D0T beauchamp @northeastern.edu
Web: nickbeauchamp.com