Analyzing Emotional Language in 21 Million News Articles
How We Used Off-the-Shelf Tools to Study Bias and Emotion in News
All of us have encountered a particularly emotionally charged news article, with every word betraying the author’s bias. But a single reader would have to be a dedicated follower of a news organization to really understand how much opinion is habitually betrayed in its work. To find out how carefully major news organizations moderate their language in articles on controversial topics, we at Qatar Computing Research Institute (QCRI) used computational techniques to analyze millions of articles from 15 large news organizations in the U.S. Some agencies, we found, do not shy from emotion-laden and biased rhetoric—the Huffington Post and Washington Post, for example. But we also found that, on average, the use of highly emotional language is curbed, pointing to possible self-moderation.
We look 15 large news media organizations in the U.S., including very large-circulation organizations like Reuters and USA Today, and also some smaller ones like Minneapolis Star Tribune and Talking Points Memo. Using NewsCred, we collected articles spanning March to September 2013. Our final news organization list included:
- Chicago Tribune
- Houston Chronicle
- Honolulu Star-Advertiser
- Huffington Post
- Los Angeles Times
- Minneapolis Star Tribune
- New York Times
- News Day
- Philadelphia Inquirer
- Talking Points Memo
- USA Today
- Washington Post
What Is Controversial?
How can we tell that a topic is controversial? The answer is very subjective, so we took to crowdsourcing to find out. Using the crowdsourcing platform CrowdFlower, we asked hundreds of people to annotate the 2,000 most frequent nouns in a random selection of articles. To make sure we get high quality data, we made sure at average at least seven annotators labeling the same word, using the agreement as an indication of our confidence in how controversial (or not) the topic is. We found 145 controversial words (our confidence in their categorization is represented in the graphic below). People felt strongly that “politics,” “immigration,” and “war” were controversial. On the other hand, there were 272 words which annotators agreed are non-controversial, like “Wednesday” and “museum.” Lastly, we also found 45 words which were labeled as “weakly controversial.”
Similarly, non-controversial terms:
Next, we collected all articles in our news sources containing at least one word in our three lists of controversial, weakly controversial, and non-controversial terms, resulting in over 21 million articles, with an average of around 3,000 and median of 1,000 per topic per news source. Reuters was the most prolific source with an average of 11,659 articles per topic, and ProPublica the least at 31 articles.
Now that we have the context of our select topics (in the form of articles), we can use quantitative techniques to find out how emotional language is used around them. Sentiment analysis is a burgeoning field, with many dictionaries available online. Here are the ones we used:
Affective Norms for English Words (ANEW) is a set of normative emotional ratings for 2,476 English words. We use the “valence” rating considering positive (respectively, negative) the ratings above (respectively, below) the mean.
General Inquirer is a list of 1,915 words classified as positive, and 2,291 words classified as negative.
MicroWNOp is a list of 1,105 WordNet synsets (cognitive synonyms) classified as positive, negative, or neutral.
SentiWordNet assigns to each synset of WordNet (around 117,000) a positive and negative score determined by a diffusion process.
Bias Lexicon is a list of 654 bias-related lemmas extracted from the edit history of Wikipedia by Recasens et al. Sentiment words are used as contributing features in the construction of this bias lexicon.
Lexicons based on WordNet often deal with synsets, which are groups of synonyms that represent some concept. One is then free to expand a dictionary with synonyms. However to avoid ambiguity, we take only the top word—one which represents the concept the most precisely.
With our database of news articles and sentiment dictionaries prepared, we are now ready to analyze the extent to which each news organization uses emotionally charged words when discussing controversial topics. For example, below we see the distributions of the proportion of biased, positive (+), and negative (−) words in the Huffington Post (HUF), CNN, and Reuters (REU), across controversial topics (C), somewhat controversial topics (W), and non-controversial topics (N).
Take, for instance, the top left box-plot, which shows the distribution of Bias lexicon words in the articles from the Huffington Post. The controversial ones (“C”) have the highest mean (the fat black bar in the middle of the box), while the non-controversial (“N”) the lowest. Some articles have very wide distributions with many outliers (the dots), such as the lower-right plot of negative SWN lexicon in Reuters, meaning that the use of these lexicon words differs widely across the topics. Overall, one can see several trends:
The use of bias terms is more likely in controversial topics than non-controversial (statistically significant at p < 0.01 for all 15 news sources)
The use of negative terms is more likely in controversial topics than non-controversial topics (statistically significant at p < 0.01 for 46 out of the 60 combinations of source and lexicon, 9 ties, and 5 cases in which the difference is significant in the opposite direction)
The use of positive terms is more likely in non-controversial topics than controversial topics (statistically significant at p < 0.01 for 40 out of 60 conditions, with 16 ties and 4 cases with significant difference in the opposite direction)
The use of strong emotional words is less likely in controversial topics (statistically significant at p < 0.01 for 32 out of 45 conditions, with only 1 of the remaining conditions having a significant difference in the other direction)
But which media sources are the most prone to use charged language around controversial topics? Here are three rankings of agencies by lexicons most distinguishing controversial and non-controversial topics. The Huffington Post, the Washington Post, and the New York Times are the ones who do not shy from charged words when controversial topics come up.
|Huffington Post||Huffington Post||Huffington Post|
|Washington Post||Washington Post||USA Today|
|New York Times||New York Times||New York Times|
|Los Angeles Times||CNN||Washington Post|
|USA Today||Los Angeles Times||Los Angeles Times|
|Philladelphia Inquirer||USA Today||Houston Cronicle|
|Chicago Tribune||Houston Cronicle||Philladelphia Inquirer|
|Houston Cronicle||Reuters||Minneapolis Star Tribune|
|Minneapolis Star Tribune||Philladelphia Inquirer||NewsDay|
|Reuters||Chicago Tribune||Chicago Tribune|
|NewsDay||Talking Points Memo||Honolulu Star-Advertiser|
|Talking Points Memo||Minneapolis Star Tribune||Pro Publica|
|Pro Publica||Pro Publica||Talking Points Memo|
Not as Controversial as You Thought
Next, we zoomed into individual topics to see just how they were treated by the news media. To do this, we assigned a sentiment score to each topic in our list of controversial and non-controversial terms by using logistic regression, resulting in a sentiment score between 0 and 1, where 0 is non-controversial, 0.5 is undecided, and 1.0 is strongly controversial. And we did it all using off-the-shelf tools.
First, we computed the features for each topic using the articles in which they appear. “Features” are a combination of lexicons, broken down by positive/negative/highly emotional categories, and by news source, resulting in a total of 195 features. For example, for the topic “immigration,” one feature could be the number of positive-labeled words in the SentiWordNet lexicon that appear in Huffington Post articles mentioning immigration.
We then turn to Weka data mining software. First, we rank the features using a Chi Squared Attribute Evaluator:
java -cp weka-3-6-10/weka.jar weka.attributeSelection.ChiSquaredAttributeEval -s "weka.attributeSelection.Ranker -T -1.79769313 -N -1" -i topic_features.arff
Its output is a list of our features (i.e. attributes), ranked by the Chi Squared statistic, such that the top are the best predictors of controversy—and in our case it is the bias lexicon. We select the top five features, and then train a Logistic model using a command like this:
java -cp weka-3-6-10/weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -t topic_features_select.arff -d logistic.model > logistic.model.out
Now we can plot the “controversy” score produced by our classifier, compared to that given to the words by the human raters. The words below the 0.5 line (for controversial) and under the 0.5 line (for non-controversial) are “errors,” or those words which from context one would think are controversial (or not), but the raters would disagree.
In some cases, terms that have different connotations depending on the context may have led to incorrect classifications. For instance, oil, when referenced in a financial article may be associated with objective language, as opposed to, say, political articles, using biased or emotional language. Another example is drug, which can sometimes simply reference drugstores. However, our machine-assisted classification of other less-obviously ambiguous terms, such as sex and killing as non-controversial prompts re-examination of these topics as controversial (although one should always watch out for ambiguity, as sex may mean “gender,” for example). Further work is needed to model more clearly the topics represented by these terms in order to better understand their context.
An important distinction should be made about whose sentiment it is that we are catching. It is possible the articles were simply quoting or attributing these words to other parties. The choice to do so is also an editorial decision, though, framing the issue by the words of others.
Why Sentiment Analysis?
Computational tools which can handle data on a massive scale allow us to have a bird-eye’s view of the behavior of the news media and to pick apart how news sources differ in often subtle ways. Quantifying the use of emotional language in the news allows us to compare this trend to the success of the reporting, and even the response to it in social media. This work can also be used to inform automatic style-guide checkers, serve as a reference to journalists interested in maintaining objectivity, or to readers wishing to monitor their news intake. Done over time, we can also track the changes of the treatment of various topics, and the rise and fall in the extent to which emotional and biased language contextualizes them for their audience.
Working directly with data providers, in our case NewsCred, has been a blessing in terms of data completeness and quality. However, not all articles are created equal, and data concerning views and clicks, as well as social media activity, would provide more information about the editorial decisions to promote certain articles, as well as the perception of the article by the public. Many studies have been done on the detection of controversy in social media (such as Garimella et al), but the analysis of the news articles themselves may be a much greater challenge, with careful writing, and editorial oversight. As you can see from the classification plots, it is challenging to neatly identify the topics as controversial. One way to do so could be to only consider editorial articles where more leeway is given to the author. Still, the tools we use here can all be found off the shelf, including the sentiment lexicons and data mining packages, making it possible for individuals and organizations to conduct their own quantitative sentiment monitoring of their news feeds.
Yelena Mejova (@yelenamejova) is currently a Scientist in the Social Computing Group at the Qatar Computing Research Institute HBKU. Specializing in social media analysis and mining, her work concerns the quantification of health and wellbeing signals in social media, as well as tracking of social phenomena, including politics and news consumption. Recently, she co-edited a volume on the use of Twitter for social science in Twitter: A Digital Socioscope, and a special issue of Social Science Computer Review on Quantifying Politics Using Online Data.