The Dark Side of Sentiment Analysis: An Exploratory Review Using Lexicons, Dictionaries, and a Statistical Monkey and Chimp
SSRN
Abstract
This article discusses the inconsistencies, inaccuracies and challenges of sentiment analysis and demonstrates problems with using sentiment analysis lexicons or dictionaries for estimating sentiment in textual artifacts, comparing multiple methods on stock market and vaccine tweets.
Overview
Sentiment analysis, an important dimension of natural language processing (NLP), has seen an exponential adoption rate across research and practitioner disciplines. Many interesting developments in NLP methods continue to improve the accuracy of sentiment analysis.
However, the plethora of sentiment analysis methods, dictionaries and lexicons, tools, open source code for machine learning based sentiment analysis, and off-the-shelf sentiment analysis solutions have led to a flurry of research and applied solutions without sufficient concern for the limitations, context, and the inaccuracies of sentiment analysis.
Research Approach
This study reviews known issues with sentiment analysis as documented by prior research and then compares the application of multiple off-the-shelf lexicon and dictionary methods to stock market and vaccine tweets.
The intention is not to improve the accuracy of sentiment analysis as compared to prior benchmarks but to identify and discuss critical aspects of the “dark side” and develop a conceptual discussion of the characteristics of the dark side of sentiment analysis.
Key Contributions
- Comprehensive review of sentiment analysis limitations
- Empirical comparison of multiple lexicon-based methods
- Conceptual framework for understanding sentiment analysis challenges
- Recommendations for future research directions
Implications
This research helps align researcher and practitioner expectations to understanding the limits and boundaries of natural language processing based solutions for sentiment analysis and estimation.