Data Science
Tracking Twitter Sentiment
May 11, 2017 Fred Schwaner

We have clients who need to know more about how they are perceived in the market place. Their current processes for assessing public perception are almost entirely manual and are generally handled by one person. This work takes up a significant amount of time, and the resulting output tends to be simple and generalized. Detailed analysis of the data to find specific insights was impossible, leaving them vulnerable to Simpson's Rule. Coria Labs decided to investigate if there was a better way to give them the data they needed.

 

For this proof-of-concept we decided to focus on Twitter posts related to a particular rel="noopener noreferrer" set of keywords. Both Microsoft rel="noopener noreferrer" Cognitive Services and Aylien were used to determine sentiment score as this would allow us to compare results. Using the geolocation data associated with each tweet (when available), we could track where the comments came from. Finally, an infographic that showed a map, volume of tweets captured (based on our keywords), and average sentiment score was generated.


Breakdown on Scoring

The two services we worked with return their answers quite differently. Microsoft Cognitive Services sentiment scores are returned as a number ranging from -1 to 1. Aylien returns a sentiment of "positive", "negative", or "neutral" along with a confidence score from 0 to 1. 

 

 

Difficulties with Human Speech

The complexities of speech are difficult for machines to pick up on. Sarcasm, for instance, is difficult to convey through text to another human. Generally, sarcasm is only understood in text when the reader understands the character of the person, the structure of the comment is of a standard “sarcastic” form, or the comment is aided by italicization, capitalization, or emoticons. These aren’t things language processing libraries generally look for.


Sentiment analysis becomes more difficult by creating the necessity for topic-sentiment disambiguation. Most topic extractors work by tagging parts of speech in the sentence structure and isolating the noun phrases. This frequently results in multiple topics per sentence. Sentiment analysis concerns quantifying the author’s feelings regarding the subject of the sentence. The pangram: “The quick brown fox jumps over the lazy dog and got into the hen house” has three topics: a fox, a dog, and a hen house.

 

It is written as being of positive sentiment, describing a gamboling woodland creature. However, if we were searching for sentiments about dogs, we would want this to have a negative sentiment, as the object of our interest is described as lazy and outmatched. If we were searching for henhouses, then we would hope for a rather inconclusive sentiment, as the henhouse is portrayed in neither a positive nor a negative light.

 

Huh...Deja Vu.

This sort of integration of machine learning and the massive amount of information available on the internet reminds me of a scene near the beginning of The Matrix. While Neo is sleeping at his desk his computer is working hard searching for something, presumably on Neo’s behalf. We see articles like “Morpheus Eludes Police at Heathrow Airport” and “Manhunt Underway” appear on his computer screen, just before he is invited down the rabbit hole. This is machine learning in action, way before many of us were even aware of its existence, much less its future potential.  

 

Coria Labs gave two developers and one marketing guru two weeks to see what they could come up with. It didn’t need to be pretty...just good enough to show what we could gather. Check out the next post in this series to see what we found out!

Fred Schwaner
Machine Learning Engineer