Improve Microsoft cognitive services sentiment analysis? - sentiment-analysis

I find sentiment analysis to be frequently off, completely off, of what the sentiment actually should be. Here are some examples,
I told you to not do that - Sentiment 0.81 (positive) .. how is this positive?
if you drop the glass i will beat you - Sentiment 0.74 (positive) .. how in the world is beating someone positive?
You made a mess - Sentiment 0.76 (positive) .. how is this positive?
etc. many other such examples!
Questions,
Is it possible to tweak this per your model?
Is it possible to have a conversation thread be analyzed for sentiment? For instance,
utterance: you made a mess, it's okay, count to ten ..
.. here count to ten is .. reassuring .. i.e. it's okay, take a deep breath. Sentiment should be positive!
But saying just "count to ten" should be not as positive.
How far can I take Microsoft cognitive services sentiment analysis? Are there better/other options?
Thanks!

Related

RNTN: stop training early on convergence?

I'm currently training some sentiment analysis models with the RNTN within CoreNLP. With the default settings, training runs for 400 iterations which takes a long time. Is there some way to stop training earlier, e.g. if the error does not get smaller? Is there code which allows this?
In the 2013 paper by Socher et al, there is a sentence stating that the RNTN convergences after a few hours of training. Can I exploit this?
edit for clarification:
The paper I am referring to is "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" by Socher et al, EMNLP 2013. The RNTN I refer to is part of the Stanford CoreNLP package.
To rephrase and clarify my question:
How can I make edu.stanford.nlp.sentiment.SentimentTraining stop training when the model is "good enough" (for some criterion) instead of going through all 400 iterations?
Unfortunately, the code does not automatically detect when it is no longer improving in order to terminate the run early. However, it does output intermediate models. If you train using a dev set, you can keep the model with the highest dev score at the end of the run.

Positivity/Negativity percentage of a file in a hotel-review dataset

There's a hotel -review dataset having 1500 each of positive and negative files. To determine the accuracy of my algorithm, I have to first check the percentage positivity or negativity of the original file in the hotel-review dataset.
I tried the basic percentage criterion:
positivity % = no. of positive words/ (Total positive + total neg words)
But this holds no significant ground, so can't work on this. Is there any other method or ground on which I can work?
Example-> (She's the most beautiful lady I've ever seen.) should get a better positivity percentage than (She is a nice lady.)
I'm doing the work in Python.
The first thing you can try is switching from a binary category for words (positive vs. negative), to a sliding scale. The SentiWordNet project provides this.
However on your specific example this could actually make things worse. E.g. nice gives P = 0.875. Whereas beautiful only gets P = 0.75. Of course you could fix the SentiWordNet ratings if you disagree, but I'd suggest doing that kind of tuning using an automatic system, with as much domain-specific training data as you can find.
BTW, there are at least a couple of Python interfaces to SentiWordNet.
http://compprag.christopherpotts.net/code-data/sentiwordnet.py describes itself as an "Interface to SentiWordNet using the NLTK WordNet classes."
https://pypi.python.org/pypi/sentiment_classifier is a more general tool, using SentiWordNet.
Going back to your example, the key difference is the "the most [SOMETHING] I've ever seen" structure. This requires switching from a bag of words approach to actually parsing and understanding the sentence. I have no useful leads to give you there, so I'll be as delighted as you if someone says there is a ready-made open-source package already doing that :-)
I'd also like to mention the importance of context. Without any context "She's a beautiful lady" and "She is a nice lady" are both simple and positive. But in the context of the hotel reviews, and their relevance to me, maybe "nice" is more useful than "beautiful" And, for fun, compare these two:
"The receptionist was a nice lady."
"At breakfast, at a table near to me, was the most beautiful lady I've ever seen. It was a welcome distraction from the food."
That is the challenge I love about sentiment analysis; the commercial applications are just excuses to work on problems like that!

What are the existent Sentiment Analysis Algorithm?

I and a group of people are developing a Sentiment Analysis Algorithm. I would like to know what are the existent ones, because I want to compare them. Is there any article that have the main algorithms in this area?
Thanks in advance
Thiago
Some of the papers on sentiment analysis may help you -
One of the earlier works by Bo Pang, Lillian Lee http://acl.ldc.upenn.edu/acl2002/EMNLP/pdfs/EMNLP219.pdf
A comprehensive survey of sentiment analysis techniques http://www.cse.iitb.ac.in/~pb/cs626-449-2009/prev-years-other-things-nlp/sentiment-analysis-opinion-mining-pang-lee-omsa-published.pdf
Study by Hang Cui, V Mittal, M Datar using 6-grams http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.5942&rep=rep1&type=pdf
For quick implementation naive bayes is recommended. You can find an example here http://nlp.stanford.edu/IR-book/
We did a statistical comparision of various classifiers and found SVM to be most accurate, though for a dataset consisting of large contents
( http://ai.stanford.edu/~amaas/data/sentiment/ ) none of the methods worked well.Our study may not be accurate though. Also instead of treating sentiment analysis as a text classification problem, you can look at extraction of meaning from text, though I do not know how successful it might be.
apparently the NLTK, a python natural language processing library, has one:
http://text-processing.com/demo/sentiment/
Probably worth having a look at it.

Comparing two English strings for similarities

So here is my problem. I have two paragraphs of text and I need to see if they are similar. Not in the sense of string metrics but in meaning. The following two paragraphs are related but I need to find out if they cover the 'same' topic. Any help or direction to solving this problem would be greatly appreciated.
Fossil fuels are fuels formed by natural processes such as anaerobic
decomposition of buried dead organisms. The age of the organisms and
their resulting fossil fuels is typically millions of years, and
sometimes exceeds 650 million years. The fossil fuels, which contain
high percentages of carbon, include coal, petroleum, and natural gas.
Fossil fuels range from volatile materials with low carbon:hydrogen
ratios like methane, to liquid petroleum to nonvolatile materials
composed of almost pure carbon, like anthracite coal. Methane can be
found in hydrocarbon fields, alone, associated with oil, or in the
form of methane clathrates. It is generally accepted that they formed
from the fossilized remains of dead plants by exposure to heat and
pressure in the Earth's crust over millions of years. This biogenic
theory was first introduced by Georg Agricola in 1556 and later by
Mikhail Lomonosov in the 18th century.
Second:
Fossil fuel reforming is a method of producing hydrogen or other
useful products from fossil fuels such as natural gas. This is
achieved in a processing device called a reformer which reacts steam
at high temperature with the fossil fuel. The steam methane reformer
is widely used in industry to make hydrogen. There is also interest in
the development of much smaller units based on similar technology to
produce hydrogen as a feedstock for fuel cells. Small-scale steam
reforming units to supply fuel cells are currently the subject of
research and development, typically involving the reforming of
methanol or natural gas but other fuels are also being considered such
as propane, gasoline, autogas, diesel fuel, and ethanol.
That's a tall order. If I were you, I'd start reading up on Natural Language Processing. NLP is a fairly large field -- I would recommend looking specifically at the things mentioned in the Wikipedia Text Analytics article's "Processes" section.
I think if you make use of information retrieval, named entity recognition, and sentiment analysis, you should be well on your way.
In general, I believe that this is still an open problem. Natural language processing is still a nascent field and while we can do a few things really well, it's still extremely difficult to do this sort of classification and categorization.
I'm not an expert in NLP, but you might want to check out these lecture slides that discuss sentiment analysis and authorship detection. The techniques you might use to do the sort of text comparison you've suggested are related to the techniques you would use for the aforementioned analyses, and you might find this to be a good starting point.
Hope this helps!
You can also have a look on Latent Dirichlet Allocation (LDA) model in machine learning. The idea there is to find a low-dimensional representation of each document (or paragraph), simply as a distribution over some 'topics'. The model is trained in an unsupervised fashion using a collection of documents/paragraphs.
If you run LDA on your collection of paragraphs, then by looking into the similarity of the hidden topics vector, you can find whether a given two paragraphs are related or not.
Of course, the baseline is to not use the LDA, and instead use the term frequencies (augmented with tf/idf) to measure similarities (vector space model).

Algorithm to determine how positive or negative a statement/text is

I need an algorithm to determine if a sentence, paragraph or article is negative or positive in tone... or better yet, how negative or positive.
For instance:
Jason is the worst SO user I have ever witnessed (-10)
Jason is an SO user (0)
Jason is the best SO user I have ever seen (+10)
Jason is the best at sucking with SO (-10)
While, okay at SO, Jason is the worst at doing bad (+10)
Not easy, huh? :)
I don't expect somebody to explain this algorithm to me, but I assume there is already much work on something like this in academia somewhere. If you can point me to some articles or research, I would love it.
Thanks.
There is a sub-field of natural language processing called sentiment analysis that deals specifically with this problem domain. There is a fair amount of commercial work done in the area because consumer products are so heavily reviewed in online user forums (ugc or user-generated-content). There is also a prototype platform for text analytics called GATE from the university of sheffield, and a python project called nltk. Both are considered flexible, but not very high performance. One or the other might be good for working out your own ideas.
In my company we have a product which does this and also performs well. I did most of the work on it. I can give a brief idea:
You need to split the paragraph into sentences and then split each sentence into smaller sub sentences - splitting based on commas, hyphen, semi colon, colon, 'and', 'or', etc.
Each sub sentence will be exhibiting a totally seperate sentiment in some cases.
Some sentences even if it is split, will have to be joined together.
Eg: The product is amazing, excellent and fantastic.
We have developed a comprehensive set of rules on the type of sentences which need to be split and which shouldn't be (based on the POS tags of the words)
On the first level, you can use a bag of words approach, meaning - have a list of positive and negative words/phrases and check in every sub sentence. While doing this, also look at the negation words like 'not', 'no', etc which will change the polarity of the sentence.
Even then if you can't find the sentiment, you can go for a naive bayes approach. This approach is not very accurate (about 60%). But if you apply this to only sentence which fail to pass through the first set of rules - you can easily get to 80-85% accuracy.
The important part is the positive/negative word list and the way you split things up. If you want, you can go even a level higher by implementing HMM (Hidden Markov Model) or CRF (Conditional Random Fields). But I am not a pro in NLP and someone else may fill you in that part.
For the curious people, we implemented all of this is python with NLTK and the Reverend Bayes module.
Pretty simple and handles most of the sentences. You may however face problems when trying to tag content from the web. Most people don't write proper sentences on the web. Also handling sarcasm is very hard.
This falls under the umbrella of Natural Language Processing, and so reading about that is probably a good place to start.
If you don't want to get in to a very complicated problem, you can just create lists of "positive" and "negative" words (and weight them if you want) and do word counts on sections of text. Obviously this isn't a "smart" solution, but it gets you some information with very little work, where doing serious NLP would be very time consuming.
One of your examples would potentially be marked positive when it was in fact negative using this approach ("Jason is the best at sucking with SO") unless you happen to weight "sucking" more than "best".... But also this is a small text sample, if you're looking at paragraphs or more of text, then weighting becomes more reliable unless you have someone purposefully trying to fool your algorithm.
As pointed out, this comes under sentiment analysis under natural language processing. Afaik GATE doesn't have any component that does sentiment analysis.
In my experience, I have implemented an algorithm which is an adaptation of the one in the paper 'Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis' by Theresa Wilson, Janyce Wiebe, Paul Hoffmann (this) as a GATE plugin, which gives reasonable good results. It could help you if you want to bootstrap the implementation.
Depending on your application you could do it via a Bayesian Filtering algorithm (which is often used in spam filters).
One way to do it would be to have two filters. One for positive documents and another for negative documents. You would seed the positive filter with positive documents (whatever criteria you use) and the negative filter with negative documents. The trick would be to find these documents. Maybe your could set it up so your users effectively rate documents.
The positive filter (once seeded) would look for positive words. Maybe it would end up with words like love, peace, etc. The negative filter would be seeded appropriately as well.
Once your filters are setup, then you run the test text through them to come up with positive and negative scores. Based on these scores and some weighting, you could come up with your numeric score.
Bayesian Filters, though simple, are surprisingly effective.
You can do like this:
Jason is the worst SO user I have ever witnessed (-10)
worst (-), the rest is (+). so, that would be (-) + (+) = (-)
Jason is an SO user (0)
( ) + ( ) = ( )
Jason is the best SO user I have ever seen (+10)
best (+) , the rest is ( ). so, that would be (+) + ( ) = (+)
Jason is the best at sucking with SO (-10)
best (+), sucking (-). so, (+) + (-) = (-)
While, okay at SO, Jason is the worst at doing bad (+10)
worst (-), doing bad (-). so, (-) + (-) = (+)
There are many machine learning approaches for this kind of Sentiment Analysis. I used most of the machine learning algorithms, which are already implemented. my case I have used
weka classification algorithms
SVM
naive basian
J48
Only you have to do this train the model to your context , add featured vector and rule based tune up. In my case I got some (61% accuracy). So We move into stanford core nlp ( they trained their model for movie reviews) and we used their training set and add our training set. we could achieved 80-90% accuracy.
This is an old question, but I happened upon it looking for a tool that could analyze article tone and found Watson Tone Analyzer by IBM. It allows 1000 api calls monthly for free.
It's all about context, I think. If you're looking for the people who are best at sucking with SO. Sucking the best can be a positive thing. For determination what is bad or good and how much I could recommend looking into Fuzzy Logic.
It's a bit like being tall. Someone who's 1.95m can considered to be tall. If you place that person in a group with people all over 2.10m, he looks short.
Maybe essay grading software could be used to estimate tone? WIRED article.
Possible reference. (I couldn't read it.)
This report compares writing skill to the Flesch-Kincaid Grade Level needed to read it!
Page 4 of e-rator says that they look at mispelling and such. (Maybe bad post are misspelled too!)
Slashdot article.
You could also use an email filter of some sort for negativity instead of spam-ness.
How about sarcasm:
Jason is the best SO user I have ever seen, NOT
Jason is the best SO user I have ever seen, right
Ah, I remember one java library for this called LingPipe (commercial license) that we evaluated. It would work fine for the example corpus that is available at the site, but for real data it sucks pretty bad.
Most of the sentiment analysis tools are lexicon based and none of them is perfect. Also, sentiment analysis can be described as a trinary sentiment classification or binary sentiment classification. Moreover, it is a domain specific task. Meaning that tools which work well on news dataset may not do a good job on informal and unstructured tweets.
I would suggest using several tools and have an aggregation or vote based mechanism to decide the intensity of the sentiment. The best survey study on sentiment analysis tools that I have come across is SentiBench. You will find it helpful.
use Algorithm::NaiveBayes;
my $nb = Algorithm::NaiveBayes->new;
$nb->add_instance
(attributes => {foo => 1, bar => 1, baz => 3},
label => 'sports');
$nb->add_instance
(attributes => {foo => 2, blurp => 1},
label => ['sports', 'finance']);
... repeat for several more instances, then:
$nb->train;
# Find results for unseen instances
my $result = $nb->predict
(attributes => {bar => 3, blurp => 2});

Resources