Anomaly in sentiment analysis done by IBM Watson Discovery - sentiment-analysis

When I give a positive comment document, Discovery is correctly categorising it as positive sentiment document.
But when I give a document with both positive and negative comments, Discovery is categorising it as positive sentiment document.
Again, when I give a negative comment document (Eg: not skilled), Discovery is categorising it as neutral sentiment document.
Is there any settings or configuration in Discovery by which this anomaly can be resolved?

You may need more text for it to fully grasp the sentiment - it also may not be perfect, things like sarcasm, double negatives, etc... are sometimes hard for sentiment analysis algorithms.
Two things I can suggest:
1) Provide the full example you are trying to process, as well as the output Discovery is giving your for sentiment - this way we can validate if it's a bug or something else.
2) Look at the score we are assigning for sentiment analysis - even though it's been labeled as Positive or Negative, their may be a lower threshold on the score you could use
(I'm a Watson Discovery employee)

Related

What is the proper way to deal with (score) dispersion in sentiment analysis on different topics in relation

I'm analyzing sentiment on a social network. Based on different topics in relation as an input. How can we deal with dispersion of individual topics scores?
For example: we are trying to score sentiment on a theme which is an event that includes different keywords, let's say the theme is Innovation week with the following topics (keywords or synonyms):
Innovation week = {"innovation week", "data solution", "emerging technologies", "august 30"...}.
What if standard deviation of scores is so big.
Do we question:
The sentiment analysis algorithm itself?
Our input keywords?
Or we just take results as are? as they represent different views of people on different levels of granularity constituting a theme? The purpose finally is to have a general insight on a theme.
I think the question is simple although this is a concern of any sentiment analysis study in social networks.
The short answer is both the algorithm and the input keywords as they are dependent on each other.
Given the right input the dispersion would increse in any algorithm and given the wrong algorithm the same will happen for any input.
Usually in this cases you should revise the algorithm as this is the case in most situations.
You can also read this in order to understand it better:
http://www.cs.cornell.edu/home/llee/omsa/omsa-published.pdf
If you are not sure in your algorithm, maybe use the NLTK Vader Sentimenter to check the results. But it could be that the answers are so different that the standard deviation scores are so big.
Do you have test data to test your algorithm? If not you should have them anyhow to measure the standard measurements of algorithm.
Standard Measurements

Sentiment towards a keyword

I have been looking around for Sentiment and text analysis services but most of them seem to analyse the whole text and provide one result for it.
Is there a way of analysing the same piece of text against two different keywords? For example, the same article could be talking about two entities, positively towards one and negatively towards the other.
How could one get these two sentiments within the same text? Is there a service or API already for that?
I have found IBM's AlchemyAPI but doesn't seem to return accurate results...
What you want is aspect-based sentiment analysis. There are lots of algorithms with different precisions and recalls for this aspect-based sentiment analysis.
You can use Aylien's text analysis api.

Sentence-level to document-level sentiment analysis. Analysing news

I need to perform sentiment analysis on news articles about a specific topic using the Stanford NLP tool.
Such tool only allows sentence based sentiment analysis while I would like to extract a sentiment evaluation of the whole articles with respect to my topic.
For instance, if my topic is Apple, I would like to know the sentiment of a news article with respect to Apple.
Just computing the average of the sentences in my articles won't do. For instance, I might have an article saying something along the lines of "Apple is very good at this, and this and that. While Google products are very bad for these reasons". Such an article would result in a Neutral classification using the average score of sentences, while it is actually a Very positive article about Apple.
On the other hand filtering my sentences to include only the ones containing the word Apple would miss articles along the lines of "Apple's product A is pretty good. However, it lacks the following crucial features: ...". In this case the effect of the second sentence would be lost if I were to use only the sentences containing the word Apple.
Is there a standard way of addressing this kind of problems? Is Stanford NLP the wrong tool to accomplish my goal?
Update: You might want to look into
http://blog.getprismatic.com/deeper-content-analysis-with-aspects/
This is a very active area of research so it would be hard to find an off-the-shelf tool to do this (at least nothing is built in the Stanford CoreNLP). Some pointers: look into aspect-based sentiment analysis. In this case, Apple would be an "aspect" (not really but can be modeled that way). Andrew McCallum's group at UMass, Bing Liu's group at UIC, Cornell's NLP group, among others, have worked on this problem.
If you want a quick fix, I would suggest to extract sentiment from sentences that have reference to Apple and its products; use coref (check out dcoref annotator in Stanford CoreNLP), which will increase the recall of sentences and solve the problem of sentences like "However, it lacks..".

Is there an algorithm that extracts meaningful tags of english text

I would like to extract a reduced collection of "meaningful" tags (10 max) out of an english text of any size.
http://tagcrowd.com/ is quite interesting but the algorithm seems very basic (just word counting)
Is there any other existing algorithm to do this?
There are existing web services for this. Two Three examples:
Yahoo's Term Extraction API
Topicalizer
OpenCalais
When you subtract the human element (tagging), all that is left is frequency. "Ignore common English words" is the next best filter, since it deals with exclusion instead of inclusion. I tested a few sites, and it is very accurate. There really is no other way to derive "meaning", which is why the Semantic Web gets so much attention these days. It is a way to imply meaning with HTML... of course, that has a human element to it as well.
Basically, this is a text categorization problem/document classification problem. If you have access to a number of already tagged documents, you could analyze which (content) words trigger which tags, and then use this information for tagging new documents.
If you don't want to use a machine-learning approach and you still have a document collection, then you can use metrics like tf.idf to filter out interesting words.
Going one step further, you can use Wordnet to find synonyms and replace words by their synonym, if the frequency of the synonym is higher.
Manning & Schütze contains a lot more introduction on text categorization.
In text classification, this problem is known as dimensionality reduction. There are many useful algorithms in the literature on this subject.
You want to do the semantic analysis of a text.
Word frequency analysis is one of the easiest ways to do the semantic analysis. Unfortunately (and obviously) it is the least accurate one. It can be improved by using special dictionaries (like for synonims or forms of a word), "stop-lists" with common words, other texts (to find those "common" words and exclude them)...
As for other algorithms they could be based on:
Syntax analysis (like trying to find the main subject and/or verb in a sentence)
Format analysis (analyzing headers, bold text, italic... where applicable)
Reference analysis (if the text is in Internet, for example, then a reference can describe it in several words... used by some search engines)
BUT... you should understand that these algorithms are mereley heuristics for semantic analysis, not the strict algorithms of achieving the goal.
The problem of semantic analysis is one of the main problems in Artificial Intelligence/Machine Learning studies since the first computers appeared.
Perhaps "Term Frequency - Inverse Document Frequency" TF-IDF would be useful...
You can use this in two steps:
1 - Try topic modeling algorithms:
Latent Dirichlet Allocation
Latent word Embeddings
2 - After that you can select the most representative word of every topic as a tag

Algorithm to determine how positive or negative a statement/text is

I need an algorithm to determine if a sentence, paragraph or article is negative or positive in tone... or better yet, how negative or positive.
For instance:
Jason is the worst SO user I have ever witnessed (-10)
Jason is an SO user (0)
Jason is the best SO user I have ever seen (+10)
Jason is the best at sucking with SO (-10)
While, okay at SO, Jason is the worst at doing bad (+10)
Not easy, huh? :)
I don't expect somebody to explain this algorithm to me, but I assume there is already much work on something like this in academia somewhere. If you can point me to some articles or research, I would love it.
Thanks.
There is a sub-field of natural language processing called sentiment analysis that deals specifically with this problem domain. There is a fair amount of commercial work done in the area because consumer products are so heavily reviewed in online user forums (ugc or user-generated-content). There is also a prototype platform for text analytics called GATE from the university of sheffield, and a python project called nltk. Both are considered flexible, but not very high performance. One or the other might be good for working out your own ideas.
In my company we have a product which does this and also performs well. I did most of the work on it. I can give a brief idea:
You need to split the paragraph into sentences and then split each sentence into smaller sub sentences - splitting based on commas, hyphen, semi colon, colon, 'and', 'or', etc.
Each sub sentence will be exhibiting a totally seperate sentiment in some cases.
Some sentences even if it is split, will have to be joined together.
Eg: The product is amazing, excellent and fantastic.
We have developed a comprehensive set of rules on the type of sentences which need to be split and which shouldn't be (based on the POS tags of the words)
On the first level, you can use a bag of words approach, meaning - have a list of positive and negative words/phrases and check in every sub sentence. While doing this, also look at the negation words like 'not', 'no', etc which will change the polarity of the sentence.
Even then if you can't find the sentiment, you can go for a naive bayes approach. This approach is not very accurate (about 60%). But if you apply this to only sentence which fail to pass through the first set of rules - you can easily get to 80-85% accuracy.
The important part is the positive/negative word list and the way you split things up. If you want, you can go even a level higher by implementing HMM (Hidden Markov Model) or CRF (Conditional Random Fields). But I am not a pro in NLP and someone else may fill you in that part.
For the curious people, we implemented all of this is python with NLTK and the Reverend Bayes module.
Pretty simple and handles most of the sentences. You may however face problems when trying to tag content from the web. Most people don't write proper sentences on the web. Also handling sarcasm is very hard.
This falls under the umbrella of Natural Language Processing, and so reading about that is probably a good place to start.
If you don't want to get in to a very complicated problem, you can just create lists of "positive" and "negative" words (and weight them if you want) and do word counts on sections of text. Obviously this isn't a "smart" solution, but it gets you some information with very little work, where doing serious NLP would be very time consuming.
One of your examples would potentially be marked positive when it was in fact negative using this approach ("Jason is the best at sucking with SO") unless you happen to weight "sucking" more than "best".... But also this is a small text sample, if you're looking at paragraphs or more of text, then weighting becomes more reliable unless you have someone purposefully trying to fool your algorithm.
As pointed out, this comes under sentiment analysis under natural language processing. Afaik GATE doesn't have any component that does sentiment analysis.
In my experience, I have implemented an algorithm which is an adaptation of the one in the paper 'Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis' by Theresa Wilson, Janyce Wiebe, Paul Hoffmann (this) as a GATE plugin, which gives reasonable good results. It could help you if you want to bootstrap the implementation.
Depending on your application you could do it via a Bayesian Filtering algorithm (which is often used in spam filters).
One way to do it would be to have two filters. One for positive documents and another for negative documents. You would seed the positive filter with positive documents (whatever criteria you use) and the negative filter with negative documents. The trick would be to find these documents. Maybe your could set it up so your users effectively rate documents.
The positive filter (once seeded) would look for positive words. Maybe it would end up with words like love, peace, etc. The negative filter would be seeded appropriately as well.
Once your filters are setup, then you run the test text through them to come up with positive and negative scores. Based on these scores and some weighting, you could come up with your numeric score.
Bayesian Filters, though simple, are surprisingly effective.
You can do like this:
Jason is the worst SO user I have ever witnessed (-10)
worst (-), the rest is (+). so, that would be (-) + (+) = (-)
Jason is an SO user (0)
( ) + ( ) = ( )
Jason is the best SO user I have ever seen (+10)
best (+) , the rest is ( ). so, that would be (+) + ( ) = (+)
Jason is the best at sucking with SO (-10)
best (+), sucking (-). so, (+) + (-) = (-)
While, okay at SO, Jason is the worst at doing bad (+10)
worst (-), doing bad (-). so, (-) + (-) = (+)
There are many machine learning approaches for this kind of Sentiment Analysis. I used most of the machine learning algorithms, which are already implemented. my case I have used
weka classification algorithms
SVM
naive basian
J48
Only you have to do this train the model to your context , add featured vector and rule based tune up. In my case I got some (61% accuracy). So We move into stanford core nlp ( they trained their model for movie reviews) and we used their training set and add our training set. we could achieved 80-90% accuracy.
This is an old question, but I happened upon it looking for a tool that could analyze article tone and found Watson Tone Analyzer by IBM. It allows 1000 api calls monthly for free.
It's all about context, I think. If you're looking for the people who are best at sucking with SO. Sucking the best can be a positive thing. For determination what is bad or good and how much I could recommend looking into Fuzzy Logic.
It's a bit like being tall. Someone who's 1.95m can considered to be tall. If you place that person in a group with people all over 2.10m, he looks short.
Maybe essay grading software could be used to estimate tone? WIRED article.
Possible reference. (I couldn't read it.)
This report compares writing skill to the Flesch-Kincaid Grade Level needed to read it!
Page 4 of e-rator says that they look at mispelling and such. (Maybe bad post are misspelled too!)
Slashdot article.
You could also use an email filter of some sort for negativity instead of spam-ness.
How about sarcasm:
Jason is the best SO user I have ever seen, NOT
Jason is the best SO user I have ever seen, right
Ah, I remember one java library for this called LingPipe (commercial license) that we evaluated. It would work fine for the example corpus that is available at the site, but for real data it sucks pretty bad.
Most of the sentiment analysis tools are lexicon based and none of them is perfect. Also, sentiment analysis can be described as a trinary sentiment classification or binary sentiment classification. Moreover, it is a domain specific task. Meaning that tools which work well on news dataset may not do a good job on informal and unstructured tweets.
I would suggest using several tools and have an aggregation or vote based mechanism to decide the intensity of the sentiment. The best survey study on sentiment analysis tools that I have come across is SentiBench. You will find it helpful.
use Algorithm::NaiveBayes;
my $nb = Algorithm::NaiveBayes->new;
$nb->add_instance
(attributes => {foo => 1, bar => 1, baz => 3},
label => 'sports');
$nb->add_instance
(attributes => {foo => 2, blurp => 1},
label => ['sports', 'finance']);
... repeat for several more instances, then:
$nb->train;
# Find results for unseen instances
my $result = $nb->predict
(attributes => {bar => 3, blurp => 2});

Resources