Sentiment from Stanford CoreNLP via stanza: why only 3 classes? - stanford-nlp

I'm using the Stanford CoreNLP client for sentiment analysis, with the stanza package (because I mostly work in Python). I'd like to get sentiment scores using all 5 classes (from very negative to very positive) built into the CoreNLP system. I know that using the sentiment classifier built into stanza only uses 3 classes (positive, neutral, negative; https://stanfordnlp.github.io/stanza/sentiment.html), but even when I access the CoreNLP server directly, I only get positive | negative | neutral. Why? Shouldn't the code below return sentiment scores across 5 classes, seeing as it uses CoreNLP itself?
import stanza
from stanza.server import CoreNLPClient
text=("This was the best movie ever made!")
with CoreNLPClient(
annotators=['tokenize', 'ssplit', 'sentiment']) as client:
ann=client.annotate(text)
#print(ann)
sentence=ann.sentence[0]
print(sentence.sentiment)

The stanza by itself is a standalone module made with python.
The CoreNLPClient uses the actual CoreNLP. If you look at it's output carefully, you will see that it does give 5 level output, but it just rounds it up to 3 level in the end.
Try printing sentence instead of sentence.sentiment and you'll see.

Related

Cannot extract confidence level from StanfordOpenIE

Was using StanfordOepnIE for my professor on a research project.
I can successfully extract the triples by using OpenIE annotator from the Standford NLP server.
However, the confidence score was not returned with the requested json as it was shown on the website
https://nlp.stanford.edu/software/openie.html.
Apparently it seemed like that was not being implemented yet by the Stanford people.
Anyone has solution to the problem or have alternative python library that I can to extract both the expected output with its confidence level from the Stanford OpenIE?
The text output has the confidences. We can add the confidences into the json for future versions.

Sentence segmentation with annotated corpus

I have a custom annotated corpus, in OpenNLP format. Ex:
<START:Person> John <END> went to <START:Location> London <END>. He visited <START:Organisation> ACME Co <END> in the afternoon.
What I need is to segment sentences from this corpus. But it won't always work as expected due to the annotations.
How can I do it without losing the entity annotations?
I am using OpenNLP.
In case you want to create multiple NLP models for OpenNLP you need multiple formats to train them:
The tokenizer requires a training format
The sentence detector requires a training format
The name finder requires a training format
Therefore, you need to manage these different annotation layers in some way.
I created an annotation tool and a Maven plugin which help you doing this, have a look here. All information can be stored in a single file and the Maven plugin will generate the NLP models for you.
Let me know if you have an further questions.

Stanford core NLP models for English language

I am using stanford corenlp for a task. There are two models "stanford-corenlp-3.6.0-models" and "stanford-english-corenlp-2016-01-10-models" on stanford's website. I want to know what is the difference between these two models.
According to the "Human languages supported" section of CoreNLP Overview , the basic distribution provides model files for the analysis of well-edited English,which is the stanford-corenlp-3.6.0-models you mentioned.
But,CoreNLP member also provides a jar that contains all of their English models, which includes various variant models, and in particular has one optimized for working with uncased English (e.g., mostly or all either uppercase or lowercase).The newest one is stanford-english-corenlp-2016-10-31-models and the previous one is stanford-english-corenlp-2016-01-10-models you mentioned.
Reference:
http://stanfordnlp.github.io/CoreNLP/index.html#programming-languages-and-operating-systems
(the Stanford CoreNLP Overview page)

"Other" Class in Stanford NLP Classifier for lines that are not related to ANY of the Trained Classes

I'm using the Stanford NLP just fine.
I made a train file with all my classes.
and it identifies the test lines just fine.
BUT what if I have an Other line (that is not of any of the Classes I've trained it).
Can I ask the algorithm to return null etc. when the line is not recognized with any of the Classes?
If not, what/How do you recommend that I should create an "Other" Class with "other" lines.. but that could be infinite.
Thanks, Aryeh.
If I understand your question correctly, yes, you should create an "Other" / O class to capture all "null" labels. This is the standard in e.g. NER, where the majority of the tokens in the corpus receive an O label (indicating no named entity label).

NLP Postagger can't grok imperatives?

Stanford NLP postagger claims imperative verbs added to recent version. I've inputted lots of text with abundant and obvious imperatives, but there seems to be no tag for them on output. Must one, after all, train it for this pos?
There is no special tag for imperatives, they are simply tagged as VB.
The info on the website refers to the fact that we added a bunch of manually annotated imperative sentences to our training data such that the POS tagger gets more of them right, i.e. tags the verb as VB.

Resources