Text tokenizer with Stanford NLP sentiment analysis - stanford-nlp

I saw that Stanford NLP sentiment analysis first tokenize a sentence to phrases. How can I use this service also (i.e. given a sentence and tokenize by the same function as Stanford NLP sentiment analysis)?

Both of these tools (sentence splitting and tokenization) ship as part of the Stanford CoreNLP API. See http://stanfordnlp.github.io/CoreNLP/cmdline.html for basic usage examples.

Related

Stanford NLP : Corpus of coreference resolution

I was simply wondering on which corpus was trained the english statistical coreference resolution system of Stanford NLP. Would it be effective if used on novels ?
The coreference model is trained on the CoNLL 2012 coreference data set, which is related to the OntoNotes 5.0 data set.
Here is the link to the data:
http://conll.cemantix.org/2012/data.html

How to create a Chinese Sentiment Annotator of Stanford Core NLP

Stanford Core NLP software has an annotator of sentiment , but it only supports for English , I want to create an sentiment annotator for Chinese . What should I do ? Can someone give me some advice on it , thank you very much!
Unfortunately, we do not have any trained model for Chinese sentiment analysis. To train a Chinese model, you'd need to construct a sentiment treebank similar to the Stanford Sentiment Treebank and then retrain the sentiment model, but this is not a small task.

What treebank was used to train the Stanford CoreNLP Spanish constituency parser?

I've searched the docs and the FAQs but I have yet to find the answer. Was the IULA treebank from the Pompeu Fabra Uni used? https://www.iula.upf.edu/recurs01_tbk_uk.htm
Thanks.
The parser was trained on a preprocessed version of the AnCora Spanish 3.0 corpus.
You can find more information about the training data and the preprocessing at
http://nlp.stanford.edu/software/spanish-faq.html .

How to get alignment between sentiment module and constituency parser in CoreNLP

I'm using Stanford coreNLP to both parse a text and get sentiment information. The two models give two Tree objects, but they are not related. Is there an easy way to navigate the two elements at the same time, so having an alignment between the two at a token level?
You can just use the sentiment tree as a model of both the grammatical parse and the sentiment — it's simply the original parse tree with extra annotations.
Explanation: If you're using the Stanford CoreNLP pipeline, the sentiment annotator draws directly from the parse annotator to build its tree. The tree provided by the sentiment annotator is then just the same binarized parse tree with extra sentiment annotations.

Natural Language Parsing using Stanford NLP

How Stanford natural Language Parser uses Penn Tree Bank for Tagging process ? I want to know how it finds the POS for the given input?
The Stanford part-of-speech tagger uses a probabilistic sequence model to determine the most likely sequence of part-of-speech tags underlying a sentence. Some of the features provided to this model are
Surrounding words and n-grams
Part-of-speech tags of surrounding words
"Word shapes" (e.g., "Foo5" is translated to "Xxx#")
Word suffix, prefix
See the ExtractorFrames class for details. The model is trained on a tagged corpus (like the Penn Treebank) which has each token annotated with its correct part of speech.
At run time, features like those mentioned above are calculated for input text and are used to build per-tag probabilities, which are then fed into an implementation of the Viterbi algorithm (ExactBestSequenceFinder), which finds the most likely arrangement of tags for the entire sequence.
For more information to get started with POS tagging:
Watch the Week 5 lectures of the Coursera NLP class (co-taught by the CoreNLP lead)
Check out the code in the edu.stanford.nlp.tagger.maxent package
Part-of-speech tagging in NLTK

Resources