Stanford CoreNLP Server binary parse trees - stanford-nlp

I use the Stanford CoreNLP Server to generate parse trees By default I get parse trees that are not binary. However, I need binary parse trees to build a Recursive Neural Tensor Network on top.
Since most of my code is in Python I use the wrapper https://github.com/smilli/py-corenlp for the CoreNLP Java library.
What I tried so far:
set the -binarize parameter when starting the CorNLP server:
$ java -mx4g -cp "/home/jonasrothfuss/Downloads/CoreNLP-master/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -binarize
when making a post "request" that is sent to the CoreNLP Server adding the parse.binaryTree parameter and set it to True
properties={
'annotators': 'tokenize,ssplit,pos,parse',
'outputFormat': 'json',
'parse.binaryTrees': 'true'
}
Nonetheless I still receive not-binary parse trees.
Does anyone know what to do, so that the CoreNLP Server sends back binary parse trees? Thanks for your help!

Related

Sentiment from Stanford CoreNLP via stanza: why only 3 classes?

I'm using the Stanford CoreNLP client for sentiment analysis, with the stanza package (because I mostly work in Python). I'd like to get sentiment scores using all 5 classes (from very negative to very positive) built into the CoreNLP system. I know that using the sentiment classifier built into stanza only uses 3 classes (positive, neutral, negative; https://stanfordnlp.github.io/stanza/sentiment.html), but even when I access the CoreNLP server directly, I only get positive | negative | neutral. Why? Shouldn't the code below return sentiment scores across 5 classes, seeing as it uses CoreNLP itself?
import stanza
from stanza.server import CoreNLPClient
text=("This was the best movie ever made!")
with CoreNLPClient(
annotators=['tokenize', 'ssplit', 'sentiment']) as client:
ann=client.annotate(text)
#print(ann)
sentence=ann.sentence[0]
print(sentence.sentiment)
The stanza by itself is a standalone module made with python.
The CoreNLPClient uses the actual CoreNLP. If you look at it's output carefully, you will see that it does give 5 level output, but it just rounds it up to 3 level in the end.
Try printing sentence instead of sentence.sentiment and you'll see.

Cannot extract confidence level from StanfordOpenIE

Was using StanfordOepnIE for my professor on a research project.
I can successfully extract the triples by using OpenIE annotator from the Standford NLP server.
However, the confidence score was not returned with the requested json as it was shown on the website
https://nlp.stanford.edu/software/openie.html.
Apparently it seemed like that was not being implemented yet by the Stanford people.
Anyone has solution to the problem or have alternative python library that I can to extract both the expected output with its confidence level from the Stanford OpenIE?
The text output has the confidences. We can add the confidences into the json for future versions.

CoreNLP API equivalent to command line?

For one of our project, we are currently using the syntax analysis component with the command line. We want to move from this approach to now use the corenlp server (for better performances).
Our command line options are as follow:
java -mx4g -cp "$scriptdir/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser -tokenized -escaper edu.stanford.nlp.process.PTBEscapingProcessor -sentences newline -tokenized -tagSeparator / -tokenizerFactory edu.stanford.nlp.process.WhitespaceTokenizer -tokenizerMethod newCoreLabelTokenizerFactory -outputFormat "wordsAndTags,typedDependenciesCollapsed"
I've tried a few things but I didn't manage to find the proper options when using the corenlp API (with Python).
For instance, how to specify that the text is already tokenised?
I would really appreciate any help.
In general, the server calls into CoreNLP rather than the individual NLP components, so the documentation on CoreNLP may be useful. The body of the text being annotated is sent to the server as the POST body; the properties are passed in as URL params. For example, for your case, I believe the following curl command should do the trick (and should be easy to adapt to the language of your choice):
curl -X POST -d "it's split on whitespace" \
'http://localhost:9000/?annotators=tokenize,ssplit,pos,parse&tokenize.whitespace=true&ssplit.eolonly=true'
Note that we're just passing the following properties into the server:
annotators = tokenize,ssplit,pos,parse (specifies that we want the parser, and all its prerequisites).
tokenize.whitespace = true will call the withespace tokenizer.
ssplit.eolonly = true will split sentences on and only on newlines.
Other potentially useful options are documented on the parser annotator page.

Where can i find CoreNLP Lexicon and how to differentiate between action and stative verbs

I am using CoreNLP for semantic network project and i want to know what words does CoreNLP contains and how is it categorized.
Why don't you Google search first? http://nlp.stanford.edu/nlp/javadoc/javanlp-3.5.0/edu/stanford/nlp/parser/lexparser/BaseLexicon.html

Using Boost Property Tree to replace DOM Parser

I need to write a XML Parser using Boost Property tree which can replace an existing MSXML DOM Parser. Basically my code should return the list of child nodes, number of child nodes etc. Can this be achieved using Property Tree? Eg. GetfirstChild(),selectNodes(),Getlength()etc.
I saw a lot of APIs related to Boost Property Tree, but the documentation seems to be bare minimum and confusing. As of now, I am able to parse the entire XML using BOOST_FOREACH. But the path to each node is hard coded which will not serve my purpose.
boost::property_tree can be used to parse XML and it's a tree so you can use as XML DOM substitution but the library is not intended to be fully fledged XML parser and it's not complaint with XML standard. For instance it can successfully parse non-wellformed xml input and it doesn't support some of XML features. So it's your choice - if you want simple interface to simple XML configuration then yes, you should use boost::property_tree

Resources