Stanford CoreNLP Statistical Coref System NullPointerException - stanford-nlp

I am trying to apply the statistical coreference system to process a text file with the following command
java -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,cleanxml,ssplit,pos,lemma,ner,parse,coref -file input.txt
This throws the following error message:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator cleanxml
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.9 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.8 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.8 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.0 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
done [0.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
Processing file /home/xilin/Toolkits/stanford-corenlp-full-2015-12-09/input.txt ... writing to /home/xilin/Toolkits/stanford-corenlp-full-2015-12-09/input.txt.out
Annotating file /home/xilin/Toolkits/stanford-corenlp-full-2015-12-09/input.txt
Exception in thread "main" java.lang.RuntimeException: Error annotating document with coref
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate(StatisticalCorefSystem.java:86)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate(StatisticalCorefSystem.java:63)
at edu.stanford.nlp.pipeline.CorefAnnotator.annotate(CorefAnnotator.java:97)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:72)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:534)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:544)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1098)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:877)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1187)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1257)
Caused by: java.lang.NullPointerException
at edu.stanford.nlp.hcoref.Preprocessor.assignMentionIDs(Preprocessor.java:170)
at edu.stanford.nlp.hcoref.Preprocessor.initializeMentions(Preprocessor.java:153)
at edu.stanford.nlp.hcoref.Preprocessor.preprocess(Preprocessor.java:64)
at edu.stanford.nlp.hcoref.CorefDocMaker.makeDocument(CorefDocMaker.java:194)
at edu.stanford.nlp.hcoref.CorefDocMaker.makeDocument(CorefDocMaker.java:154)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate(StatisticalCorefSystem.java:68)
... 9 more
If I changed the option "coref" in the above command to "dcoref", the deterministic coreference system runs smoothly. Others have pointed out that this is a BUG in 3.6.0 distribution. I am using the github repository, and I'm getting the latest version. However the bug seems to exist still.

You need to include the mention annotator before coref. The fact that this appears as a null pointer exception is indeed a bug though. Which Git revision are you using? We've recently changed the way requirements are handled, and this may be a remnant bug from that.

Related

No tuples is emitted or transffered by topology in storm ui

i am using stormcrawler 1.16 with elasticsearch 7.2.0. i have built project with with acrhetype.
command i run to submitted topology
storm jar target/stormcrawler-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --remote es-crawler.flux
i am getting this in output -
Parsing file: /home/ubuntu/stormcrawler/es-crawler.flux
835 [main] INFO o.a.s.f.p.FluxParser - loading YAML from input
stream...
841 [main] INFO o.a.s.f.p.FluxParser - Not performing property
substitution.
841 [main] INFO o.a.s.f.p.FluxParser - Not performing environment
variable substitution.
900 [main] INFO o.a.s.f.p.FluxParser - Loading includes from
resource: /crawler-default.yaml
901 [main] INFO o.a.s.f.p.FluxParser - loading YAML from input
stream...
903 [main] INFO o.a.s.f.p.FluxParser - Not performing property
substitution.
903 [main] INFO o.a.s.f.p.FluxParser - Not performing environment
variable substitution.
Configuration (interpreted):
then i last output lines -
2014 [main] WARN o.a.s.u.Utils - STORM-VERSION new 1.2.3 old 1.2.3
2376 [main] INFO o.a.s.StormSubmitter - Finished submitting topology: crawler
but when i check this crawler topology in storm ui, then in topology stats, no tuple is emitted or transffered by this crawler topology.
i have atteched a snapshot of storm ui in link below.
[in topology stats, no tuples is emitted or transffered. how can i solve this issue ? 1
Your POM file is probably missing the storm-crawler-elasticsearch dependency.
You could compare your code with what is generated by the storm-crawler-elasticsearch-archetype, which should give you a working configuration.
Use the archetype for Elasticsearch with:
mvn archetype:generate
-DarchetypeGroupId=com.digitalpebble.stormcrawler -DarchetypeArtifactId=storm-crawler-elasticsearch-archetype -DarchetypeVersion=LATEST
You'll be asked to enter a groupId (e.g. com.mycompany.crawler), an
artefactId (e.g. stormcrawler), a version and package name.
This will not only create a fully formed project containing a POM with
the dependency above but also a set of resources, configuration files
and a topology class. Enter the directory you just created (should be
the same as the artefactId you specified earlier) and follow the
instructions on the README file.

Initializing coreNLP in R

My issue is very similar to Cannot Initialize CoreNLP in R, however the answer that is provided doesn't work -- R Studio simply crashes
To be clear, I get this:
library(coreNLP)
initCoreNLP()
initCoreNLP()
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Searching for resource: StanfordCoreNLP.properties
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.7 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.0 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Error in rJava::.jnew("edu.stanford.nlp.pipeline.StanfordCoreNLP", basename(path)) :
edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
Alternatively, if I run (in a new session):
dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/lib/server/libjvm.dylib')
library(rJava)
library(coreNLP)
initCoreNLP()
it just crashes.
My versions of Java:
Williams-MacBook-Pro-2:~ WilliamLS$ /usr/libexec/java_home -V
Matching Java Virtual Machines (5):
9.0.4, x86_64: "Java SE 9.0.4" /Library/Java/JavaVirtualMachines/jdk-9.0.4.jdk/Contents/Home
1.8.0_162, x86_64: "Java SE 8" /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home
1.8.0_161, x86_64: "Java SE 8" /Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home
1.6.0_65-b14-468, x86_64: "Java SE 6" /Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
1.6.0_65-b14-468, i386: "Java SE 6" /Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
/Library/Java/JavaVirtualMachines/jdk-9.0.4.jdk/Contents/Home
What would you recommend? I also attempted using the cleanNLP package but when I try initializing the
init_clean_nlp
command, I get the same error as the one written above. Any recommendations?

OpenIE did not get the same result as it said in the demo

I just download and have a try of the OpenIE. I use the same sentence in the demo "Born in a small town, she took the midnight train going anywhere."
In this page:enter link description here it said there will be three triplets extracted, but from below it missing the triplet "she took midnight train". This information is important.
Could you tell me why I can't get the same result as in the demo?
Is there any parameter need to be set?
Thanks.
tom#tom-Aspire-E5-572G:~/Downloads/stanford-corenlp-full-2015-12-09$ cat input.txt
Born in a small town, she took the midnight train going anywhere.
tom#tom-Aspire-E5-572G:~/Downloads/stanford-corenlp-full-2015-12-09$ java -cp "*" -Xmx1000m edu.stanford.nlp.naturalli.OpenIE ./input.txt
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [2.6 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
PreComputed 100000, Elapsed Time: 2.157 (s)
Initializing dependency parser done [5.1 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator openie
Loading clause searcher from edu/stanford/nlp/models/naturalli/clauseSearcherModel.ser.gz...done [0.73 seconds]
Processing file: ./input.txt
All files have been queued; awaiting termination...
1.0 she Born in small town
1.0 she Born in town
DONE processing files. 0 exceptions encountered.
I'm going to chalk this up to the parse tree being difficult. The underlying dependency parser -- and in fact, the underlying dependency representation -- changed between the publication of the paper and the most recent release of the code. For instance, "Born in a small town, she took the midnight train." works ok.

Dependency parsing for French with CoreNLP

I'm trying to use Stanford CoreNLP for French texts. POS tagging and parsing works fine, but with my configuration, output dependencies do not make sense at all.
My command is
java -mx1g -cp "~/stanford-corenlp/stanford-corenlp-full-2015-12-09/*"
edu.stanford.nlp.pipeline.StanfordCoreNLP -props french.conf
-file /tmp/file.txt -outputFormat text
where french.conf contains:
annotators = tokenize, ssplit, pos, depparse, parse
tokenize.language = fr
pos.model = edu/stanford/nlp/models/pos-tagger/french/french.tagger
parse.model = edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz
depparse.model = edu/stanford/nlp/models/parser/nndep/UD_French.gz
I'm using CoreNLP 3.6 with French models found here. The log looks fine with that respect:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/french/french.tagger ... done [0,2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/UD_French.gz ...
PreComputed 100000, Elapsed Time: 1.43 (s)
Initializing dependency parser done [3,4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz ...
done [3,0 sec].
but here is the result with "Le chat mange la souris" (the cat is eating the mouse, with the same exact structure)
root(ROOT-0, chat-2)
det(chat-2, Le-1)
case(souris-5, mange-3)
det(souris-5, la-4)
nmod:mange(chat-2, souris-5)
punct(chat-2, .-6)
which is just nonsense; and this is not exceptional, I tested many sentences and always got this kind of output.
That's why I guess I'm using a bad configuration file.
Any help would be appreciated!
For those that would be interested, Stanford CoreNLP has now updated their models and they work pretty well now.
It's because CoreNLP dependency parser expects to get as input Universal Dependencies POS tags and the French POS tagger provided by CoreNLP outputs French Treebank POS tags.
I've made a patch that converts the French POS tagger output in order to get Universal Dependencies POS tags: https://github.com/askplatypus/CoreNLP/commit/e6215bdc5d4903bc3e2d2fb533da7e3938fa825f

Stanford OpenIE with Pronoun Coreference Option

I was trying to run the OpenIE module through command line with the resolve_coref option, but was getting the following error:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.8 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
PreComputed 100000, Elapsed Time: 2.091 (s)
Initializing dependency parser done [5.6 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.4 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.6 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator entitymentions
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
Exception in thread "main" java.lang.IllegalArgumentException: annotator "coref" requires annotator "mention"
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:375)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:139)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:135)
at edu.stanford.nlp.naturalli.OpenIE.main(OpenIE.java:697)
It reports requiring annotator "mention", yet it previously added another annotator "entitymentions" and there seems to be some aliases resolution issue. On the other hand, I can't find relevant information about the "mention" annotator in the documentation.
I used the following command:
java -Xmx20g -cp stanford-corenlp-3.6.0.jar:stanford-corenlp-3.6.0-models.jar:CoreNLP-to-HTML.xsl:slf4j-api.jar:slf4j-simple.jar edu.stanford.nlp.naturalli.OpenIE -openie.resolve_coref input.txt
Oops; this is indeed a bug. Coref was refactored fairly heavily a few times between the last release and this one, and OpenIE seems to have not kept up with the changes...
It should be fixed in the GitHub version of the code, and will hopefully be incorporated into the Maven version released soon.

Resources