Initializing coreNLP in R - stanford-nlp

My issue is very similar to Cannot Initialize CoreNLP in R, however the answer that is provided doesn't work -- R Studio simply crashes
To be clear, I get this:
library(coreNLP)
initCoreNLP()
initCoreNLP()
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Searching for resource: StanfordCoreNLP.properties
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.7 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.0 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Error in rJava::.jnew("edu.stanford.nlp.pipeline.StanfordCoreNLP", basename(path)) :
edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
Alternatively, if I run (in a new session):
dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/lib/server/libjvm.dylib')
library(rJava)
library(coreNLP)
initCoreNLP()
it just crashes.
My versions of Java:
Williams-MacBook-Pro-2:~ WilliamLS$ /usr/libexec/java_home -V
Matching Java Virtual Machines (5):
9.0.4, x86_64: "Java SE 9.0.4" /Library/Java/JavaVirtualMachines/jdk-9.0.4.jdk/Contents/Home
1.8.0_162, x86_64: "Java SE 8" /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home
1.8.0_161, x86_64: "Java SE 8" /Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home
1.6.0_65-b14-468, x86_64: "Java SE 6" /Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
1.6.0_65-b14-468, i386: "Java SE 6" /Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
/Library/Java/JavaVirtualMachines/jdk-9.0.4.jdk/Contents/Home
What would you recommend? I also attempted using the cleanNLP package but when I try initializing the
init_clean_nlp
command, I get the same error as the one written above. Any recommendations?

Related

OpenIE did not get the same result as it said in the demo

I just download and have a try of the OpenIE. I use the same sentence in the demo "Born in a small town, she took the midnight train going anywhere."
In this page:enter link description here it said there will be three triplets extracted, but from below it missing the triplet "she took midnight train". This information is important.
Could you tell me why I can't get the same result as in the demo?
Is there any parameter need to be set?
Thanks.
tom#tom-Aspire-E5-572G:~/Downloads/stanford-corenlp-full-2015-12-09$ cat input.txt
Born in a small town, she took the midnight train going anywhere.
tom#tom-Aspire-E5-572G:~/Downloads/stanford-corenlp-full-2015-12-09$ java -cp "*" -Xmx1000m edu.stanford.nlp.naturalli.OpenIE ./input.txt
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [2.6 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
PreComputed 100000, Elapsed Time: 2.157 (s)
Initializing dependency parser done [5.1 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator openie
Loading clause searcher from edu/stanford/nlp/models/naturalli/clauseSearcherModel.ser.gz...done [0.73 seconds]
Processing file: ./input.txt
All files have been queued; awaiting termination...
1.0 she Born in small town
1.0 she Born in town
DONE processing files. 0 exceptions encountered.
I'm going to chalk this up to the parse tree being difficult. The underlying dependency parser -- and in fact, the underlying dependency representation -- changed between the publication of the paper and the most recent release of the code. For instance, "Born in a small town, she took the midnight train." works ok.

Dependency parsing for French with CoreNLP

I'm trying to use Stanford CoreNLP for French texts. POS tagging and parsing works fine, but with my configuration, output dependencies do not make sense at all.
My command is
java -mx1g -cp "~/stanford-corenlp/stanford-corenlp-full-2015-12-09/*"
edu.stanford.nlp.pipeline.StanfordCoreNLP -props french.conf
-file /tmp/file.txt -outputFormat text
where french.conf contains:
annotators = tokenize, ssplit, pos, depparse, parse
tokenize.language = fr
pos.model = edu/stanford/nlp/models/pos-tagger/french/french.tagger
parse.model = edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz
depparse.model = edu/stanford/nlp/models/parser/nndep/UD_French.gz
I'm using CoreNLP 3.6 with French models found here. The log looks fine with that respect:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/french/french.tagger ... done [0,2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/UD_French.gz ...
PreComputed 100000, Elapsed Time: 1.43 (s)
Initializing dependency parser done [3,4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz ...
done [3,0 sec].
but here is the result with "Le chat mange la souris" (the cat is eating the mouse, with the same exact structure)
root(ROOT-0, chat-2)
det(chat-2, Le-1)
case(souris-5, mange-3)
det(souris-5, la-4)
nmod:mange(chat-2, souris-5)
punct(chat-2, .-6)
which is just nonsense; and this is not exceptional, I tested many sentences and always got this kind of output.
That's why I guess I'm using a bad configuration file.
Any help would be appreciated!
For those that would be interested, Stanford CoreNLP has now updated their models and they work pretty well now.
It's because CoreNLP dependency parser expects to get as input Universal Dependencies POS tags and the French POS tagger provided by CoreNLP outputs French Treebank POS tags.
I've made a patch that converts the French POS tagger output in order to get Universal Dependencies POS tags: https://github.com/askplatypus/CoreNLP/commit/e6215bdc5d4903bc3e2d2fb533da7e3938fa825f

Stanford CoreNLP Statistical Coref System NullPointerException

I am trying to apply the statistical coreference system to process a text file with the following command
java -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,cleanxml,ssplit,pos,lemma,ner,parse,coref -file input.txt
This throws the following error message:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator cleanxml
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.9 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.8 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.8 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.0 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
done [0.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
Processing file /home/xilin/Toolkits/stanford-corenlp-full-2015-12-09/input.txt ... writing to /home/xilin/Toolkits/stanford-corenlp-full-2015-12-09/input.txt.out
Annotating file /home/xilin/Toolkits/stanford-corenlp-full-2015-12-09/input.txt
Exception in thread "main" java.lang.RuntimeException: Error annotating document with coref
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate(StatisticalCorefSystem.java:86)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate(StatisticalCorefSystem.java:63)
at edu.stanford.nlp.pipeline.CorefAnnotator.annotate(CorefAnnotator.java:97)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:72)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:534)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:544)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1098)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:877)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1187)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1257)
Caused by: java.lang.NullPointerException
at edu.stanford.nlp.hcoref.Preprocessor.assignMentionIDs(Preprocessor.java:170)
at edu.stanford.nlp.hcoref.Preprocessor.initializeMentions(Preprocessor.java:153)
at edu.stanford.nlp.hcoref.Preprocessor.preprocess(Preprocessor.java:64)
at edu.stanford.nlp.hcoref.CorefDocMaker.makeDocument(CorefDocMaker.java:194)
at edu.stanford.nlp.hcoref.CorefDocMaker.makeDocument(CorefDocMaker.java:154)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate(StatisticalCorefSystem.java:68)
... 9 more
If I changed the option "coref" in the above command to "dcoref", the deterministic coreference system runs smoothly. Others have pointed out that this is a BUG in 3.6.0 distribution. I am using the github repository, and I'm getting the latest version. However the bug seems to exist still.
You need to include the mention annotator before coref. The fact that this appears as a null pointer exception is indeed a bug though. Which Git revision are you using? We've recently changed the way requirements are handled, and this may be a remnant bug from that.

Stanford OpenIE with Pronoun Coreference Option

I was trying to run the OpenIE module through command line with the resolve_coref option, but was getting the following error:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.8 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
PreComputed 100000, Elapsed Time: 2.091 (s)
Initializing dependency parser done [5.6 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.4 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.6 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator entitymentions
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
Exception in thread "main" java.lang.IllegalArgumentException: annotator "coref" requires annotator "mention"
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:375)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:139)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:135)
at edu.stanford.nlp.naturalli.OpenIE.main(OpenIE.java:697)
It reports requiring annotator "mention", yet it previously added another annotator "entitymentions" and there seems to be some aliases resolution issue. On the other hand, I can't find relevant information about the "mention" annotator in the documentation.
I used the following command:
java -Xmx20g -cp stanford-corenlp-3.6.0.jar:stanford-corenlp-3.6.0-models.jar:CoreNLP-to-HTML.xsl:slf4j-api.jar:slf4j-simple.jar edu.stanford.nlp.naturalli.OpenIE -openie.resolve_coref input.txt
Oops; this is indeed a bug. Coref was refactored fairly heavily a few times between the last release and this one, and OpenIE seems to have not kept up with the changes...
It should be fixed in the GitHub version of the code, and will hopefully be incorporated into the Maven version released soon.

Pig Error: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

I have just upgraded Pig 0.12.0 to 0.13.0 version on Hortonworks HDP 2.1
I am getting below error when I am trying to use XMLLoader in my script, even though I have registered piggybank already.
Script:
A = load 'EPAXMLDownload.xml' using org.apache.pig.piggybank.storage.XMLLoader('Document') as (x:chararray);
Error:
dump A
2014-08-10 23:08:56,494 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-08-10 23:08:56,496 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-08-10 23:08:56,651 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-08-10 23:08:56,727 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-08-10 23:08:57,191 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-08-10 23:08:57,199 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-08-10 23:08:57,214 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-08-10 23:08:57,223 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-08-10 23:08:57,247 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
Note that pig decides the hadoop version depending on which context var you have set
HADOOP_HOME -> v1
HADOOP_PREFIX -> v2
If you use hadoop2, you need to recompile the piggybank (which is by default compiled for hadoop1)
go to pig/contrib/piggybank/java
$ ant -Dhadoopversion=23
then copy that jar over pig/lib/piggybank.jar
A few more details because the other answers didn't work for me:
Git clone the pig git mirror https://github.com/apache/pig
cd into the cloned directory
if you've already built pig in the past in this directory, you should run a clean
ant clean
build pig for hadoop 2
ant -Dhadoopversion=23
cd into piggybank
cd contrib/piggybank/java
again, if you've build piggybank before, make sure to clean out the old build files
ant clean
build piggybank for hadoop 2 (same command, different directory)
ant -Dhadoopversion=23
If you don't build pig first, piggybank will throw a bunch of "symbol not found" exceptions while compiling. In addition, since I had previously built pig for Hadoop 1 (accidentally), without running a clean, I ran into runtime errors.
Some times you may get problem after installing Pig like below:-
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.hcatalog.common.HCatUtil.checkJobContextIfRunningFromBackend(HCatUtil.java:88)
at org.apache.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:162)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:540)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:322)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:199)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:277)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1367)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1352)
at org.apache.pig.PigServer.execute(PigServer.java:1341)
Many blogs suggest you to recompile the Pig by executing command:
ant clean jar-all -Dhadoopversion=23
or recompile piggybank.jar by executing below steps
cd contrib/piggybank/java
ant clean
ant -Dhadoopversion=23
But this may not solve your problem big time. The actual cause here is related to HCatalog. Try updating it!!. In my case, I was using Hive0.13 and Pig.0.13. And I was using HCatalog provided with Hive0.13.
Then I updated Pig to 0.15 and used separate hive-hcatalog-0.13.0.2.1.1.0-385 library jars. And problem was resolved....
Because later I identified it was not Pig who was creating problem rather it was Hive-HCatalog libraries. Hope this may help.
Even i faced the same error with Hadoop version 2.2.0.
The work around is, we have to register following jar files using the grunt shell.
The paths that i am gonna paste below will be according to the hadoop-2.2.0 version. Kindly find the jars according to your version.
/hadoop-2.2.0/share/hadoop/mapreduce/ hadoop-mapreduce-client-core-2.2.0.jar
/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
Using the REGISTER command we have to register these jars along with piggybank.
Run the pig script/command now and revert if you face any issue.

Resources