How to run stanford corenlp? - stanford-nlp

I have followed the following URL to set up stanford CoreNLP.
My $CLASSPATH is the following.
https://stanfordnlp.github.io/CoreNLP/download.html
$ echo $CLASSPATH | sed 's/:/\n/g'
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/javax.json-api-1.0-sources.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/ejml-0.23.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/slf4j-api.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/stanford-corenlp-3.9.0-javadoc.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/stanford-corenlp-3.9.0-sources.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/protobuf.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/stanford-corenlp-3.9.0-models.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/joda-time.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/joda-time-2.9-sources.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/xom-1.2.10-src.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/xom.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/javax.json.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/stanford-corenlp-3.9.0.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/jollyday.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/slf4j-simple.jar
/Users/myname/doc/src/dvcs/stanford-corenlp-full-2018-01-31/jollyday-0.4.9-sources.jar
But when I run it, I got the following error. Does anybody know what is wrong? Thanks.
$ echo "the quick brown fox jumped over the lazy dog" > input.txt
$ java -mx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -outputFormat json -file input.txt
Searching for resource: StanfordCoreNLP.properties
Searching for resource: edu/stanford/nlp/pipeline/StanfordCoreNLP.properties
Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Exception in thread "main" java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:493)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:260)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:127)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:123)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1251)
Caused by: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:749)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:283)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:247)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:78)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:62)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:491)
... 5 more
Caused by: java.io.IOException: Unable to resolve "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as either class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:419)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:744)
... 10 more

I'm a nearby but I remember that you may have to set the env. variable STANFORD_MODELS with the list of directories containing your model files xxx-models.jar
It seems to be in the online doc.
Hope it may help.

Related

Log4j2 encoding issue

When I try to run Elasticsearch on Windows 10 as main language is English, everything works fine. But if I change the main language as Turkish, I get error messages as:
2018-07-26 14:42:39,485 main ERROR Unable to locate plugin type for IfFileName
2018-07-26 14:42:39,633 main ERROR Unable to locate plugin for IfAccumulatedFileSize
2018-07-26 14:42:39,634 main ERROR Unable to locate plugin for IfFileName
2018-07-26 14:42:39,637 main ERROR Unable to invoke factory method in class org.apache.logging.log4j.core.appender.rolling.action.DeleteAction for element Delete: java.lang.NullPointerException java.lang.NullPointerException
at org.apache.logging.log4j.core.config.plugins.visitors.PluginElementVisitor.findNamedNode(PluginElementVisitor.java:103)
at org.apache.logging.log4j.core.config.plugins.visitors.PluginElementVisitor.visit(PluginElementVisitor.java:87)
at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:248)
at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:958)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:898)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:890)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:890)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:890)
at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:513)
at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:237)
at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:249)
at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:545)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:261)
at org.elasticsearch.common.logging.LogConfigurator.configure(LogConfigurator.java:163)
at org.elasticsearch.common.logging.LogConfigurator.configure(LogConfigurator.java:119)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:291)
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121)
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
at org.elasticsearch.cli.Command.main(Command.java:90)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:85)
2018-07-26 14:42:39,645 main ERROR Null object returned for Delete in DefaultRolloverStrategy.
So it seem like a charset problem. The file is encoded as UTF-8, I check it with Notepad++. Elasticsearch has JVM option -Dfile.encoding=UTF-8. I double checked the log4j2.properties file and IfFileName has no space after it.
And if I change IfFileName as ıfFileName (which ı is a Turkish character - lower I) error becomes:
2018-07-26 14:54:25,819 main ERROR Unable to locate plugin type for ıfFileName
Does anyone have an idea about how to fix that?
Adding -Duser.language=en JVM parameter fixed the problem.
I had the same problem but didn't know where to add the -Duser.language=en. However, I found it out it is under the sonar.properties, the line where sonar.search.javaAdditionalOpts= is located remove the # at the begining and write as sonar.search.javaAdditionalOpts=-Duser.language=en and save the file.
This is a bug in Log4j2, which uses String#toLowerCase() without a locale parameter: in the Turkish locale IfFileName is lowercased as ıffilename (with a dotless i). I have reported this as GH issue #1281.
Until this is fixed you can write plugin types in all lowercase (English) letters: e.g. iffilename instead of IfFileName.

Error invoking "pig" in cloudera-quickstart-vm-5.8.0

I am a month new to Hadoop environment, I have cloudera-quickstart-vm-5.8.0 on my windows laptop, while invoking 'pig' in cloudera vm, I could not able to enter into grunt shell, the error I am getting is below
[Fatal Error] :-1:-1: Premature end of file. 2017-04-25 06:39:53,207
[main] FATAL org.apache.hadoop.conf.Configuration - error parsing conf
hdfs-default.xml org.xml.sax.SAXParseException; Premature end of file.
Kindly let me know how to resolve this.

Dependency parsing for French with CoreNLP

I'm trying to use Stanford CoreNLP for French texts. POS tagging and parsing works fine, but with my configuration, output dependencies do not make sense at all.
My command is
java -mx1g -cp "~/stanford-corenlp/stanford-corenlp-full-2015-12-09/*"
edu.stanford.nlp.pipeline.StanfordCoreNLP -props french.conf
-file /tmp/file.txt -outputFormat text
where french.conf contains:
annotators = tokenize, ssplit, pos, depparse, parse
tokenize.language = fr
pos.model = edu/stanford/nlp/models/pos-tagger/french/french.tagger
parse.model = edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz
depparse.model = edu/stanford/nlp/models/parser/nndep/UD_French.gz
I'm using CoreNLP 3.6 with French models found here. The log looks fine with that respect:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/french/french.tagger ... done [0,2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/UD_French.gz ...
PreComputed 100000, Elapsed Time: 1.43 (s)
Initializing dependency parser done [3,4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz ...
done [3,0 sec].
but here is the result with "Le chat mange la souris" (the cat is eating the mouse, with the same exact structure)
root(ROOT-0, chat-2)
det(chat-2, Le-1)
case(souris-5, mange-3)
det(souris-5, la-4)
nmod:mange(chat-2, souris-5)
punct(chat-2, .-6)
which is just nonsense; and this is not exceptional, I tested many sentences and always got this kind of output.
That's why I guess I'm using a bad configuration file.
Any help would be appreciated!
For those that would be interested, Stanford CoreNLP has now updated their models and they work pretty well now.
It's because CoreNLP dependency parser expects to get as input Universal Dependencies POS tags and the French POS tagger provided by CoreNLP outputs French Treebank POS tags.
I've made a patch that converts the French POS tagger output in order to get Universal Dependencies POS tags: https://github.com/askplatypus/CoreNLP/commit/e6215bdc5d4903bc3e2d2fb533da7e3938fa825f

Stanford CoreNLP Statistical Coref System NullPointerException

I am trying to apply the statistical coreference system to process a text file with the following command
java -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,cleanxml,ssplit,pos,lemma,ner,parse,coref -file input.txt
This throws the following error message:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator cleanxml
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.9 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.8 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.8 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.0 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
done [0.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
Processing file /home/xilin/Toolkits/stanford-corenlp-full-2015-12-09/input.txt ... writing to /home/xilin/Toolkits/stanford-corenlp-full-2015-12-09/input.txt.out
Annotating file /home/xilin/Toolkits/stanford-corenlp-full-2015-12-09/input.txt
Exception in thread "main" java.lang.RuntimeException: Error annotating document with coref
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate(StatisticalCorefSystem.java:86)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate(StatisticalCorefSystem.java:63)
at edu.stanford.nlp.pipeline.CorefAnnotator.annotate(CorefAnnotator.java:97)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:72)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:534)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:544)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1098)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:877)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1187)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1257)
Caused by: java.lang.NullPointerException
at edu.stanford.nlp.hcoref.Preprocessor.assignMentionIDs(Preprocessor.java:170)
at edu.stanford.nlp.hcoref.Preprocessor.initializeMentions(Preprocessor.java:153)
at edu.stanford.nlp.hcoref.Preprocessor.preprocess(Preprocessor.java:64)
at edu.stanford.nlp.hcoref.CorefDocMaker.makeDocument(CorefDocMaker.java:194)
at edu.stanford.nlp.hcoref.CorefDocMaker.makeDocument(CorefDocMaker.java:154)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate(StatisticalCorefSystem.java:68)
... 9 more
If I changed the option "coref" in the above command to "dcoref", the deterministic coreference system runs smoothly. Others have pointed out that this is a BUG in 3.6.0 distribution. I am using the github repository, and I'm getting the latest version. However the bug seems to exist still.
You need to include the mention annotator before coref. The fact that this appears as a null pointer exception is indeed a bug though. Which Git revision are you using? We've recently changed the way requirements are handled, and this may be a remnant bug from that.

Load CSV data to HBase using pig or hive

Hi I have created a pig script which loads data into hbase. My csv file is stored into hadoop location at /hbase_tables/zip.csv
Pig Script
register /home/hduser/pig-0.12.0/lib/pig-0.8.0-core.jar;
A = LOAD '/hbase_tables/zip.csv' USING PigStorage(',') as (id:chararray, zip:chararray, desc1:chararray, desc2:chararray, income:chararray);
STORE A INTO 'hbase://mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('zip:zip,desc:desc1,desc:desc2,income:income');
when i execute it gives below error
Pig Stack Trace
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:667)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
at org.apache.pig.PigServer.execute(PigServer.java:1190)
at org.apache.pig.PigServer.access$100(PigServer.java:128)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1517)
at org.apache.pig.PigServer.executeBatchEx(PigServer.java:362)
at org.apache.pig.PigServer.executeBatch(PigServer.java:329)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
at org.apache.hadoop.fs.Path.initialize(Path.java:148)
at org.apache.hadoop.fs.Path.<init>(Path.java:71)
at org.apache.hadoop.fs.Path.<init>(Path.java:45)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:470)
... 20 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.<init>(URI.java:752)
at org.apache.hadoop.fs.Path.initialize(Path.java:145)
... 23 more
Please let me know how i can import csv data file into hbase or if you have any alternate solution.
Seems like your problem is with "Relative path" in absolute URI: hbase://mydata_logs.
Are you sure the path is correct?
Probably table mydata_logs does not exist. Start: hbase shell and type list. Is your table mydata_logs on the list?
I had the same task once and have fully-working solution (actually, I'm not sure about commas in your third line of the code):
%default hbase_home `echo \$HBASE_HOME`;
%default tmp '/user/alexander/tmp/users_dump/k14'
set zookeeper.znode.parent '/hbase-unsecure';
set hbase.zookeeper.quorum 'dmp-hbase.local';
register $hbase_home/lib/zookeeper-3.4.5.jar;
register $hbase_home/hbase-0.94.20.jar;
UsersHdfs = LOAD '$tmp' using PigStorage('\t', '-schema');
store UsersHdfs into 'hbase://user_test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'id:DEFAULT id:last_modified birth:year gender:female gender:male','-caster HBaseBinaryConverter'
);
That code works for me, maybe the matter is in you hbase configs.
You could provide your .csv file and we could talk about it in more details.

Resources