StanfordCoreNLP - Setting pipelineLanguage to German not working? - stanford-nlp

I am using the pycorenlp client in order to talk to the Stanford CoreNLP Server. In my setup I am setting pipelineLanguage to german like this:
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
text = 'Das große Auto.'
output = nlp.annotate(text, properties={
'annotators': 'tokenize,ssplit,pos,depparse,parse',
'outputFormat': 'json',
'pipelineLanguage': 'german'
})
However, from the looks I'd say that it's not working:
output['sentences'][0]['tokens']
will return:
[{'after': ' ',
'before': '',
'characterOffsetBegin': 0,
'characterOffsetEnd': 3,
'index': 1,
'originalText': 'Das',
'pos': 'NN',
'word': 'Das'},
{'after': ' ',
'before': ' ',
'characterOffsetBegin': 4,
'characterOffsetEnd': 9,
'index': 2,
'originalText': 'große',
'pos': 'NN',
'word': 'große'},
{'after': '',
'before': ' ',
'characterOffsetBegin': 10,
'characterOffsetEnd': 14,
'index': 3,
'originalText': 'Auto',
'pos': 'NN',
'word': 'Auto'},
{'after': '',
'before': '',
'characterOffsetBegin': 14,
'characterOffsetEnd': 15,
'index': 4,
'originalText': '.',
'pos': '.',
'word': '.'}]
This should be more like
Das große Auto
POS: DT JJ NN
It seems to me that setting 'pipelineLanguage': 'de' does not work for some reason.
I've executed
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
in order to start the server.
I am getting the following from the logger:
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
[pool-1-thread-3] ERROR CoreNLP - Failure to load language specific properties: StanfordCoreNLP-german.properties for german
[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:60700] API call w/annotators tokenize,ssplit,pos,depparse,parse
Das große Auto.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 8.645 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [9.8 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec].
Apparently the server is loading the models for the English language - without warning me about that.

Alright, I just downloaded the models jar for German from the website and moved it into the directory where I extracted the server e.g.
~/Downloads/stanford-corenlp-full-2017-06-09 $
After re-running the server, the model was successfully loaded.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/german/german-hgc.tagger ... done [5.1 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/UD_German.gz ...
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99984, Elapsed Time: 11.419 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [12.2 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/germanFactored.ser.gz ... done [1.0 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-3] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/german.conll.hgc_175m_600.crf.ser.gz ... done [0.7 sec].

Related

get error 'NoneType' object has no attribute 'dumps' when load model in HAYSTACK

I trying to load 'bert-base-multilingual-uncased' in haystack FARMReader and get the error:
(huyenv) PS D:\study\DUANCNTT2\HAYSTACK\haystack_demo> &
d:/study/DUANCNTT2/HAYSTACK/haystack_demo/huyenv/Scripts/python.exe
d:/study/DUANCNTT2/HAYSTACK/haystack_demo/main.py 05/21/2021 00:12:58
INFO - faiss.loader - Loading faiss. 05/21/2021 00:12:58 - INFO - faiss.loader - Loading faiss. 05/21/2021 00:12:59 - INFO -
farm.modeling.prediction_head - Better speed can be achieved with
apex installed from https://www.github.com/nvidia/apex . 05/21/2021
00:13:00 - INFO - faiss.loader - Loading faiss. 05/21/2021 00:13:00
INFO - faiss.loader - Loading faiss. 05/21/2021 00:13:01 - INFO - elasticsearch - HEAD http://localhost:9200/ [status:200
request:0.018s] 05/21/2021 00:13:01 - INFO - elasticsearch - HEAD
http://localhost:9200/cv [status:200 request:0.005s] 05/21/2021
00:13:01 - INFO - elasticsearch - GET http://localhost:9200/cv
[status:200 request:0.009s] 05/21/2021 00:13:01 - INFO - elasticsearch
PUT http://localhost:9200/cv/_mapping [status:200 request:0.041s] 05/21/2021 00:13:01 - INFO - elasticsearch - HEAD
http://localhost:9200/label [status:200 request:0.008s] 05/21/2021
00:13:01 - INFO - farm.utils - Using device: CPU 05/21/2021 00:13:01
INFO - farm.utils - Number of GPUs: 0 05/21/2021 00:13:01 - INFO - farm.utils - Distributed Training: False 05/21/2021 00:13:01 - INFO
farm.utils - Automatic Mixed Precision: None Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when
initializing BertForQuestionAnswering: ['cls.predictions.bias',
'cls.predictions.transform.dense.weight',
'cls.predictions.transform.dense.bias',
'cls.predictions.decoder.weight', 'cls.seq_relationship.weight',
'cls.seq_relationship.bias',
'cls.predictions.transform.LayerNorm.weight',
'cls.predictions.transform.LayerNorm.bias']
This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another
architecture (e.g. initializing a BertForSequenceClassification model
from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you
expect to be exactly identical (initializing a
BertForSequenceClassification model from a
BertForSequenceClassification model). Some weights of
BertForQuestionAnswering were not initialized from the model
checkpoint at bert-base-multilingual-uncased and are newly
initialized: ['qa_outputs.weight', 'qa_outputs.bias'] You should
probably TRAIN this model on a down-stream task to be able to use it
for predictions and inference. 05/21/2021 00:13:21 - WARNING -
farm.utils - ML Logging is turned off. No parameters, metrics or
artifacts will be logged to MLFlow. 05/21/2021 00:13:21 - INFO -
farm.utils - Using device: CPU 05/21/2021 00:13:21 - INFO -
farm.utils - Number of GPUs: 0 05/21/2021 00:13:21 - INFO -
farm.utils - Distributed Training: False 05/21/2021 00:13:21 - INFO
farm.utils - Automatic Mixed Precision: None 05/21/2021 00:13:21 - INFO - farm.infer - Got ya 3 parallel workers to do inference ...
05/21/2021 00:13:21 - INFO - farm.infer - 0 0 0 05/21/2021
00:13:21 - INFO - farm.infer - /w\ /w\ /w\ 05/21/2021 00:13:21 -
INFO - farm.infer - /'\ / \ /'\ 05/21/2021 00:13:21 - INFO -
farm.infer - Exception ignored in: <function Pool.del at
0x000001BBA1DC9C10> Traceback (most recent call last): File
"C:\Users\Admin\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py",
line 268, in del File
"C:\Users\Admin\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py",
line 362, in put AttributeError: 'NoneType' object has no attribute 'dumps'
This is my main.py file:
from haystack.reader.farm import FARMReader
from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
from haystack.retriever.sparse import ElasticsearchRetriever
document_store = ElasticsearchDocumentStore(
host="localhost",
username="",
password="",
index="cv",
embedding_dim=768,
embedding_field="embedding")
retriever = ElasticsearchRetriever(document_store=document_store)
reader = FARMReader(model_name_or_path='bert-base-multilingual-uncased')
NOTICE: My elasticsearch server has been started successfully!
Seems like an issue with multiprocessing on Windows. You can disable multiprocessing for the FARMReader like this:
...
reader = FARMReader(model_name_or_path='bert-base-multilingual-uncased', num_processes=0)
See also the docs for more details.

Running Sqoop with Oozie Error: Can not create a Path from an empty string

I am trying to Run Sqoop export with Oozie. I can run simple Sqoop commands (list-tables etc) and I can run my Sqoop export command from the cmd line, however when I run with Oozie I get the following error in my Yarn logs:
Error:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/resource/hadoop/yarn/local/filecache/41235/mapreduce.tar.gz/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/resource/hadoop/yarn/local/filecache/41709/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Note: /tmp/sqoop-yarn/compile/ff5ff27843de6fb697dddfb18c85dbbb/tmp_fact_kpi_da20.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
at org.apache.hadoop.fs.Path.<init>(Path.java:134)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:127)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:95)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:190)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.sqoop.mapreduce.ExportJobBase.doSubmitJob(ExportJobBase.java:326)
at org.apache.sqoop.mapreduce.ExportJobBase.runJob(ExportJobBase.java:303)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:444)
at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:179)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
My workflow.xml is:
<workflow-app name="${jobName}" xmlns="uri:oozie:workflow:0.1">
<start to="sqoop-export" />
<action name="sqoop-export">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>oozie.action.sharelib.for.sqoop</name>
<value>sqoop,hive,hcatalog</value>
</property>
<property>
<name>oozie.sqoop.log.level</name>
<value>${debugLevel}</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://****:9083</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
</property>
</configuration>
<command>export --hcatalog-database modeling_reporting --hcatalog-table fact_kpi_da20 --table tmp_fact_kpi_da20 --connect jdbc:sqlserver://****.database.windows.net:1433;databaseName=****;user=****;password=****
</command>
</sqoop>
<ok to="end"/>
<error to="sqoop-load-fail"/>
</action>
<kill name="sqoop-load-fail">
<message>Sqoop export failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>
And my job.properties includes:
oozie.use.system.libpath=true
oozie.wf.application.path=/user/abc
I run the job with:
oozie job -config job.properties -run
Additional logs show the job is able to connect to my destination table and verifies that my columns match:
7222 [main] DEBUG org.apache.sqoop.orm.CompilationManager - Finished writing jar file /tmp/sqoop-yarn/compile/24e897ef3439fabb89090a4dbe4c9be1/tmp_fact_kpi_da20.jar
7222 [main] DEBUG org.apache.sqoop.orm.CompilationManager - Finished writing jar file /tmp/sqoop-yarn/compile/24e897ef3439fabb89090a4dbe4c9be1/tmp_fact_kpi_da20.jar
7235 [main] INFO org.apache.sqoop.mapreduce.ExportJobBase - Beginning export of tmp_fact_kpi_da20
7235 [main] INFO org.apache.sqoop.mapreduce.ExportJobBase - Beginning export of tmp_fact_kpi_da20
7235 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
7240 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
7240 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
7240 [main] INFO org.apache.sqoop.mapreduce.ExportJobBase - Configuring HCatalog for export job
7240 [main] INFO org.apache.sqoop.mapreduce.ExportJobBase - Configuring HCatalog for export job
7257 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Configuring HCatalog specific details for job
7257 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Configuring HCatalog specific details for job
7493 [main] DEBUG org.apache.sqoop.manager.SqlManager - Execute getColumnInfoRawQuery : SELECT t.* FROM [tmp_fact_kpi_da20] AS t WHERE 1=0
7493 [main] DEBUG org.apache.sqoop.manager.SqlManager - Execute getColumnInfoRawQuery : SELECT t.* FROM [tmp_fact_kpi_da20] AS t WHERE 1=0
7493 [main] DEBUG org.apache.sqoop.manager.SqlManager - Using fetchSize for next query: 1000
7493 [main] DEBUG org.apache.sqoop.manager.SqlManager - Using fetchSize for next query: 1000
7493 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM [tmp_fact_kpi_da20] AS t WHERE 1=0
7493 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM [tmp_fact_kpi_da20] AS t WHERE 1=0
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column eventdate of type [-9, 10, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column eventdate of type [-9, 10, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column clientregion of type [-9, 4, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column clientregion of type [-9, 4, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column clientjourney of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column clientjourney of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column eventtype of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column eventtype of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column eventreason of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column eventreason of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column feature of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column feature of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column customerdevice of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column customerdevice of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column customerbrowser of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column customerbrowser of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column customercountryiso2 of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column customercountryiso2 of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column clientcurrencyiso3 of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column clientcurrencyiso3 of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column eventcount of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column eventcount of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column uniqueeventcount of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column uniqueeventcount of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column sales of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column sales of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column salesvalue of type [3, 38, 2]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column salesvalue of type [3, 38, 2]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column salesvaluegbp of type [3, 38, 2]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column salesvaluegbp of type [3, 38, 2]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column started_customersurveys of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column started_customersurveys of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column completed_customersurveys of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column completed_customersurveys of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column started_emailsubscriptions of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column started_emailsubscriptions of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column completed_emailsubscriptions of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column completed_emailsubscriptions of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column started_problemsolversurveys of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column started_problemsolversurveys of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column completed_problemsolversurveys of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column completed_problemsolversurveys of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column scenario of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column scenario of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column abtestgroup of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column abtestgroup of type [-9, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column abtestid of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column abtestid of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column abtestiscontrol of type [-7, 1, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column abtestiscontrol of type [-7, 1, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column appversion of type [12, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column appversion of type [12, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column agentid of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column agentid of type [-5, 19, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column pdate of type [12, 50, 0]
7582 [main] DEBUG org.apache.sqoop.manager.SqlManager - Found column pdate of type [12, 50, 0]
7670 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Database column names projected : [eventdate, clientregion, clientjourney, eventtype, eventreason, feature, customerdevice, customerbrowser, customercountryiso2, clientcurrencyiso3, eventcount, uniqueeventcount, sales, salesvalue, salesvaluegbp, started_customersurveys, completed_customersurveys, started_emailsubscriptions, completed_emailsubscriptions, started_problemsolversurveys, completed_problemsolversurveys, scenario, abtestgroup, abtestid, abtestiscontrol, appversion, agentid, pdate]
7670 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Database column names projected : [eventdate, clientregion, clientjourney, eventtype, eventreason, feature, customerdevice, customerbrowser, customercountryiso2, clientcurrencyiso3, eventcount, uniqueeventcount, sales, salesvalue, salesvaluegbp, started_customersurveys, completed_customersurveys, started_emailsubscriptions, completed_emailsubscriptions, started_problemsolversurveys, completed_problemsolversurveys, scenario, abtestgroup, abtestid, abtestiscontrol, appversion, agentid, pdate]
7670 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Database column name - info map :
started_customersurveys : [Type : -5,Precision : 19,Scale : 0]
pdate : [Type : 12,Precision : 50,Scale : 0]
uniqueeventcount : [Type : -5,Precision : 19,Scale : 0]
sales : [Type : -5,Precision : 19,Scale : 0]
customerbrowser : [Type : -9,Precision : 50,Scale : 0]
salesvalue : [Type : 3,Precision : 38,Scale : 2]
abtestiscontrol : [Type : -7,Precision : 1,Scale : 0]
feature : [Type : -9,Precision : 50,Scale : 0]
scenario : [Type : -9,Precision : 50,Scale : 0]
clientregion : [Type : -9,Precision : 4,Scale : 0]
eventcount : [Type : -5,Precision : 19,Scale : 0]
customercountryiso2 : [Type : -9,Precision : 50,Scale : 0]
completed_emailsubscriptions : [Type : -5,Precision : 19,Scale : 0]
salesvaluegbp : [Type : 3,Precision : 38,Scale : 2]
abtestid : [Type : -5,Precision : 19,Scale : 0]
agentid : [Type : -5,Precision : 19,Scale : 0]
started_emailsubscriptions : [Type : -5,Precision : 19,Scale : 0]
completed_problemsolversurveys : [Type : -5,Precision : 19,Scale : 0]
appversion : [Type : 12,Precision : 50,Scale : 0]
customerdevice : [Type : -9,Precision : 50,Scale : 0]
clientjourney : [Type : -9,Precision : 50,Scale : 0]
eventdate : [Type : -9,Precision : 10,Scale : 0]
eventreason : [Type : -9,Precision : 50,Scale : 0]
abtestgroup : [Type : -9,Precision : 50,Scale : 0]
clientcurrencyiso3 : [Type : -9,Precision : 50,Scale : 0]
completed_customersurveys : [Type : -5,Precision : 19,Scale : 0]
started_problemsolversurveys : [Type : -5,Precision : 19,Scale : 0]
eventtype : [Type : -9,Precision : 50,Scale : 0]
7670 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Database column name - info map :
started_customersurveys : [Type : -5,Precision : 19,Scale : 0]
pdate : [Type : 12,Precision : 50,Scale : 0]
uniqueeventcount : [Type : -5,Precision : 19,Scale : 0]
sales : [Type : -5,Precision : 19,Scale : 0]
customerbrowser : [Type : -9,Precision : 50,Scale : 0]
salesvalue : [Type : 3,Precision : 38,Scale : 2]
abtestiscontrol : [Type : -7,Precision : 1,Scale : 0]
feature : [Type : -9,Precision : 50,Scale : 0]
scenario : [Type : -9,Precision : 50,Scale : 0]
clientregion : [Type : -9,Precision : 4,Scale : 0]
eventcount : [Type : -5,Precision : 19,Scale : 0]
customercountryiso2 : [Type : -9,Precision : 50,Scale : 0]
completed_emailsubscriptions : [Type : -5,Precision : 19,Scale : 0]
salesvaluegbp : [Type : 3,Precision : 38,Scale : 2]
abtestid : [Type : -5,Precision : 19,Scale : 0]
agentid : [Type : -5,Precision : 19,Scale : 0]
started_emailsubscriptions : [Type : -5,Precision : 19,Scale : 0]
completed_problemsolversurveys : [Type : -5,Precision : 19,Scale : 0]
appversion : [Type : 12,Precision : 50,Scale : 0]
customerdevice : [Type : -9,Precision : 50,Scale : 0]
clientjourney : [Type : -9,Precision : 50,Scale : 0]
eventdate : [Type : -9,Precision : 10,Scale : 0]
eventreason : [Type : -9,Precision : 50,Scale : 0]
abtestgroup : [Type : -9,Precision : 50,Scale : 0]
clientcurrencyiso3 : [Type : -9,Precision : 50,Scale : 0]
completed_customersurveys : [Type : -5,Precision : 19,Scale : 0]
started_problemsolversurveys : [Type : -5,Precision : 19,Scale : 0]
eventtype : [Type : -9,Precision : 50,Scale : 0]
7834 [main] INFO org.apache.hive.hcatalog.common.HiveClientCache - Initializing cache: eviction-timeout=120 initial-capacity=50 maximum-capacity=50
7872 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://d-u2-prcs-sv-01.veproduction.dom:9083
7917 [main] INFO hive.metastore - Connected to metastore.
10113 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog full table schema fields = [eventdate, clientregion, clientjourney, eventtype, eventreason, feature, customerdevice, customerbrowser, customercountryiso2, clientcurrencyiso3, eventcount, uniqueeventcount, sales, salesvalue, salesvaluegbp, started_customersurveys, completed_customersurveys, started_emailsubscriptions, completed_emailsubscriptions, started_problemsolversurveys, completed_problemsolversurveys, scenario, abtestgroup, abtestid, abtestiscontrol, appversion, agentid, pdate]
10113 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog full table schema fields = [eventdate, clientregion, clientjourney, eventtype, eventreason, feature, customerdevice, customerbrowser, customercountryiso2, clientcurrencyiso3, eventcount, uniqueeventcount, sales, salesvalue, salesvaluegbp, started_customersurveys, completed_customersurveys, started_emailsubscriptions, completed_emailsubscriptions, started_problemsolversurveys, completed_problemsolversurveys, scenario, abtestgroup, abtestid, abtestiscontrol, appversion, agentid, pdate]
10849 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog table partitioning key fields = [pdate]
10849 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog table partitioning key fields = [pdate]
10849 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog projected schema fields = [eventdate, clientregion, clientjourney, eventtype, eventreason, feature, customerdevice, customerbrowser, customercountryiso2, clientcurrencyiso3, eventcount, uniqueeventcount, sales, salesvalue, salesvaluegbp, started_customersurveys, completed_customersurveys, started_emailsubscriptions, completed_emailsubscriptions, started_problemsolversurveys, completed_problemsolversurveys, scenario, abtestgroup, abtestid, abtestiscontrol, appversion, agentid, pdate]
10849 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog projected schema fields = [eventdate, clientregion, clientjourney, eventtype, eventreason, feature, customerdevice, customerbrowser, customercountryiso2, clientcurrencyiso3, eventcount, uniqueeventcount, sales, salesvalue, salesvaluegbp, started_customersurveys, completed_customersurveys, started_emailsubscriptions, completed_emailsubscriptions, started_problemsolversurveys, completed_problemsolversurveys, scenario, abtestgroup, abtestid, abtestiscontrol, appversion, agentid, pdate]
10889 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog job : Hive Home = /usr/lib/hive
10889 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog job : Hive Home = /usr/lib/hive
10889 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog job: HCatalog Home = /usr/lib/hcatalog
10889 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog job: HCatalog Home = /usr/lib/hcatalog
10920 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Adding jar files under /usr/lib/hcatalog/share/hcatalog to distributed cache
10920 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Adding jar files under /usr/lib/hcatalog/share/hcatalog to distributed cache
10920 [main] WARN org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - No files under /usr/lib/hcatalog/share/hcatalog to add to distributed cache for hcatalog job
10920 [main] WARN org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - No files under /usr/lib/hcatalog/share/hcatalog to add to distributed cache for hcatalog job
10920 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Adding jar files under /usr/lib/hcatalog/lib to distributed cache
10920 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Adding jar files under /usr/lib/hcatalog/lib to distributed cache
10920 [main] WARN org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - No files under /usr/lib/hcatalog/lib to add to distributed cache for hcatalog job
10920 [main] WARN org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - No files under /usr/lib/hcatalog/lib to add to distributed cache for hcatalog job
10920 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Adding jar files under /usr/lib/hive/lib to distributed cache
10920 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Adding jar files under /usr/lib/hive/lib to distributed cache
10920 [main] WARN org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - No files under /usr/lib/hive/lib to add to distributed cache for hcatalog job
10920 [main] WARN org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - No files under /usr/lib/hive/lib to add to distributed cache for hcatalog job
10920 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Adding jar files under /usr/lib/hcatalog/share/hcatalog/storage-handlers to distributed cache (recursively)
10920 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Adding jar files under /usr/lib/hcatalog/share/hcatalog/storage-handlers to distributed cache (recursively)
10920 [main] WARN org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - No files under /usr/lib/hcatalog/share/hcatalog/storage-handlers to add to distributed cache for hcatalog job
10920 [main] WARN org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - No files under /usr/lib/hcatalog/share/hcatalog/storage-handlers to add to distributed cache for hcatalog job
10921 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Using InputFormat: class org.apache.sqoop.mapreduce.hcat.SqoopHCatExportFormat
10921 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Using InputFormat: class org.apache.sqoop.mapreduce.hcat.SqoopHCatExportFormat
10921 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Configuring HCatalog for export job
10921 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Configuring HCatalog for export job
10921 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Ignoring configuration request for HCatalog info
10921 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Ignoring configuration request for HCatalog info
11112 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
11112 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
11112 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
11113 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Adding to job classpath: file:/mnt/resource/hadoop/yarn/local/filecache/14417/sqoop-1.4.6.2.6.2.0-205.jar
11113 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Adding to job classpath: file:/mnt/resource/hadoop/yarn/local/filecache/14417/sqoop-1.4.6.2.6.2.0-205.jar
11114 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Adding to job classpath: file:/mnt/resource/hadoop/yarn/local/usercache/louiscronin/filecache/1096/mssql-jdbc-8.2.2.jre8.jar
11114 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Adding to job classpath: file:/mnt/resource/hadoop/yarn/local/usercache/louiscronin/filecache/1096/mssql-jdbc-8.2.2.jre8.jar
11115 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Adding to job classpath: file:/mnt/resource/hadoop/yarn/local/filecache/14417/sqoop-1.4.6.2.6.2.0-205.jar
11115 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Adding to job classpath: file:/mnt/resource/hadoop/yarn/local/filecache/14417/sqoop-1.4.6.2.6.2.0-205.jar
11116 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Adding to job classpath: file:/mnt/resource/hadoop/yarn/local/filecache/14417/sqoop-1.4.6.2.6.2.0-205.jar
11116 [main] DEBUG org.apache.sqoop.mapreduce.JobBase - Adding to job classpath: file:/mnt/resource/hadoop/yarn/local/filecache/14417/sqoop-1.4.6.2.6.2.0-205.jar
11116 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
11116 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
11210 [main] INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at d-u2-prcs-nm-03.veproduction.dom/172.28.50.22:10200
11330 [main] INFO org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider - Looking for the active RM in [rm1, rm2]...
11336 [main] INFO org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider - Found active RM [rm2]
11518 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/louiscronin/.staging/job_1612961662367_2162
11546 [main] WARN org.apache.hadoop.fs.azure.AzureFileSystemThreadPoolExecutor - Disabling threads for Delete operation as thread count 0 is <= 1
11554 [main] INFO org.apache.hadoop.fs.azure.AzureFileSystemThreadPoolExecutor - Time taken for Delete operation is: 9 ms with threads: 0
11603 [main] ERROR org.apache.sqoop.Sqoop - Got exception running Sqoop: java.lang.IllegalArgumentException: Can not create a Path from an empty string
11603 [main] ERROR org.apache.sqoop.Sqoop - Got exception running Sqoop: java.lang.IllegalArgumentException: Can not create a Path from an empty string

Apache PIG, ELEPHANTBIRDJSON Loader

I'm trying to parse below input (there are 2 records in this input)using Elephantbird json loader
[{"node_disk_lnum_1":36,"node_disk_xfers_in_rate_sum":136.40000000000001,"node_disk_bytes_in_rate_22":
187392.0, "node_disk_lnum_7": 13}]
[{"node_disk_lnum_1": 36, "node_disk_xfers_in_rate_sum":
105.2,"node_disk_bytes_in_rate_22": 123084.8, "node_disk_lnum_7":13}]
Here is my syntax:
register '/home/data/Desktop/elephant-bird-pig-4.1.jar';
a = LOAD '/pig/tc1.log' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);
b = FOREACH a GENERATE flatten(json#'node_disk_lnum_1') AS
node_disk_lnum_1,flatten(json#'node_disk_xfers_in_rate_sum') AS
node_disk_xfers_in_rate_sum,flatten(json#'node_disk_bytes_in_rate_22') AS
node_disk_bytes_in_rate_22, flatten(json#'node_disk_lnum_7') AS
node_disk_lnum_7;
DESCRIBE b;
b describe result:
b: {node_disk_lnum_1: bytearray,node_disk_xfers_in_rate_sum:
bytearray,node_disk_bytes_in_rate_22: bytearray,node_disk_lnum_7:
bytearray}
c = FOREACH b GENERATE node_disk_lnum_1;
DESCRIBE c;
c: {node_disk_lnum_1: bytearray}
DUMP c;
Expected Result:
36, 136.40000000000001, 187392.0, 13
36, 105.2, 123084.8, 13
Throwing the below error
2017-02-06 01:05:49,337 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN 2017-02-06 01:05:49,386 [main] INFO
org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not
set... will not generate code. 2017-02-06 01:05:49,387 [main] INFO
org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,
GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter,
MergeFilter, MergeForEach, PartitionFilterOptimizer,
PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter,
SplitFilter, StreamTypeCastInserter]} 2017-02-06 01:05:49,390 [main]
INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Map
key required for a: $0->[node_disk_lnum_1,
node_disk_xfers_in_rate_sum, node_disk_bytes_in_rate_22,
node_disk_lnum_7]
2017-02-06 01:05:49,395 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false 2017-02-06 01:05:49,398 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1 2017-02-06 01:05:49,398 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1 2017-02-06 01:05:49,425 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig
script settings are added to the job 2017-02-06 01:05:49,426 [main]
INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2017-02-06 01:05:49,428 [main] ERROR
org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal
error. com/twitter/elephantbird/util/HadoopCompat
Please help what am I missing?
You do not have any nested data in your json,so remove -nestedload
a = LOAD '/pig/tc1.log' USING com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]);

French coreference annotation using CoreNLP

can someone help me to correct my setting for performing coreference annotation for French by using coreNLP? I have tryed the basic suggestion by editing the properties file:
annotators = tokenize, ssplit, pos, parse, lemma, ner, parse, depparse, mention, coref
tokenize.language = fr
pos.model = edu/stanford/nlp/models/pos-tagger/french/french.tagger
parse.model = edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz
The command:
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props frenchProps.properties -file frenchFile.txt
which gets the following output log:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/french/french.tagger ... done [0.3 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz ...
done [2.2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.9 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFORMACIÓN: Read 83 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFORMACIÓN: Read 267 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFORMACIÓN: Read 25 rules
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
PreComputed 100000, Elapsed Time: 1.639 (s)
Initializing dependency parser done [6.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator mention
Using mention detector type: rule
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3097)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2892)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1646)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at java.util.HashMap.readObject(HashMap.java:1402)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:324)
at edu.stanford.nlp.scoref.SimpleLinearClassifier.<init>(SimpleLinearClassifier.java:30)
at edu.stanford.nlp.scoref.PairwiseModel.<init>(PairwiseModel.java:75)
at edu.stanford.nlp.scoref.PairwiseModel$Builder.build(PairwiseModel.java:57)
at edu.stanford.nlp.scoref.ClusteringCorefSystem.<init>(ClusteringCorefSystem.java:31)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.fromProps(StatisticalCorefSystem.java:48)
at edu.stanford.nlp.pipeline.CorefAnnotator.<init>(CorefAnnotator.java:66)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.coref(AnnotatorImplementations.java:220)
at edu.stanford.nlp.pipeline.AnnotatorFactories$13.create(AnnotatorFactories.java:515)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:375)
Which made me to think there are extra missing configuration stuff.
AFAIK CoreNLP doesn't offer coreference resolution for French. (see also http://stanfordnlp.github.io/CoreNLP/coref.html)

Stanford CoreNLP dedicated server ignoring annotators input

I'm running the CoreNLP dedicated server on AWS and trying to make a request from ruby. The server seems to be receiving the request correctly but the issue is the server seems to ignore the input annotators list and always default to all annotators. My Ruby code to make the request looks like so:
uri = URI.parse(URI.encode('http://ec2-************.compute.amazonaws.com//?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos", "outputFormat": "json"}'))
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new("/v1.1/auth")
request.add_field('Content-Type', 'application/json')
request.body = text
response = http.request(request)
json = JSON.parse(response.body)
In the nohup.out logs on the server I see the following:
[/38.122.182.107:53507] API call w/annotators tokenize,ssplit,pos,depparse,lemma,ner,mention,coref,natlog,openie
....
INPUT TEXT BLOCK HERE
....
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [2.0 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
PreComputed 100000, Elapsed Time: 2.259 (s)
Initializing dependency parser done [5.1 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.6 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.2 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [7.2 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Feb 22, 2016 11:37:20 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 83 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Feb 22, 2016 11:37:20 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 267 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
Feb 22, 2016 11:37:20 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 25 rules
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator mention
Using mention detector type: dependency
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
etc etc.
When I run test queries using wget on the command line it seems to work fine.
wget --post-data 'the quick brown fox jumped over the lazy dog' 'ec2-*******.compute.amazonaws.com/?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos", "outputFormat": "json"}' -O -
Any help as to why this is happening would be appreicated thanks!
It turns out the request was being constructed incorrectly. The path should be in the argument to the Post.new. Corrected code below in case it helps anyone:
host = "http://ec2-***********.us-west-2.compute.amazonaws.com"
path = '/?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos", "outputFormat": "json"}'
encoded_path = URI.encode(path)
uri = URI.parse(URI.encode(host))
http = Net::HTTP.new(uri.host, uri.port)
http.set_debug_output($stdout)
# request = Net::HTTP::Post.new("/v1.1/auth")
request = Net::HTTP::Post.new(encoded_path)
request.add_field('Content-Type', 'application/json')
request.body = text
response = http.request(request)
json = JSON.parse(response.body)

Resources