Solr is printing lot of lines like below in slow_query logs.
solr_slow_requests.log.1:2023-02-03 17:30:27.084 WARN (qtp1961945640-50747) [c:products s:shard4 r:core_node47 x:products_shard4_replica_p44] o.a.s.c.S.SlowRequest slow: [products_shard4_replica_p44] webapp=/solr path=/select params={} rid=10.0.61.80-5704218 hits=9309 status=0 QTime=1280
solr_slow_requests.log.1:2023-02-03 17:30:27.157 WARN (qtp1961945640-50744) [c:products s:shard4 r:core_node47 x:products_shard4_replica_p44] o.a.s.c.S.SlowRequest slow: [products_shard4_replica_p44] webapp=/solr path=/select params={} rid=10.0.61.80-5704223 hits=9730 status=0 QTime=1508
solr_slow_requests.log.1:2023-02-03 17:30:27.325 WARN (qtp1961945640-50742) [c:products s:shard5 r:core_node59 x:products_shard5_replica_p56] o.a.s.c.S.SlowRequest slow: [products_shard5_replica_p56] webapp=/solr path=/select params={} rid=10.0.61.80-5704234 hits=9309 status=0 QTime=1993
solr_slow_requests.log.1:2023-02-03 17:30:27.326 WARN (qtp1961945640-50746) [c:products s:shard4 r:core_node47 x:products_shard4_replica_p44] o.a.s.c.S.SlowRequest slow: [products_shard4_replica_p44] webapp=/solr path=/select params={} rid=10.0.61.80-5704235 hits=9309 status=0 QTime=1994
solr_slow_requests.log.1:2023-02-03 17:30:27.657 WARN (qtp1961945640-50668) [c:products s:shard2 r:core_node23 x:products_shard2_replica_p20] o.a.s.c.S.SlowRequest slow: [products_shard2_replica_p20] webapp=/solr path=/select params={} rid=10.0.61.80-5704247 hits=9730 status=0 QTime=1140
solr_slow_requests.log.1:2023-02-03 17:30:27.700 WARN (qtp1961945640-50757) [c:products s:shard3 r:core_node35 x:products_shard3_replica_p32] o.a.s.c.S.SlowRequest slow: [products_shard3_replica_p32] webapp=/solr path=/select params={} rid=10.0.61.80-5704249 hits=9730 status=0 QTime=1068
solr_slow_requests.log.1:2023-02-03 17:30:27.720 WARN (qtp1961945640-50661) [c:products s:shard3 r:core_node35 x:products_shard3_replica_p32] o.a.s.c.S.SlowRequest slow: [products_shard3_replica_p32] webapp=/solr path=/select params={} rid=10.0.61.80-5704254 hits=9309 status=0 QTime=1023
solr_slow_requests.log.1:2023-02-03 17:30:27.816 WARN (qtp1961945640-49782) [c:products s:shard6 r:core_node71 x:products_shard6_replica_p68] o.a.s.c.S.SlowRequest slow: [products_shard6_replica_p68] webapp=/solr path=/select params={} rid=10.0.61.80-5704262 hits=9730 status=0 QTime=1246
solr_slow_requests.log.1:2023-02-03 17:30:27.825 WARN (qtp1961945640-50750) [c:products s:shard5 r:core_node59 x:products_shard5_replica_p56] o.a.s.c.S.SlowRequest slow: [products_shard5_replica_p56] webapp=/solr path=/select params={} rid=10.0.61.80-5704263 hits=9730 status=0 QTime=1847
solr_slow_requests.log.1:2023-02-03 17:30:27.888 WARN (qtp1961945640-50711) [c:products s:shard3 r:core_node35 x:products_shard3_replica_p32] o.a.s.c.S.SlowRequest slow: [products_shard3_replica_p32] webapp=/solr path=/select params={} rid=10.0.61.80-5704266 hits=9309 status=0 QTime=1150
solr_slow_requests.log.1:2023-02-03 17:30:27.995 WARN (qtp1961945640-50734) [c:products s:shard5 r:core_node59 x:products_shard5_replica_p56] o.a.s.c.S.SlowRequest slow: [products_shard5_replica_p56] webapp=/solr path=/select params={} rid=10.0.61.80-5704277 hits=9730 status=0 QTime=1481
I have two queries:
Why it doesn't print the params? Why params is always empty? params={} even though our requests send a lot of params.
Why is it taking too much time while No CPU spike?
We have 6 Node cluster with 6 shards of product collection. We are using 16 core machines with 32 GB Ram each.
A sample of our query looks like this:
solr.log.1:2023-02-06 12:02:54.090 INFO (qtp1961945640-4677) [c:products s:shard1 r:core_node11 x:products_shard1_replica_p8] o.a.s.c.S.Request [products_shard1_replica_p8] webapp=/solr path=/select params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=16388&start=0&fsv=true&fq=channel_identifier:632aff00940b4e27c80986f3&fq=zone_identifier:"_all_"&fq=is_available:True&fq=image_nature:("standard"+OR+"substandard"+OR+"default")&fq=product_online_date:[*+TO+NOW]&fq={!tag%3Dbrand_id}brand_id:("74"+OR+"235")&sort=popularity+desc+,id+asc&shard.url=http://IP:8983/solr/products_shard1_replica_p8/|http://IP:8983/solr/products_shard1_replica_n2/|http://IP:8983/solr/products_shard1_replica_p6/|http://IP:8983/solr/products_shard1_replica_t4/|http://IP:8983/solr/products_shard1_replica_p10/|http://IP:8983/solr/products_shard1_replica_n1/&rows=11550&rid=IP-225159&version=2&q=*:*&omitHeader=false&NOW=1675684973896&json={"query":+"*:*",+"params":+{"df":+"_text_",+"_route_":+"632aff00940b4e27c80986f3/2!",+"start":+11500,+"rows":+50},+"fields":+["*+score"],+"filter":+["channel_identifier:632aff00940b4e27c80986f3",+"zone_identifier:\"_all_\"",+"is_available:True",+"image_nature:(\"standard\"+OR+\"substandard\"+OR+\"default\")",+"product_online_date:[*+TO+NOW]",+{"#brand_id":+"brand_id:(\"74\"+OR+\"235\")"}],+"sort":+"popularity+desc+,id+asc"}&isShard=true&wt=javabin&_route_=632aff00940b4e27c80986f3/2!} hits=42585 status=0 QTime=193
It is happening only during a Load Test. Some of the params of Solr are as below:
-Djetty.home=/opt/solr/server
-Djetty.port=8983
-Dlog4j2.formatMsgNoLookups=true
-Dnewrelic.environment=[]
-Dsolr.data.home=
-Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
-Dsolr.documentCache.initialSize=8339
-Dsolr.documentCache.size=8339
-Dsolr.filterCache.initialSize=8339
-Dsolr.filterCache.size=8339
-Dsolr.install.dir=/opt/solr
-Dsolr.jetty.inetaccess.excludes=
-Dsolr.jetty.inetaccess.includes=
-Dsolr.log.dir=/var/solr/data/logs
-Dsolr.log.muteconsole
-Dsolr.queryResultCache.initialSize=6671
-Dsolr.queryResultCache.size=6671
-Dsolr.solr.home=/var/solr/data/data
-Duser.timezone=UTC
-DzkClientTimeout=30000
-XX:+AggressiveOpts
-XX:+AlwaysPreTouch
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseG1GC
-XX:+UseGCLogFileRotation
-XX:+UseLargePages
-XX:-OmitStackTraceInFastThrow
-XX:-OmitStackTraceInFastThrow
-XX:ConcGCThreads=4
-XX:G1ReservePercent=10
-XX:GCLogFileSize=20M
-XX:InitiatingHeapOccupancyPercent=80
-XX:MaxGCPauseMillis=100
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/data/logs
-XX:ParallelGCThreads=4
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=4
-Xloggc:/var/solr/data/logs/solr_gc.log
-Xms20000m
-Xmx20000m
-Xss256k
-javaagent:/opt/solr/contrib/newrelic/newrelic.jar
-verbose:gc
Related
I'm migrating a logstash into a EC2 instance.
It's running a AmazonLinux.
By the command tail -f /var/log/logstash/logstash-plain.log
I'm getting a the follow log cycling/repeating
2017-12-20T15:30:24,742][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/share/logstash/modules/netflow/configuration"}
[2017-12-20T15:30:24,745][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[2017-12-20T15:30:27,342][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://search-ivendas-sz2q3f573vro6xlncwjnvzbf2m.us-east-1.es.amazonaws.com:443/]}}
[2017-12-20T15:30:27,343][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>https://search-ivendas-sz2q3f573vro6xlncwjnvzbf2m.us-east-1.es.amazonaws.com:443/, :path=>"/"}
[2017-12-20T15:30:28,040][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"https://search-ivendas-sz2q3f573vro6xlncwjnvzbf2m.us-east-1.es.amazonaws.com:443/"}
[2017-12-20T15:30:28,175][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2017-12-20T15:30:28,185][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"#timestamp"=>{"type"=>"date", "include_in_all"=>false}, "#version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2017-12-20T15:30:28,201][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//search-ivendas-sz2q3f573vro6xlncwjnvzbf2m.us-east-1.es.amazonaws.com:443"]}
[2017-12-20T15:30:28,385][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
[2017-12-20T15:30:29,298][INFO ][logstash.pipeline ] Pipeline main started
[2017-12-20T15:30:29,502][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2017-12-20T15:30:29,979][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#<NameError: undefined local variable or method `dotfile' for #<AwesomePrint::Inspector:0x18bafa48>>, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/1.9/gems/awesome_print-1.8.0/lib/awesome_print/inspector.rb:163:in `merge_custom_defaults!'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/awesome_print-1.8.0/lib/awesome_print/inspector.rb:50:in `initialize'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/awesome_print-1.8.0/lib/awesome_print/core_ext/kernel.rb:9:in `ai'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-rubydebug-3.0.5/lib/logstash/codecs/rubydebug.rb:39:in `encode_default'", "org/jruby/RubyMethod.java:120:in `call'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-rubydebug-3.0.5/lib/logstash/codecs/rubydebug.rb:35:in `encode'", "/usr/share/logstash/logstash-core/lib/logstash/codecs/base.rb:50:in `multi_encode'", "org/jruby/RubyArray.java:1613:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/codecs/base.rb:50:in `multi_encode'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:90:in `multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/output_delegator_strategies/single.rb:15:in `multi_receive'", "org/jruby/ext/thread/Mutex.java:149:in `synchronize'", "/usr/share/logstash/logstash-core/lib/logstash/output_delegator_strategies/single.rb:14:in `multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/output_delegator.rb:49:in `multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:434:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:433:in `output_batch'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:381:in `worker_loop'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:342:in `start_workers'"]}
I did installed the missing plugins, before I was getting another errors.
Is there someway to get more details about the problem ?
What am I missing ?
This is an issue with awesome-print plugin for rubydebug codec. set the HOME env variable (export HOME=<path_to_aprc_file>) which will be used to load .aprc configuration required by plugin. Refer this to persist this env variable.
I am using the pycorenlp client in order to talk to the Stanford CoreNLP Server. In my setup I am setting pipelineLanguage to german like this:
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
text = 'Das große Auto.'
output = nlp.annotate(text, properties={
'annotators': 'tokenize,ssplit,pos,depparse,parse',
'outputFormat': 'json',
'pipelineLanguage': 'german'
})
However, from the looks I'd say that it's not working:
output['sentences'][0]['tokens']
will return:
[{'after': ' ',
'before': '',
'characterOffsetBegin': 0,
'characterOffsetEnd': 3,
'index': 1,
'originalText': 'Das',
'pos': 'NN',
'word': 'Das'},
{'after': ' ',
'before': ' ',
'characterOffsetBegin': 4,
'characterOffsetEnd': 9,
'index': 2,
'originalText': 'große',
'pos': 'NN',
'word': 'große'},
{'after': '',
'before': ' ',
'characterOffsetBegin': 10,
'characterOffsetEnd': 14,
'index': 3,
'originalText': 'Auto',
'pos': 'NN',
'word': 'Auto'},
{'after': '',
'before': '',
'characterOffsetBegin': 14,
'characterOffsetEnd': 15,
'index': 4,
'originalText': '.',
'pos': '.',
'word': '.'}]
This should be more like
Das große Auto
POS: DT JJ NN
It seems to me that setting 'pipelineLanguage': 'de' does not work for some reason.
I've executed
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
in order to start the server.
I am getting the following from the logger:
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
[pool-1-thread-3] ERROR CoreNLP - Failure to load language specific properties: StanfordCoreNLP-german.properties for german
[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:60700] API call w/annotators tokenize,ssplit,pos,depparse,parse
Das große Auto.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 8.645 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [9.8 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec].
Apparently the server is loading the models for the English language - without warning me about that.
Alright, I just downloaded the models jar for German from the website and moved it into the directory where I extracted the server e.g.
~/Downloads/stanford-corenlp-full-2017-06-09 $
After re-running the server, the model was successfully loaded.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/german/german-hgc.tagger ... done [5.1 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/UD_German.gz ...
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99984, Elapsed Time: 11.419 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [12.2 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/germanFactored.ser.gz ... done [1.0 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-3] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/german.conll.hgc_175m_600.crf.ser.gz ... done [0.7 sec].
can someone help me to correct my setting for performing coreference annotation for French by using coreNLP? I have tryed the basic suggestion by editing the properties file:
annotators = tokenize, ssplit, pos, parse, lemma, ner, parse, depparse, mention, coref
tokenize.language = fr
pos.model = edu/stanford/nlp/models/pos-tagger/french/french.tagger
parse.model = edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz
The command:
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props frenchProps.properties -file frenchFile.txt
which gets the following output log:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/french/french.tagger ... done [0.3 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz ...
done [2.2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.9 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFORMACIÓN: Read 83 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFORMACIÓN: Read 267 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFORMACIÓN: Read 25 rules
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
PreComputed 100000, Elapsed Time: 1.639 (s)
Initializing dependency parser done [6.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator mention
Using mention detector type: rule
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3097)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2892)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1646)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at java.util.HashMap.readObject(HashMap.java:1402)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:324)
at edu.stanford.nlp.scoref.SimpleLinearClassifier.<init>(SimpleLinearClassifier.java:30)
at edu.stanford.nlp.scoref.PairwiseModel.<init>(PairwiseModel.java:75)
at edu.stanford.nlp.scoref.PairwiseModel$Builder.build(PairwiseModel.java:57)
at edu.stanford.nlp.scoref.ClusteringCorefSystem.<init>(ClusteringCorefSystem.java:31)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.fromProps(StatisticalCorefSystem.java:48)
at edu.stanford.nlp.pipeline.CorefAnnotator.<init>(CorefAnnotator.java:66)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.coref(AnnotatorImplementations.java:220)
at edu.stanford.nlp.pipeline.AnnotatorFactories$13.create(AnnotatorFactories.java:515)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:375)
Which made me to think there are extra missing configuration stuff.
AFAIK CoreNLP doesn't offer coreference resolution for French. (see also http://stanfordnlp.github.io/CoreNLP/coref.html)
I'm following this tutorial for setting up nutch alongwith Elasticsearch. Whenever I try to index the data into the ES, it returns an error. Following are the logs:-
Command:-
bin/nutch index elasticsearch -all
Logs when I add elastic.port(9200) in conf/nutch-site.xml :-
2016-05-05 13:22:49,903 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: 100
2016-05-05 13:22:49,904 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2016-05-05 13:22:49,904 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2016-05-05 13:22:49,904 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2016-05-05 13:22:49,905 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.metadata.MetadataIndexer
2016-05-05 13:22:49,906 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.more.MoreIndexingFilter
2016-05-05 13:22:49,961 INFO elastic.ElasticIndexWriter - Processing remaining requests [docs = 0, length = 0, total docs = 0]
2016-05-05 13:22:49,961 INFO elastic.ElasticIndexWriter - Processing to finalize last execute
2016-05-05 13:22:54,898 INFO client.transport - [Peggy Carter] failed to get node info for [#transport#-1][ubuntu][inet[localhost/127.0.0.1:9200]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9200]][cluster:monitor/nodes/info] request_id [1] timed out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-05 13:22:55,682 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
2016-05-05 13:22:55,683 INFO indexer.IndexingJob - Active IndexWriters :
ElasticIndexWriter
elastic.cluster : elastic prefix cluster
elastic.host : hostname
elastic.port : port (default 9300)
elastic.index : elastic index command
elastic.max.bulk.docs : elastic bulk index doc counts. (default 250)
elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)
2016-05-05 13:22:55,711 INFO elasticsearch.plugins - [Adrian Toomes] loaded [], sites []
2016-05-05 13:23:00,763 INFO client.transport - [Adrian Toomes] failed to get node info for [#transport#-1][ubuntu][inet[localhost/127.0.0.1:92$0]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9200]][cluster:monitor/nodes/info] request_id [0] time$ out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-05 13:23:00,766 INFO indexer.IndexingJob - IndexingJob: done.
Logs when default port 9300 is used:-
2016-05-05 13:58:44,584 INFO elasticsearch.plugins - [Mentallo] loaded [], sites []
2016-05-05 13:58:44,673 WARN transport.netty - [Mentallo] Message not fully read (response) for [0] handler future(org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler$1#3c80f1dd), error [true], resetting
2016-05-05 13:58:44,674 INFO client.transport - [Mentallo] failed to get node info for [#transport#-1][ubuntu][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:173)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:125)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.StreamCorruptedException: Unsupported version: 1
at org.elasticsearch.common.io.ThrowableObjectInputStream.readStreamHeader(ThrowableObjectInputStream.java:46)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
at org.elasticsearch.common.io.ThrowableObjectInputStream.<init>(ThrowableObjectInputStream.java:38)
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:170)
... 23 more
2016-05-05 13:58:44,676 INFO indexer.IndexingJob - IndexingJob: done.
I've configured everything fine. Have had a look at various threads as well but to no avail. Also java version for both ES and JVM is same. Is there a bug in here?
I'm using Nutch 2.3.1 and have tried with both ES 1.4.4 and 2.3.2. I can see data in Mongo but I cannot index data in ES. Why??
Getting error while running query on hive over tez. As per logs, hive is failing while copying tez jars to a hdfs location on start of tez session.Below is the complete log obtained from hive log file :
2015-06-19 01:23:52,289 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printInfo(852)) - Query ID = saurabh_20150619012323_f52f1d6c-2adb-4edc-8ba4-b64d7d898325
2015-06-19 01:23:52,289 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printInfo(852)) - Total jobs = 1
2015-06-19 01:23:52,289 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=TimeToSubmit start=1434657232288 end=1434657232289 duration=1 from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:23:52,290 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:23:52,290 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=task.TEZ.Stage-1 from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:23:52,302 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printInfo(852)) - Launching Job 1 out of 1
2015-06-19 01:23:52,302 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (Driver.java:launchTask(1630)) - Starting task [Stage-1:MAPRED] in parallel
2015-06-19 01:23:52,312 INFO [Thread-21]: session.SessionState (SessionState.java:start(488)) - No Tez session required at this point. hive.execution.engine=mr.
2015-06-19 01:23:52,314 INFO [Thread-21]: tez.TezSessionPoolManager (TezSessionPoolManager.java:getSession(125)) - QueueName: null nonDefaultUser: true defaultQueuePool: null blockingQueueLength: -1
2015-06-19 01:23:52,315 INFO [Thread-21]: tez.TezSessionPoolManager (TezSessionPoolManager.java:getNewSessionState(154)) - Created a new session for queue: null session id: 85d83746-a48e-419e-a7ca-8c98faf173ea
2015-06-19 01:23:52,380 INFO [Thread-21]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
2015-06-19 01:23:52,412 INFO [Thread-21]: ql.Context (Context.java:getMRScratchDir(328)) - New scratch dir is hdfs://localhost:9000/tmp/hive/saurabh/e5a701ae-242d-488f-beec-cf18878becdc/hive_2015-06-19_01-23-49_794_2167174123575230985-2
2015-06-19 01:23:52,420 INFO [Thread-21]: exec.Task (TezTask.java:updateSession(233)) - Tez session hasn't been created yet. Opening session
2015-06-19 01:23:52,420 INFO [Thread-21]: tez.TezSessionState (TezSessionState.java:open(142)) - User of session id 85d83746-a48e-419e-a7ca-8c98faf173ea is saurabh
2015-06-19 01:23:52,433 INFO [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(950)) - Localizing resource because it does not exist: file:/usr/lib/tez/* to dest: hdfs://localhost:9000/tmp/hive/saurabh/_tez_session_dir/85d83746-a48e-419e-a7ca-8c98faf173ea/*
2015-06-19 01:23:52,433 INFO [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(954)) - Looks like another thread is writing the same file will wait.
2015-06-19 01:23:52,433 INFO [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(961)) - Number of wait attempts: 5. Wait interval: 5000
2015-06-19 01:24:17,449 ERROR [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(977)) - Could not find the jar that was being uploaded
2015-06-19 01:24:17,451 ERROR [Thread-21]: exec.Task (TezTask.java:execute(184)) - Failed to execute tez graph.
java.io.IOException: Previous writer likely failed to write hdfs://localhost:9000/tmp/hive/saurabh/_tez_session_dir/85d83746-a48e-419e-a7ca-8c98faf173ea/*. Failing because I am unlikely to write too.
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:978)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:859)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:802)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState.java:228)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:154)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:234)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
2015-06-19 01:24:18,329 ERROR [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printError(861)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
2015-06-19 01:24:18,329 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=Driver.execute start=1434657232288 end=1434657258329 duration=26041 from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:24:18,329 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:24:18,329 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=releaseLocks start=1434657258329 end=1434657258329 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:24:18,333 ERROR [HiveServer2-Background-Pool: Thread-41]: operation.Operation (SQLOperation.java:run(200)) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:147)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-19 01:24:18,342 INFO [HiveServer2-Handler-Pool: Thread-29]: exec.ListSinkOperator (Operator.java:close(595)) - 40 finished. closing...
2015-06-19 01:24:18,343 INFO [HiveServer2-Handler-Pool: Thread-29]: exec.ListSinkOperator (Operator.java:close(613)) - 40 Close done
2015-06-19 01:24:18,393 INFO [HiveServer2-Handler-Pool: Thread-29]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:24:18,394 INFO [HiveServer2-Handler-Pool: Thread-29]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=releaseLocks start=1434657258393 end=1434657258394 duration=1 from=org.apache.hadoop.hive.ql.Driver>