I have been using a groovy Script as ScriptType.File. A part of my groovy Script looks like this.
def refApplicValues =_source.refApplicValue;
def lineNumbers = refApplicValues.tokenize('|');
Now Im migrating to ElasticSearch 5.2.1 which uses painless script.I have modified my script a bit to match painless syntax like:
def refApplicValues =params._source.refApplicValue;
def lineNumbers = refApplicValues.tokenize('|');
When i run my script now its throwing runtime error:
Caused by: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: ScriptException[runtime error]; nested: IllegalArgumentException[Unable to find dynamic method [tokenize] with [1] arguments for class [java.lang.String].];
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:405)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:106)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:246)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:360)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:322)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:319)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:610)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:596)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Its telling me that i cant use tokenize . Is there any relevant functionality that can be used instead?
You can use a StringTokenizer in painless
Related
Is there a way to write through Spark Streaming, reading from kafka and write in ElasticSearch?
I tried something like... As explained in the ElasticSearch docs (with very little documentation regarding pyspark):
sc = SparkContext("local[2]", appName="TwitterStreamKafka")
ssc = StreamingContext(sc, batchIntervalSeconds)
topic = url_topic
tweets = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
tweets.pprint()
conf = {"es.resource": "credentials/credential"} # assume Elasticsearch is running on localhost defaults
if tweets.count() > 0:
tweets.foreachRDD(lambda rdd: rdd.saveAsNewAPIHadoopFile(
path='-',
outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=conf))
ssc.start()
ssc.awaitTermination()
But it doesn't work. The error is:
17/11/10 17:16:35 ERROR Utils: Aborting task
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [127.0.0.1:9200] returned Bad Request(400) - failed to parse; Bailing out..
at org.elasticsearch.hadoop.rest.RestClient.processBulkResponse(RestClient.java:251)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:203)
at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:222)
at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:244)
at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:269)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:214)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.close(EsOutputFormat.java:196)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:155)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:144)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:159)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:89)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:88)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/11/10 17:16:35 ERROR SparkHadoopMapReduceWriter: Task attempt_20171110171633_0003_r_000000_0 aborted.
17/11/10 17:16:35 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:178)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:89)
at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:88)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This is the command I use to execute it:
spark-submit --jars elasticsearch-hadoop-5.6.4.jar,spark-streaming-kafka-0-10-assembly_2.11-2.2.0.jar es_spark_write.py
I am using spark 2.2.0
The messages from Kafka are key-message json like these:
(u'urls', u'{"token": "secret_token", "count": 2443}')
I am using kafka + spark streaming to stream messages and do analytics, then saving to phoenix. Some spark job fail several times per day with the following error message:
org.apache.phoenix.schema.IllegalDataException:
java.lang.IllegalArgumentException: Invalid format: ""
at org.apache.phoenix.util.DateUtil$ISODateFormatParser.parseDateTime(DateUtil.java:297)
at org.apache.phoenix.util.DateUtil.parseDateTime(DateUtil.java:163)
at org.apache.phoenix.util.DateUtil.parseTimestamp(DateUtil.java:175)
at org.apache.phoenix.schema.types.PTimestamp.toObject(PTimestamp.java:95)
at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:194)
at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:172)
at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:159)
at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:979)
at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:963)
at org.apache.phoenix.parse.BindParseNode.accept(BindParseNode.java:47)
at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:832)
at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578)
at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324)
at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245)
at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
at org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
at org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:39)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1113)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1251)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Invalid format: ""
at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:673)
at org.apache.phoenix.util.DateUtil$ISODateFormatParser.parseDateTime(DateUtil.java:295)
My code:
val myDF = sqlContext.createDataFrame(myRows, myStruct)
myDF.write
.format(sourcePhoenixSpark)
.mode("overwrite")
.options(Map("table" -> (myPhoenixNamespace + myTable), "zkUrl" -> myPhoenixZKUrl))
.save()
I am using phoenix-spark version 4.7.0-HBase-1.1. Any suggestion to solve the problem would be appreciated. Thanks
You are trying to process dirty data.
That error comes from here:
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/util/DateUtil.java#L301
Where it's trying to parse some string that is expected to be a Date in ISO format and the provided String is empty ("").
You need to prepare+clean your data before attempting to write it to storage.
Can anyone tell me what approach I need to take to solve the below exception?
[2017-05-23 15:28:48,783][DEBUG][action.search.type ] [Mister Buda] [2217] Failed to execute fetch phase
org.elasticsearch.script.groovy.GroovyScriptExecutionException: NullPointerException[null]
at org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript.run(GroovyScriptEngineService.java:274)
at org.elasticsearch.search.fetch.script.ScriptFieldsFetchSubPhase.hitExecute(ScriptFieldsFetchSubPhase.java:74)
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:194)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:516)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:452)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:449)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I see this exception during querying an index, that query contains a script_field.
I am using elasticsearch version 1.5.
This is the script,
(( _source.rating.'${amount}'.costRating.value * costWeighting) + ( _source.rating.flexibilityRating.value * (1 - costWeighting))) * 10
Here's the image of the sample data in the index,
I'm trying to get max and min id by #query elasticsearch spring data ,so I use the following code but I get theis error
No query registered for [aggs]];
#Query(value="{\"aggs\":{\"max_id\":{\"max\":{\"field\":\"id\"}}}}")
Segment findMaxId();
stack trace
org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to
execute phase [dfs], all shards failed; shardFailures
{[QvVc1wxqRoWtkUhGSDULCg][segment][0]: SearchParseException[[segment][0]:
from[0],size[10]: Parse Failure [Failed to parse source
[{"from":0,"size":10,"query_binary":
"eyJhZ2dzIjp7Im1heF9pZCI6eyJtYXgiOnsiZmllb
GQiOiJpZCJ9fX19"}]]]; nested: QueryParsingException[[segment] No query
registered for [aggs]]; }{[QvVc1wxqRoWtkUhGSDULCg][segment][1]:
SearchParseException[[segment][1]: from[0],size[10]: Parse Failure
[Failed to parse source [{"from":0,"size":10,"query_binary":
"eyJhZ2dzIjp7Im1heF9pZCI6eyJtYXgiOnsiZmllbGQiOiJpZCJ9fX19"}]]]; nested:
QueryParsingException[[segment] No query registered for [aggs]];
}{[QvVc1wxqRoWtkUhGSDULCg][segment][2]:
SearchParseException[[segment][2]: from[0],size[10]: Parse Failure
[Failed to parse source
I see an exception when I try to index my logs into ElasticSearch (1.3.4). The root cause of the exception I see is the following (edited my initial post to provide the full stacktrace)
[2015-01-09 15:53:00,953][DEBUG][action.admin.indices.create] [perfgen04 1] [logaggr-2015.01.09] failed to create
org.elasticsearch.index.mapper.MapperParsingException: mapping [test]
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:386)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:328)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.IllegalArgumentException: Invalid format: [ISO8601]: Illegal pattern component: I
at org.elasticsearch.common.joda.Joda.forPattern(Joda.java:160)
at org.elasticsearch.common.joda.Joda.forPattern(Joda.java:37)
at org.elasticsearch.index.mapper.core.TypeParsers.parseDateTimeFormatter(TypeParsers.java:295)
at org.elasticsearch.index.mapper.core.DateFieldMapper$TypeParser.parse(DateFieldMapper.java:155)
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:289)
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:217)
at org.elasticsearch.index.mapper.object.RootObjectMapper$TypeParser.parse(RootObjectMapper.java:136)
at org.elasticsearch.index.mapper.DocumentMapperParser.parse(DocumentMapperParser.java:209)
at org.elasticsearch.index.mapper.DocumentMapperParser.parseCompressed(DocumentMapperParser.java:190)
at org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:440)
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:313)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:383)
... 5 more
Caused by: java.lang.IllegalArgumentException: Illegal pattern component: I
at org.elasticsearch.common.joda.time.format.DateTimeFormat.parsePatternTo(DateTimeFormat.java:570)
at org.elasticsearch.common.joda.time.format.DateTimeFormat.createFormatterForPattern(DateTimeFormat.java:693)
at org.elasticsearch.common.joda.time.format.DateTimeFormat.forPattern(DateTimeFormat.java:181)
at org.elasticsearch.common.joda.Joda.forPattern(Joda.java:158)
... 16 more
I am using logstash (1.4.2) to send my logs to ElasticSearch. My grok filter is pretty simple and is as follows. I am keeping the timestamp as a string "logts".
filter {
grok {
match => [ "message", "%{DATA:logts}%{SPACE}\[%{LOGLEVEL:level}%{SPACE}]%{SPACE}\[%{DATA:thread}]%{SPACE}\[%{DATA:classname}]%{SPACE}%{GREEDYDATA:details}" ]
}
}
A sample line from my log file is:
2015-01-09 14:53:07,035-0800 [ERROR] [pool-1-thread-2] [LogGenerator] invocation count=101,time=95840107816543,metric=6688916707300087716
I ran logstash with '-vv' flag and I don't see any "[ISO8601]" in the output.
Does anyone know where the invalid format is being introduced?
The Gist is available here.
I deleted my Elasticsearch installation (this was a test environment) and reinstalled and this started working again.
I suspect if I had deleted my index it would have solved the problem as well.