Error Invalid format: [ISO8601] when indexing logs in ElasticSearch - elasticsearch

I see an exception when I try to index my logs into ElasticSearch (1.3.4). The root cause of the exception I see is the following (edited my initial post to provide the full stacktrace)
[2015-01-09 15:53:00,953][DEBUG][action.admin.indices.create] [perfgen04 1] [logaggr-2015.01.09] failed to create
org.elasticsearch.index.mapper.MapperParsingException: mapping [test]
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:386)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:328)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.IllegalArgumentException: Invalid format: [ISO8601]: Illegal pattern component: I
at org.elasticsearch.common.joda.Joda.forPattern(Joda.java:160)
at org.elasticsearch.common.joda.Joda.forPattern(Joda.java:37)
at org.elasticsearch.index.mapper.core.TypeParsers.parseDateTimeFormatter(TypeParsers.java:295)
at org.elasticsearch.index.mapper.core.DateFieldMapper$TypeParser.parse(DateFieldMapper.java:155)
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:289)
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:217)
at org.elasticsearch.index.mapper.object.RootObjectMapper$TypeParser.parse(RootObjectMapper.java:136)
at org.elasticsearch.index.mapper.DocumentMapperParser.parse(DocumentMapperParser.java:209)
at org.elasticsearch.index.mapper.DocumentMapperParser.parseCompressed(DocumentMapperParser.java:190)
at org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:440)
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:313)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:383)
... 5 more
Caused by: java.lang.IllegalArgumentException: Illegal pattern component: I
at org.elasticsearch.common.joda.time.format.DateTimeFormat.parsePatternTo(DateTimeFormat.java:570)
at org.elasticsearch.common.joda.time.format.DateTimeFormat.createFormatterForPattern(DateTimeFormat.java:693)
at org.elasticsearch.common.joda.time.format.DateTimeFormat.forPattern(DateTimeFormat.java:181)
at org.elasticsearch.common.joda.Joda.forPattern(Joda.java:158)
... 16 more
I am using logstash (1.4.2) to send my logs to ElasticSearch. My grok filter is pretty simple and is as follows. I am keeping the timestamp as a string "logts".
filter {
grok {
match => [ "message", "%{DATA:logts}%{SPACE}\[%{LOGLEVEL:level}%{SPACE}]%{SPACE}\[%{DATA:thread}]%{SPACE}\[%{DATA:classname}]%{SPACE}%{GREEDYDATA:details}" ]
}
}
A sample line from my log file is:
2015-01-09 14:53:07,035-0800 [ERROR] [pool-1-thread-2] [LogGenerator] invocation count=101,time=95840107816543,metric=6688916707300087716
I ran logstash with '-vv' flag and I don't see any "[ISO8601]" in the output.
Does anyone know where the invalid format is being introduced?
The Gist is available here.

I deleted my Elasticsearch installation (this was a test environment) and reinstalled and this started working again.
I suspect if I had deleted my index it would have solved the problem as well.

Related

Hadoop Spark SQL insert failing

I'm trying to insert something around 13M rows into a new table but I'm getting the following error:
22/12/09 19:33:56 ERROR Utils: Aborting task
java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
22/12/09 19:33:57 ERROR FileFormatWriter: Job job_202212091917352650741377131539872_0020 aborted.
22/12/09 19:33:57 ERROR Executor: Exception in task 0.1 in stage 20.0 (TID 26337)
org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:298)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
The insert operation is like the following:
insert overwrite table fake_table_txt partition(partition_name)
select id, name, type, description from ( inner query )
I'm a Hadoop beginner and I have no idea what may be causing this.
Could anybody please give me any direction?
After struggling a little, I was told that increasing the property "files per task" would do the trick.
set spark.sql.maxCreatedFilesPerTask = 15;
It was defaulted to 10 previously.

What does mean the ElasticSearch exceptions: "all shards failed" and "No search context found for id"?

I'm working with ELS using the Scroll API and the following errors is persisting:
09:21:12.118] elastic-qlik-event-job [main] ERROR com.valemobi.ImportExportController - Não foi possivel recuperar pagina:
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1793)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1770)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1527)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
at
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://database.elasticsearch.*.prod:8088], URI [/_search/scroll], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"search_context_missing_exception","reason":"No search context found for id [15068451]"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":-1,"index":null,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [15068451]"}}],"caused_by":{"type":"search_context_missing_exception","reason":"No search context found for id [15068451]"}},"status":404}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:261)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
... 15 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=search_context_missing_exception, reason=No search context found for id [15068451]]
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:169)
... 18 common frames omitted
Does anybody have any idea what may be causing this and how do I fix it?

IllegalDataException from DateUtil.java when saving spark streaming dataframe to phoenix

I am using kafka + spark streaming to stream messages and do analytics, then saving to phoenix. Some spark job fail several times per day with the following error message:
org.apache.phoenix.schema.IllegalDataException:
java.lang.IllegalArgumentException: Invalid format: ""
at org.apache.phoenix.util.DateUtil$ISODateFormatParser.parseDateTime(DateUtil.java:297)
at org.apache.phoenix.util.DateUtil.parseDateTime(DateUtil.java:163)
at org.apache.phoenix.util.DateUtil.parseTimestamp(DateUtil.java:175)
at org.apache.phoenix.schema.types.PTimestamp.toObject(PTimestamp.java:95)
at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:194)
at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:172)
at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:159)
at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:979)
at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:963)
at org.apache.phoenix.parse.BindParseNode.accept(BindParseNode.java:47)
at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:832)
at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578)
at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324)
at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245)
at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
at org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
at org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:39)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1113)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1251)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Invalid format: ""
at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:673)
at org.apache.phoenix.util.DateUtil$ISODateFormatParser.parseDateTime(DateUtil.java:295)
My code:
val myDF = sqlContext.createDataFrame(myRows, myStruct)
myDF.write
.format(sourcePhoenixSpark)
.mode("overwrite")
.options(Map("table" -> (myPhoenixNamespace + myTable), "zkUrl" -> myPhoenixZKUrl))
.save()
I am using phoenix-spark version 4.7.0-HBase-1.1. Any suggestion to solve the problem would be appreciated. Thanks
You are trying to process dirty data.
That error comes from here:
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/util/DateUtil.java#L301
Where it's trying to parse some string that is expected to be a Date in ISO format and the provided String is empty ("").
You need to prepare+clean your data before attempting to write it to storage.

Migration from groovy script to painless scriptin ElasticSearch 5.2.1

I have been using a groovy Script as ScriptType.File. A part of my groovy Script looks like this.
def refApplicValues =_source.refApplicValue;
def lineNumbers = refApplicValues.tokenize('|');
Now Im migrating to ElasticSearch 5.2.1 which uses painless script.I have modified my script a bit to match painless syntax like:
def refApplicValues =params._source.refApplicValue;
def lineNumbers = refApplicValues.tokenize('|');
When i run my script now its throwing runtime error:
Caused by: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: ScriptException[runtime error]; nested: IllegalArgumentException[Unable to find dynamic method [tokenize] with [1] arguments for class [java.lang.String].];
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:405)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:106)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:246)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:360)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:322)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:319)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:610)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:596)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Its telling me that i cant use tokenize . Is there any relevant functionality that can be used instead?
You can use a StringTokenizer in painless

How to prevent from exiting MapReduce job when an Exception throwed in Elasitcsearch Hadoop

I'm stuck with an Exception while running MapReduce.
I'm using Elasticsearch 2.1 and Elasticsearch Hadoop 2.2.0
My Problem
The type of f1 is byte
$ curl -XGET http://hostname:9200/index-name/?pretty
...
"f1": {
"type": "byte"
}
...
One of documents has value 20 on f1 field.
$ curl -XGET http://hostname:9200/index-name/type-name/doc-id?pretty
...
"f1": 20
...
But I made a mistake like this:
$ curl -XPOST http://hostname:9200/index-name/type-name/doc-id/_update -d '
{
"script": "ctx._source.f1 += \"10\";",
"upsert": {
"f1": 20
}
}'
Now, f1 became 2010 which does not fit in byte
$ curl -XGET http://hostname:9200/index-name/type-name/doc-id?pretty
...
"f1": "2010"
...
Finally, ES Hadoop throws the NumberFormatException
INFO mapreduce.Job: Task Id : attempt_1454640755387_0404_m_000020_2, Status : FAILED
Error: org.elasticsearch.hadoop.rest.EsHadoopParsingException: Cannot parse value [2010] for field [f1]
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:701)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:794)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:692)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:457)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:382)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:277)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:250)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:456)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:298)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:232)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NumberFormatException: Value out of range. Value:"2030" Radix:10
at java.lang.Byte.parseByte(Byte.java:150)
at java.lang.Byte.parseByte(Byte.java:174)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.parseByte(JdkValueReader.java:333)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.byteValue(JdkValueReader.java:325)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.readValue(JdkValueReader.java:67)
at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:714)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:699)
... 21 more
What I want is ...
I want to ignore malformed document which throws NumberFormat Exception and want to continue MapReduce.
What I did is ...
According to SO Answer, I sorrounded Mapper.map() method with try-catch block. But it didn't help me.
Thanks.
The author of Elasticsearch Hadoop said that:
ES-Hadoop is not a mapper - rather in M/R is available as an Input/OutputFormat. The issue is not the mapper but rather the data that is sent to ES.
ES-Hadoop currently has no option to ignore errors as it is fail-fast - if something goes wrong, it bails out right away.
You can however filter the incorrect data before it reaches ES.
Refer to: https://discuss.elastic.co/t/how-to-prevent-from-exiting-mapreduce-job-when-an-exception-throwed-in-elasitcsearch-hadoop/43783

Resources