I see an exception when I try to index my logs into ElasticSearch (1.3.4). The root cause of the exception I see is the following (edited my initial post to provide the full stacktrace)
[2015-01-09 15:53:00,953][DEBUG][action.admin.indices.create] [perfgen04 1] [logaggr-2015.01.09] failed to create
org.elasticsearch.index.mapper.MapperParsingException: mapping [test]
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:386)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:328)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.IllegalArgumentException: Invalid format: [ISO8601]: Illegal pattern component: I
at org.elasticsearch.common.joda.Joda.forPattern(Joda.java:160)
at org.elasticsearch.common.joda.Joda.forPattern(Joda.java:37)
at org.elasticsearch.index.mapper.core.TypeParsers.parseDateTimeFormatter(TypeParsers.java:295)
at org.elasticsearch.index.mapper.core.DateFieldMapper$TypeParser.parse(DateFieldMapper.java:155)
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:289)
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:217)
at org.elasticsearch.index.mapper.object.RootObjectMapper$TypeParser.parse(RootObjectMapper.java:136)
at org.elasticsearch.index.mapper.DocumentMapperParser.parse(DocumentMapperParser.java:209)
at org.elasticsearch.index.mapper.DocumentMapperParser.parseCompressed(DocumentMapperParser.java:190)
at org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:440)
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:313)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:383)
... 5 more
Caused by: java.lang.IllegalArgumentException: Illegal pattern component: I
at org.elasticsearch.common.joda.time.format.DateTimeFormat.parsePatternTo(DateTimeFormat.java:570)
at org.elasticsearch.common.joda.time.format.DateTimeFormat.createFormatterForPattern(DateTimeFormat.java:693)
at org.elasticsearch.common.joda.time.format.DateTimeFormat.forPattern(DateTimeFormat.java:181)
at org.elasticsearch.common.joda.Joda.forPattern(Joda.java:158)
... 16 more
I am using logstash (1.4.2) to send my logs to ElasticSearch. My grok filter is pretty simple and is as follows. I am keeping the timestamp as a string "logts".
filter {
grok {
match => [ "message", "%{DATA:logts}%{SPACE}\[%{LOGLEVEL:level}%{SPACE}]%{SPACE}\[%{DATA:thread}]%{SPACE}\[%{DATA:classname}]%{SPACE}%{GREEDYDATA:details}" ]
}
}
A sample line from my log file is:
2015-01-09 14:53:07,035-0800 [ERROR] [pool-1-thread-2] [LogGenerator] invocation count=101,time=95840107816543,metric=6688916707300087716
I ran logstash with '-vv' flag and I don't see any "[ISO8601]" in the output.
Does anyone know where the invalid format is being introduced?
The Gist is available here.
I deleted my Elasticsearch installation (this was a test environment) and reinstalled and this started working again.
I suspect if I had deleted my index it would have solved the problem as well.
Related
I'm trying to insert something around 13M rows into a new table but I'm getting the following error:
22/12/09 19:33:56 ERROR Utils: Aborting task
java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
22/12/09 19:33:57 ERROR FileFormatWriter: Job job_202212091917352650741377131539872_0020 aborted.
22/12/09 19:33:57 ERROR Executor: Exception in task 0.1 in stage 20.0 (TID 26337)
org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:298)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
The insert operation is like the following:
insert overwrite table fake_table_txt partition(partition_name)
select id, name, type, description from ( inner query )
I'm a Hadoop beginner and I have no idea what may be causing this.
Could anybody please give me any direction?
After struggling a little, I was told that increasing the property "files per task" would do the trick.
set spark.sql.maxCreatedFilesPerTask = 15;
It was defaulted to 10 previously.
I'm working with ELS using the Scroll API and the following errors is persisting:
09:21:12.118] elastic-qlik-event-job [main] ERROR com.valemobi.ImportExportController - Não foi possivel recuperar pagina:
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1793)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1770)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1527)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
at
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://database.elasticsearch.*.prod:8088], URI [/_search/scroll], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"search_context_missing_exception","reason":"No search context found for id [15068451]"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":-1,"index":null,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [15068451]"}}],"caused_by":{"type":"search_context_missing_exception","reason":"No search context found for id [15068451]"}},"status":404}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:261)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
... 15 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=search_context_missing_exception, reason=No search context found for id [15068451]]
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:169)
... 18 common frames omitted
Does anybody have any idea what may be causing this and how do I fix it?
I am using kafka + spark streaming to stream messages and do analytics, then saving to phoenix. Some spark job fail several times per day with the following error message:
org.apache.phoenix.schema.IllegalDataException:
java.lang.IllegalArgumentException: Invalid format: ""
at org.apache.phoenix.util.DateUtil$ISODateFormatParser.parseDateTime(DateUtil.java:297)
at org.apache.phoenix.util.DateUtil.parseDateTime(DateUtil.java:163)
at org.apache.phoenix.util.DateUtil.parseTimestamp(DateUtil.java:175)
at org.apache.phoenix.schema.types.PTimestamp.toObject(PTimestamp.java:95)
at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:194)
at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:172)
at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:159)
at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:979)
at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:963)
at org.apache.phoenix.parse.BindParseNode.accept(BindParseNode.java:47)
at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:832)
at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578)
at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324)
at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245)
at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
at org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
at org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:39)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1113)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1251)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Invalid format: ""
at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:673)
at org.apache.phoenix.util.DateUtil$ISODateFormatParser.parseDateTime(DateUtil.java:295)
My code:
val myDF = sqlContext.createDataFrame(myRows, myStruct)
myDF.write
.format(sourcePhoenixSpark)
.mode("overwrite")
.options(Map("table" -> (myPhoenixNamespace + myTable), "zkUrl" -> myPhoenixZKUrl))
.save()
I am using phoenix-spark version 4.7.0-HBase-1.1. Any suggestion to solve the problem would be appreciated. Thanks
You are trying to process dirty data.
That error comes from here:
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/util/DateUtil.java#L301
Where it's trying to parse some string that is expected to be a Date in ISO format and the provided String is empty ("").
You need to prepare+clean your data before attempting to write it to storage.
I have been using a groovy Script as ScriptType.File. A part of my groovy Script looks like this.
def refApplicValues =_source.refApplicValue;
def lineNumbers = refApplicValues.tokenize('|');
Now Im migrating to ElasticSearch 5.2.1 which uses painless script.I have modified my script a bit to match painless syntax like:
def refApplicValues =params._source.refApplicValue;
def lineNumbers = refApplicValues.tokenize('|');
When i run my script now its throwing runtime error:
Caused by: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: ScriptException[runtime error]; nested: IllegalArgumentException[Unable to find dynamic method [tokenize] with [1] arguments for class [java.lang.String].];
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:405)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:106)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:246)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:360)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:322)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:319)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:610)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:596)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Its telling me that i cant use tokenize . Is there any relevant functionality that can be used instead?
You can use a StringTokenizer in painless
I'm stuck with an Exception while running MapReduce.
I'm using Elasticsearch 2.1 and Elasticsearch Hadoop 2.2.0
My Problem
The type of f1 is byte
$ curl -XGET http://hostname:9200/index-name/?pretty
...
"f1": {
"type": "byte"
}
...
One of documents has value 20 on f1 field.
$ curl -XGET http://hostname:9200/index-name/type-name/doc-id?pretty
...
"f1": 20
...
But I made a mistake like this:
$ curl -XPOST http://hostname:9200/index-name/type-name/doc-id/_update -d '
{
"script": "ctx._source.f1 += \"10\";",
"upsert": {
"f1": 20
}
}'
Now, f1 became 2010 which does not fit in byte
$ curl -XGET http://hostname:9200/index-name/type-name/doc-id?pretty
...
"f1": "2010"
...
Finally, ES Hadoop throws the NumberFormatException
INFO mapreduce.Job: Task Id : attempt_1454640755387_0404_m_000020_2, Status : FAILED
Error: org.elasticsearch.hadoop.rest.EsHadoopParsingException: Cannot parse value [2010] for field [f1]
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:701)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:794)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:692)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:457)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:382)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:277)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:250)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:456)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:298)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:232)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NumberFormatException: Value out of range. Value:"2030" Radix:10
at java.lang.Byte.parseByte(Byte.java:150)
at java.lang.Byte.parseByte(Byte.java:174)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.parseByte(JdkValueReader.java:333)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.byteValue(JdkValueReader.java:325)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.readValue(JdkValueReader.java:67)
at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:714)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:699)
... 21 more
What I want is ...
I want to ignore malformed document which throws NumberFormat Exception and want to continue MapReduce.
What I did is ...
According to SO Answer, I sorrounded Mapper.map() method with try-catch block. But it didn't help me.
Thanks.
The author of Elasticsearch Hadoop said that:
ES-Hadoop is not a mapper - rather in M/R is available as an Input/OutputFormat. The issue is not the mapper but rather the data that is sent to ES.
ES-Hadoop currently has no option to ignore errors as it is fail-fast - if something goes wrong, it bails out right away.
You can however filter the incorrect data before it reaches ES.
Refer to: https://discuss.elastic.co/t/how-to-prevent-from-exiting-mapreduce-job-when-an-exception-throwed-in-elasitcsearch-hadoop/43783