Timestamp mapping in Spark to Elasticsearch - elasticsearch

I am writing logs using Spark to elasticsearch.Logs are in JSON format having timestamp field.
example { "timestamp": "2016-11-02 21:16:06.116" }
When I write the Json logs to Elastic index, timestamp is analysed as String instead of date. I tried setting the property in sparkconf using sparkConf.set("es.mapping.timestamp", "timestamp") but it throws following error at runtime : org.apache.spark.util.TaskCompletionListenerException: failed to parse timestamp [2016-11-03 15:46:55.1155]

you can change timestamp data format
2016-11-02 21:16:06.116 -> 2016-11-02T21:16:06.116
i using 2016-11-02T21:16:06.116 insert to Elastic is work
type properties
"create_time": {
"format": "strict_date_optional_time||epoch_millis",
"type": "date"

Related

How to save nested JSON data into Elasticsearch from Apache Nifi?

When I try to insert geo_shape type entry into Elasticsearch from Apache Nifi. This is just a nested JSON field - for eaxmple, in Apache Nifi my FlowFile have this nested content for geo_shape:
{
"location": "{\"type\":\"polygon\",\"coordinates\":[[[3.042514,41.79673582],[3.04182089,41.79738937],[3.04299467,41.79763732],[3.042514,41.79673582]]]}"
}
In Elasticsearch the field is specified as follows:
"location": {
"type": "geo_shape"
}
When I execute PutElasticsearch1.3, I get the following error:
MapperParsingException failed to parse [location]: nested - shape must
be an object consisting of type and coordinates
How can I parse this nested JSON string in order to save it in Elasticsearch from Apache Nifi?

Can we use a Unix Timestamp as _timestamp field in elasticsearch

When we create time-based indices, elasticsearch/kibana need a field named "_timestamp".
I found that this field should be a string.
But in my log, Unix Timestamp is a necessary segment.
Yes you can store unix timestamp in Date type fields. But make sure you use proper format like epoch_millis for timestamp in millis and epoch_second for timestamp in seconds.
Example mapping for timestamp field which stores unix timestamp in seconds.
PUT my-index
{
"mappings": {
"my-type": {
"properties": {
"timestamp": {
"type": "date",
"format": "epoch_second"
}
}
}
}
}
You can find more information here

any elasticsearch datatype matching decimal timestamp?

I have a timestamp from log file like {"ts" : "1486418325.948487"}
My infrastructure are "filebeat" 5.2.0 --> "elasticsearch" 5.2
I tried mapping the "ts" to "date" -- "epoch_second" but es writing failed in filebeat.
PUT /auth_auditlog
{
"mappings": {
"auth-auditlogs": {
"properties": {
"ts": {
"type": "date",
"format": "epoch_second"
}
}
}
}
}
The filebeat error msg like
WARN Can not index event (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse [ts]","caused_by":{"type":"illegal_argument_exception","reason":"Invalid format: \"1486418325.948487\""}}
I tried use "1486418325" is ok so I guess es doesn't accept decimal format timestamp. However, python default output timestamp is this format.
My purpose is to type correctly in elasticsearch. I want to use ts as a original timestamp in elasticsearch.
Any solution is welcome except to change the original log data!
Filebeat doesn't have a processor for this type of stuff. You can't replace the #timestamp with the one your log has in Filebeat. What you can do, is send that stuff to logstash and let the date filter parse epoch.
date {
match => ["timestamp","UNIX","UNIX_MS"]
}
The other option would be to use a ingest node. Although I haven't used this myself, it seems it is also able to do the job. Check out the docs here.

ElasticSearch Mapping: is it possible to auto-truncate a date to fit it's format?

On our project we're using NEST to insert data into ElasticSearch (1.7). We'd like to be able to force ES to truncate all dates towards the mapped format.
Mapping example:
"dateFrom" : {
"type": "date",
"format": "dateHourMinute" // Or yyyy-MM-dd'T'HH:mm
}
Data example:
{
"dateFrom" : 2015-12-21T15:55:00.000Z
}
Inserting this data throws an IllegalArgumentException:
Invalid format: "2015-12-21T15:55:00.000Z" is malformed at ":00.000Z"
Obviously we don't need the last part of the date. Can't we configure ES to just truncate it instead of erroring out?
Keep in mind we're using 1.7 right now, since date formatting seems to have changed in recent versions...
In order to get the data to index correctly I could change the data type to date_optional_time (supported in 1.7)
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"date": {
"type": "date",
"format": "date_optional_time"
}
}
}
}
}
This will allow you to contribute date with time being optional.
such as:
PUT /my_index/my_type/1
{
"date": "2015-12-21"
}
or as you have it
PUT /my_index/my_type/2
{
"date": "2015-12-21T15:55:00.000Z"
}
Both are now valid submissions. I don't know of any transformation approaches within ES to support a truncation or transformation of field data at time of index. I would think if you want to parse the data and remove the time pre-submission you will need to do that outside of ES when you create the JSON object.
It appears ES is currently not capable of editing dates through a custom mapping. We ended up using JsonConverters (like this) to drop seconds and millis before inserting them into ES.

how do you transform a date that's storred as a type long (epoch time) into a dateOptionalTime in Elasticsearch?

I have a field in my database that's stored as Epoch time, which is a long. I'm trying to get Elasticsearch to recognize this field as what it actually is: a date. Once indexed by Elasticsearch, I want it to be of type dateOptionalTime.
My thinking is that I need to apply a transform to convert the Epoch long into a string date.
On my index, I have a mapping that specifies the type for my field as date with a format of dateOptionalTime. Finally, this timestamp is in all of my docs, so I've added my (attempted) mapping to _default_.
The code:
'_default_', {
'_all': {'enabled': 'true'},
'dynamic_templates': [
{ "date_fixer" : {
"match": "my_timestamp",
"mapping": {
"transform": {
"script": "ctx._source['my_timestamp'] = new Date(ctx._source['my_timestamp']).toString()"
},
'type': "date",
}
}
}]
}
I'm brand new to Elastic, so I'll walk through what I think is happening.
I'm setting this to type _default_ which will apply this to all new types Elastic encounters.
I set _all to enabled. I want Elastic to use the default mapping for all types with the exception of my timestamp field.
finally, I add my dynamic template that (a) converts the long into a date, and (b) applies a mapping to the timestamp field explicitly saying that it is a date
The Problem:
When I try to add my data to the index, I get the following exception.
TransportError(400, u'MapperParsingException[failed to parse [my_timestamp]]; nested: ElasticsearchIllegalArgumentException[unknown property [$date]]; ')
My data looks like:
{
"timestamp": 8374747594,
"owner": "text",
"some_more": {
"key": "val",
"key": "val"
}
}

Resources