Making timestamps out of Kafka Connect Single Message Transformer - elasticsearch

I am moving data from Kafka to Elasticsearch and using Kafka connects SMT and more specificly TimeStampConverter . I fiddled around some with it and couldnt get it to output a Timestamp format.
When I used types "Date", "Time" or "Timestamp" as the values for transforms.TimestampConverter.target.type I couldnt get data into Elasticsearch. It was only until I set the value "string" there that it outputs the values into elasticsearch as date data type. This unfortunately means that I can only get the value by accuracy of a day.
Here is the transformer configs:
"transforms": "TimestampConverter",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.field": "UPDATED",
"transforms.TimestampConverter.format": "yyyy-MM-dd",
"transforms.TimestampConverter.target.type": "string"
Any known ways how to achieve this with more accurate timestamp? I tried all kinds of configurations alterin the target.typeand format fields
The UPDATED value is epoch bigint

Related

How to transform more than 1 field in Kafka Connect?

I am using a Kafka Connect Sink config to get data from a topic and persist to an Oracle DB. Works like a champ, and I'm doing a transformation on a timestamp column that comes in via an Avro schema as a long, and I then transform to an Oracle Timestamp column.
"transforms": "TimestampConverter",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.format": "mm/dd/yyyy HH:mm:ss",
"transforms.TimestampConverter.target.type": "Timestamp",
"transforms.TimestampConverter.field": "created_ts"
But, I can't figure out how to do this on multiple timestamps. That is, in addition to the created_ts, I also have an updated_ts I need to transform.
I tried this:
"transforms.TimestampConverter.field": "created_ts, updated_ts"
Does not work, nor can I repeat the whole block for the other field, because Connect only allows 1 same-named entry.
Lastly, I tried this:
"transforms.TimestampConverter.field.1": "created_ts",
"transforms.TimestampConverter.field.2": "updated_ts"
You would add 2 transforms
"transforms": "CreatedConverter,UpdatedConverter",
"transforms.CreatedConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value"
"transforms.CreatedConverter.field": "created_ts",
...
"transforms.UpdatedConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value"
"transforms.UpdatedConverter.field": "updated_ts"
...

Reindex Elasticsearch converting unixtime to date

I have an Elasticsearch index which uses the #timestamp field to store the date in a date field.
There are many records which are missing the #timestamp field, but have a timestamp field containing a unix timestamp. (Generated from PHP, so seconds, not milliseconds)
Note, the timestamp field is of date type, but numeric data seems to be stored there.
How can I use Painless script in a reindex and set #timestamp where it is missing, IF there is a numeric timestamp field with a unix timestamp?
Here's an example record that I would want to transform.
{
"_index": "my_log",
"_type": "doc",
"_id": "AWjEkbynNsX24NVXXmna",
"_score": 1,
"_source": {
"name": null,
"pid": "148651",
"timestamp": 1549486104
}
},
Did you have a look at the ingest module of Elasticsearch??
https://www.elastic.co/guide/en/elasticsearch/reference/current/date-processor.html
Parses dates from fields, and then uses the date or timestamp as the
timestamp for the document. By default, the date processor adds the
parsed date as a new field called #timestamp. You can specify a
different field by setting the target_field configuration parameter.
Multiple date formats are supported as part of the same date processor
definition. They will be used sequentially to attempt parsing the date
field, in the same order they were defined as part of the processor
definition.
It does exactly what you want :) In your reindex statement you can direct documents through this ingest processor.
If you need more help let me know, then I can jump behind a computer and help out :D

Access epoch from date field in groovy script in elastic search

My orignal question is here
https://discuss.elastic.co/t/access-the-epoch-of-the-date-type-doc-in-groovy-script/53129
I need to access the stored millis(as in index) of the date in groovy script. Is this possible
Orignal Question
As per my understanding and from here [understanding how elasticsearch stores dates internally (understanding how elasticsearch stores dates internally) elastic search stores the date internally as epoch format. Now consider I need to access this epoch in groovy script and out doc date format is date_optional_time. Now when I try to access it in groovy script it gives me the formatted date(as on the time of input). Is there a way to access the epoch time here.
I have come up with three thoughts
1) Convert the doc value to date and get millis in script,
2) Create a new field with copy_to that stores the date as epoch format
3)//or if possible directly access the epoch. but how?
Can some body guide me on this.I need epoch because I need to update some other field on basis of the epoch
e.g
Consider a mapping like this
{
"createdDate": {
"type": "date",
"store": true,
"format": "dateOptionalTime"
}
"modifiedDate": {
"type": "date",
"store": true,
"format": "dateOptionalTime"
}
"daysINBetween" : {
"type" : "long"
}
,
}
Now I need to run a script that stores (createdDate.millis - modifiedDate.millis) / (24 * 60 * 60 * 1000). I don't want to create a new object of date each time, that's why I am trying to access epoch in script

Elasticsearch is giving error with date on bulk insert

I am trying to insert records in Elasticsearch using bulk api and I am getting below error
"error": "MapperParsingException[failed to parse [created_date]]; nested: MapperParsingException[failed to parse date field [2015-07-18 13:00:22], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: \"2015-07-18 13:00:22\" is malformed at \" 13:00:22\"]; "
while I am passing below date
"created_date":"2015-07-18 13:00:22"
and below mapping is used
"created_date": {
"format": "yyyy-MM-DD HH:mm:ss",
"type": "date"
},
I can see that date is correct and mapping is also correct, error is giving for this particular record only and other records are inserted successfully. What could be the reason?
I doubt your mapping has been applied to the field you are expecting.
Logs says tried both date format [dateOptionalTime], and timestamp number with locale []
It does not say that it tries yyyy-MM-DD HH:mm:ss.
May be your created_date is another created_date field?
use "created_date":"2015-07-18T13:00:22"
It may help You

Logstash inserting dates as strings instead of dateOptionalTime

I have an Elasticsearch index with the following mapping:
"pickup_datetime": {
"type": "date",
"format": "dateOptionalTime"
}
Here is an example of a date contained in the file that is being read in
"pickup_datetime": "2013-01-07 06:08:51"
I am using Logstash to read and insert data into ES with the following lines to attempt to convert the date string into the date type.
date {
match => [ "pickup_datetime", "yyyy-MM-dd HH:mm:ss" ]
target => "pickup_datetime"
}
But the match never seems to occur.
What am I doing wrong?
It turns out the date filter was before the csv filter, where the columns get named, hence the date filter was not finding the pickup_datetime column since it had not yet been named.
It might be a good idea to clearly mention the sequentiality of the filters in the documentation to avoid others having similar problems in the future.

Resources