Kafka Elasticsearch Connector Timestamps - elasticsearch

I can see this has been discussed a few times here for instance but I think the solutions are out of date due to breaking changes in Elasticsearch.
I'm trying to convert a long/epoch field in my Json in my Kafka topic to an Elasticsearch date type which is pushed through the connector.
When I try to add a dynamic mapping, my Kafka connect updates fail because Im trying to apply two mappings to a field, _doc and kafkaconnect. This was a breaking change around version 6 I believe where you can only have one mapping per index.
{
"index_patterns": [ "depart_details" ],
"mappings": {
"dynamic_templates": [
{
"scheduled_to_date": {
"match": "scheduled",
"mapping": {
"type": "date"
}
}
}
]
}}
I've now focussed on trying to translate the message at source in the connector by changing the field to a timestamp, time or date.
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.field" : "scheduled",
"transforms.TimestampConverter.target.type": "Timestamp"
However, any messages I try to send through this transformer fail with
Caused by: org.apache.kafka.connect.errors.DataException: Java class class java.util.Date does not have corresponding schema type.
at org.apache.kafka.connect.json.JsonConverter.convertToJson(JsonConverter.java:604)
at org.apache.kafka.connect.json.JsonConverter.convertToJson(JsonConverter.java:668)
at org.apache.kafka.connect.json.JsonConverter.convertToJsonWithoutEnvelope(JsonConverter.java:574)
at org.apache.kafka.connect.json.JsonConverter.fromConnectData(JsonConverter.java:324)
at io.confluent.connect.elasticsearch.DataConverter.getPayload(DataConverter.java:181)
at io.confluent.connect.elasticsearch.DataConverter.convertRecord(DataConverter.java:163)
at io.confluent.connect.elasticsearch.ElasticsearchWriter.tryWriteRecord(ElasticsearchWriter.java:285)
at io.confluent.connect.elasticsearch.ElasticsearchWriter.write(ElasticsearchWriter.java:270)
at io.confluent.connect.elasticsearch.ElasticsearchSinkTask.put(ElasticsearchSinkTask.java:169)
Seems like a really common thing to need to do, but I don't see how to get a date or time field into Elastic through this connector in version 7?

The Confluent documentation states that the ES connector is currently not supported with ES 7.
According to this issue, it might suffice to change type.name=kafkaconnect to type.name=_doc in your connector configuration.

Related

Kafka connect ElasticSearch sink - using if-else blocks to extract and transform fields for different topics

I have a kafka es sink properties file like the following
name=elasticsearch.sink.direct
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=16
topics=data.my_setting
connection.url=http://dev-elastic-search01:9200
type.name=logs
topic.index.map=data.my_setting:direct_my_setting_index
batch.size=2048
max.buffered.records=32768
flush.timeout.ms=60000
max.retries=10
retry.backoff.ms=1000
schema.ignore=true
transforms=InsertKey,ExtractId
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=MY_SETTING_ID
transforms.ExtractId.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractId.field=MY_SETTING_ID
This works perfectly for a single topic (data.my_setting). I would like to use the same connector for data coming in from more than one topic. A message in a different topic will have a different key which I'll need to transform.I was wondering if there's a way to use if else statements with a condition on the topic name or on a single field in the message such that I can then transform the key differently. All the incoming messages are json with schema and payload.
UPDATE based on the answer:
In my jdbc connector I add the key as follows:
name=data.my_setting
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
poll.interval.ms=500
tasks.max=4
mode=timestamp
query=SELECT * FROM MY_TABLE with (nolock)
timestamp.column.name=LAST_MOD_DATE
topic.prefix=investment.ed.data.app_setting
transforms=ValueToKey
transforms.ValueToKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.ValueToKey.fields=MY_SETTING_ID
I still however get the error when a message produced from this connector is read by elasticsearch sink
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
Caused by: org.apache.kafka.connect.errors.DataException: STRUCT is not supported as the document id
The payload looks like this:
{
"schema": {
"type": "struct",
"fields": [{
"type": "int32",
"optional": false,
"field": "MY_SETTING_ID"
}, {
"type": "string",
"optional": true,
"field": "MY_SETTING_NAME"
}
],
"optional": false
},
"payload": {
"MY_SETTING_ID": 9,
"MY_SETTING_NAME": "setting_name"
}
}
Connect standalone property file looks like this:
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/apps/{env}/logs/infrastructure/offsets/connect.offsets
rest.port=8084
plugin.path=/usr/share/java
Is there a way to achieve my goal which is to have messages from multiple topics (in my case db tables) which will have their own unique ids (which will also be the id of a document in ES) be sent to a single ES sink.
Can I use avro for this task. Is there a way to define the key in schema registry or will I run into the same problem?
This isn't possible. You'd need multiple Connectors if the key fields are different.
One option to think about is pre-processing your Kafka topics through a stream processor (e.g. Kafka Streams, KSQL, Spark Streaming etc etc) to standardise the key fields, so that you can then use a single connector. It depends what you're building as to whether this would be worth doing, or overkill.

Specifying Field Types Indexing from Logstash to Elasticsearch

I have successfully ingested data using the XML filter plugin from Logstash to Elasticsearch, however all the field types are of the type "text."
Is there a way to manually or automatically specify the correct type?
I found the following technique good for my use:
Logstash would filter the data and change a field from the default - text to whatever form you want. The documentation would be found here. The example given in the documentation is:
filter {
mutate {
convert => { "fieldname" => "integer" }
}
}
This you add in the /etc/logstash/conf.d/02-... file in the body part. I believe the downside of this practice is that from my understanding it is less recommended to alter data entering the ES.
After you do this you will probably get the this problem. If you have this problem and your DB is a test DB that you can erase all old data just DELETE the index until now that there would not be a conflict (for example you have a field that was until now text and now it is received as date there would be a conflict between old and new data). If you can't just erase the old data then read into the answer in the link I linked.
What you want to do is specify a mapping template.
PUT _template/template_1
{
"index_patterns": ["te*", "bar*"],
"settings": {
"number_of_shards": 1
},
"mappings": {
"type1": {
"_source": {
"enabled": false
},
"properties": {
"host_name": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z YYYY"
}
}
}
}
}
Change the settings to match your needs such as listing the properties to map what you want them to map to.
Setting index_patterns is especially important because it tells elastic how to apply this template. You can set an array of index patterns and can use * as appropriate for wildcards. i.e logstash's default is to rotate by date. They will look like logstash-2018.04.23 so your pattern could be logstash-* and any that match the pattern will receive the template.
If you want to match based on some pattern, then you can use dynamic templates.
Edit: Adding a little update here, if you want logstash to apply the template for you, here is a link to the settings you'll want to be aware of.

any elasticsearch datatype matching decimal timestamp?

I have a timestamp from log file like {"ts" : "1486418325.948487"}
My infrastructure are "filebeat" 5.2.0 --> "elasticsearch" 5.2
I tried mapping the "ts" to "date" -- "epoch_second" but es writing failed in filebeat.
PUT /auth_auditlog
{
"mappings": {
"auth-auditlogs": {
"properties": {
"ts": {
"type": "date",
"format": "epoch_second"
}
}
}
}
}
The filebeat error msg like
WARN Can not index event (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse [ts]","caused_by":{"type":"illegal_argument_exception","reason":"Invalid format: \"1486418325.948487\""}}
I tried use "1486418325" is ok so I guess es doesn't accept decimal format timestamp. However, python default output timestamp is this format.
My purpose is to type correctly in elasticsearch. I want to use ts as a original timestamp in elasticsearch.
Any solution is welcome except to change the original log data!
Filebeat doesn't have a processor for this type of stuff. You can't replace the #timestamp with the one your log has in Filebeat. What you can do, is send that stuff to logstash and let the date filter parse epoch.
date {
match => ["timestamp","UNIX","UNIX_MS"]
}
The other option would be to use a ingest node. Although I haven't used this myself, it seems it is also able to do the job. Check out the docs here.

Logstash upper case mapping names

I've tried to replace my old import to logstash. Everything works fine but logstash will always push the new data with lowercase property names.
Is there a way to push the data via logstash with uppercase property names into my elastic search?
...otherwise I have to rewrite some applications. :(
SAMPLE
My indexed document.
{
"ID": 1,
"NAME": "DEMO"
}
Logstash index document
{
"id": 1,
"name": "DEMO"
}
Many thanks!
Should be able to use the rename option of the mutate filter.

Timestamps and documents are not detected in Kibana

I am using Elasticsearch for a while, and wanted to visualize the data with Kibana. Since I have time-series data, I created a---from my point of view---suitable timestamp field in the corresponding index. The relevant part of the index mappings is as follows:
[..]
"properties": {
"#timestamp": {
"enabled" : true,
"type":"date",
"format": "date_hour_minute_second_millis",
"store": true,
"path": "#timestamp"
},
[..]
I have played around with the "format" field-value, because I want to visualize data having millisecond resolution. Ideally, I would just like to use the raw timestamp from my application (i.e. Unix epoch time, fractional in seconds), but I couldn't get Kibana to detect that format. Currently, I am posting data as follows:
{
"#timestamp": "2015-03-10T14:37:42.644",
"name": "some counter",
"value": 91.76
}
Kibana detects the #timestamp field as a timestamp, but then tells me that it cannot find any documents stored having that field (which is not true):
This field is present in your elasticsearch mapping but not in any documents in the search results. You may still be able to visualize or search on it.
I should note that previously, I used "dateOptionalTime" as format for the timestamp, and everything was working fine in Kibana using "simple" timestamps. I need, however, to switch to milliseconds now.
Cheers!
I was struggling with this highly ambiguous problem as well. No matter how I changed the mapping, Kibana simply would not display certain fields.
Then I found out it has something to do with JVM Heap size and Kibana being unable to display every field 'in-memory'.
Setting "doc_value" to true for those fields during mapping fixed the issue.
In the Java API, I'm building the following Mapping that includes the doc_values option
XContentFactory.jsonBuilder()
.startObject()
.startObject("#memory-available")
.field("type", "integer")
.field("index","not_analyzed")
.field("doc_values", true)
.endObject()
.endObject();
Read More here : Doc Values in ES

Resources