illegal_argument_exception while using kafka connector to push to elasticsearch - elasticsearch

I am trying to push logs from kafka tops to elasticsearch.
My message in kafka:
{
"#timestamp": 1589549688.659166,
"log": "13:34:48.658 [pool-2-thread-1] DEBUG health check success",
"stream": "stdout",
"time": "2020-05-15T13:34:48.659166158Z",
"pod_name": "my-pod-789f8c85f4-mt62l",
"namespace_name": "services",
"pod_id": "600ca012-91f5-XXXX-XXXX-XXXXXXXXXXX",
"host": "ip-192-168-88-59.ap-south-1.compute.internal",
"container_name": "my-pod",
"docker_id": "XXXXXXXXXXXXXXXXX1435bb2870bfc9d20deb2c483ce07f8e71ec",
"container_hash": "myregistry",
"labelpod-template-hash": "9tignfe9r",
"labelsecurity.istio.io/tlsMode": "istio",
"labelservice": "my-pod",
"labelservice.istio.io/canonical-name": "my-pod",
"labelservice.istio.io/canonical-revision": "latest",
"labeltype": "my-pod",
"annotationkubernetes.io/psp": "eks.privileged",
"annotationsidecar.istio.io/status": "{\"version\":\"58dc8b12bb311f1e2f46fd56abfe876ac96a38d7ac3fc6581af3598ccca7522f\"}"
}
This is my connector config:
{
"name": "logs",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"connection.url": "http://es:9200",
"connection.username": "username",
"connection.password": "password",
"tasks.max": "10",
"topics": "my-pod",
"name": "logs",
"type.name": "_doc",
"schema.ignore": "true",
"key.ignore": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "routeTS",
"transforms.routeTS.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.routeTS.topic.format": "${topic}-${timestamp}",
"transforms.routeTS.timestamp.format": "YYYYMMDD"
}
}
This is the error i'm getting
cp-kafka-connect-server [2020-05-15 13:30:59,083] WARN Failed to execute batch 4830 of 18 records with attempt 4/6, will attempt retry after 539 ms. Failure reason: Bulk request failed: [{"type":"illegal_argument_exception","reason":"mapper [labelservice] of different type, current_type [text], merged_type [ObjectMapper]"}
I haven't created any mapping beforehand. I'm depending on the connector to create the index.
This is the mapping I have in es which is autocreated.
{
"mapping": {}
}

The error message is clear
reason":"mapper [labelservice] of different type, current_type [text],
merged_type [ObjectMapper]"
It means in your index mapping labelservice is defined as text but you are sending below data in labelservice field:
"labelservice": "my-pod",
"labelservice.istio.io/canonical-name": "my-pod",
"labelservice.istio.io/canonical-revision": "latest",
This is the format of object type in Elasticsearch, now there is a mismatch in the data-type which caused the error message.
You need to change your mapping and define labelservice as object to make it work. Refer object datatype in Elasticsearch for more info.

Related

dropfield transform sink connector: (STRUCT) type doesn't have a mapping to the SQL database column type

I created a sink connector from kafka to mysql.
After transform in sink connector's config and deleting some columns I get this error whereas whithout transform it works:
(STRUCT) type doesn't have a mapping to the SQL database column type
{
"name": "mysql-conf-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "3",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"topics": "mysql.cars.prices",
"transforms": "dropPrefix,unwrap",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex": "mysql.cars.prices",
"transforms.dropPrefix.replacement": "prices",
"transforms.timestamp.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.timestamp.target.type": "Timestamp",
"transforms.timestamp.field": "date_time",
"transforms.timestamp.format": "yyyy-MM-dd HH:mm:ss",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"connection.url": "jdbc:mysql://localhost:3306/product",
"connection.user": "kafka",
"connection.password": "123456",
"transforms": "ReplaceField",
"transforms.ReplaceField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.ReplaceField.blacklist": "id, brand",
"insert.mode": "insert",
"auto.create": "true",
"auto.evolve": "true",
"batch.size": 50000
}
}
You have put "transforms" key more than once in your JSON, which isn't valid.
Try with one entry
"transforms": "unwrap,ReplaceField,dropPrefix",
You are getting the error because you have overrode the value, and unwrap, specifically, is no longer called, so you have nested Structs.
The blacklist property got renamed to exclude, by the way - https://docs.confluent.io/platform/current/connect/transforms/replacefield.html#properties

Kafka connect Jdbc sink connector not auto creating tables

I am using docker images of kafka and kafka connect to test cdc using debezium, and the database is a standalone one
My sink connector config json looks like this,
{
"name": "jdbc-sink-test-oracle",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"dialect.name": "OracleDatabaseDialect",
"table.name.format": "TEST",
"topics": "oracle-db-source.DBZ_SRC.TEST",
"connection.url": "jdbc:oracle:thin:#hostname:1521/DB",
"connection.user": "DBZ_TARGET",
"connection.password": "DBZ_TARGET",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"auto.create": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode": "record_key"
}
}
and my source connector config json looks like this,
{
"name": "test-source-connector",
"config": {
"connector.class" : "io.debezium.connector.oracle.OracleConnector",
"tasks.max" : "1",
"database.server.name" : "oracle-db-source",
"database.hostname" : "hostname",
"database.port" : "1521",
"database.user" : "clogminer",
"database.password" : "clogminer",
"database.dbname" : "DB",
"database.oracle.version": "19",
"database.history.kafka.bootstrap.servers" : "kafka:9092",
"database.history.kafka.topic": "schema-changes.DBZ_SRC",
"database.connection.adapter": "logminer",
"table.include.list" : "DBZ_SRC.TEST",
"database.schema": "DBZ_SRC",
"errors.log.enable": "true",
"snapshot.lock.timeout.ms":"5000",
"include.schema.changes": "true",
"snapshot.mode":"initial",
"decimal.handling.mode": "double"
}
}
and I am getting this error for the above configurations,
Error : 942, Position : 11, Sql = merge into "TEST" using (select :1 "ID", :2 "NAME", :3 "DESCRIPTION", :4 "WEIGHT" FROM dual) incoming on("TEST"."ID"=incoming."ID") when matched then update set "TEST"."NAME"=incoming."NAME","TEST"."DESCRIPTION"=incoming."DESCRIPTION","TEST"."WEIGHT"=incoming."WEIGHT" when not matched then insert("TEST"."NAME","TEST"."DESCRIPTION","TEST"."WEIGHT","TEST"."ID") values(incoming."NAME",incoming."DESCRIPTION",incoming."WEIGHT",incoming."ID"), OriginalSql = merge into "TEST" using (select ? "ID", ? "NAME", ? "DESCRIPTION", ? "WEIGHT" FROM dual) incoming on("TEST"."ID"=incoming."ID") when matched then update set "TEST"."NAME"=incoming."NAME","TEST"."DESCRIPTION"=incoming."DESCRIPTION","TEST"."WEIGHT"=incoming."WEIGHT" when not matched then insert("TEST"."NAME","TEST"."DESCRIPTION","TEST"."WEIGHT","TEST"."ID") values(incoming."NAME",incoming."DESCRIPTION",incoming."WEIGHT",incoming."ID"), Error Msg = ORA-00942: table or view does not exist
at io.confluent.connect.jdbc.sink.JdbcSinkTask.getAllMessagesException(JdbcSinkTask.java:150)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:102)
... 11 more
2022-07-28 15:01:58,644 ERROR || WorkerSinkTask{id=jdbc-sink-test-oracle-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask]
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.ConnectException: java.sql.SQLException: Exception chain:
java.sql.BatchUpdateException: ORA-00942: table or view does not exist
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
I believe according to the configurations I gave, the table needs to be auto created but it says table doesnt exist.
But, it is working fine and is auto creating the table named 'TEST2' and is also exporting the data from source to this table for this sink connector configuration
{
"name": "jdbc-sink-test2-oracle",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"dialect.name": "OracleDatabaseDialect",
"table.name.format": "TEST2",
"topics": "oracle-db-source.DBZ_SRC.TEST",
"connection.url": "jdbc:oracle:thin:#hostname:1521/DB",
"connection.user": "DBZ_TARGET",
"connection.password": "DBZ_TARGET",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"auto.create": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode": "record_key"
}
}
Edit:
The sink connector is working fine if the target table with the same name as the source table is already created with same DDL, but the target table is not getting auto created if it is not present already.

Timestamp in avro schema produces incompatible value validation in Kafka Connect JDBC

Error produced by JDBC sink connector:
org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type INT64: class java.util.Date for field: "some_timestamp_field"
at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:242)
at org.apache.kafka.connect.data.Struct.put(Struct.java:216)
at org.apache.kafka.connect.transforms.Cast.applyWithSchema(Cast.java:151)
at org.apache.kafka.connect.transforms.Cast.apply(Cast.java:107)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:38)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:480)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The avro schema registered by source JDBC connector (MySQL):
{
"type":"record",
"name":"ConnectDefault",
"namespace":"io.confluent.connect.avro",
"fields":[
...
{
"name":"some_timestamp_field",
"type":{
"type":"long",
"connect.version":1,
"connect.name":"org.apache.kafka.connect.data.Timestamp",
"logicalType":"timestamp-millis"
}
},
...
]
}
Looks like the exception is due to this code block: https://github.com/apache/kafka/blob/f0282498e7a312a977acb127557520def338d45c/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java#L239
So, in the avro schema, the timestamp field is registered as INT64 with correct (timestamp) logical type. But connect reads the schema type as INT64 and compares it with value type java.util.Date.
Is this a bug, or there is a work around for this? May be I am missing something as this looks like a standard connect model.
Thanks in advance.
UPDATE
Sink connector config:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_value",
"pk.fields": "id"
}
}
Deserialised data in Kafka:
{
"id":678148,
"some_timestamp_field":1543806057000,
...
}
We have worked out a work around for the problem. Our goal was to convert the id from BIGINT to STRING(TEXT/VARCHAR) and save the record in downstream db.
But due to an issue (probably https://issues.apache.org/jira/browse/KAFKA-5891), casting the id field was not working. Kafka was trying to validate the timestamp fields also in the casting chain, but was reading the schema type/name wrong and resulting a type mismatch (see the above record body and error log).
So we made a work around as follows:
extract only the id field as key -> execute cast transform on the key -> it works as key does not contain timestamp field.
Here is the worked around configuration:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"transforms": "createKey,castKeyToString",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.castKeyToString.type": "org.apache.kafka.connect.transforms.Cast$Key",
"transforms.castKeyToString.spec": "id:string",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_key",
"pk.fields": "id"
}
}
Disclaimer: This is not a proper solution, just a work around. The bug in casting transform should be fixed. In my opinion, the casting transform should only have concerns with the fields designated for casting, not other fields in the message.
Have a good day.

ExtractField and Parse JSON in kafka-connect sink

I have a kafka-connect flow of mongodb->kafka connect->elasticsearch sending data end to end OK, but the payload document is JSON encoded. Here's my source mongodb document.
{
"_id": "1541527535911",
"enabled": true,
"price": 15.99,
"style": {
"color": "blue"
},
"tags": [
"shirt",
"summer"
]
}
And here's my mongodb source connector configuration:
{
"name": "redacted",
"config": {
"connector.class": "com.teambition.kafka.connect.mongo.source.MongoSourceConnector",
"databases": "redacted.redacted",
"initial.import": "true",
"topic.prefix": "redacted",
"tasks.max": "8",
"batch.size": "1",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.JSONSerializer",
"key.serializer.schemas.enable": false,
"value.serializer.schemas.enable": false,
"compression.type": "none",
"mongo.uri": "mongodb://redacted:27017/redacted",
"analyze.schema": false,
"schema.name": "__unused__",
"transforms": "RenameTopic",
"transforms.RenameTopic.type":
"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.RenameTopic.regex": "redacted.redacted_Redacted",
"transforms.RenameTopic.replacement": "redacted"
}
}
Over in elasticsearch, it ends up looking like this:
{
"_index" : "redacted",
"_type" : "kafka-connect",
"_id" : "{\"schema\":{\"type\":\"string\",\"optional\":true},\"payload\":\"1541527535911\"}",
"_score" : 1.0,
"_source" : {
"ts" : 1541527536,
"inc" : 2,
"id" : "1541527535911",
"database" : "redacted",
"op" : "i",
"object" : "{ \"_id\" : \"1541527535911\", \"price\" : 15.99,
\"enabled\" : true, \"tags\" : [\"shirt\", \"summer\"],
\"style\" : { \"color\" : \"blue\" } }"
}
}
I'd like to do use 2 single message transforms:
ExtractField to grab object, which is a string of JSON
Something to parse that JSON into an object or just let the normal JSONConverter handle it, as long as it ends up as properly structured in elasticsearch.
I've attempted to do it with just ExtractField in my sink config, but I see this error logged by kafka
kafka-connect_1 | org.apache.kafka.connect.errors.ConnectException:
Bulk request failed: [{"type":"mapper_parsing_exception",
"reason":"failed to parse",
"caused_by":{"type":"not_x_content_exception",
"reason":"Compressor detection can only be called on some xcontent bytes or
compressed xcontent bytes"}}]
Here's my elasticsearch sink connector configuration. In this version, I have things working but I had to code a custom ParseJson SMT. It's working well, but if there's a better way or a way to do this with some combination of built-in stuff (converters, SMTs, whatever works), I'd love to see that.
{
"name": "redacted",
"config": {
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"batch.size": 1,
"connection.url": "http://redacted:9200",
"key.converter.schemas.enable": true,
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"schema.ignore": true,
"tasks.max": "1",
"topics": "redacted",
"transforms": "ExtractFieldPayload,ExtractFieldObject,ParseJson,ReplaceId",
"transforms.ExtractFieldPayload.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractFieldPayload.field": "payload",
"transforms.ExtractFieldObject.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractFieldObject.field": "object",
"transforms.ParseJson.type": "reaction.kafka.connect.transforms.ParseJson",
"transforms.ReplaceId.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.ReplaceId.renames": "_id:id",
"type.name": "kafka-connect",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": false
}
}
I am not sure about your Mongo connector. I don't recognize the class or the configurations... Most people probably use Debezium Mongo connector
I would setup this way, though
"connector.class": "com.teambition.kafka.connect.mongo.source.MongoSourceConnector",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.JSONSerializer",
"key.serializer.schemas.enable": false,
"value.serializer.schemas.enable": true,
The schemas.enable is important, that way the internal Connect data classes can know how to convert to/from other formats.
Then, in the Sink, you again need to use JSON DeSerializer (via the converter) so that it creates a full object rather than a plaintext string, as you see in Elasticsearch ({\"schema\":{\"type\":\"string\").
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": false,
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": true
And if this doesn't work, then you might have to manually create your index mapping in Elasticsearch ahead of time so it knows how to actually parse the strings you are sending it

Confluent Kafka connect ElasticSearch ID document creation

I'm using the kafka connect elasticsearch connector to write data from a topic to an ElasticSearch index. Both the key and value of the topic messages are in json format. The connector is not able to start because of the following error:
org.apache.kafka.connect.errors.DataException: MAP is not supported as the document id.
Following is the format of my messages (key | value):
{"key":"OKOK","start":1517241690000,"end":1517241695000} | {"measurement":"responses","count":9,"sum":1350.0,"max":150.0,"min":150.0,"avg":150.0}
And following is the body of the POST request I'm using to create the connector:
{
"name": "elasticsearch-sink-connector",
"config": {
"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "output-topic-elastic",
"connection.url": "http://elasticsearch:9200",
"type.name": "aggregator",
"schemas.enable": "false",
"topic.schema.ignore": "true",
"topic.key.ignore": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"key.ignore":"false",
"topic.index.map": "output-topic-elastic:aggregator",
"name": "elasticsearch-sink",
"transforms": "InsertKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key"
}}
Any help would be really appreciated. I've found out a similar question on stackoverflow 1 but I've got no luck with the answers.
ES document ID creation
You also need ExtractField in there
"transforms": "InsertKey,extractKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key",
"transforms.extractKey.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field":"key"
Check out this post for more details.

Resources