Write same value to 2 different columns with JDBC Sink Connector - jdbc

There are 2 columns that needs to be same in the table I want to sink on. Lets say columns are named as ID and PAYLOADID. But in the Kafka side, there are no seperate records for these columns. So, how can I configure my sink connector to write to these 2 columns from the same field in Kafka?
This is my connector config:
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.user": "${file:/pass.properties:alm_user}",
"connection.password": "${file:/pass.properties:alm_pwd}",
"connection.url": "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=****)(PORT=****))(CONNECT_DATA=(SERVICE_NAME=****)))",
"table.name.format": "SCHEMA_NAME.TABLE_NAME",
"topics": "MY_TOPIC",
"transforms": "TimestampConverter1",
"transforms.TimestampConverter1.target.type": "Timestamp",
"transforms.TimestampConverter1.field": "RECORDDATE",
"transforms.TimestampConverter1.format": "MM.dd.yyyy hh:mm:ss",
"transforms.TimestampConverter1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"auto.create": "false",
"insert.mode": "insert",
"transforms": "rename",
"transforms.rename.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.rename.renames": "payload:PAYLOADID, type:TYPE"

You'd need to write your own transform, or otherwise pre-process the topic, such that you can copy and rename one field to another while keeping the same field-value.

Related

dropfield transform sink connector: (STRUCT) type doesn't have a mapping to the SQL database column type

I created a sink connector from kafka to mysql.
After transform in sink connector's config and deleting some columns I get this error whereas whithout transform it works:
(STRUCT) type doesn't have a mapping to the SQL database column type
{
"name": "mysql-conf-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "3",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"topics": "mysql.cars.prices",
"transforms": "dropPrefix,unwrap",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex": "mysql.cars.prices",
"transforms.dropPrefix.replacement": "prices",
"transforms.timestamp.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.timestamp.target.type": "Timestamp",
"transforms.timestamp.field": "date_time",
"transforms.timestamp.format": "yyyy-MM-dd HH:mm:ss",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"connection.url": "jdbc:mysql://localhost:3306/product",
"connection.user": "kafka",
"connection.password": "123456",
"transforms": "ReplaceField",
"transforms.ReplaceField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.ReplaceField.blacklist": "id, brand",
"insert.mode": "insert",
"auto.create": "true",
"auto.evolve": "true",
"batch.size": 50000
}
}
You have put "transforms" key more than once in your JSON, which isn't valid.
Try with one entry
"transforms": "unwrap,ReplaceField,dropPrefix",
You are getting the error because you have overrode the value, and unwrap, specifically, is no longer called, so you have nested Structs.
The blacklist property got renamed to exclude, by the way - https://docs.confluent.io/platform/current/connect/transforms/replacefield.html#properties

Is it possible to have one Elasticsearch Index for one database with tables using debezium and kafka?

I have this connector and sink which basically creates a topic with
"Test.dbo.TEST_A" and write to the ES index "Test". I have set the "key.ignore": "false" so that row updates are also updated in ES and
"transforms.unwrap.add.fields":"table" to keep track on which table the document belong to.
{
"name": "Test-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.hostname": "192.168.1.234",
"database.port": "1433",
"database.user": "user",
"database.password": "pass",
"database.dbname": "Test",
"database.server.name": "MyServer",
"table.include.list": "dbo.TEST_A",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.testA",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms.unwrap.add.fields":"table"
}
}
{
"name": "elastic-sink-test",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "TEST_A",
"connection.url": "http://localhost:9200/",
"string.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schema.enable": "false",
"schema.ignore": "true",
"transforms": "topicRoute,unwrap,key",
"transforms.topicRoute.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.topicRoute.regex": "(.*).dbo.TEST_A", /* Use the database name */
"transforms.topicRoute.replacement": "$1",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "Id",
"key.ignore": "false",
"type.name": "TEST_A",
"behavior.on.null.values": "delete"
}
}
But when I add another connector/sink to include another table "TEST_B" from the database.
It seems like whenever the id from TEST_A and TEST_B are the same one of the row is deleted from ES?
Is it possible with this setup to have one index = one dabase or is the only solution to have one index per table?
The reason I want to have one index = one dabase is to decrease the amount of indexes when more database are added to ES.
You are reading data changes from different Databases/Tables and writing them into the same ElasticSearch index, with the ES document ID set to the DB record ID. And as you can see, if the DB record IDs collide, the index document IDs will also collide, causing old documents to be deleted.
You have a few options here:
ElasticSearch index per DB/Table name: You can implement this with different connectors or with a custom Single Message Transform (SMT)
Globally unique DB records: If you control the schema of the source tables, you can set the primary key to a UUID. This will prevent ID collisions.
As you mentioned in the comments, set the ES document ID to DB/Table/ID. You can implement this change using an SMT

Kafka connect database source connector : how to copy data from foreign key

I am using the database source connector to move data from my Postgres database table to Kafka topic. I have an orders table having a foreign key with customers table using customerNumber field.
Below is the connector which is copying the orders to Kafka but without customers data into JSON. I am looking how could I construct the complete object of orders with customers into JSON.
and the connector is :
{
"name": "SOURCE_CONNECTOR",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"connection.password": "postgres_pwd",
"transforms.cast.type": "org.apache.kafka.connect.transforms.Cast$Value",
"transforms.cast.spec": "amount:float64",
"tasks.max": "1",
"transforms": "cast,createKey,extractInt",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"batch.max.rows": "25",
"table.whitelist": "orders",
"mode": "bulk",
"topic.prefix": "data_",
"transforms.extractInt.field": "uuid",
"connection.user": "postgres_user",
"transforms.createKey.fields": "uuid",
"poll.interval.ms": "3600000",
"sql.quote.identifiers": "false",
"name": "SOURCE_CONNECTOR",
"numeric.mapping": "best_fit",
"connection.url": "url"
}
}
You can use Query-based ingest. Just specify query config option.

Timestamp in avro schema produces incompatible value validation in Kafka Connect JDBC

Error produced by JDBC sink connector:
org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type INT64: class java.util.Date for field: "some_timestamp_field"
at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:242)
at org.apache.kafka.connect.data.Struct.put(Struct.java:216)
at org.apache.kafka.connect.transforms.Cast.applyWithSchema(Cast.java:151)
at org.apache.kafka.connect.transforms.Cast.apply(Cast.java:107)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:38)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:480)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The avro schema registered by source JDBC connector (MySQL):
{
"type":"record",
"name":"ConnectDefault",
"namespace":"io.confluent.connect.avro",
"fields":[
...
{
"name":"some_timestamp_field",
"type":{
"type":"long",
"connect.version":1,
"connect.name":"org.apache.kafka.connect.data.Timestamp",
"logicalType":"timestamp-millis"
}
},
...
]
}
Looks like the exception is due to this code block: https://github.com/apache/kafka/blob/f0282498e7a312a977acb127557520def338d45c/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java#L239
So, in the avro schema, the timestamp field is registered as INT64 with correct (timestamp) logical type. But connect reads the schema type as INT64 and compares it with value type java.util.Date.
Is this a bug, or there is a work around for this? May be I am missing something as this looks like a standard connect model.
Thanks in advance.
UPDATE
Sink connector config:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_value",
"pk.fields": "id"
}
}
Deserialised data in Kafka:
{
"id":678148,
"some_timestamp_field":1543806057000,
...
}
We have worked out a work around for the problem. Our goal was to convert the id from BIGINT to STRING(TEXT/VARCHAR) and save the record in downstream db.
But due to an issue (probably https://issues.apache.org/jira/browse/KAFKA-5891), casting the id field was not working. Kafka was trying to validate the timestamp fields also in the casting chain, but was reading the schema type/name wrong and resulting a type mismatch (see the above record body and error log).
So we made a work around as follows:
extract only the id field as key -> execute cast transform on the key -> it works as key does not contain timestamp field.
Here is the worked around configuration:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"transforms": "createKey,castKeyToString",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.castKeyToString.type": "org.apache.kafka.connect.transforms.Cast$Key",
"transforms.castKeyToString.spec": "id:string",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_key",
"pk.fields": "id"
}
}
Disclaimer: This is not a proper solution, just a work around. The bug in casting transform should be fixed. In my opinion, the casting transform should only have concerns with the fields designated for casting, not other fields in the message.
Have a good day.

Confluent Kafka connect ElasticSearch ID document creation

I'm using the kafka connect elasticsearch connector to write data from a topic to an ElasticSearch index. Both the key and value of the topic messages are in json format. The connector is not able to start because of the following error:
org.apache.kafka.connect.errors.DataException: MAP is not supported as the document id.
Following is the format of my messages (key | value):
{"key":"OKOK","start":1517241690000,"end":1517241695000} | {"measurement":"responses","count":9,"sum":1350.0,"max":150.0,"min":150.0,"avg":150.0}
And following is the body of the POST request I'm using to create the connector:
{
"name": "elasticsearch-sink-connector",
"config": {
"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "output-topic-elastic",
"connection.url": "http://elasticsearch:9200",
"type.name": "aggregator",
"schemas.enable": "false",
"topic.schema.ignore": "true",
"topic.key.ignore": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"key.ignore":"false",
"topic.index.map": "output-topic-elastic:aggregator",
"name": "elasticsearch-sink",
"transforms": "InsertKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key"
}}
Any help would be really appreciated. I've found out a similar question on stackoverflow 1 but I've got no luck with the answers.
ES document ID creation
You also need ExtractField in there
"transforms": "InsertKey,extractKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key",
"transforms.extractKey.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field":"key"
Check out this post for more details.

Resources