Debezium Outbox Event Router - error with nested objects in payload - apache-kafka-connect

I'm getting an error using the Debezium Outbox Event Router (Postgres connector) when the JSON payload has nested objects and i use the option expand.json.payload: true.
Payload example:
{"Id": "8767ee4f-ef77-4128-a4eb-9a8fc8490d64", "Data": {"Name": "John"}}
With this payload I get the following error in the connector:
Caused by: org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
Caused by: org.apache.avro.SchemaParseException: Can't redefine: io.confluent.connect.avro.ConnectDefault
If the payload has only simple properties, then there is no issue. Example:
{"Id": "8767ee4f-ef77-4128-a4eb-9a8fc8490d64", "Data": "John"}
As far as I understand this has to with the fact the JSON expander does not create a named schema for the nested objects.
Is there any way to overcome this?

Related

Fluentd JSON string parsing with multiple data type in array

I am trying to set up a logging pipeline with Fluentd and elastic search. One of my log patterns looks like the following:
{
"key": "value",
"inputs": [
[
"2023-01-16T04: 45: 12.238Z",
{
"type": "channel",
"subtype": "profile",
"data": {
"firstName": "Customer"
}
}
]
]
}
The issue with this structure is that the first object in the inner array date is a string. Whenever Flunetd is trying to write it to ES, it throws an exception with error code 400, and following message
#0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: illegal_argument_exception [reason]: 'can't merge a non object mapping [data.inputs] with an object mapping'" location=nil
What is the way forward?
When I remove this date from the array, it is getting synced correctly to ES.

Correct flatMapvalues in Kafka streams split single message based on a field value

Needing some guidance w.r.t Kafka streams split.
I have a message value fields like this
{"name": "val1", "role": "val2"}
key of the message is a String field which we don't worry about here.
When in the name field I get multiple values separated by a / like this {"name": "tom/dick/harry", "role": "manager"} I want to be able to check those records in my stream with multiple / separated values in name field and then split or branch based that and send each message to the output topic. So basically 1 message to 3 different messages in this case:
{"name": "tom", "role": "manager"}
{"name": "dick", "role": "manager"}
{"name": "harry", "role": "manager"}
and send each of these to output topic.
I have tried Kafka streams' flatMapValues() and branch but it doesn't work. Just looking for a one line code or method I can use to achieve this.
Here is my code:
modifiedStream.filter(((key, value) -> value.getPerformerName().contains("/")))
.peek((key, value) -> log.info("Splitting this record to multiple ones..."))
.flatMapValues(new ValueMapper<work_reg_performer_int, Iterable<?>>() {
#Override
public Iterable<?> apply(work_reg_performer_int value) {
return Arrays.asList(value.getPerformerName().split("/"));
}
}
).to("split_performers_topic");
Here is my consumer's stream config:
consumer:
keySerde: org.apache.kafka.common.serialization.Serdes$StringSerde
valueSerde: io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde
startOffset: earliest
Running this code is throwing this exception stack which I think is due to only each of the performer name becoming its own message without anything else?
org.apache.kafka.streams.errors.StreamsException: ClassCastException while producing data to topic split_performers_topic. A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: io.confluent.kafka.streams.serdes.avro.SpecificAvroSerializer) is not compatible to the actual key or value type (key type: java.lang.String / value type: java.lang.String). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters (for example if using the DSL, `#to(String topic, Produced<K, V> produced)` with `Produced.keySerde(WindowedSerdes.timeWindowedSerdeFrom(String.class))`).
PS: I am using Spring Cloud stream for Kafka for this.
Can you update your problem statement with Stream config values? From the error, you seem generating Streams of records with key of type String and value of type of avro specific record, but the expected format to the destination topic is String and String.
Also as per the code snippet you have hardcoded Stream out topic as "split_performers_topic", but in your error it is complaining about the producer topic APRADB_work_reg_performer. Not sure where is the mismatch coming from. Kindly check and confirm.

PostgreSQL JDBC sink raise error null(Array) type doesn't have a mapping to the SQL database column type

I have problem when try to replicate my database using Kafka JDBC sink. When I run my server to a table which has Array data type on it, it give this error
...
Caused by: org.apache.kafka.connect.errors.ConnectException: null (ARRAY) type doesn't have a mapping to the SQL database column type
...
I want to retain the same Array condition and don't want to convert it into string like what i do to SQL Server (since SQL Server not allowed array data type).
This is my connection config:
{"name" :"pgsink_'$topic_name'",
"config":{"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max":"1",
"topics":"'$table'",
"connection.url":"jdbc:postgresql://",
"connection.user":"",
"connection.password":"",
"transforms":"unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"delete.handling.mode":"drop",
"auto.create":"true",
"auto.evolve":"true",
"insert.mode":"upsert",
"pk.fields":" '$pk'",
"pk.mode":"record_key",
"delete.enabled":"true",
"destination.table.format":"public.'$table'",
"connection.attempts":"60",
"connection.backoff.ms":"100000"
}}
My Kafka source came from Debezium, since I want to retain same data type, i don't put SMT into my source. This is the source config:
{
"name":"pg_prod",
"config":{
"connector.class":"io.debezium.connector.postgresql.PostgresConnector",
"plugin.name":"wal2json_streaming",
"database.hostname":"",
"database.port":"",
"database.user":"",
"database.password":"",
"database.dbname":"",
"database.server.name":"",
"database.history.kafka.bootstrap.servers": "",
"database.history.kafka.topic": "",
"transforms":"unwrap,reroute",
"table.whitelist":"public.table",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.delete.handling.mode": "drop",
"transforms.unwrap.drop.tombstones": "false",
"decimal.handling.mode":"double",
"time.precision.mode":"connect",
"transforms.reroute.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.reroute.regex":"postgres.public.(.*)",
"transforms.reroute.replacement":"$1",
"errors.tolerance": "all",
"errors.log.enable":true,
"errors.log.include.messages":true,
"kafkaPartition": "0",
"snapshot.delay.ms":"1000",
"schema.refresh.mode":"columns_diff_exclude_unchanged_toast"
}
}

Publishing Avro messages using Kafka REST Proxy throws "Conversion of JSON to Avro failed"

I am trying to publish a message which has a union for one field as
{
"name": "somefield",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
Publishing the message using the Kafka REST Proxy keeps throwing me the following error when somefield has an array populated.
{
"error_code": 42203,
"message": "Conversion of JSON to Avro failed: Failed to convert JSON to Avro: Expected start-union. Got START_ARRAY"
}
Same schema with somefield: null is working fine.
The Java classes are built in the Spring Boot project using the gradle plugin from the Avro schemas. When I use the generated Java classes and publish a message, with the array populated using the Spring KafkaTemplate, the message is getting published correctly with the correct schema. (The schema is taken from the generated Avro Specific Record) I copy the same json value and schema and publish via REST proxy, it fails with the above error.
I have these content types in the API call
accept:application/vnd.kafka.v2+json, application/vnd.kafka+json, application/json
content-type:application/vnd.kafka.avro.v2+json
What am I missing here? Any pointers to troubleshoot the issue is appreciated.
The messages I tested for were,
{
"somefield" : null
}
and
{
"somefield" : [
{"field1": "hello"}
]
}
However, it should be instead passed as,
{
"somefield" : {
"array": [
{"field1": "hello"}
]}
}

Kafka connect ElasticSearch sink - using if-else blocks to extract and transform fields for different topics

I have a kafka es sink properties file like the following
name=elasticsearch.sink.direct
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=16
topics=data.my_setting
connection.url=http://dev-elastic-search01:9200
type.name=logs
topic.index.map=data.my_setting:direct_my_setting_index
batch.size=2048
max.buffered.records=32768
flush.timeout.ms=60000
max.retries=10
retry.backoff.ms=1000
schema.ignore=true
transforms=InsertKey,ExtractId
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=MY_SETTING_ID
transforms.ExtractId.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractId.field=MY_SETTING_ID
This works perfectly for a single topic (data.my_setting). I would like to use the same connector for data coming in from more than one topic. A message in a different topic will have a different key which I'll need to transform.I was wondering if there's a way to use if else statements with a condition on the topic name or on a single field in the message such that I can then transform the key differently. All the incoming messages are json with schema and payload.
UPDATE based on the answer:
In my jdbc connector I add the key as follows:
name=data.my_setting
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
poll.interval.ms=500
tasks.max=4
mode=timestamp
query=SELECT * FROM MY_TABLE with (nolock)
timestamp.column.name=LAST_MOD_DATE
topic.prefix=investment.ed.data.app_setting
transforms=ValueToKey
transforms.ValueToKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.ValueToKey.fields=MY_SETTING_ID
I still however get the error when a message produced from this connector is read by elasticsearch sink
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
Caused by: org.apache.kafka.connect.errors.DataException: STRUCT is not supported as the document id
The payload looks like this:
{
"schema": {
"type": "struct",
"fields": [{
"type": "int32",
"optional": false,
"field": "MY_SETTING_ID"
}, {
"type": "string",
"optional": true,
"field": "MY_SETTING_NAME"
}
],
"optional": false
},
"payload": {
"MY_SETTING_ID": 9,
"MY_SETTING_NAME": "setting_name"
}
}
Connect standalone property file looks like this:
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/apps/{env}/logs/infrastructure/offsets/connect.offsets
rest.port=8084
plugin.path=/usr/share/java
Is there a way to achieve my goal which is to have messages from multiple topics (in my case db tables) which will have their own unique ids (which will also be the id of a document in ES) be sent to a single ES sink.
Can I use avro for this task. Is there a way to define the key in schema registry or will I run into the same problem?
This isn't possible. You'd need multiple Connectors if the key fields are different.
One option to think about is pre-processing your Kafka topics through a stream processor (e.g. Kafka Streams, KSQL, Spark Streaming etc etc) to standardise the key fields, so that you can then use a single connector. It depends what you're building as to whether this would be worth doing, or overkill.

Resources