Why does SetSchemaMetadata SMT work for $Value but not for $Key in my Kafka Connect connector? - jdbc

I've tried for a few hours to find out why the SetSchemaMetadata SMT works for my value but not for my key even though Kafka Connect detects that the SMT plugin exists for the key (i.e no error). I have also set auto.register.schemas to true to see what schema it generates in the schema registry and the value schema has the correct namespace but the key schema has the default values (i.e. not being set by the SMT).
Here are my connector configs:
'{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"connection.url": "jdbc:postgresql://host.docker.internal:5432/db_name",
"connection.user": "postgres",
"connection.password": "12345",
"table.whitelist": "file",
"mode": "timestamp+incrementing",
"incrementing.column.name": "seq",
"timestamp.column.name": "last_modified_datetime",
"topic.prefix": "jdbc-source-inserted_or_updated_",
"validate.non.null": true,
"timestamp.granularity": "nanos_long",
"poll.interval.ms": "100",
"auto.register.schemas": "true",
"transforms": "AddNamespaceForKey, AddNamespaceForValue, copyFieldToKey",
"transforms.AddNamespaceForKey.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Key",
"transforms.AddNamespaceForKey.schema.name": "io.JDBC_Source_Inserted_Or_Updated_File_Key.JDBC_Source_Inserted_Or_Updated_File_Key",
"transforms.AddNamespaceForValue.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"transforms.AddNamespaceForValue.schema.name": "io.JDBC_Source_Inserted_Or_Updated_File_Value.JDBC_Source_Inserted_Or_Updated_File_Value",
"transforms.copyFieldToKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.copyFieldToKey.fields":"name",
"errors.log.enable":true,
"errors.log.include.messages":true
}'
Here are the generated schemas:
Key:
{
"fields": [
{
"name": "name",
"type": "string"
}
],
"name": "ConnectDefault",
"namespace": "io.confluent.connect.avro",
"type": "record"
}
Value:
{
"connect.name": "io.JDBC_Source_Inserted_Or_Updated_File_Value.JDBC_Source_Inserted_Or_Updated_File_Value",
"fields": [
{
"name": "seq",
"type": "int"
},
{
"name": "name",
"type": "string"
},
{
"name": "url",
"type": "string"
},
{
"name": "last_modified_datetime",
"type": "long"
},
{
"name": "modified_at",
"type": "long"
},
{
"name": "created_at",
"type": "long"
}
],
"name": "JDBC_Source_Inserted_Or_Updated_File_Value",
"namespace": "io.JDBC_Source_Inserted_Or_Updated_File_Value",
"type": "record"
}

Related

How I can transform a field of the Jsonfile witn NiFi?

Good morning
I am new in NiFi and I want modify a field in a JSON file (I am using NiFi v.1.12.0) and save it in other PATH.
This is and example of my JSON file:
"Id": "2b2ef24a-f3ce-4249-ad92-db9a565b5b66",
"Tipo": "AuditEvent",
"SubTipo": "Plataforma",
"Accion": "Audit.Middleware.EventData.HttpResponseSentEvent",
"IDCorrelacion": "7af48a20-587d-4e60-9c3b-02cc6a074662",
"TiempoEvento": "2020-07-30 11:45:08.315",
"Resultado": "No informado",
"ResultadoDesc": "No informado",
"Origen": {
"IDOrigen": "132403308182038429",
"Tipo": "Backend",
"Aplicacion": "fabric:/Omnicanalidad.Canales.Muro_v1",
"Servicio": "fabric:/Omnicanalidad.Canales.Muro_v1/Muro",
"Maquina": "ibsfabbe02pru",
"IP": "ibsfabbe02pru"
},
"OrigenInterno": "Audit.Middleware.AuditMiddleware",
"Agente": {
"Rol": "Sin rol asignado",
"IDUsuario": "1428",
"AltIDUsuario": "20141115",
"Localizador": "197.183.27.17",
"PropositoUso": "No informado",
"IDSession": "",
"XForwardedPort": "443",
"XForwardedFor": "162.37.0.100:30279, 162.37.0.5:10158, 172.37.0.5",
"XForwardedHost": "ebeprate.es",
"XForwardedProto": "https",
"XOriginalURL": "/test/v1/Relation/ObtieneGestor?IdUser=4355625&NiciTitular=43485326",
"XOriginalHost": "ebeprate.es",
"Referer": null,
"AuthenticationType": "AuthenticationTypes.Federation",
"UserAgent": "HttpApplicationGateway",
"Claims": "Hello World",
"AcceptedLanguage": null
},
"DatosEvento": {
"Headers": ["Content-Length: 0", "Request-Context: appId=cid-v1:d8b40be1-4838-4a94-a4f8-3ec374989b27"],
"StatusCode": 204,
"Body": ""
}
}
I want modify the field TiempoEvento from date to timestamp.
In this case 2020-07-30 11:45:08.315 convert to 1596109508
So I use this procedure:
1.- I used the GetFile Processor for take the file. I configure the properties (without any problems) and everything it is ok.
2.- I used UpdateRecord Processor to modify the field. (The problems appears)
In properties I have 3 properties:
I read that I need configure a schema-registry if I want to work with any data in NiFi (I don't know if it is totaly true). In this case how I am working with a JsonFile I supposed that I need it, so I did it.
In controller service I configure JsonPathReader, JsonRecordSetWriter and AvroSchemaRegistry.
I started with AvroSchemaRegistry.
SETTING
Name: Test
PROPERTIES
Validate Field Names -> true
test-schema ->
{
"name": "MyFirstNiFiTest",
"type": "record",
"namespace": "test.example",
"fields": [
{
"name": "Id",
"type": "string"
},
{
"name": "Tipo",
"type": "string"
},
{
"name": "SubTipo",
"type": "string"
},
{
"name": "Accion",
"type": "string"
},
{
"name": "IDCorrelacion",
"type": "string"
},
{
"name": "TiempoEvento",
"type": "string"
},
{
"name": "Resultado",
"type": "string"
},
{
"name": "ResultadoDesc",
"type": "string"
},
{
"name": "Origen",
"type": {
"name": "Origen",
"type": "record",
"fields": [
{
"name": "IDOrigen",
"type": "string"
},
{
"name": "Tipo",
"type": "string"
},
{
"name": "Aplicacion",
"type": "string"
},
{
"name": "Servicio",
"type": "string"
},
{
"name": "Maquina",
"type": "string"
},
{
"name": "IP",
"type": "string"
}
]
}
},
{
"name": "OrigenInterno",
"type": "string"
},
{
"name": "Agente",
"type": {
"name": "Agente",
"type": "record",
"fields": [
{
"name": "Rol",
"type": "string"
},
{
"name": "IDUsuario",
"type": "string"
},
{
"name": "AltIDUsuario",
"type": "string"
},
{
"name": "Localizador",
"type": "string"
},
{
"name": "PropositoUso",
"type": "string"
},
{
"name": "IDSession",
"type": "string"
},
{
"name": "XForwardedPort",
"type": "string"
},
{
"name": "XForwardedFor",
"type": "string"
},
{
"name": "XForwardedHost",
"type": "string"
},
{
"name": "XForwardedProto",
"type": "string"
},
{
"name": "XOriginalURL",
"type": "string"
},
{
"name": "XOriginalHost",
"type": "string"
},
{
"name": "Referer",
"type": [
"string",
"null"
]
},
{
"name": "AuthenticationType",
"type": [
"string",
"null"
]
},
{
"name": "UserAgent",
"type": "string"
},
{
"name": "Claims",
"type": "string"
},
{
"name": "AcceptedLanguage",
"type": [
"string",
"null"
]
}
]
}
},
{
"name": "DatosEvento",
"type": {
"name": "DatosEvento",
"type": "record",
"fields": [
{
"name": "Name",
"type": "string"
},
{
"name": "Category",
"type": "string"
},
{
"name": "EventType",
"type": "int"
},
{
"name": "Id",
"type": "int"
},
{
"name": "ApiName",
"type": "string"
},
{
"name": "Token",
"type": "string"
},
{
"name": "ApiScopes",
"type": {
"type": "array",
"items": "string"
}
},
{
"name": "TokenScopes",
"type": {
"type": "array",
"items": "string"
}
},
{
"name": "Message",
"type": "string"
},
{
"name": "ActivityId",
"type": "string"
},
{
"name": "TimeStamp",
"type": "int",
"logicalType": "date"
},
{
"name": "ProcessId",
"type": "int"
},
{
"name": "LocalIpAddress",
"type": "string"
},
{
"name": "RemoteIpAddress",
"type": "string"
}
]
}
}
]
}
I converted the JSON file to avroSchema
I enable it and everything it is OK.
Then I configure the JsonRecordSetWrite:
SETTING
Name: TestRecordSetWriter
PROPERTIES
I enable it and everything it is OK.
and then I configue de JsonPathReader
SETTING
Name: TestPathReader
PROPERTIES
And in this point I have and alert that said:
'JSON paths' is invalid bacause No JSON Paths were specified
and I can't enable this controller services, and I don't know what am I missing?
I don't know if there are another way to do it easier. I don't know if I am totally wrong. So I need some help.
Thank you
I found the answer. I has a bad configuration in JsonPathreader, because I had not configured the records of the schema in the properties.

How to solve Kafka Connect JSONConverter "Schema must contain 'type' field"

Am trying to push a message to JdbcSink the message is as below
{
"schema": {
"type": "struct",
"fields": [{
"field": "ID",
"type": {
"type": "bytes",
"scale": 0,
"precision": 64,
"connect.version": 1,
"connect.parameters": {
"scale": "0"
},
"connect.name": "org.apache.kafka.connect.data.Decimal",
"logicalType": "decimal"
}
}, {
"field": "STORE_DATE",
"type": ["null", {
"type": "long",
"connect.version": 1,
"connect.name": "org.apache.kafka.connect.data.Timestamp",
"logicalType": "timestamp-millis"
}],
"default": null
}, {
"field": "DATA",
"type": ["null", "string"],
"default": null
}],
"name": "KAFKA_STREAM"
},
"payload": {
"ID": 17,
"STORE_DATE": null,
"DATA": "THIS IS TEST DATA"
}
}
but it keeps throwing error Caused by: org.apache.kafka.connect.errors.DataException: Schema must contain 'type' field
this is the connector configuration am using currently
{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"topics": "DEV_KAFKA_STREAM",
"connection.url": "url",
"connection.user": "user",
"connection.password": "password",
"insert.mode": "insert",
"table.name.format": "KAFKA_STREAM",
"pk.fields": "ID",
"auto.create": "false",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true"
}
am not sure how to debug this or how to find the root cause as the json does have type field
From what I can tell, "long" is not a valid Schema type.
You want "int64"
JSON Schema source code
And you may also want to remove the unions. There's a optional key to designate nullable fields
Kafka Connect JDBC sink connector not working
If you're creating that JSON in java, you should use SchemaBuilder and the Envelope class around two JSONNode objects to make sure you correctly build the payload

How to set TAG when sending data from Kafka to Influxdb

I'm using Confluent Influxdb Connector to send data to Influxdb from Kafka. The configuration looks like this:
connector.class=io.confluent.influxdb.InfluxDBSinkConnector
influxdb.url=myurl
topics=mytopic
tasks.max=1
The schema looks like this:
{
"type": "record",
"name": "myrecord",
"fields": [
{
"name": "sn",
"type": "string"
},
{
"name": "value",
"type": "float"
},
{
"name": "tagnum",
"type": "string"
}
]
}
When sending the data from Kafka to Influxdb, every data item was regarded as FIELD.
How to set some of the data items as TAG when sent to InfluxDB from Kafka by using the InfluxDB Connector, such as set "tagnum" as TAG ?
Your schema would look like this. The important thing is that your tags are in a map field.
{
"type": "struct",
"fields": [
{ "type": "map", "keys": { "type": "string", "optional": false }, "values": { "type": "string", "optional": false }, "optional": false, "field": "tags" },
{ "field": "sn", "optional": false, "type": "string" },
{ "field": "value", "optional": false, "type": "float" }
],
"optional": false,
"version": 1
}
Here's an example sending a JSON payload:
kafkacat -b localhost:9092 -P -t testdata-json4 <<EOF
{ "schema": { "type": "struct", "fields": [ { "type": "map", "keys": { "type": "string", "optional": false }, "values": { "type": "string", "optional": false }, "optional": false, "field": "tags" }, { "field": "sn", "optional": false, "type": "string" }, { "field": "value", "optional": false, "type": "float" } ], "optional": false, "version": 1 }, "payload": { "tags": { "tagnum": "5" }, "sn": "FOO", "value": 500.0 } }
EOF
curl -i -X PUT -H "Accept:application/json" \
-H "Content-Type:application/json" http://localhost:8083/connectors/SINK_INFLUX_01/config \
-d '{
"connector.class" : "io.confluent.influxdb.InfluxDBSinkConnector",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"key.converter" : "org.apache.kafka.connect.storage.StringConverter",
"topics" : "testdata-json4",
"influxdb.url" : "http://influxdb:8086",
"influxdb.db" : "my_db",
"measurement.name.format" : "${topic}"
}'
The result in InfluxDB:
$ influx -execute 'SELECT * FROM "testdata-json4" GROUP BY tagnum;' -database "my_db"
name: testdata-json4
tags: tagnum=5
time sn value
---- -- -----
1579713749737000000 FOO 500
Ref: https://rmoff.net/2020/01/23/notes-on-getting-data-into-influxdb-from-kafka-with-kafka-connect/

Kafka JDBC source not showing numeric values

I am deploying a Kafka Connect JDBC Source. It is connecting properly to the dabase, but the result I am getting is this:
{
"schema": {
"type": "struct",
"fields": [
{
"type": "bytes",
"optional": false,
"name": "org.apache.kafka.connect.data.Decimal",
"version": 1,
"parameters": {
"scale": "0"
},
"field": "ID"
},
{
"type": "bytes",
"optional": false,
"name": "org.apache.kafka.connect.data.Decimal",
"version": 1,
"parameters": {
"scale": "0"
},
"field": "TENANT_ID"
},
{
"type": "bytes",
"optional": false,
"name": "org.apache.kafka.connect.data.Decimal",
"version": 1,
"parameters": {
"scale": "0"
},
"field": "IS_ACTIVE"
},
{
"type": "int64",
"optional": false,
"name": "org.apache.kafka.connect.data.Timestamp",
"version": 1,
"field": "CREATION_DATE"
},
{
"type": "int64",
"optional": true,
"name": "org.apache.kafka.connect.data.Timestamp",
"version": 1,
"field": "LAST_LOGIN"
},
{
"type": "string",
"optional": true,
"field": "NAME"
},
{
"type": "string",
"optional": true,
"field": "MOBILEPHONE"
},
{
"type": "string",
"optional": true,
"field": "EMAIL"
},
{
"type": "string",
"optional": true,
"field": "USERNAME"
},
{
"type": "string",
"optional": true,
"field": "PASSWORD"
},
{
"type": "string",
"optional": true,
"field": "EXTERNAL_ID"
}
],
"optional": false
},
"payload": {
"ID": "fdo=",
"TENANT_ID": "Uw==",
"IS_ACTIVE": "AQ==",
"CREATION_DATE": 1548987456000,
"LAST_LOGIN": 1557401837030,
"NAME": " ",
"MOBILEPHONE": " ",
"EMAIL": " ",
"USERNAME": "ES00613751",
"PASSWORD": " ",
"EXTERNAL_ID": " "
}
}
As you can see, the numeric and timestamp values are not showing the value properly.
The config:
name=jdbc-teradata-source-connector
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=...
numeric.maping=best_fit
topic.prefix=test-2
mode=timestamp+incrementing
timestamp.column.name=LAST_LOGIN
incrementing.column.name=ID
topic=test-jdbc-oracle-source
The numeric mapping does not work since it is Confluent 3.2.2
I have also tried to cast the numbers to numeric but it does not work either.
Add in your connector config numeric.mapping
"numeric.mapping":"best_fit"
you can see all explication here

Avro schema evolution: extending an existing array

Standard Avro schema evolution examples show adding a new field with a default value to a record. But what if your old schema had an array and you want to add a new field to that array?
For example, given an array of records:
{
"type": "array",
"items": {
"name": "Loss",
"type": "record",
"fields": [
{
"name": "lossTotalAmount",
"type": [ "null", "string" ],
"default": null
},
{
"name": "lossType",
"type": [ "null", "string" ],
"default": null
},
{
"name": "lossId",
"type": [ "null", "string" ],
"default": null
},
{
"name": "vehicleLossCode",
"type": [ "null", "string" ],
"default": null
}
]
}
}
Adding a new field claimNumber:
{
"type": "array",
"items": {
"name": "Loss",
"type": "record",
"fields": [
{
"name": "lossTotalAmount",
"type": [ "null", "string" ],
"default": null
},
{
"name": "lossType",
"type": [ "null", "string" ],
"default": null
},
{
"name": "lossId",
"type": [ "null", "string" ],
"default": null
},
{
"name": "vehicleLossCode",
"type": [ "null", "string" ],
"default": null
},
{
"name": "claimNumber",
"type": [ "null", "string" ],
"default": null
}
]
}
}
Using the regular technique doesn't seem to work as I start running into deserialization exceptions. Is there a different way to extend existing arrays in Avro or is it impossible?

Resources