I am trying use Kafka-connect-jdbc to load data in incremental mode using query. Data is getting loaded to the topic. Key column to converted to Numeric(38,0) but it's getting truncated in the topic. I am suspecting that some conversion is happening. Its starts from 0 to 127 and then each digit is getting repeated.
Thanks in advance
Please add more details as to how the key looks like in the topic.
As far as Confluent JDBC Source connector goes, there have been problems mapping the precision appropriately.
This blog sheds more light into the specifics of Oracle numeric type with respect to Kafka Connect.
Based on the problem description, I can suggest using the below property in the source property file:
numeric.mapping=best_fit
Related
everyone. I'm learning about some NiFi processors.
I want to obtain all the data of several tables automatically.
So I used a ListDatabaseTable processor with the aim of getting the tables names that are in a specific catalog.
After that, I used other processors to generate the queries like GenerateTableFetch and
RemplaceText. Everything works perfectly since here.
Finally, ExecuteSQL processor plays a role, and here and error is displayed. It says that a datetime column can not be converted to Avro format.
The problem is that there are several tables so specify those columns would be complicated to cast them.
Is a possible solution to fix the error?
The connection is with Microsoft SQL Server.
Here is the image of my flow :
I use Flink read data from Kafka using FlinkKafkaConsumer, then convert datastream to table, in the end sink data back to kafka(kafka-connector table) with FlinkSQL. In order to get exactly-once delivery guarantees, i set kafka table with property: sink.semantic=exactly-once.
When do test, i got error "transaction timeout is large than the maximum value allowed by the broker".
Flink default Kafka producer max transaction timeout: 1h
kafka default setting is transaction.max.timeout.ms=900000.
So, i need to add "transaction.timeout.ms" property in kafka producer. My question is where can i add this property using FlinkSQL.
My code:
tableEnv.executeSql("INSERT INTO sink_kafka_table select * from source_table")
I have known use with table api
tableEnv.connect(new Kafka()
.version("")
.topic("")
.property("bootstrap.server","")
.property("transaction.timeout.ms","120000"))
.withSchema()
.withFormat()
.createTemporaryTable("sink_table")
table.executeInsert("sink_table")
It's not good advice to modify kafka config file.
Any advice will help, thanks advance.
Using the connector declaration https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/common/#connector-tables you can use the .option method to set the properties.* option which will be forwarded to the kafka client with properties. stripped. So you'll need to set properties.transaction.max.timeout.ms
You can also create the sink_table with an SQL DDL statement passing any configuration using the properties.* option as well: https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/kafka/#properties
I'm not familiar with how are you creating the table, but I think it was deprecated and removed in 1.14: https://nightlies.apache.org/flink/flink-docs-release-1.13/api/java/org/apache/flink/table/api/TableEnvironment.html#connect-org.apache.flink.table.descriptors.ConnectorDescriptor- the method comments recommends creating the table executing a SQL DDL statement.
I have created few topics using kafka postgres source connector. Their names are as below:
server1.public.table1
server1.public.table2
server1.public.table3
I am using JDBC sink connector to load this data to postgres on different location. I want postgres tables to be created with name "table1","table2" and "table3".
I used below propertied in sink.json file
"topics.regex": "server1.public.(.*)",
"transforms": "route",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter"
but are not working. How can i pass the output from above transforms to "table.name.format"?
Or is there any another method to do that?
I know it's been a long time but I ran into the same issue as you.
It worked for me with almost the same regex like yours but with just one backslashes.
([^.]+)\.([^.]+)\.([^.]+)
I recommend using regex editor to see if it matches! I hope this will help anyone since it's been 2 years without an answer 😋
I have installed Kafka connect using confluent-4.0.0
Using hdfs connector I am able to save Avro records received from Kafka topic to hive.
I would like to know if there is any way to modify the records before writing into hdfs sink.
My requirement is to do small modifications to values of the record. For Example, performing arithmetic operations on integers or manipulation of strings etc.
Please suggest if there any way to achieve this
You have several options.
Single Message Transforms, which you can see in action here. Great for light-weight changes as messages pass through Connect. Configuration-file based, and extensible using the provided API if there's not an existing transform that does what you want.
See the discussion here on when SMT are suitable for a given requirement.
KSQL is a streaming SQL engine for Kafka. You can use it to modify your streams of data before sending them to HDFS. See this example here.
KSQL is built on the Kafka Stream's API, which is a Java library and gives you the power to transform your data as much as you'd like. Here's an example.
Take a look at Kafka connect transformers [1] & [2]. You can build a custom transformer library and use it in connector.
[1] http://kafka.apache.org/documentation.html#connect_transforms
[2] https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect
I'm connecting Oracle to Kafka by using a JDBC connector. When data comes in from Oracle, it is converted correctly except for the Oracle Columns that are Numbers. For such columns, the data is not decoded. The following is an example:
{"ID":"\u0004{","TYPE":"\u0000Ù","MODE":"bytes":"\u0007"},"STAT_TEMP":{"string":"TESTING"}}
I should mention that I'm also connecting the Kafka to spark such that I get the same output in the spark.
I'm wondering what is the best way to convert the data?
Whether to do it in Kafka or spark. If in Kafka, what is your suggestion in how to convert it?
Add in your connector config numeric.mapping
"numeric.mapping":"best_fit"
for more explication here