I am trying to insert data from Kafka to Teradata. The payload has some null values, and the JDBC sink is throwing the following error.
[Teradata JDBC Driver] [TeraJDBC 16.20.00.10] [Error 1063] [SQLState HY000] null is not supported as a data value with this variant of the setObject method; use the setNull method or the setObject method with a targetSqlType parameter
My connector config:
name=teradata-sink-K_C_OSUSR_DGL_DFORM_I1-V2
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
connection.url=connectionString
topics=POPS-P-OSUSR_DGL_DFORM_I1-J-V2-CAL-OUT
topic.prefix=
table.name.format=K_C_OSUSR_DGL_DFORM_I1_V2
batch.size=50000
errors.tolerance=all
errors.deadletterqueue.topic.name=POPS-P-OSUSR_DGL_DFORM_I1-V2-CAL-DEAD
errors.deadletterqueue.topic.replication.factor=1
Is there a way to achieve this? I do not know if I have to change some code into the sink or just change the connector config.
You are getting the error from some line that surely look like this:
ps.setObject(1, val);
This one will throw an exception if the val you try to insert has a null value.
The error is telling that you must specify the data type of the null values incoming. You could do this:
ps.setObject(1, val, Types.VARCHAR);
This way you are casting NULL to a VARCHAR, one of the supported targetSqlTypes.
Another option for the same purpose:
ps.setNull(1, Types.VARCHAR) ;
The problem we have is we are using the standard Kafka Connect to create a sink (we are not coding any custom connector).
We have configured both .properties files for worker and connector to create a link between the topic and the teradata table, and we run it using
.../confluent/bin/connect-standalone <worker.cfg> <connector.cfg>
When we create a message with "null" values and send it into the topic, the Sink Connector is unable to insert the record into the TD table.
Related
Recently when I setup the Kafka Sink Connector and ingest data into the database, I noticed that certain values gave errors for one specific column, while other times this one specific column only allows certain values to be ingested. Every time this column doesn't allow that value, it will throw this error
java.sql.SQLException: Exception Chain
java.sql.BatchUpdateException: ORA-00001: unique constraint (db.table) violated
java.sql.SQLIntegrityConstraintViolationException: ORA-00001: unique constraint (db.table) violated
Error : 1, Position : 0, Sql = INSERT INTO TABLE(table_id, system_timestamp)
I know that the database side can trigger the value to null, however the database admin just want to leave the situation alone. However this error is getting out of control where the logs keep generating this error message every few seconds. I researched that the Kafka Sink Connector can filter the field name and allow certain values to be entered into the database. However when I tried it out it gave error that the regex expression Kafka does not accept. Am I writing it correctly? The table_id only allows 19212 and 19213 as the values, while other integers or strings would not allow. is there a way for Kafka to only accept those two values, while other times it spits out warning and null result. Here is my code
"transforms": filterTableId",
"transforms.filterTableId.type": "io.cofluent.connect.transforms.Filter$Value",
"transforms.filterTableId.filter.condition": "$.payload.after[?(#.nestedKey == "32512" || #.nestedKey == "32513")]",
"transforms.filterTableId.filter.type": "include",
"transforms.filterTableId.missing.or.null.behavior": "fail"
from https://docs.confluent.io/platform/current/connect/transforms/filter-confluent.html#properties
Any suggestions on what I did wrong, or is it because its Confluent that my Kafka Sink Connector does not support because I also tried org.apache.kafka.connect as type and also failed to support
I am working on my local windows IIB/MQ server. What I am trying to do is place a message on JMSOutput queue.
For that, I have created JMS Administered Object by creating an initial context factory and within it I have created Destination Queue and Connection Factory using file system option. I got a .binding file created in the Provider_URL path specified below.
In JMS Output node proprerties, I have set the JMS Provider name to
Websphere MQ
and initial context factory to
com.sun.jndi.fscontext.RefFSContextFactory
All the other options are unfilled.
Please note that the JMSADmin.config file has the following uncommented properties:
PROVIDER_URL=file:/C:/JNDI
INITIAL_CONTEXT_FACTORY=com.sun.jndi.fscontext.RefFSContextFactory
Now when I try to put a message on JMS Output node, I get the following exception:
ExceptionList
RecoverableException
File:CHARACTER:F:\build\S1000_slot1\S1000_P\src\DataFlowEngine\MessageServices\ImbDataFlowNode.cpp
Line:INTEGER:1251
Function:CHARACTER:ImbDataFlowNode::createExceptionList
Type:CHARACTER:ComIbmJMSClientOutputNode
Name:CHARACTER:test#FCMComposite_1_4
Label:CHARACTER:test.JMS Output
Catalog:CHARACTER:BIPmsgs
Severity:INTEGER:3
Number:INTEGER:2230
Text:CHARACTER:Node throwing exception
Insert
Type:INTEGER:14
Text:CHARACTER:test.JMS Output
RecoverableException
File:CHARACTER:JMSClientErrors.java
Line:INTEGER:771
Function:CHARACTER:JMSClientErrors:handleJNDIException()
Type:CHARACTER:
Name:CHARACTER:
Label:CHARACTER:
Catalog:CHARACTER:BIPmsgs
Severity:INTEGER:3
Number:INTEGER:4640
Text:CHARACTER:Failure to obtain JNDI administered objects
Insert
Type:INTEGER:5
Text:CHARACTER:Broker 'LOCALBK10'; Execution Group 'Test'; Message Flow 'test'; Node 'ComIbmJMSClientOutputNode'
Insert
Type:INTEGER:5
Text:CHARACTER:com.sun.jndi.fscontext.RefFSContextFactory
Insert
Type:INTEGER:5
Text:CHARACTER:
Insert
Type:INTEGER:5
Text:CHARACTER:
Insert
Type:INTEGER:5
Text:CHARACTER:Hello
Insert
Type:INTEGER:5
Text:CHARACTER: Cause:java.net.MalformedURLException: no protocol:
Insert
Type:INTEGER:5
Text:CHARACTER: , Failure to obtain JNDI administered objects
Any help would be highly appreciated.
Towards the end of above stack trace you see this
Cause:java.net.MalformedURLException: no protocol
This is because you did not set the value for property Location JNDI bindings. It has to have the same value as configured in JMSAdmin.config, i.e. file:/C:/JNDI.
My requirement is reading data from a Database aggregate it and convert to bytes then stream it to another database(Oracle) in a Blob column.
Oracle requires to disable JDBC autocommit to stream to a Blob column and call Connection#Commit when finished.
I currently have 3 steps.
Step 1(Tasklet):
It has two SQL queries. one to initialize the column (UPDATE DATABASEUSER.TABLENAME SET payload = empty_blob() WHERE PrimaryKey= ?)
the second one returns the Blob locator (SELECT payload AS payload FROM DATABASEUSER.TABLENAME WHERE PrimaryKey = ? FOR UPDATE)
I also get the connection object from the datasource to disable autocommit
Step 2(Chuck)
I have an IteamReader that reads data from the source DB in a generic way and a Processor that takes converts the rows to a CSV format but in bytes. Then I have a Custom ItemWriter to stream the data to the Blob column.
Step 3(Tasklet)
This is when I cleanup and commit the connection.
Question 1: Is this the correct strategy? Appreciate any direction as I'm kinda unsure
I solved it.
I used the ResourcelessTransactionManager transaction manager in all my steps. In step 1 I get a connection from the datasource, disable autocommit and call commit on the final step. I use the same connection in all steps.
I am using CDH5.5,ElasticSearch-2.4.1.
I have created Hive table and trying to push the hive table data to ElasticSearch using the below query.
CREATE EXTERNAL TABLE test1_es(
id string,
timestamp string,
dept string)<br>
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
LOCATION
'hdfs://quickstart.cloudera:8020/user/cloudera/elasticsearch/test1_es'
TBLPROPERTIES ( 'es.nodes'='localhost',
'es.resource'='sample/test1',
'es.mapping.names' = 'timestamp:#timestamp',
'es.port' = '9200',
'es.input.json' = 'false',
'es.write.operation' = 'index',
'es.index.auto.create' = 'yes'
);<br>
INSERT INTO TABLE default.test1_es select id,timestamp,dept from test1_hive;
I'm getting the below error in the Job Tracker URL
"
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured. <br>
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all. "
It will throw "FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask" in hive terminal.
I tried all the steps mentioned in forums like including /usr/lib/hive/bin/elasticsearch-hadoop-2.0.2.jar in hive-site.xml, adding ES-hadoop jar to HIVEAUXJARS_PATH, copied yarn jar to /usr/lib/hadoop/elasticsearch-yarn-2.1.0.Beta3.jar also. Please suggest me how to fix the error.
Thanks in Advance,
Sreenath
I'm dealing with the same problem, and I found the execution error thrown by hive is caused by a timestamp field of string type which could not be parsed. I'm wondering whether timestamp fields of string type could be properly mapped to es, and if not this could be the root cause.
BTW, you should go to the hadoop MR log to find more details about the error.
REATE EXTERNAL TABLE test1_es(
id string,
timestamp string,
dept string)<br>
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ...........
don't need location
I just started to use Spark-SQL to load data from a H2 database, here is what I did following the Spark-SQL document:
>>> sqlContext = SQLContext(sc)
>>> df = sqlContext.load(source="jdbc",driver="org.h2.Driver", url="jdbc:h2:~/test", dbtable="RAWVECTOR")
But it didn't work and gave errors, I think the problem is that the username and password are not specified in the function.
This is parameters from the document from Spark-SQL 1.3.1:
url
The JDBC URL to connect to.
dbtable
The JDBC table that should be read. Note that anything that
is valid in a FROM clause of a SQL query can be used. For example,
instead of a full table you could also use a subquery in
parentheses.
driver
The class name of the JDBC driver needed to connect to this
URL. This class with be loaded on the master and workers before
running an JDBC commands to allow the driver to register itself with
the JDBC subsystem.
partitionColumn, lowerBound, upperBound, numPartitions
These options must all be specified if any of them is specified. They describe how to partition the table when reading in parallel from multiple workers. partitionColumn must be a numeric column from the table in question.
But I didn't find any clue how to pass the database user name and password to the sqlContext.load function.
Any one has similar case or clues?
Thanks.
I figured it out. Just do
df = sqlContext.load(
source="jdbc",driver="org.h2.Driver",
url="jdbc:h2:tcp://localhost/~/test?user=sa&password=1234",
dbtable="RAWVECTOR"
)
And when you create the database, use same pattern:
conn = DriverManager.getConnection(
"jdbc:h2:tcp://localhost/~/"+dbName+"?user=sa&password=1234", null, null
);
And, here is a blog about how to use the API.