Snowflake to kafka topic using JdbcSourceConnector timezone issue - jdbc

I am working on POC Usecase setting up JDBC connector to fetch the data from Snowflake database and push it to a Kafka topic. I am running into timezone related issue and the exception is Kafka Connect Date type should not have any time fields set to non-zero values.
I am using a snowflake view to fetch the data and the view column data types are as below. LOADDATE is timestamp column and its datatype is TIMESTAMP_NTZ(9) where NTZ is no time zone which is UTC.
AGENTFIRSTNAME VARCHAR(200)
AGENTMIDDLENAME VARCHAR(200)
AGENTLASTNAME VARCHAR(200)
AGENTNAME VARCHAR(400)
AGENTKEY NUMBER(38,0)
ISAGENCY BOOLEAN
NPN VARCHAR(50)
AGENTNUMBER VARCHAR(100)
AGENTSTATE VARCHAR(50)
VUENAME VARCHAR(100)
GROUPNAME VARCHAR(200)
TYPENAME VARCHAR(80)
SUBMITDATE DATE
CONFNUMBER VARCHAR(100)
SRCE VARCHAR(100)
INSFIRSTNAME VARCHAR(16777216)
INSLASTNAME VARCHAR(16777216)
INSCITY VARCHAR(16777216)
INSSTATE VARCHAR(16777216)
INSBIRTHDATE DATE
LOADDATE TIMESTAMP_NTZ(9)
I have referred stackoverflow and online forums and To fix the issue Kafka Connect Date type should not have any time fields set to non-zero values, suggestion is to to setup in connector config db.timezone parameter:
I have run the query SHOW PARAMETERS LIKE 'TIMEZONE' IN ACCOUNT; in the snowflake query window to find out timezone and it returned America/Los_Angeles.
I have tried using America/Los_Angeles as timezone, then I also tried using UTC as timezone in my sfconnector.json and rebuilt the docker image and then I have run the docker container but so far I did not messages in Kafka topic.
I have attached the Exception in containerlog file, DataException: Kafka Connect Date type should not have any time fields set to non-zero values.
Exception:
2023-01-27 09:34:50 (org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig)
2023-01-27 09:35:01 [2023-01-27 15:35:01,039] ERROR Error encountered in task jdbc-snowflake-source-0. Executing stage 'VALUE_CONVERTER' with class 'io.confluent.connect.avro.AvroConverter', where source record is = SourceRecord{sourcePartition={protocol=1, table=DEV_ED.DBO.VW_CENTERENROLLMENT_IC}, sourceOffset={}} ConnectRecord{topic='VW_CENTERENROLLMENT_IC', kafkaPartition=null, key=null, keySchema=null, value=Struct{AGENTFIRSTNAME=GRECHEN,AGENTMIDDLENAME=LYNN,AGENTLASTNAME=SOWELL,AGENTNAME=GRECHEN LYNN SOWELL,AGENTKEY=690042,ISAGENCY=false,NPN=8991475,AGENTNUMBER=8991475,AGENTSTATE=MI,VUENAME=POE,GROUPNAME=Ana,TYPENAME=PDP,SUBMITDATE=2020-10-20,CONFNUMBER=SMVW8PDYWC,SOURCE=Sunfest,INSFIRSTNAME=Mary,INSLASTNAME=Hoefler,INSCITY=Grand Rapids,INSSTATE=MI,INSBIRTHDATE=1970-09-21,LOADDATE=2020-12-09 16:29:51.0}, valueSchema=Schema{VW_CENTERENROLLMENT_IC:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}. (org.apache.kafka.connect.runtime.errors.LogReporter)
2023-01-27 09:35:01 org.apache.kafka.connect.errors.DataException: Kafka Connect Date type should not have any time fields set to non-zero values.
Attached are the sfconnector config and full ContainerLog.
sfconnector config:
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "LOADDATE",
"db.timezone": "America/Los_Angeles",
"numeric.mapping": "best_fit",
"errors.log.include.messages": "true",
"tasks.max": "1",
"validate.non.null": "false",
"connection.url": "jdbc:snowflake://dp8881.central-us.azure.snowflakecomputing.com/?warehouse=ED_WH&db=DEV_ED&role=FR_IC_ANLYST&schema=DBO&user=IC_SERVICE_ACT&private_key_file=/tmp/snowflake_key.p8",
"errors.log.enable": "true",
"table.whitelist": "DEV_ED.DBO.VW_CENTERENROLLMENT_IC",
"table.types": "VIEW"
}
Thanks
Vamshi

While troubleshooting the issue, I was able to see that the issue is caused due to Timestamp and Date columns in the view. I had to convert the timestamp and all DATE columns using convert_timezone to UTC in the view. Updated by using convert_timezone:
convert_timezone('UTC', Loaddate) as Loaddate, convert_timezone('UTC', Submitdate) as Submitdate, convert_timezone('UTC', Insbirthdate) as Insbirthdate.
After the above view updates, The Kafka topic is created and are messages stored to it but observation is that all the three convert_timezone columns data i.e. (Loaddate, Submitdate, Insbirthdate) are missing/not recorded in the Kafka topic.
Per the Docker Container Logs, the warning/error message is recorded as "TIMESTAMPTZ not currently supported"
But when I converted these three date columns to CHAR then those values are also stored to Kafka topic.
Thanks
Vamshi

Thought to share the solution for my above post:
I was using latest version of the Jdbc Driver 3.13.26 But with this newest version of Jdbc driver, there are Timestamp issues. The type here comes as TIMESTAMP_TZ, which is not supported by GenericDialect of JDBC, hence we get the exceptions like
a)WARN JDBC type 2014 (TIMESTAMPIZ) not currently supported.
b)TIMESTAMP column not found.
In older version 3.13.16, it comes as TIMESTAMP.
So I have used 3.13.16 Jdbc driver installation command in my Dockerfile and re-built the Docker image which fixed the issues in my POC Usecase.
Thanks
Vamshi

Related

mapping TIMESTAMPTZ to clickhouse datatype

I have a TIMESTAMPTZ column in a cockroachDB source, by using clickhouse kafka consumer to read from cockroachDB changefeed i stored the TIMESTAMPTZ fields as DateTime however this resulted with inaccurate data, something of such sort:
1970-01-01 00:00:00
how to map TIMESTAMPTZ to the accurate date type in Clickhosue?
Welcome to the CRDB community!
I'm not very familiar with changefeed nor clickhouse but I'll try my best to help.
I tried to set up a CRDB changefeed on a table with a TIMESTAMPZ column:
create table t (i int primary key, j timestamptz);
insert into t values (1, now());
The output string of this TIMESTAMPZ columns uses ISO 8601 format:
root#localhost:26257/defaultdb> EXPERIMENTAL CHANGEFEED FOR t;
{"key":"[1]","table":"t","value":"{\"after\": {\"i\": 1, \"j\": \"2023-01-16T14:44:01.337341Z\"}}"}
So following #Denny Crane's lead, it does seem like using best_effort will allow Clickhouse to parse input date/time in ISO 8601 format.
Can you try and let us know whether it helps? If not, I can engage my colleague who have more expertise on this matter to help you.

Bad performance while writing to Oracle from Kafka using Kafka Connect

We are using Kafka Connect to write to Oracle and having performance issues while writing to Oracle and based on our understanding so far, It is actually dependent of data and the data type we have defined in Kafka Connect and Oracle.
Test 1:
We got good write performance with sample data set (Dataset 1) in this test.
Test 2:
With another sample dataset (Dataset2), write performance was almost half of what it was in first test.
Upon diving deep, we found that the difference between two data sets was that one of the field (STUDENT_ID) was having all not null values populated in dataset1 whereas in dataset2, it was a mix of NULL and NOT NULL values.
We found that STUDENT_ID field had a data type mismatch between kafka connect and Oracle. Kafka connect had INT64 whereas oracle had VARCHAR(50). so we changed kafka connect data type to STRING and with this change, we were able to get good write performance even with dataset2
Test 3:
During production run, we again ended up with poor write performance and this time we have no clue which field (and its values) are causing this.
Below is the mapping of data type between Kafka connect and Oracle.
Oracle | Kafka Connect
NUMBER(19,0) | int64
NUMBER | bytes (org.apache.kafka.connect.data.Decimal)
VARCHAR2(10) | string
VARCHAR2(1) | string
CHAR(4) | string
VARCHAR2(10) | string
TIMESTAMP | int64 (org.apache.kafka.connect.data.Timestamp)
CLOB | string
Please let me know if you have any suggestions on how to find the root cause for slow performance in Test 3.
Note : While comparing the Test1 and Test2, we suspected that CLOB field might be the cause for poor write performance but it wasn’t the case. Even after removing CLOB field, we got bad performance and it was only once we corrected the data type for STUDENT_ID field on Kafka side, we got the good performance.

Unable to connect to SQL shell server

Recently I've installed SQL Shell to run some SQL codes and the first thing that the shell shows is
Being a newbie in SQL, I have NO IDEA what to fill in in each option so I've spent many hours looking for the solution but every time I type in password, I get the following error
I'd want to run the following code:
CREATE TABLE flights (
id SERIAL PRIMARY KEY,
origin VARCHAR NOT NULL,
destination VARCHAR NOT NULL,
duration INTEGER NOT NULL
);
to get some ideas of how the platform works, but the server connection process hinders my progress. Would anyone please provide a kind suggestion?
enter image description here

Pulling a timestamp across a database link from postgres to oracle

The company I work for is in the process of switching from oracle to EnterpriseDB and I'm trying to update a query that uses a timestamp from a table, but whenever I try to pull that timestamp it gives me:
[Devart][ODBC][PostgreSQL]Invalid TIMESTAMP string {HY000}
I've tried casting it as varchar2, date, timestamp, using to_date, and nothing has worked.
The query is:
select "ship_date" from "promotion"#pgProd
In postgres ship_date is just a timestamp.
Any information about how this can be accomplished would be appreciated.
EDIT: To clarify, this query is being run in oracle pulling data from postgres.
Oracle version is 11g
The relevent line of the creation script is:
ship_date timestamp without time zone NOT NULL

Time zone issue when saving data to Hbase using Phoenix Driver

I am new to HBase and using Phoenix driver to connect HBase using Squirrel client. Below query describes my table structure and it has composite primary key with "Alert Id ( varchar)" and "Alert StartTime ( Row Timestamp)".
CREATE TABLE ALERT_DETAILS (ALERTID VARCHAR,MACHINENAME VARCHAR(100),PLACE VARCHAR(100),ALERTTYPE VARCHAR(32),ALERTSTARTTIME TIMESTAMP NOT NULL CONSTRAINT CTKEY PRIMARY KEY (ALERTID, ALERTSTARTTIME ROW_TIMESTAMP));
When I am inserting data using using below query. I am not able to see the time stamp value which I have given in the query. It is changing (5 hours before) to other value.
upsert into ALERT_DETAILS values('956dbd63fc586e35bccb0cac18d2cef0','machineone','AUS','CRITICAL ALERT','2016-12-22 11:30:23.0')
After executing the query The timestamp value is changing from '2016-12-22 11:30:23.0' to '2016-12-22 06:30:23.0'.
My system time zone is EST and please help me how to change configuration of Phoenix and Hbase
Phoenix uses the system time zone.
Use tzselect and follow the prompts. It will output an environment variable that you can set in your .bash_profile or set on system startup.
ie. TZ='America/New_York'; export TZ

Resources