Kafka Connect sink to Redshift table not in public schema - jdbc

I am unable to make a Kafka Connect sink work for a table that is not in the public schema.
I am using Kafka Connect to send records to a Redshift database via a sink operation using JdbcSinkConnector.
I have created my destination table in Redshift, but it is not in the public schema. (my_schema.test_table. Note: auto.create & auto.evolve are off in the connector configuration)
When I attempt to specify the table's location in the connector config, like so...
"table.name.format": "my_schema.test_table",
...the sink connector's task encounters this error when it attempts to get itself going:
"Table my_schema.test_table is missing and auto-creation is disabled"
from
Caused by: org.apache.kafka.connect.errors.ConnectException: Table my_schema.test_table is missing and auto-creation is disabled
at io.confluent.connect.jdbc.sink.DbStructure.create(DbStructure.java:86)
at io.confluent.connect.jdbc.sink.DbStructure.createOrAmendIfNecessary(DbStructure.java:63)
at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:78)
...
I have tried the following formats for supplying table name:
my_schema.test_table
dev.my_schema.test_table
test_table <-- in this case I get past the existence check that stops the others, but then run into this error every time Kafka Connect attempts to write a row:
"org.apache.kafka.connect.errors.RetriableException: java.sql.SQLException: java.sql.SQLException: Amazon Invalid operation: relation "test_table" does not exist;"
Likely because test_table is not in the public schema. : (
And it seems like the code is attempting to parse this table name correctly, but unfortunately it doesn't log its results.
This is my connection string: "connection.url": "jdbc:redshift://xxx.xxx.xxx.xxx:5439/dev"
I have mucked around with attempting so specify currentSchema=my_schema in the connection string... both for the redshift jdbc driver as well as postgresql. No luck.
I'm using Kafka Connect version 1.1.0
Redshift JDBC JAR: RedshiftJDBC42-1.2.16.1027.jar
I am able to get data flowing by putting the table in the public schema and specifying table name with no schema: "table.name.format": "test_table".
Unfortunately, that's not where we need the data to be.
Any help much appreciated.

I noticed that the source code seemed to be trying to do the right thing… and then realized that the version of the JDBC sink connector we were using did not have those modifications, which are somewhat recent. I moved from version 4.1.0 of the JDBC sink connector jar to version 5.0.0 and voila data is flowing into a table in the schema I specified. 🙃

Related

Kafka Connect JDBC dblink

I'm starting to study Apache Kafka and Kafka Connect.
I'm trying to get data from a remote Oracle Database that my user only have read privilegies and can't list tables (i don't have permission to change that). To every query, i have to pass a dblink, but in the JDBC Connector, i didn't find a option to pass a dblink.
I can do the query if i pass a specific query on the connector configuration, but i want to fetch allot of tables and speficifying the query on the connector, would make me create allot of connectors.
There's a way to pass the dblink on the connector configuration or to the JDBC URL?

using snowflake JDBC driver with Presto

I want to read data from snowflake datastore into my app via presto. I would like to make snowflake as one of the data sources for the Presto. can I use Snowflake provided JDBC driver with presto? thanks
Based on github repository:
https://github.com/prestodb/presto
I see presto-mysql, presto-spark and presto-redshift etc, but I don't see presto-snowflake.
And I tried to added snowflake.properties under /usr/local/Cellar/prestodb/0.263/libexec/etc/catalog on my Mac, but presto server failed to start with error:
2021-10-16T11:08:25.505+1100 ERROR main com.facebook.presto.server.PrestoServer No factory for connector snowflake
java.lang.IllegalArgumentException: No factory for connector snowflake
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:216)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:208)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:123)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:98)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:80)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:68)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:150)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:85)
It does not look like that Presto supports Snowflake yet.

When connecting to Snowflake internal stage I am seeing it connect to a different db

I am connecting through newly released Snowflake Kafka connector using standalone mode. The connector is able to connect to my snowflake account successfully but when it looks for creating internal stage, it doesn't use correct database as in config
This is the content of new connector.properties file:
name=kafkaSnowNow
connector.class=com.snowflake.kafka.connector.SnowflakeSinkConnector
tasks.max=8
topics=kafkaSnow1,kafkaSnow2
snowflake.topic2table.map= kafkaSnow1:kafka_db.kafka_schema.kafkaSnow1,kafkaSnow2:kafka_db.kafka_schema.kafkaSnow2
buffer.count.records=100
buffer.flush.time=60
buffer.size.bytes=65536
snowflake.url.name=https://xyz.ap-southeast-2.snowflakecomputing.com:443
snowflake.user.name=xyz_user
snowflake.private.key=
snowflake.database.name=kafka_db
snowflake.schema.name=kafka_schema
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=com.snowflake.kafka.connector.records.SnowflakeJsonConverter
When I start connector, the attempt to write records is being made but it looks for below stage.
desc stage
identifier(SNOWFLAKE_KAFKA_CONNECTOR_kafkaSnowNow_STAGE_kafka_db.kafka_schema.kafkaSnow1)

Hive Transactions + Remote Metastore Error

I'm running Hive 2.1.1 on EMR 5.5.0 with a remote mysql metastore DB. I need to enable transactions on hive, but when I follow the configuration here and run any query, I get the following error
FAILED: Error in acquiring locks: Error communicating with the metastore
Settings on the metastore:
hive.compactor.worker.threads = 0
hive.compactor.initiator.on = true
Settings in the hive client:
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
This only happens when I set hive.txn.manager, so my hive metastore is definitely online.
I've tried some of the old suggestions of turning hive test features on which didn't work, but I don't think this is a test feature anymore. I can't turn off concurrency as a similar post in SO suggests because I need concurrency. It seems like the problem is that either DbTxnManager isn't getting the remote metastore connection info properly from hive config or the mysqldb is missing some tables required by DbTxnManager. I have datanucleus.autoCreateTables=true.
It looks like hive wasn't properly creating the tables needed for the transaction manager. I'm not sure where it was getting its schema, but it was definitely wrong.
So we just ran the hive-txn-schema query to setup the schema manually. We'll do this at the start of any of our clusters from now on.
https://github.com/apache/hive/blob/master/metastore/scripts/upgrade/mysql/hive-txn-schema-2.1.0.mysql.sql
The error from
FAILED: Error in acquiring locks: Error communicating with the metastore
sometimes because of it without any data, you need to initialization some data in your tables. for example below.
create table t1(id int, name string)
clustered by (id) into 8 buckets
stored as orc TBLPROPERTIES ('transactional'='true');

Hive JDBC fails to connect configured schema

I am able to connect to Hive using hive-jdbc client and also using the beeline.Typical url is,
jdbc:hive2://hive_thrift_ip:10000/custom_schema;principal=hive/hive_thrift_ip#COMPANY.COM
Unfortunately the connection is always established to the 'default' schema of Hive , and it is not considering the configured schema name in the url. I use the org.apache.hive.jdbc.HiveDriver class
It always takes me to the tables of the default schema. Still I am able to access the tables from other schema using the schema name prefix to the tables, like custom_schema.test_table
Kindly let me know if I missed any property or configuration in the connection creation part which will help me in getting the session exclusively for the schema that configure in the url.
Many thanks.

Resources