When connecting to Snowflake internal stage I am seeing it connect to a different db - apache-kafka-connect

I am connecting through newly released Snowflake Kafka connector using standalone mode. The connector is able to connect to my snowflake account successfully but when it looks for creating internal stage, it doesn't use correct database as in config
This is the content of new connector.properties file:
name=kafkaSnowNow
connector.class=com.snowflake.kafka.connector.SnowflakeSinkConnector
tasks.max=8
topics=kafkaSnow1,kafkaSnow2
snowflake.topic2table.map= kafkaSnow1:kafka_db.kafka_schema.kafkaSnow1,kafkaSnow2:kafka_db.kafka_schema.kafkaSnow2
buffer.count.records=100
buffer.flush.time=60
buffer.size.bytes=65536
snowflake.url.name=https://xyz.ap-southeast-2.snowflakecomputing.com:443
snowflake.user.name=xyz_user
snowflake.private.key=
snowflake.database.name=kafka_db
snowflake.schema.name=kafka_schema
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=com.snowflake.kafka.connector.records.SnowflakeJsonConverter
When I start connector, the attempt to write records is being made but it looks for below stage.
desc stage
identifier(SNOWFLAKE_KAFKA_CONNECTOR_kafkaSnowNow_STAGE_kafka_db.kafka_schema.kafkaSnow1)

Related

Querying data from external hive metastore

I am trying to configure an external hive metastore for my azure synapse spark pool. The rational behind using external meta store is to share table definitions across databricks and synapse workspaces.
However, I am wondering if its possible to access backend data via the metastore. For example, can clients like PowerBI,tableau connect to external metastore and retrieve not just the metadata, but also the business data in the underlying tables?
Also what additional value does an external metastore provides ?
You can configure the external Hive Metadata in Synapse by creating a Linked Service for that external source and then query it in Synapse Serverless Pool.
Follow the below steps to connect with External Hive Metastore.
In Synapse Portal, go to the Manage symbol on the left side of the of the page. Click on it and then click on Linked Services. To create the new Linked Service, click on + New.
Search for Azure SQL Database or Azure Database for MySQL for the external Hive Metastore. Synapse supports these two Hive external metastore. Select and Continue.
Fill in all the required details like Name, Subscription, Server name, Database name, Username and Password and Test the connection.
You can test the connection with the Hive metadata using below code.
%%spark
import java.sql.DriverManager
/** this JDBC url could be copied from Azure portal > Azure SQL database > Connection strings > JDBC **/
val url = s"jdbc:sqlserver://<servername>.database.windows.net:1433;database=<databasename>;user=utkarsh;password=<password>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
try {
val connection = DriverManager.getConnection(url)
val result = connection.createStatement().executeQuery("select * from dbo.persons")
result.next();
println(s"Successful to test connection. Hive Metastore version is ${result.getString(1)}")
} catch {
case ex: Throwable => println(s"Failed to establish connection:\n $ex")
}
Check the same in below snippet for your reference.
can clients like PowerBI,tableau connect to external metastore and retrieve not just the metadata, but also the business data in the underlying tables?
Yes, Power BI allows us to connect with Azure SQL Database using in-built connector.
In Power BI Desktop, go to Get Data, click on Azure and select Azure SQL Database. Click connect.
In the next step, go give the Server name in this format <utkarsh.database.windows.net>, database name, Username and Password and you can now access data in Power BI. Refer below image.

Informatica Workflow Cannot create proper Relational connection object to connect to SQL Server

On my Infa server PC, in Informatica Administration I created a repository service giving Oracle Database information. Once in Informatica I connect to this repository and there I want to create a mapping importing a table from a remote sql server pc (on my home network domain) and then create a workflow to put the data into an oracle target table.
Using ODBC admin console I created and tested the connection and I am also able to telnet the linked sql server and port.
Within informatica I created a relational connection for sql server and when I run the workflow I get the error reason (14007) failed to create and inituiate OLE DB instance and datatabase driven error, failed to connect to database using user I use in SSMS for windows authentication and connection string(). I would like to know, first of of all, if I am doing something wrong, willing to connect me to a repository with oracle database information and then use a sql server table on remote pc. Do I have to create another repository for Sql server and there use sql server tables or I can mix them? secondly I would like to know how to create a relational connection object in informatica for my linked sql server so that it will be the same of relatonal connection created with ODBC admin consolle. Last but not least I would like to understand why gives an error out saying I left connection string empty, when I cannot see a place where I can put it by creating the relational connection object
I might not be able to solve the problem completely, but here's few remarks that might be helpful:
PowerCenter Repository database is where PowerCenter will store all the metadata about the processes you create. It may be Oracle - that's perfectly fine. And as it is not releated to your data sources or targets, you do not need to create another one for different sources/targets. One is enough for all ot them.
Using PowerCenter Workflow Manager create the appropriate connections to all the systems you need. Here you create the connections that indicate ODBC/other connections that will be used by Integration Service to actually connect to your data sources and targets, hence
Make sure the ODBC / other data sources are specified on Intergration Service. It is the IS that will run the process, connect to systems specified in the process with the defined connections.
When you build the mappings, you create them in a client app (Mapping Designer) and you can connect to DB engines to create source and target definitions. Mark, that in such case you use the connection (eg. ODBC data source) defined on the client. Once you will try to actually run the workflow with the given mapping, it is executed on IS (mentioned above) where appropriate connections need to be defined - and that's completely separate.
When editing a session in a workflow, for each source and target you need to pick a connection defined in Informatica Repository, created as described in point 2 above (or use a variable to indicate one - but that's another story)
So the error you mention seems to be related to connection created in Workflow Manager - it probably does not specify the connection string that should refer the data source defined on IS.

Kafka Connect JDBC dblink

I'm starting to study Apache Kafka and Kafka Connect.
I'm trying to get data from a remote Oracle Database that my user only have read privilegies and can't list tables (i don't have permission to change that). To every query, i have to pass a dblink, but in the JDBC Connector, i didn't find a option to pass a dblink.
I can do the query if i pass a specific query on the connector configuration, but i want to fetch allot of tables and speficifying the query on the connector, would make me create allot of connectors.
There's a way to pass the dblink on the connector configuration or to the JDBC URL?

using snowflake JDBC driver with Presto

I want to read data from snowflake datastore into my app via presto. I would like to make snowflake as one of the data sources for the Presto. can I use Snowflake provided JDBC driver with presto? thanks
Based on github repository:
https://github.com/prestodb/presto
I see presto-mysql, presto-spark and presto-redshift etc, but I don't see presto-snowflake.
And I tried to added snowflake.properties under /usr/local/Cellar/prestodb/0.263/libexec/etc/catalog on my Mac, but presto server failed to start with error:
2021-10-16T11:08:25.505+1100 ERROR main com.facebook.presto.server.PrestoServer No factory for connector snowflake
java.lang.IllegalArgumentException: No factory for connector snowflake
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:216)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:208)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:123)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:98)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:80)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:68)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:150)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:85)
It does not look like that Presto supports Snowflake yet.

Kafka Connect sink to Redshift table not in public schema

I am unable to make a Kafka Connect sink work for a table that is not in the public schema.
I am using Kafka Connect to send records to a Redshift database via a sink operation using JdbcSinkConnector.
I have created my destination table in Redshift, but it is not in the public schema. (my_schema.test_table. Note: auto.create & auto.evolve are off in the connector configuration)
When I attempt to specify the table's location in the connector config, like so...
"table.name.format": "my_schema.test_table",
...the sink connector's task encounters this error when it attempts to get itself going:
"Table my_schema.test_table is missing and auto-creation is disabled"
from
Caused by: org.apache.kafka.connect.errors.ConnectException: Table my_schema.test_table is missing and auto-creation is disabled
at io.confluent.connect.jdbc.sink.DbStructure.create(DbStructure.java:86)
at io.confluent.connect.jdbc.sink.DbStructure.createOrAmendIfNecessary(DbStructure.java:63)
at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:78)
...
I have tried the following formats for supplying table name:
my_schema.test_table
dev.my_schema.test_table
test_table <-- in this case I get past the existence check that stops the others, but then run into this error every time Kafka Connect attempts to write a row:
"org.apache.kafka.connect.errors.RetriableException: java.sql.SQLException: java.sql.SQLException: Amazon Invalid operation: relation "test_table" does not exist;"
Likely because test_table is not in the public schema. : (
And it seems like the code is attempting to parse this table name correctly, but unfortunately it doesn't log its results.
This is my connection string: "connection.url": "jdbc:redshift://xxx.xxx.xxx.xxx:5439/dev"
I have mucked around with attempting so specify currentSchema=my_schema in the connection string... both for the redshift jdbc driver as well as postgresql. No luck.
I'm using Kafka Connect version 1.1.0
Redshift JDBC JAR: RedshiftJDBC42-1.2.16.1027.jar
I am able to get data flowing by putting the table in the public schema and specifying table name with no schema: "table.name.format": "test_table".
Unfortunately, that's not where we need the data to be.
Any help much appreciated.
I noticed that the source code seemed to be trying to do the right thing… and then realized that the version of the JDBC sink connector we were using did not have those modifications, which are somewhat recent. I moved from version 4.1.0 of the JDBC sink connector jar to version 5.0.0 and voila data is flowing into a table in the schema I specified. 🙃

Resources