I want to read data from snowflake datastore into my app via presto. I would like to make snowflake as one of the data sources for the Presto. can I use Snowflake provided JDBC driver with presto? thanks
Based on github repository:
https://github.com/prestodb/presto
I see presto-mysql, presto-spark and presto-redshift etc, but I don't see presto-snowflake.
And I tried to added snowflake.properties under /usr/local/Cellar/prestodb/0.263/libexec/etc/catalog on my Mac, but presto server failed to start with error:
2021-10-16T11:08:25.505+1100 ERROR main com.facebook.presto.server.PrestoServer No factory for connector snowflake
java.lang.IllegalArgumentException: No factory for connector snowflake
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:216)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:208)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:123)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:98)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:80)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:68)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:150)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:85)
It does not look like that Presto supports Snowflake yet.
Related
How to write data into DBF using Hitachi Pentaho(ETL TOOL),i can't find the right way.
There's a JDBC driver to read/write data to DBF files. Adding that driver to the Pentaho Data Integration installation, you could use the Table Output step with a Generic database Connection type/Dialect configured to use that driver.
Read Pentaho Data Integration documentation to learn the directory where you need to add the JDBC driver. Then you'll need to read the driver documentation to learn how to use it to manipulate DBF files, I haven't work with DBF, but to use the Generic database connection type to connect to SQLite databases (instead of the SQLite specific Connection type/Dialect) I configure the Custom connection URL with this value: jdbc:sqlite:/path_to_my_db/MyDBfile.sqlite and the Custom driver class name with the value org.sqlite.JDBC. I don't need user/password to connect with this database. DBF JDBC Driver will use something similar, I suppose.
I had been using the Databricks JDBC driver version 2.6.22 and tried to upgrade to 2.6.27. However, after upgrading I get messages saying my JDBC URLs are invalid when trying to connect. These JDBC URLs work fine with the old version of the driver and I pull them directly from the Databricks SQL endpoint info, so I expect something else is going on.
Example JDBC URL:
jdbc:spark://[workspace domain]:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/[identifier]
I noticed between versions the name went from SimbaSparkJDBC42-2.6.22.1040 to DatabricksJDBC42-2.6.27.1048 and the JAR class name went from com.simba.spark.jdbc.Driver to com.databricks.client.jdbc.Driver. Does dropping Simba mean there was a more major change? Do I need to correct my JDBC URLs somehow?
I'm downloading my driver from here
I'm using DBeaver as my SQL client if that makes a difference.
JDBC URLs for the new databricks driver start with jdbc:databricks: instead of jdbc:spark:. As of now, JDBC URL details in the UI still use the old format, just replace spark with databricks and they should work. Mentioned here
Databricks has a different URL format, check the documentation here
Basically in the url replace spark to databricks and add PWD parameter.
jdbc:databricks://[workspace domain]:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/[identifier];PWD=[your_access_token]
PWD is the personal access token. Instructions to get access token.
I am unable to make a Kafka Connect sink work for a table that is not in the public schema.
I am using Kafka Connect to send records to a Redshift database via a sink operation using JdbcSinkConnector.
I have created my destination table in Redshift, but it is not in the public schema. (my_schema.test_table. Note: auto.create & auto.evolve are off in the connector configuration)
When I attempt to specify the table's location in the connector config, like so...
"table.name.format": "my_schema.test_table",
...the sink connector's task encounters this error when it attempts to get itself going:
"Table my_schema.test_table is missing and auto-creation is disabled"
from
Caused by: org.apache.kafka.connect.errors.ConnectException: Table my_schema.test_table is missing and auto-creation is disabled
at io.confluent.connect.jdbc.sink.DbStructure.create(DbStructure.java:86)
at io.confluent.connect.jdbc.sink.DbStructure.createOrAmendIfNecessary(DbStructure.java:63)
at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:78)
...
I have tried the following formats for supplying table name:
my_schema.test_table
dev.my_schema.test_table
test_table <-- in this case I get past the existence check that stops the others, but then run into this error every time Kafka Connect attempts to write a row:
"org.apache.kafka.connect.errors.RetriableException: java.sql.SQLException: java.sql.SQLException: Amazon Invalid operation: relation "test_table" does not exist;"
Likely because test_table is not in the public schema. : (
And it seems like the code is attempting to parse this table name correctly, but unfortunately it doesn't log its results.
This is my connection string: "connection.url": "jdbc:redshift://xxx.xxx.xxx.xxx:5439/dev"
I have mucked around with attempting so specify currentSchema=my_schema in the connection string... both for the redshift jdbc driver as well as postgresql. No luck.
I'm using Kafka Connect version 1.1.0
Redshift JDBC JAR: RedshiftJDBC42-1.2.16.1027.jar
I am able to get data flowing by putting the table in the public schema and specifying table name with no schema: "table.name.format": "test_table".
Unfortunately, that's not where we need the data to be.
Any help much appreciated.
I noticed that the source code seemed to be trying to do the right thing… and then realized that the version of the JDBC sink connector we were using did not have those modifications, which are somewhat recent. I moved from version 4.1.0 of the JDBC sink connector jar to version 5.0.0 and voila data is flowing into a table in the schema I specified. 🙃
Has anyone been able to use the new JDBC drivers for BigQuery in JetBrains DataGrip?
I've followed the these steps
Created a driver in DataGrip with all the jar files
Created a database with a connection string with a service account file
The connection test says successful, but once I try to query something I receive an error:
java.lang.ClassNotFoundException: com.google.api.client.json.JsonFactory
I've added the following files from the Simba ZIP into the DataGrip driver:
GoogleBigQueryJDBC42.jar
jackson-core-2.1.3.jar
google-api-client-1.22.0.jar
google-api-services-bigquery-v2-rev320-1.22.0.jar
google-http-client-1.22.0.jar
google-http-client-jackson2-1.22.0.jar
google-oauth-client-1.22.0.jar
So I'm not sure what to do next. I tried changing their order in DataGrip but it didn't seem to make a different.
My connection string also looks OK I think:
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=...;OAuthType=0;OAuthPvtKeyPath=...;OAuthServiceAcctEmail=...;
You may get this error when the driver JAR files are not referenced correctly in the tool. I have listed out the steps I used to connect to BigQuery via DataGrip.
Add a new driver by adding all the JAR files from the zip. The correct class name should be selected from the "Class" drop down in this step.
Add a new data source by selecting the newly created BigQuery JDBC driver. Provide the correct connection URL in this step.
If the test connection succeeds, create a new query for the same datasource.
Make sure your query uses the correct format "dataset.tablename" and is running on the data source you just tested.
For me replacing P12 with Json worked. But, cannot use DataGrip or in general JDBC to access BigQuery because of various query/incompatibility issues.
This video can be referred : https://www.youtube.com/watch?v=r9l2c_aQPoQ&ab_channel=JetBrainsTV
to use the new simba jdbc drivers for BigQuery in JetBrains DataGrip. It covers all steps one by one for working setup.
Here is the blog which refers this video: https://blog.jetbrains.com/datagrip/2018/07/10/using-bigquery-from-intellij-based-ide/
Drivers can be downloaded at : https://cloud.google.com/bigquery/providers/simba-drivers
Note: Make sure to go through comments on blog to authenticate without creating service account on gcp.
Hope this is helpful!
I have an amazon redshift db that supports connecting a postgresql client with jdbc
google apps scripts support connecting to a db with jdbc, but only with the mysql, ms sql, and oracle protocol, but not postgresql. If I try, not surprisingly I get error:
'Connection URL uses an unsupported JDBC protocol.'
Looking at some google forums, this has been an issue for several years with no response from google.
Is there any workaround?
thanks
I use Kloudio - a google sheets extension to get this done. I can run and schedule my redshift queries in Kloudio
If you are using amazon redshift then you can connect it through amazon redshift client
Here are the steps:
Write SQL query in the redshift client and save it as a report
use its api key generated and report no. in the embedded link.
use Importdata function to google spreadsheet to import the data automatically. it will refresh by default every one hour.
Thanks