Setup Athena JDBC connector in Glue 4.0 - jdbc

I have a data source in Glue, which is configured with partition projection. I can query the data in Athena, however when I load this data source in a Glue 4.0 job, the Spark dataframe returns empty. It seems that partition projection is an Athena-only feature.
To workaround the issue, I would like to setup a JDBC connector for Athena in my Glue job, so I can access the data via Athena, instead of directly querying the Glue catalog. AWS provides instructions and a jar file here: https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html.
So I'm adding the latest jar file (at the time of writing, AthenaJDBC42-2.0.35.1000.jar) into Spark using the --extra-jars argument, but I'm getting this error:
java.lang.SecurityException: class "org.apache.logging.log4j.core.lookup.JndiLookup"'s signer information does not match signer information of other classes in the same package
Does anyone know how I can address this error?

Related

ADF Copy Data remove index in oracle Sink

I am trying to insert data from a SQL table to an Oracle table using activity Copy Data in Data Factory, on the first try it runs fine but on the second try it throws an error that an index on the target table (Oracle) has been corrupted.
Searching in different forums I found that apparently the Copy Data activity sends the insert statement in the following way: INSERT /*+ SYS_DL_CURSOR */ INTO
any idea how to fix this???
Thank you very much for the help
As per the error index is not corrupted. It was used twice. May be the operation was not planned according to the schedule and worked parallelly.
The Copy activity is executed on an integration runtime. You can use different types of integration runtimes for different data copy scenarios:
When you're copying data between two data stores that are publicly accessible through the internet from any IP, you can use the Azure integration runtime for the copy activity. This integration runtime is secure, reliable, scalable, and globally available.
When you're copying data to and from data stores that are located on-premises or in a network with access control (for example, an Azure virtual network), you need to set up a self-hosted integration runtime.
Use either of the two operations mentioned above, the error will be resolved.
Check link for support document: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview

using snowflake JDBC driver with Presto

I want to read data from snowflake datastore into my app via presto. I would like to make snowflake as one of the data sources for the Presto. can I use Snowflake provided JDBC driver with presto? thanks
Based on github repository:
https://github.com/prestodb/presto
I see presto-mysql, presto-spark and presto-redshift etc, but I don't see presto-snowflake.
And I tried to added snowflake.properties under /usr/local/Cellar/prestodb/0.263/libexec/etc/catalog on my Mac, but presto server failed to start with error:
2021-10-16T11:08:25.505+1100 ERROR main com.facebook.presto.server.PrestoServer No factory for connector snowflake
java.lang.IllegalArgumentException: No factory for connector snowflake
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:216)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:208)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:123)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:98)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:80)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:68)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:150)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:85)
It does not look like that Presto supports Snowflake yet.

Write Data to SQL DW from Apache Spark in Azure Synapse

When I write data to SQL DW in Azure from Databricks I use the following code:
example1.write.format("com.databricks.spark.sqldw").option("url", sqlDwUrlSmall).option("dbtable", "SampleTable12").option("forward_spark_azure_storage_credentials","True") .option("tempdir", tempDir).mode("overwrite").save()
This won't work with with Notebook in Synapse Notebook. I get the error:
Py4JJavaError: An error occurred while calling o174.save.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.sqldw. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:656) Caused by: java.lang.ClassNotFoundException: com.databricks.spark.sqldw.DefaultSource
Basically, I need to know the equivalent of com.databricks.spark.sqldw for Apache Spark in Azure Synapse.
Thanks
If you are writing to a dedicated SQL pool within the same Synapse workspace as your notebook, then it's as simple as calling the synapsesql method. A simple parameterised example in Scala, using the parameter cell feature of Synapse notebooks.
// Read the table
val df = spark.read.synapsesql(s"${pDatabaseName}.${pSchemaName}.${pTableName}")
// do some processing ...
// Write it back with _processed suffixed to the table name
df.write.synapsesql(s"${pDatabaseName}.${pSchemaName}.${pTableName}_processed", Constants.INTERNAL)
If you are trying to write from your notebook to a different dedicated SQL pool, or old Azure SQL Data Warehouse then it's a bit different but there some great examples here.
UPDATE: The items in curly brackets with the dollar-sign (eg ${pDatabaseName}) are parameters. You can designate a parameter cell in your notebook so parameters can be passed in externally eg from Azure Data Factory (ADF) or Synapse Pipelines using the Execute Notebook activity, and reused in the notebook, as per my example above. Find out more about Synapse Notebook parameters here.

Using BigQuery in DataGrip with JDBC

Has anyone been able to use the new JDBC drivers for BigQuery in JetBrains DataGrip?
I've followed the these steps
Created a driver in DataGrip with all the jar files
Created a database with a connection string with a service account file
The connection test says successful, but once I try to query something I receive an error:
java.lang.ClassNotFoundException: com.google.api.client.json.JsonFactory
I've added the following files from the Simba ZIP into the DataGrip driver:
GoogleBigQueryJDBC42.jar
jackson-core-2.1.3.jar
google-api-client-1.22.0.jar
google-api-services-bigquery-v2-rev320-1.22.0.jar
google-http-client-1.22.0.jar
google-http-client-jackson2-1.22.0.jar
google-oauth-client-1.22.0.jar
So I'm not sure what to do next. I tried changing their order in DataGrip but it didn't seem to make a different.
My connection string also looks OK I think:
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=...;OAuthType=0;OAuthPvtKeyPath=...;OAuthServiceAcctEmail=...;
You may get this error when the driver JAR files are not referenced correctly in the tool. I have listed out the steps I used to connect to BigQuery via DataGrip.
Add a new driver by adding all the JAR files from the zip. The correct class name should be selected from the "Class" drop down in this step.
Add a new data source by selecting the newly created BigQuery JDBC driver. Provide the correct connection URL in this step.
If the test connection succeeds, create a new query for the same datasource.
Make sure your query uses the correct format "dataset.tablename" and is running on the data source you just tested.
For me replacing P12 with Json worked. But, cannot use DataGrip or in general JDBC to access BigQuery because of various query/incompatibility issues.
This video can be referred : https://www.youtube.com/watch?v=r9l2c_aQPoQ&ab_channel=JetBrainsTV
to use the new simba jdbc drivers for BigQuery in JetBrains DataGrip. It covers all steps one by one for working setup.
Here is the blog which refers this video: https://blog.jetbrains.com/datagrip/2018/07/10/using-bigquery-from-intellij-based-ide/
Drivers can be downloaded at : https://cloud.google.com/bigquery/providers/simba-drivers
Note: Make sure to go through comments on blog to authenticate without creating service account on gcp.
Hope this is helpful!

How can I read a .dbf file with wso2 Data Services Server?

the documentation of WSO2 Data services server say you can read any database with a JDBC driver, and I found that there are some JDBC libraries for .DBF files.
DSS documentation: http://wso2.com/products/data-services-server/
JDBC for DBF files: http://www.csv-jdbc.com/stels_dbf_jdbc.htm
Someone has already done or did something similar?
I would appreciate your help
Yes, WSO2 DSS supports any RDBMS datasource type, provided a compatible driver is copied to the product. You can follow the below steps.
You can add the JDBC driver to the $DSS_HOME/repository/component/lib folder and start the server.
Create a dataservice by following the this doc
Add an RDBMS type datasource, as DBF is not listed in the pre-defined datasource types, select 'Generic' and give the driver class name, connection url, username and password according to the DBF specified
Add query and then an operation. Our official documentation has all the necessary steps.
Please note that, we have not tested with this driver, therefore, to use in a production environment, a comprehensive testing cycle will be needed (including load tests, long running tests .. etc).

Resources