Querying data from external hive metastore - azure-databricks

I am trying to configure an external hive metastore for my azure synapse spark pool. The rational behind using external meta store is to share table definitions across databricks and synapse workspaces.
However, I am wondering if its possible to access backend data via the metastore. For example, can clients like PowerBI,tableau connect to external metastore and retrieve not just the metadata, but also the business data in the underlying tables?
Also what additional value does an external metastore provides ?

You can configure the external Hive Metadata in Synapse by creating a Linked Service for that external source and then query it in Synapse Serverless Pool.
Follow the below steps to connect with External Hive Metastore.
In Synapse Portal, go to the Manage symbol on the left side of the of the page. Click on it and then click on Linked Services. To create the new Linked Service, click on + New.
Search for Azure SQL Database or Azure Database for MySQL for the external Hive Metastore. Synapse supports these two Hive external metastore. Select and Continue.
Fill in all the required details like Name, Subscription, Server name, Database name, Username and Password and Test the connection.
You can test the connection with the Hive metadata using below code.
%%spark
import java.sql.DriverManager
/** this JDBC url could be copied from Azure portal > Azure SQL database > Connection strings > JDBC **/
val url = s"jdbc:sqlserver://<servername>.database.windows.net:1433;database=<databasename>;user=utkarsh;password=<password>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
try {
val connection = DriverManager.getConnection(url)
val result = connection.createStatement().executeQuery("select * from dbo.persons")
result.next();
println(s"Successful to test connection. Hive Metastore version is ${result.getString(1)}")
} catch {
case ex: Throwable => println(s"Failed to establish connection:\n $ex")
}
Check the same in below snippet for your reference.
can clients like PowerBI,tableau connect to external metastore and retrieve not just the metadata, but also the business data in the underlying tables?
Yes, Power BI allows us to connect with Azure SQL Database using in-built connector.
In Power BI Desktop, go to Get Data, click on Azure and select Azure SQL Database. Click connect.
In the next step, go give the Server name in this format <utkarsh.database.windows.net>, database name, Username and Password and you can now access data in Power BI. Refer below image.

Related

Databricks SQL workspace configuration with External MetaStore

I have Azure Databricks setup with External hive metaStore (with Azure SQL) and the database connection URL is setup in Databricks Cluster's advanced Settings. In this way I am able to see/access the database tables from delta lake (which is on azure storage account adls) in Databricks's Data section.
Now, I want my users to access these tables through Databricks's 'SQL workspace'. I have configured the 'Data access' using service principle in 'SQL warehouse' section.
Per databricks SQL documentation, I am supposed to see the delta lake tables which I can see through 'Data Science and Engineering' section. But I cant see schema or tables in meta store.
Problem, I am not able to see the tables through the 'SQL workspace' > Data. And I am puzzled how it will know where is my External metaStore and what are schema definition?
In SQL workspace should be setup to indicate the hive metastore connection. But I am not sure as databricks documentation is not very clear on this point.
Please suggest
Below are 'data access' details for service principal:
spark.hadoop.fs.azure.account.auth.type.<adlsContainer>.dfs.core.windows.net OAuth
spark.hadoop.fs.azure.account.oauth.provider.type.<adlsContainer>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id.<adlsContainer>.dfs.core.windows.net <CLIENT_ID>
spark.hadoop.fs.azure.account.oauth2.client.secret.<adlsContainer>.dfs.core.windows.net <CLIENT_SECRET>
spark.hadoop.fs.azure.account.oauth2.client.endpoint.<adlsContainer>.dfs.core.windows.net https://login.microsoftonline.com/<TENANT_ID>/oauth2/token
As mentioned in the data access documentation you need to configure the same Spark configuration properties for external hive metastore as for "normal" Spark clusters - see documentation on this topic.

Informatica Workflow Cannot create proper Relational connection object to connect to SQL Server

On my Infa server PC, in Informatica Administration I created a repository service giving Oracle Database information. Once in Informatica I connect to this repository and there I want to create a mapping importing a table from a remote sql server pc (on my home network domain) and then create a workflow to put the data into an oracle target table.
Using ODBC admin console I created and tested the connection and I am also able to telnet the linked sql server and port.
Within informatica I created a relational connection for sql server and when I run the workflow I get the error reason (14007) failed to create and inituiate OLE DB instance and datatabase driven error, failed to connect to database using user I use in SSMS for windows authentication and connection string(). I would like to know, first of of all, if I am doing something wrong, willing to connect me to a repository with oracle database information and then use a sql server table on remote pc. Do I have to create another repository for Sql server and there use sql server tables or I can mix them? secondly I would like to know how to create a relational connection object in informatica for my linked sql server so that it will be the same of relatonal connection created with ODBC admin consolle. Last but not least I would like to understand why gives an error out saying I left connection string empty, when I cannot see a place where I can put it by creating the relational connection object
I might not be able to solve the problem completely, but here's few remarks that might be helpful:
PowerCenter Repository database is where PowerCenter will store all the metadata about the processes you create. It may be Oracle - that's perfectly fine. And as it is not releated to your data sources or targets, you do not need to create another one for different sources/targets. One is enough for all ot them.
Using PowerCenter Workflow Manager create the appropriate connections to all the systems you need. Here you create the connections that indicate ODBC/other connections that will be used by Integration Service to actually connect to your data sources and targets, hence
Make sure the ODBC / other data sources are specified on Intergration Service. It is the IS that will run the process, connect to systems specified in the process with the defined connections.
When you build the mappings, you create them in a client app (Mapping Designer) and you can connect to DB engines to create source and target definitions. Mark, that in such case you use the connection (eg. ODBC data source) defined on the client. Once you will try to actually run the workflow with the given mapping, it is executed on IS (mentioned above) where appropriate connections need to be defined - and that's completely separate.
When editing a session in a workflow, for each source and target you need to pick a connection defined in Informatica Repository, created as described in point 2 above (or use a variable to indicate one - but that's another story)
So the error you mention seems to be related to connection created in Workflow Manager - it probably does not specify the connection string that should refer the data source defined on IS.

How to connect to data source in HUE?

I have been given access to HUE Hive platform by my client. I have also raised all the access requests for the database and all of them are approved also. But I can't see any databases or tables in the Hive interface. Is there any procedure to connect to a database or it should reflect on the Hive interface automatically?

Create external data source in Azure Synapse Analytics (Azure SQL Data warehouse) to Oracle

I am trying to create external data source in Azure Synapse Analytics (Azure SQL Data warehouse) to Oracle external database. I am using the following code in SSMS to do that:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'myPassword';
CREATE DATABASE SCOPED CREDENTIAL MyCred WITH IDENTITY = 'myUserName', Secret = 'Mypassword';
CREATE EXTERNAL DATA SOURCE MyEXTSource
WITH (
LOCATION = 'oracle://<myIPAddress>:1521',
CREDENTIAL = MyCred
)
I am getting the following error:
CREATE EXTERNAL DATA SOURCE statement failed because the 'TYPE' option is not specified. Specify a value for the 'TYPE' option and try again.
I understand from the below that TYPE is not a required option for Oracle databases.
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest
Not sure how what the problem is here, is this feature still not supported in Azure Synapse Analytics (Azure DW) when it is already available in MS SQL Server 2019? Any ideas are welcome.
Polybase has different versions across the different products with different capabilities. Most of these are described here:
The ability to connect to Oracle is only present in the SQL Server versions, currently 2019. The documentation is quite clear that is only applies to SQL Server and not to Azure Synapse Analytics (formerly Azure SQL Data Warehouse):
https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configure-oracle?view=sql-server-ver15
In summary, Azure Synapse Analytics and its version of Polybase does not currently support to access external Oracle tables at this time.

Write SQL Statement against Oracle Cloud Database Schema services using Oracle SQL Developer

I have a subscription to Oracle Cloud Database Schema Services ( Schema not full Database ) , I am trying to access the database instance using Oracle SQL Developer , I followed the below to make connection in SQL developer :
1 - From "Database schema Service connection"
2 - New Cloud Connection and enter my username/password/instance URL
3- The database connected and I can list all tables inside the schema
But When I tried to open SQL worksheet and write some select statements , I can't achieve this.
I know that this task can be done through APEX console , but is there anyway to do it by using SQL Developer?
This is not currently supported.
The SCHEMA service is only reachable via HTTPS. We have built several REST Services that allow SQL Developer to do what you see currently today, which includes browsing the schema and uploading data via the CART.
We have just built a SQL 'REST Service' feature into Oracle REST Data Services which allow us..or any authenticated user, to run an ad-hoc SQL or PL/SQL block statement via a POST. This would allow us to add what you're looking for - the ability to do a SQL Worksheet for your service.
However, instead of building this into the SQL Developer desktop, we're looking at releasing 'SQL Developer Web' which will be available in your Oracle Cloud Database Services' consoles.
I can't tell you if/when that will be available for you in your Schema Service, but it's on the road map.
In the mean time, the APEX UI and its SQL Workshop is the way to go.

Resources