Connect to Synapse from DataBricks using Service Principal - azure-databricks

I am trying to connect from Databricks to Synapse using service principal.
I have configured the service principal in cluster configuration
fs.azure.account.auth.type.<datalake>.dfs.core.windows.net OAuth
fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.id <Service Principal ID/Application ID>
fs.azure.account.oauth2.client.secret <Client secret key/Service Principal Password>
fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<tenant-id>/oauth2/token
fs.azure.createRemoteFileSystemDuringInitialization true
Whilst I can successfully connect to DataLake and work, i could not write to synapse, when I use the below command...
DummyDF.write.format("com.databricks.spark.sqldw")\
.mode("append")\
.option("url", jdbcUrl)\
.option("useAzureMSI", "true")\
.option("tempDir",tempdir)\
.option("dbTable", "DummyTable").save()
I am getting the below error...
Py4JJavaError: An error occurred while calling o831.save.
: com.databricks.spark.sqldw.SqlDWSideException: SQL DW failed to execute the JDBC query produced by the connector.
Underlying SQLException(s):
com.microsoft.sqlserver.jdbc.SQLServerException: External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://datalakename.dfs.core.windows.net/temp/2020-06-24/14-21-57-819/88228292-9f00-4da0-b778-d3421ea4d2ec?upn=false&timeout=90' [ErrorCode = 105019] [SQLState = S0001]
However i could write to Synapse using the below command...
DummyDF.write.mode("append").jdbc(jdbcUrl,"DummyTable")
I am not sure what is missing.

The second option is not using Polybase, goes just through JDBC and is way slower.
I think your error is not connected to Databricks and SQL DW library, rather connectivity between Synapse and the Storage.
Could you check:
Is "Allow access to Azure services" set to ON on the firewall pane of the Azure Synapse server through Azure portal (overall remember if your Azure Blob Storage is restricted to select virtual networks, Azure Synapse requires Managed Service Identity instead of Access Keys)
verify if you have you correctly specified tempDir, for blob storage "wasbs://" + blobContainer + "#" + blobStorage +"/tempDirs" or*"abfss://..."* for ADLS Gen 2
Can you create external tables to that storage using managed identity directly from Synapse?
Here is one article that covers solving the same error code as yours 105019:
https://techcommunity.microsoft.com/t5/azure-synapse-analytics/msg-10519-when-attempting-to-access-external-table-via-polybase/ba-p/690641

Related

ADF Oracle Service Cloud connector - correct endpoint

In Azure Data Factory, I'm trying to create a linked service by using the Oracle Service Cloud (Preview) connector to connect to my organisation's Oracle HCM instance. I'm generally following this guidance, using the copy data tool, which should be straightforward: https://learn.microsoft.com/en-us/azure/data-factory/connector-oracle-service-cloud?tabs=data-factory
I have tried the following host names...
https://xxxx.xx.xxx.oraclecloud.com/
https://xxxx.xx.xxx.oraclecloud.com/hcmRestApi
https://xxxx.xx.xxx.oraclecloud.com/hcmRestApi/resources/11.13.18.05/grades
https://xxxx.xx.xxx.oraclecloud.com:443/hcmRestApi/resources/11.13.18.05/grades
... but all of the generate the following error...
Error code 9603
ERROR [HY000] [Microsoft][OSvC] (20) Error while attempting to use REST API: Couldn't resolve host name
ERROR [HY000] [Microsoft][OSvC] (20) Error while attempting to use REST API: Couldn't resolve host name
Activity ID: 590c5007-ec6f-4729-9eb2-d05ef779dc0e.
I'm using a username and password that has been tested on Oracle, and have tried various combinations of using encrypted endpoints, host verification and peer verification as true or false.
I believe I'm using the correct endpoints, based on Oracle's guidance:
Oracle REST endpoints
https://docs.oracle.com/en/cloud/saas/human-resources/22c/farws/rest-endpoints.html
I'm not sure what else to try to get this connector to work? Has anybody else got it to work, or perhaps noticed something I'm doing wrong with the host name?

Databricks and Azure Blob Storage

I am running this on databricks notebook
dbutils.fs.ls("/mount/valuable_folder")
I am getting this error
Caused by: StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
I tried using dbutils.fs.refreshMounts()
to get any updates in azure blob storage, but still getting above error.
Such errors most often arise when credentials that you have used for mounting are expired - for example, SAS is expired, storage key is rotated, or service principal secret is expired. You need to unmount the storage using dbutils.fs.unmount and mount it again with dbutils.fs.mount. dbutils.fs.refreshMounts() just refreshes a list of mounts in the backend, it doesn't recheck for credentials.

Error in AWS DMS Endpoint using Oracle database as a source

I have tried to configure an AWS DMS Endpoint, but when I try to do the connection test it shows me the following error:
Test Endpoint failed: Application-Status: 1020912, Application-Message: Log Miner is not supported in Oracle PDB environment Endpoint initialization failed.
I have given all the grants that are required in Oracle DB following the documentation:
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html
But the error still persists. What could be the solution?
Add this line to extra connection attributes (under endpoint settings)
useLogMinerReader=N;useBfile=Y;
make sure to grant necessary permission to your container user on this link https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html#CHAP_Source.Oracle.Self-Managed.BinaryReaderPrivileges

Data Factory Blob Storage Linked Service with Managed Identity : The remote server returned an error: (403)

I have created a linked service connection to a storage account using a managed identity and it successfully validates but when I try to use the linked service on a dataset I get an error:
A storage operation failed with the following error 'The remote server returned an error: (403)
The error is displayed when I attempt to browse the blob to set the file path.
The managed identity for the data factory has been assigned the Contributor role.
The blob container is set to private access.
Anyone know how I make this work?
Turns out I was using the wrong role. Just need to add an assignment for
Storage Blob Data Contributor.

Need to enable proxy settings in SQL for creating External table using Azure Polybase for Production environment

The external table creation using polybase was successful in local environment but it was unsuccessful in production where we use proxy servers for internet access.
When tried to create external table in production environment I got following error
"EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_IsDirExist: Error [com.microsoft.azure.storage.StorageException: An unknown failure occurred : Connection timed out: connect] occurred while accessing external file.'"
I tried by enabling proxy settings using IE.Is there any way to enable proxy for SQL for creating
external table which can establish connection to Azure blob.
Following are our requirement:
- We need to run Polybase queries to create parquet files on Azure blob
We dont have direct internet connection on the DB server.
We need to use proxy to connect to internat externally.
We are able to create container through .Net Azure SDK after enbaling proxy inside the app.config file.
But we are not able to run the extenral table creation query from the SQL server and we are getting the below error.
"EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_IsDirExist: Error [com.microsoft.azure.storage.StorageException: An unknown failure occurred : Connection timed out: connect] occurred while accessing external file.' "
If this error is because of proxy issues, how can we give proxy for extenal table creation with azure reference.
eg: Sample external table creation is below
-- EXTERNAL TABLE
CREATE EXTERNAL TABLE dbo.SampleExternal (
DateId INT NULL,
CalendarQuarter TINYINT NULL,
FiscalQuarter TINYINT NULL)
WITH (LOCATION='/SampleExternal.parquet',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=ParquetFile);
----- DATABASE SCOPED CREDENTIAL
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH
IDENTITY = 'user',
SECRET = 'XXXXXXXXXXX=='
;
---DATA SOURCE
CREATE EXTERNAL DATA SOURCE AzureStorage
WITH (
TYPE = HADOOP,
LOCATION = 'wasbs://XXContainer#XXStorage.blob.core.windows.net',
CREDENTIAL = AzureStorageCredential
);

Resources