Databricks Azure Blob Storage access - azure-blob-storage

I am trying to access files stored in Azure blob storage and have followed the documentation linked below:
https://docs.databricks.com/external-data/azure-storage.html
I was successful in mounting the Azure blob storage on dbfs but it seems that the method is not recommended anymore. So, I tried to set up direct access using URI using SAS authentication.
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", "<token>")
Now when I try to access any file using:
spark.read.load("abfss://<container-name>#<storage-account-name>.dfs.core.windows.net/<path-to-data>")
I get the following error:
Operation failed: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.", 403, HEAD,
I am able to mount the storage account using the same SAS token but this is not working.
What needs to be changed for this to work?

If you are using blob storage, then you have to use wasbs and not abfss. I have tried using using the same code as yours with my SAS token and got the same error with my blob storage.
spark.conf.set("fs.azure.account.auth.type.<storage_account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.dfs.core.windows.net", "<token>")
df = spark.read.load("abfss://<container>#<storage_account>.dfs.core.windows.net/input/sample1.csv")
When I used the following modified code, I was able to successfully read the data.
spark.conf.set("fs.azure.account.auth.type.<storage_account>.blob.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.blob.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.blob.core.windows.net", "<token>")
df = spark.read.format("csv").load("wasbs://<container>#<storage_account>.blob.core.windows.net/input/sample1.csv")
UPDATE:
To access files from azure blob storage where the firewall settings are only from selected networks, you need to configure VNet for the Databricks workspace.
Now add the same virtual network to your storage account as well.
I have also selected service endpoints and subnet delegation as following:
Now when I run the same code again using the file path as wasbs://<container>#<storage_account>.blob.core.windows.net/<path>, the file is read successfully.

Related

Databricks and Azure Blob Storage

I am running this on databricks notebook
dbutils.fs.ls("/mount/valuable_folder")
I am getting this error
Caused by: StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
I tried using dbutils.fs.refreshMounts()
to get any updates in azure blob storage, but still getting above error.
Such errors most often arise when credentials that you have used for mounting are expired - for example, SAS is expired, storage key is rotated, or service principal secret is expired. You need to unmount the storage using dbutils.fs.unmount and mount it again with dbutils.fs.mount. dbutils.fs.refreshMounts() just refreshes a list of mounts in the backend, it doesn't recheck for credentials.

Azure service for hosting zip file securely

I am running a spring boot app which internally uses secure bundle zip file of size 13kb . I want to host the file to remote server securely and encrypted . The infrastructure I am in is azure . Which azure service I can use to host my zip file securely ? Can I use azure key value to access my zip file ?
You can store the Zip file in Azure Blob storage. Azure Storage uses server-side encryption (SSE) to automatically encrypt your data when it is persisted to the cloud. You can configure your storage account to accept requests from secure connections only by setting the Secure transfer required property for the storage account.
You could then restrict access to the storage blob via Shared Access Signature.
If you wanted to be extra secret squirrel, you could even enable CMK (Customer Managed Keys) for the encryption to make doubly sure nobody is looking at your secret sauce.
https://learn.microsoft.com/en-us/azure/storage/common/storage-require-secure-transfer
https://learn.microsoft.com/en-us/azure/storage/common/customer-managed-keys-configure-key-vault?tabs=portal
https://learn.microsoft.com/en-us/azure/storage/blobs/security-recommendations
You could theoretically host your zip file in Key Vault if you serialize it to a text file. But I think that's a bad idea.
The right service is Azure Storage File Share.

Go storage client not able to access GCP bucket

I have a golang service which has an API exposed where we try to upload a CSV to a GCP bucket. On my local host, I set the environment variable GOOGLE_APPLICATION_CREDENTIAL
and point this variable to the filepath of service account json. But when deploying to an actual GCP instance, I'm getting the below error while trying to access this API. Ideally,the service should talk to GCP metadata server and fetch the credentials and then store them in a json file. So there are 2 problems here:
Service is not querying the metadata service to get the credentials.
If file is present(I created it manually), it's not able to access due to permission issues.
Any help would be appreciated.
Error while initializing storage Client:dialing: google: error getting credentials using well-known file (/root/.config/gcloud/application_default_credentials.json): open /root/.config/gcloud/application_default_credentials.json: permission denied
Finally, after long debugging and searching over the web, found out that there's already an open PR for the go-storage client which is open: https://github.com/golang/oauth2/issues/337. I had to make a few changes in the code using this method: https://pkg.go.dev/golang.org/x/oauth2/google#ComputeTokenSource where in basically we are trying to fetch the token explicitly from metadata server and then calling subsequent cloud API's.

access issue while connecting to azure data lake gen 2 from databricks

I am getting this below access issue while trying to connect from databricks to gen2 data lake using Service principal and OAuth 2.0
Steps performed: Reference article
created new service principal
provide necessary access to this service principal from azure storage account IAM with Contributor role access.
Firewalls and private end points connection has been enabled on databricks and storage account.
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=AuthorizationPermissionMismatch
ErrorMessage=This request is not authorized to perform this operation using this permission.
However when I tried connecting via access keys it works well without any issue. Now I started suspecting if #3 from my steps is the reason for this access issue. If so, do I need to give any additional access to make it success? Any thoughts?
When performing the steps in the Assign the application to a role, make sure to assign the Storage Blob Data Contributor role to the service principal.
Repro: I have provided owner permission to the service principal and tried to run the “dbutils.fs.ls("mnt/azure/")”, returned same error message as above.
Solution: Now assigned the Storage Blob Data Contributor role to the service principal.
Finally, able to get the output without any error message after assigning Storage Blob Data Contributor role to the service principal.
For more details, refer “Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark”.

How to access Azure blob storage via Flex Application?

Merged with How to access Azure blob storage via Flex Application?.
I am trying to access Blob Storage on Azure via my flex Application. I am doing this via an HTTP Service by using the url given by Azure Blob Storage. However, my storage has private and restricted access and I can only update the storage by using the key (provided by Azure).
Since my application needs to write to this storage, I somehow need to pass in the key via my HTTPService?
Does anyone have any idea how I can do this?
Regards
Aparna

Resources