access issue while connecting to azure data lake gen 2 from databricks - azure-databricks

I am getting this below access issue while trying to connect from databricks to gen2 data lake using Service principal and OAuth 2.0
Steps performed: Reference article
created new service principal
provide necessary access to this service principal from azure storage account IAM with Contributor role access.
Firewalls and private end points connection has been enabled on databricks and storage account.
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=AuthorizationPermissionMismatch
ErrorMessage=This request is not authorized to perform this operation using this permission.
However when I tried connecting via access keys it works well without any issue. Now I started suspecting if #3 from my steps is the reason for this access issue. If so, do I need to give any additional access to make it success? Any thoughts?

When performing the steps in the Assign the application to a role, make sure to assign the Storage Blob Data Contributor role to the service principal.
Repro: I have provided owner permission to the service principal and tried to run the “dbutils.fs.ls("mnt/azure/")”, returned same error message as above.
Solution: Now assigned the Storage Blob Data Contributor role to the service principal.
Finally, able to get the output without any error message after assigning Storage Blob Data Contributor role to the service principal.
For more details, refer “Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark”.

Related

Databricks Azure Blob Storage access

I am trying to access files stored in Azure blob storage and have followed the documentation linked below:
https://docs.databricks.com/external-data/azure-storage.html
I was successful in mounting the Azure blob storage on dbfs but it seems that the method is not recommended anymore. So, I tried to set up direct access using URI using SAS authentication.
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", "<token>")
Now when I try to access any file using:
spark.read.load("abfss://<container-name>#<storage-account-name>.dfs.core.windows.net/<path-to-data>")
I get the following error:
Operation failed: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.", 403, HEAD,
I am able to mount the storage account using the same SAS token but this is not working.
What needs to be changed for this to work?
If you are using blob storage, then you have to use wasbs and not abfss. I have tried using using the same code as yours with my SAS token and got the same error with my blob storage.
spark.conf.set("fs.azure.account.auth.type.<storage_account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.dfs.core.windows.net", "<token>")
df = spark.read.load("abfss://<container>#<storage_account>.dfs.core.windows.net/input/sample1.csv")
When I used the following modified code, I was able to successfully read the data.
spark.conf.set("fs.azure.account.auth.type.<storage_account>.blob.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.blob.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.blob.core.windows.net", "<token>")
df = spark.read.format("csv").load("wasbs://<container>#<storage_account>.blob.core.windows.net/input/sample1.csv")
UPDATE:
To access files from azure blob storage where the firewall settings are only from selected networks, you need to configure VNet for the Databricks workspace.
Now add the same virtual network to your storage account as well.
I have also selected service endpoints and subnet delegation as following:
Now when I run the same code again using the file path as wasbs://<container>#<storage_account>.blob.core.windows.net/<path>, the file is read successfully.

Databricks and Azure Blob Storage

I am running this on databricks notebook
dbutils.fs.ls("/mount/valuable_folder")
I am getting this error
Caused by: StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
I tried using dbutils.fs.refreshMounts()
to get any updates in azure blob storage, but still getting above error.
Such errors most often arise when credentials that you have used for mounting are expired - for example, SAS is expired, storage key is rotated, or service principal secret is expired. You need to unmount the storage using dbutils.fs.unmount and mount it again with dbutils.fs.mount. dbutils.fs.refreshMounts() just refreshes a list of mounts in the backend, it doesn't recheck for credentials.

Android Management API: Failed to patch policy - Caller is not authorized to manage enterprise

I have been working with the Android Management API to try and manage the policy of my company's existing enterprise. My company account has the Owner role within the organization and the roles Owner and Service Account Admin for the service account mentioned later.
I followed the Quickstart Guide to get familiar with the API and made some modifications for a more permanent solution along the way such as creating a service account with the Android Management User role via the Google Cloud Platform and generating a JSON key to acquire credentials rather than going through the OAuth2 flow like in the guide. This allowed me to authenticate properly, but when it comes time to patch the policy as such,
androidmanagement.enterprises().policies().patch(
name=policy_name,
body=policy_json
).execute()
I get the following error:
<HttpError 403 when requesting https://androidmanagement.googleapis.com/v1/enterprises/XXXXXXXXX/policies/<policy_name>?alt=json returned "Caller is not authorized to manage enterprise.". Details: "Caller is not authorized to manage enterprise.">
I have verified that the service account I am authenticating with has the Android Management User role, and thus has the androidmanagement.enterprises.manage permission.
I have also attempted to make this call with an elevated admin role in the organization.
Is there a chance that I need to have created the enterprise with my own account to manage the enterprise? The guide suggests that an organization can create multiple enterprises. In which case, would I need to create a new Google account not associated with my organization's enterprise and create a new enterprise that way?
It is advisable to use your own google account to call Android Management API since your organization account may not be compatible with the quickstart.
To access the Android Management API your service account requires the androidmanagement.enterprises.manage permission, which can be granted by the Android Management User role (or roles/androidmanagement.user). Kindly check this link for details regarding creating a service account.
Please keep in mind that the enterprise you created as part of the colab instructions can only be managed using the colab itself. To allow your cloud project to manage an organization, you will need to create one using the client configuration from your cloud project.

Azure Blob: How to grant a mobile app limited access to user-specific data

I have a mobile app (Xamarin Android & iOS) that connects to a website (ASP.NET MVC). Some of the content for the mobile app (files & images) comes from an Azure Blob store that currently has public read-access enabled.
I am building an authentication module for the app (OAuth, with username/password). Is it possible to somehow build authentication into my Azure Blob account as well, so that a user would only have access to their specific files? I know that I could use the website as an intermediary (ie. user authenticates and connects to website, websites connects to azure & retrieves data and returns it to app) but this will add an extra step of lag as opposed to just connecting to Azure Blob directly.
I see that Azure Blob supports a shared access signature (SAS) tokens. Is it possible to generate a SAS token just for the subset of files relevant to that user? I imagine the workflow would be:
mobile app authenticates to website api
website generates and return SAS token for blob access
mobile app connects to azure blob directly using SAS token.
Would that even be a good idea? Any other suggestions?
From what I understand of your scenario, you could use either Azure AD or SAS for authentication/authorization to Blob storage. The key will be to organize your users' data by container, so that you can restrict access to that container. This type of design will align best with how authorization is handled in Azure Storage today.
So for example, you would create a container for user1's data, another container for user2's data, and so on.
If you are already using Azure AD to authenticate and authorize your users for your application, then you may be able to simply assign an RBAC role that is scoped to the user's container for each user. For example, you can assign the Storage Blob Data Contributor role to user1 for container1, then do the same for user2 on container2. See Use the Azure portal to assign an Azure role for access to blob and queue data for information about how to do this in the Azure portal; you can also use PowerShell or Azure CLI.
Note that an RBAC role cannot be scoped to an individual blob, but only at the container level or above.
If you determine that you need to use SAS, you can create a SAS for each user that is restricted to their container. If your users are already authenticating/authorizing to your application with Azure AD, then you probably don't need to use SAS. The SAS would be useful in the case where you need to grant access to a user that is not otherwise authenticated.

Is it possible to prevent the leakage of the original data of the database even if it is hacked?

We want to build a web application and deploy it on AWS.
EC2: Laravel
RDS: MySQL
I will use Laravel's encrypter to encrypt the data of database. Even RDS got hacked, the data have encrypted. Hacker can't know the contents. But if EC2 got hacked, hacker can get the database credential and the encryption key on the source code and decrypt the encrypted data from database.
My Boss (maybe client) think that it is not enough because of the database contains sensitive informations of users. He want to prevent the leakage of the original data of the database even if the web server (EC2) got hacked. Is it possible?
If not, I think we should focus on make the web server more difficult to be hacked:
Set Security Group to limit ssh access by IP address.
Or any other measures?
Here are a few safety measures you can do to reduce your blast radius.
Move your credentials for the RDS database so they are not directly on the instance, use a credential store such as:
AWS Secrets Manager
HashiCorp Vault
Rotate your database credentials frequently, and use IAM roles for your EC2 applications and not IAM users.
Keep your EC2 and RDS within private subnets, add an ELB in front of the EC2 so that public traffic can only access this device only.
Configure security groups to scoped to only what they need, limit inbound access to your AWS VPC to a VPN or direct connect connection.
Restrict access to who can do what in your AWS account, if a user does not need to perform certain actions for their role then just remove those permissions. This will prevent an accidental action on a service the user should not be using.
AWS have a large number of actions you can do in the security pillar too, so make sure to take a read of that.

Resources