Azure - Copy LARGE blobs from one container to other using logic apps

Azure - Copy LARGE blobs from one container to other using logic apps - azure-blob-storage

I successfully built logic app where whenever a blob is added in container-one, it gets copied to container-2. However it fails when any blobs larger than 50 MB (default size) is uploaded.
Could you please guide.
Blobs are added via rest api.
Below is the flow,

Currently, the maximum file size with disabled chunking is 50MB. One of the workarounds is to use Azure functions in order to transfer the files from one container to another.
Below is the sample Python Code that worked for me when I'm trying to transfer files from One container to Another
from azure.storage.blob import BlobClient, BlobServiceClient
from azure.storage.blob import ResourceTypes, AccountSasPermissions
from azure.storage.blob import generate_account_sas
from datetime import datetime,timedelta
connection_string = '<Your Connection String>'
account_key = '<Your Account Key>'
source_container_name = 'container1'
blob_name = 'samplepdf.pdf'
destination_container_name = 'container2'
# Create client
client = BlobServiceClient.from_connection_string(connection_string)
# Create sas token for blob
sas_token = generate_account_sas(
account_name = client.account_name,
account_key = account_key,
resource_types = ResourceTypes(object=True),
permission= AccountSasPermissions(read=True),
expiry = datetime.utcnow() + timedelta(hours=4)
)
# Create blob client for source blob
source_blob = BlobClient(
client.url,
container_name = source_container_name,
blob_name = blob_name,
credential = sas_token
)
# Create new blob and start copy operation
new_blob = client.get_blob_client(destination_container_name, blob_name)
new_blob.start_copy_from_url(source_blob.url)
RESULT:
REFERENCES:
General Limits
How to copy a blob from one container to another container using Azure Blob storage SDK

Related

How to generate SAS token using python legacy SDK(2.1) without account_key or connection_string

I am using python3.6 and azure-storage-blob(version = 1.5.0) and trying to use user assigned managed identity to connect to my azure storage blob from an Azure VM.The problem I am facing is I want to generate the SAS token to form a downloadable url.
I am using blob_service = BlockBlobService(account name,token credential) to authenticate. But I am not able to find any methods which let me generate SAS token without supplying the account key.
Also not seeing any way of using the user delegation key as is available in the new azure-storage-blob (versions>=12.0.0). Is there any workaround or I will need to upgrade the azure storage library at the end.

I tried to reproduce in my environment to generate SAS token without account key or connection string got result successfully.
Code:
import datetime as dt
import json
import os
from azure.identity import DefaultAzureCredential
from azure.storage.blob import (
BlobClient,
BlobSasPermissions,
BlobServiceClient,
generate_blob_sas,
)
credential = DefaultAzureCredential(exclude_shared_token_cache_credential=True)
storage_acct_name = "Accountname"
container_name = "containername"
blob_name = "Filename"
url = f"https://<Accountname>.blob.core.windows.net"
blob_service_client = BlobServiceClient(url, credential=credential)
udk = blob_service_client.get_user_delegation_key(
key_start_time=dt.datetime.utcnow() - dt.timedelta(hours=1),
key_expiry_time=dt.datetime.utcnow() + dt.timedelta(hours=1))
sas = generate_blob_sas(
account_name=storage_acct_name,
container_name=container_name,
blob_name=blob_name,
user_delegation_key=udk,
permission=BlobSasPermissions(read=True),
start = dt.datetime.utcnow() - dt.timedelta(minutes=15),
expiry = dt.datetime.utcnow() + dt.timedelta(hours=2),
)
sas_url = (
f'https://{storage_acct_name}.blob.core.windows.net/'
f'{container_name}/{blob_name}?{sas}'
)
print(sas_url)
Output:
Make sure you need to add storage blob data contributor role as below:

Access Always Encrypted data from Databricks

I have a table in Azure SQL managed instance with 'Always Encrypted' columns. I stored the Column and master keys in Azure key Vault.
My first, question is - How do I access the decrypted data in Azure SQL from Databricks. For that I connected to Azure SQL via jdbc. For Username and Password, I am passing my credentials manually
val jdbcHostname = "XXXXXXXXXXX.database.windows.net"
val jdbcPort = 1433
val jdbcDatabase = "ABCD"
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
// Create a Properties() object to hold the parameters.
import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)
import java.sql.DriverManager
val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
connection.isClosed()
val user = spark.read.jdbc(jdbcUrl, "dbo.bp_mp_user_test", connectionProperties)
display(user)
When I do this I am able to display the data, but it is encrypted data. How do I see the decrypted data
I am new to Azure and Databricks combo, so still learning Azure/Microsoft stack. Are there any other forms of jdbc connection syntax that allows you to decrypt.
I have the keys in Azure Keyvault. So how do I make use of those keys and the security associated with those keys, that way when someone accesses this table, it shows encrypted/decrypted data in the Databricks when accessed.

Reading Blob Into Pyspark

I'm trying to read in a series of json files stored in an azure blob into spark using the databricks notebook. I set the conf() with my account and key but it always returns the error
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.lang.IllegalArgumentException: The String is not a valid Base64-encoded string.
I've followed along with the information provided here:
https://docs.databricks.com/_static/notebooks/data-import/azure-blob-store.html
and here:
https://luminousmen.com/post/azure-blob-storage-with-pyspark
I can pull the data just fine using the azure sdk for python
storage_account_name = "name"
storage_account_access_key = "key"
spark.conf.set(
"fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
storage_account_access_key)
file_location = "wasbs://loc/locationpath"
file_type = "json"
df = spark.read.format(file_type).option("inferSchema", "true").load(file_location)
Should return a dataframe of the json file

Mock the result of accessing public GCS bucket

I have the following code:
bucket = get_bucket('bucket-name')
blob = bucket.blob(os.path.join(*pieces))
blob.upload_from_string('test')
blob.make_public()
result = blob.public_url
# result is `<Mock name='mock().get_bucket().blob().public_url`
And I would do like to mock the result of public_url, my unit test code is something like this
with ExitStack() as st:
from google.cloud import storage
blob_mock = mock.Mock(spec=storage.Blob)
blob_mock.public_url.return_value = 'http://'
bucket_mock = mock.Mock(spec=storage.Bucket)
bucket_mock.blob.return_value = blob_mock
storage_client_mock = mock.Mock(spec=storage.Client)
storage_client_mock.get_bucket.return_value = bucket_mock
st.enter_context(
mock.patch('google.cloud.storage.Client', storage_client_mock))
my_function()
Is there something like FakeRedis or moto for Google Storage, so I can mock google.cloud.storage.Blob.public_url?

I found this fake gcs server written in Go which can be run within a Docker container and consumed by the Python library. See Python examples.

Importing binary data to parse.com

I'm trying to import data to parse.com so I can test my application (I'm new to parse and I've never used json before).
Can you please give me an example of a json file that I can use to import binary files (images) ?
NB : I'm trying to upload my data in bulk directry from the Data Browser. Here is a screencap : i.stack.imgur.com/bw9b4.png

In parse docs i think 2 sections could help you out depend on whether you want to use REST api of the android sdk.
rest api - see section on POST, uploading files that can be upload to parse using REST POST.
SDk - see section on "files"
code for Rest includes following:
use some HttpClient implementation having "ByteArrayEntity" class or something
Map your image to bytearrayEntity and POST it with the correct headers for Mime/Type in httpclient...
case POST:
HttpPost httpPost = new HttpPost(url); //urlends "audio OR "pic"
httpPost.setProtocolVersion(new ProtocolVersion("HTTP", 1,1));
httpPost.setConfig(this.config);
if ( mfile.canRead() ){
FileInputStream fis = new FileInputStream(mfile);
FileChannel fc = fis.getChannel(); // Get the file's size and then map it into memory
int sz = (int)fc.size();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, sz);
data2 = new byte[bb.remaining()];
bb.get(data2);
ByteArrayEntity reqEntity = new ByteArrayEntity(data2);
httpPost.setEntity(reqEntity);
fis.close();
}
,,,
request.addHeader("Content-Type", "image/*") ;
pseudocode for post the runnable to execute the http request

The only binary data allowed to be loaded to parse.com are images. In other cases like files or streams .. etc the most suitable solution is to store a link to the binary data in another dedicated storage for such type of information.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Azure - Copy LARGE blobs from one container to other using logic apps - azure-blob-storage

I successfully built logic app where whenever a blob is added in container-one, it gets copied to container-2. However it fails when any blobs larger than 50 MB (default size) is uploaded. Could you please guide. Blobs are added via rest api. Below is the flow,

Related

How to generate SAS token using python legacy SDK(2.1) without account_key or connection_string

Access Always Encrypted data from Databricks

Reading Blob Into Pyspark

Mock the result of accessing public GCS bucket

Importing binary data to parse.com

Categories

Resources