Auto generated block blobs - azure-blob-storage

I am observing that whenever I create a new folder inside the Azure blob storage, a block blob file with the same name as the folder is auto created. I dont know why and what setting is making this behave this way. Any pointers on why and how to disable this ? Thank you.

In azure blob storage(not Azure data lake storage Gen2), you should know one important thing: You cannot create an empty folder. The reason is that blob storage has a 2 level hierarchy - blob container and blob. Any folder/directory should be part of a blob.
If you want to create an empty folder, you can use Azure data lake storage Gen2 instead. It's built on blob storage and has some familiar operations.

Related

Copying JSON data from CosmosDB to Snowflake through ADF

**Hello,
I am trying to copy data from Cosmos DB to Snowflake through Azure Data Factory. But I get the error- "Direct copying data to Snowflake is only supported when source dataset is DelimitedText, Parquet, JSON with Azure Blob Storage or Amazon S3 linked service, for other dataset or linked service, please enable staging". Would that imply that I need to create a linked service with blob storage? What URL and SAS token should I give? Do I need to move everything to Blob and then move forward with staging?
Any help is appreciated. Thank You very much.**
Try it with a data flow activity instead of a copy activity

incremental loads from blob storage to azure table storage

I have a following scenario. (A rather common one but I am not entirely sure where to start)
I have data incoming into blob storage container (our raw zone). The files get dropped in raw zone everyday(by someone sitting somewhere). Each day as the new files come in, the old files are overwritten, but the number of records increases.
Suppose a customer file from yesterday may have 100 records, today's file might have 150 records. (100 from yesterday and 50 from today).
Now, what is the best way to do an incremental load (or other solutions welcome) for moving latest number of records into the azure table storage.
I have worked with using watermarks etc when loading data from or into sql, but don't have so much experience with Azure table. Would appreciate if I can get a lead.
Thanks in advance.
You can use ADF to do incremental load into Azure Table Storage using watermarks. Refer to below links and you might need to tweak the implementation a little based on requirements.
Incrementally load data from Azure SQL Database to Azure Blob storage using the Azure portal
Copy data to and from Azure Table storage using Azure Data Factory or Synapse Analytics

Look for best approach replicating oracle table in S3 bucket

My problem:
I need a data pipeline created from my organization’s Oracle DB (Oracle Cloud Infrastructure) to an AWS S3 bucket. Ideally, I would love for there to be some mechanism for oracle to push new data that has entered the database to be pushed to an S3 bucket as it is added (in whatever format).
Question:
Is this possible with Oracle native, specifically Oracle Cloud Infrastructure?
Or would is there a better solution you have seen?
Note:
I have seen AWS has the Data Sync product, this seems like it could facilitate with this problem, however I am not sure if it is suitable for this specific problem.
An S3 bucket is object storage; it can only hold complete files. You cannot open and update an existing file like you would in a normal file system, even just to add new rows. You will need to construct your whole file outside of Oracle and then push it to S3 with some other mechanism.
You may want to consider the following steps:
Export your data from Oracle Cloud into Oracle Object Storage (similar to S3) using the Oracle Cloud's integration with their object storage. (https://blogs.oracle.com/datawarehousing/the-simplest-guide-to-exporting-data-from-autonomous-database-directly-to-object-storage)
THEN:
Let the customer access the Oracle Object Store as they normally would access S3, using Oracle's Amazon S3 Compatibility API. (https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/s3compatibleapi.htm)
OR:
Use an externally driven script to download the data - either from Oracle Object Store or directly from the database - to a server, then push the file up to Amazon S3. The server could be local, or hosted in either Oracle OCI or in AWS, as long as it has access to both object stores. (https://blogs.oracle.com/linux/using-rclone-to-copy-data-in-and-out-of-oracle-cloud-object-storage)
OR:
You may be able to use AWS Data Sync to move data directly from Oracle Object Storage to S3, depending on networking configuration requirements. (https://aws.amazon.com/blogs/aws/aws-datasync-adds-support-for-on-premises-object-storage/)

Get the latest data from ADLS Gen 2 blob storage to table mounted in Azure DataBricks

I have created an unmanaged table in Azure DataBricks using mount path as below:
CREATE TABLE <Table-Name> using org.apache.spark.sql.parquet OPTIONS (path "/mnt/<folder>/<subfolder>/")
Source of mount path is parquet files stored in ADLS Gen2.
I see if the underlying data is changed in ADLS Gen 2 blob storage path, it is not reflected in the unmanaged table created in ADB. This ADB table still holds the data which was available in blob storage at time of creation of table
Is there any way to get the latest data from blob storage into table in ADB?
There are many who suggest to use ,
REFRESH TABLE <table-name>
https://docs.databricks.com/data/tables.html#update-a-table
But it never worked for me .
The below think it worked .
yourdataframe.write.mode("overwrite").saveAsTable("test_table")

Backup Azure blob storage in line with SQL Azure DB

It is recommended that we store document information in blob storage. In our case the blob storage is related to the SQL Azure data, is the facility available to back up the blob storage in sync with the SQL Azure data ? What I don't want to see is a point in time restore of the SQL Azure data only to find we don't have the same snapshot of the blob data at that time :(
Does anyone know what is available
Interesting issue you have to solve. But there is no automated way to keep in sync BLOB and Azure SQL Database Data. You have to do manage this yourself. And here is not just about blob snapshots. How about your updated DB record refers to a new blob. What happens with the old one ?! This is all business rules to apply at application level. And you have to question yourself to what degree you want that backup of Blobs.
Here is an interesting blog post on Azure SQL and Storage backup. But again - there is no service that will keep for you in sync data between SQL DB and Azure Storage.

Resources