I have a following scenario. (A rather common one but I am not entirely sure where to start)
I have data incoming into blob storage container (our raw zone). The files get dropped in raw zone everyday(by someone sitting somewhere). Each day as the new files come in, the old files are overwritten, but the number of records increases.
Suppose a customer file from yesterday may have 100 records, today's file might have 150 records. (100 from yesterday and 50 from today).
Now, what is the best way to do an incremental load (or other solutions welcome) for moving latest number of records into the azure table storage.
I have worked with using watermarks etc when loading data from or into sql, but don't have so much experience with Azure table. Would appreciate if I can get a lead.
Thanks in advance.
You can use ADF to do incremental load into Azure Table Storage using watermarks. Refer to below links and you might need to tweak the implementation a little based on requirements.
Incrementally load data from Azure SQL Database to Azure Blob storage using the Azure portal
Copy data to and from Azure Table storage using Azure Data Factory or Synapse Analytics
Related
I have data in an Oracle Siebel data base and I want to move the data to the Azure Cloud over a VPN.
Is it possible to copy the database and then apply daily snapshots until I am ready to take the new Azure application live.
This reduces the risk of relying on a single big migration on the cutover weekend.
The destination DB is not Siebel, I will have to put the data into an Azure DB.
Just gathering ideas at the moment
I am working on IoT solution, where there are multiple sensors which are sending data. I have one job which listen to Event hub, get the IoT sensor data and store in in Delta lake table. (Underlying Azure ADLS Gen-2 Storage in parquet file format)
I have to display the sensor data on UI (custom UI - developed in React). For that, I have API layer developed in .NET core API / Node JS API.
So finally I have to query Delta table created in data bricks to retrieve the sensor data - using Node JS / .NET CORE and display it on UI. How I can query the data from C# / Node JS API from delta lake table?
You may be better off using query compute that already is capable of reading the delta formatted data. A low cost option would be to create an Azure Synapse Analytics workspace and use a serverless SQL pool to query the delta content. A serverless pool exposes itself as an Azure SQL DB. So, any tool that can query an Azure SQL DB (not a problem for either C# or node.js) can then query those delta tables.
The SQL syntax is a little different looking as it uses an OPENROWSET like follows:
SELECT
*
FROM
OPENROWSET(
BULK 'https://<my-storage-account>.dfs.core.windows.net/<my-container>/<path-to-delta-folder>/',
FORMAT='DELTA'
) AS [recordset];
Alternatively, you can create a logical database in the serverless pool and create external tables for each of your delta folders. Doing this would make it seems a little closer to a traditional relational database.
Since this is a serverless instance there is no provisioning costs. You'd only pay for the consumption (e.g. the actual query). You can find the current pay-as-you go pricing here: Azure Synapse Analytics pricing
I am trying to insert data from a SQL table to an Oracle table using activity Copy Data in Data Factory, on the first try it runs fine but on the second try it throws an error that an index on the target table (Oracle) has been corrupted.
Searching in different forums I found that apparently the Copy Data activity sends the insert statement in the following way: INSERT /*+ SYS_DL_CURSOR */ INTO
any idea how to fix this???
Thank you very much for the help
As per the error index is not corrupted. It was used twice. May be the operation was not planned according to the schedule and worked parallelly.
The Copy activity is executed on an integration runtime. You can use different types of integration runtimes for different data copy scenarios:
When you're copying data between two data stores that are publicly accessible through the internet from any IP, you can use the Azure integration runtime for the copy activity. This integration runtime is secure, reliable, scalable, and globally available.
When you're copying data to and from data stores that are located on-premises or in a network with access control (for example, an Azure virtual network), you need to set up a self-hosted integration runtime.
Use either of the two operations mentioned above, the error will be resolved.
Check link for support document: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview
I'm looking for the best way to upload files to a Azure SQL Database.
We have to use Azure Data Factory as at this moment we are not allowed to use Azure VM's with SSIS.
Each day we are upload 1,5Gb of XML files.
Currently we we are uploading the to a Blob storage and with a Copy activity we are uploading them into the DB.
But this takes up to 2,5 hours.
What would be a better/faster concept to do this ?
Any Suggestions ?
You can use the bcp utility to import your data into an instance of SQL Server as explained in this document. From there, you can use the Azure Data Synchronization tool to synchronize your Azure SQL Database with your SQL Server Database. This may provide a faster execution time.
Finally it dropped to 35 minutes.
Just by using several storage accounts (5) and split the data over the 5 accounts.
5 ADF pipelines uploding all into the same staging table.
Some where huge files, but we had over 100.000 small files from 2 up to 100K.
This worked out fine for us.
We noticed that the DTU's never went up to their limit so we thought that the DB was not the problem, by firts splitting into 2 we saw DTU rising a bit more. we continued on that path....
It is recommended that we store document information in blob storage. In our case the blob storage is related to the SQL Azure data, is the facility available to back up the blob storage in sync with the SQL Azure data ? What I don't want to see is a point in time restore of the SQL Azure data only to find we don't have the same snapshot of the blob data at that time :(
Does anyone know what is available
Interesting issue you have to solve. But there is no automated way to keep in sync BLOB and Azure SQL Database Data. You have to do manage this yourself. And here is not just about blob snapshots. How about your updated DB record refers to a new blob. What happens with the old one ?! This is all business rules to apply at application level. And you have to question yourself to what degree you want that backup of Blobs.
Here is an interesting blog post on Azure SQL and Storage backup. But again - there is no service that will keep for you in sync data between SQL DB and Azure Storage.