IoT - Databricks Deltalake - access in C# api or Node js API - azure-blob-storage

I am working on IoT solution, where there are multiple sensors which are sending data. I have one job which listen to Event hub, get the IoT sensor data and store in in Delta lake table. (Underlying Azure ADLS Gen-2 Storage in parquet file format)
I have to display the sensor data on UI (custom UI - developed in React). For that, I have API layer developed in .NET core API / Node JS API.
So finally I have to query Delta table created in data bricks to retrieve the sensor data - using Node JS / .NET CORE and display it on UI. How I can query the data from C# / Node JS API from delta lake table?

You may be better off using query compute that already is capable of reading the delta formatted data. A low cost option would be to create an Azure Synapse Analytics workspace and use a serverless SQL pool to query the delta content. A serverless pool exposes itself as an Azure SQL DB. So, any tool that can query an Azure SQL DB (not a problem for either C# or node.js) can then query those delta tables.
The SQL syntax is a little different looking as it uses an OPENROWSET like follows:
SELECT
*
FROM
OPENROWSET(
BULK 'https://<my-storage-account>.dfs.core.windows.net/<my-container>/<path-to-delta-folder>/',
FORMAT='DELTA'
) AS [recordset];
Alternatively, you can create a logical database in the serverless pool and create external tables for each of your delta folders. Doing this would make it seems a little closer to a traditional relational database.
Since this is a serverless instance there is no provisioning costs. You'd only pay for the consumption (e.g. the actual query). You can find the current pay-as-you go pricing here: Azure Synapse Analytics pricing

Related

Siebel incremental backups to a non Siebel DB

I have data in an Oracle Siebel data base and I want to move the data to the Azure Cloud over a VPN.
Is it possible to copy the database and then apply daily snapshots until I am ready to take the new Azure application live.
This reduces the risk of relying on a single big migration on the cutover weekend.
The destination DB is not Siebel, I will have to put the data into an Azure DB.
Just gathering ideas at the moment

Azure Dataverse To Sync the Data via Azure Synapse Link But Option set Text value not able export to Storage

We are using dataverse to export the data to azure storage by using Azure Synapse Link.
We can see the all the data load into azure data lake without any issues.
now as per requirement, we could do the transformation to load the option set values which separate CSV loaded in storage account but we could not do the transformation with model.json. because the model json available all the schema details.
https://learn.microsoft.com/en-us/power-apps/maker/data-platform/export-to-data-lake-data-adf

incremental loads from blob storage to azure table storage

I have a following scenario. (A rather common one but I am not entirely sure where to start)
I have data incoming into blob storage container (our raw zone). The files get dropped in raw zone everyday(by someone sitting somewhere). Each day as the new files come in, the old files are overwritten, but the number of records increases.
Suppose a customer file from yesterday may have 100 records, today's file might have 150 records. (100 from yesterday and 50 from today).
Now, what is the best way to do an incremental load (or other solutions welcome) for moving latest number of records into the azure table storage.
I have worked with using watermarks etc when loading data from or into sql, but don't have so much experience with Azure table. Would appreciate if I can get a lead.
Thanks in advance.
You can use ADF to do incremental load into Azure Table Storage using watermarks. Refer to below links and you might need to tweak the implementation a little based on requirements.
Incrementally load data from Azure SQL Database to Azure Blob storage using the Azure portal
Copy data to and from Azure Table storage using Azure Data Factory or Synapse Analytics

Manipulating Data Within AWS Redshift to a Schedule

Current Setup:
SQL Server OLTP database
AWS Redshift OLAP database updated from OLTP
via SSIS every 20 minutes
Our customers only have access to the OLAP Db
Requirement:
One customer requires some additional tables to be created and populated to a schedule which can be done by aggregating the data already in AWS Redshift.
Challenge:
This is only for one customer so I cannot leverage the core process for populating AWS; the process must be independent and is to be handed over to the customer who do not use SSIS and don't wish to start. I was considering using Data Pipeline but this is not yet available in the market in which the customer resides.
Question:
What is my alternative? I am aware of numerous partners who offer ETL like solutions but this seems over the top, ultimately all I want to do is execute a series of SQL statements on a schedule with some form of error handling/ alert. Preference of both customer and management is to not use a bespoke app to do this, hence the intended use of Data Pipeline.
For exporting data from AWS Redshift to another data source using datapipeline you can follow a template similar to https://github.com/awslabs/data-pipeline-samples/tree/master/samples/RedshiftToRDS using which data can be transferred from Redshift to RDS. But instead of using RDSDatabase as the sink you could add a JdbcDatabase (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html). The template https://github.com/awslabs/data-pipeline-samples/blob/master/samples/oracle-backup/definition.json provides more details on how to use the JdbcDatabase.
There are many such templates available in https://github.com/awslabs/data-pipeline-samples/tree/master/samples to use as a reference.
I do exactly the same thing as you, but I use lambda service to perform my ETL. One drawback of lambda service is, it can run max of 5 mins (Initially 1 min) only.
So for ETL's greater than 5 minutes, I am planning to set up PHP server in AWS and with SQL injection I can run my queries, scheduled at any time with help of cron function.

Backup Azure blob storage in line with SQL Azure DB

It is recommended that we store document information in blob storage. In our case the blob storage is related to the SQL Azure data, is the facility available to back up the blob storage in sync with the SQL Azure data ? What I don't want to see is a point in time restore of the SQL Azure data only to find we don't have the same snapshot of the blob data at that time :(
Does anyone know what is available
Interesting issue you have to solve. But there is no automated way to keep in sync BLOB and Azure SQL Database Data. You have to do manage this yourself. And here is not just about blob snapshots. How about your updated DB record refers to a new blob. What happens with the old one ?! This is all business rules to apply at application level. And you have to question yourself to what degree you want that backup of Blobs.
Here is an interesting blog post on Azure SQL and Storage backup. But again - there is no service that will keep for you in sync data between SQL DB and Azure Storage.

Resources