Azure Table Storage Incremental backup to Azure Storage Blob

Azure Table Storage Incremental backup to Azure Storage Blob - azure-blob-storage

Is there any way I can do Azure Table Storage backup in to Azure Blob incremental way. AZcopy has solution for full backup for the table but not incremental.
Is there any way I can recover Azure storage table, If I delete it from Azure storage explorer?

We wrote a .NET library that backups tables and blobs. You can easily implement this in an azure function timer trigger.
In this blog I explain how to implement it using an Azure function.
[FunctionName("Function1")]
public static async Task Run([TimerTrigger("0 */5 * * * *")]TimerInfo myTimer, ILogger log, ExecutionContext context)
{
var sourceAccountName = Environment.GetEnvironmentVariable("BackupSourceAccountName");
var sourceKey = Environment.GetEnvironmentVariable("BackupSourceAccountKey");
var backupAzureStorage = new Luminis.AzureStorageBackup.BackupAzureStorage(sourceAccountName, sourceKey, log, context.FunctionAppDirectory);
var destinationAccountName = Environment.GetEnvironmentVariable("BackupDestinationAccountName");
var destinationKey = Environment.GetEnvironmentVariable("BackupDestinationAccountKey");
var destinationContainerName = Environment.GetEnvironmentVariable("BackupDestinationContainer");
// Backup Tables
await backupAzureStorage.BackupAzureTablesToBlobStorage("table1,table2", destinationAccountName, destinationKey, destinationContainerName, "tables");
// Backup Blobs
await backupAzureStorage.BackupBlobStorage("container1,container2", destinationAccountName, destinationKey, destinationContainerName, "blobs");
}

As far as I know, currently azure doesn't support auto backup the table data into blob.
We need write codes to achieve this requirement.
I suggest you could use azure webjobs/function or azcopy(as you says) to achieve this.
If you want to achieve auto backup the data.
I suggest you could try to use timer trigger function to run codes which could backup the data every day or every minute.
More details about how to use timer trigger, you could refer to this article(azure function) or this(web jobs).

Related

Azure Data Factory Missing Blob Triggers

I have created an ADF pipeline that should trigger when a blob is added to a storage container (say container1) and copy the blob to another storage container (say container2). All my blob names are alphanumeric with '-' (basically a GUID). I see that the ADF is triggered only a few times compared to the number of blobs in container1 (i.e if I have n files in container1, the ADF is triggered only x times where x<n).
I also observed that whenever the blobs created per second in container1 is high there are more missed triggers. I am not using any event batching in the event grid. My storage account is v2 BlockBlobStorage.
Is there a way I can resolve this?

I think it is difficult for us to get the correct answer in the community. We'd better move this issue to here. After MS's stress test to find out the possibility of bugs.

Export existing DynamoDB items to Lambda Function

Is there any AWS managed solution which would allow be to perform what is essentially a data migration using DynamoDB as the source and a Lambda function as the sink?
I’m setting up a Lambda to process DynamoDB streams, and I’d like to be able to use that same Lambda to process all the existing items as well rather than having to rewrite the same logic in a Spark or Hive job for AWS Glue, Data Pipeline, or Batch. (I’m okay with the input to the Lambda being different than a DynamoDB stream record—I can handle that in my Lambda—I’m just trying to avoid re-implementing my business logic elsewhere.)
I know that I could build my own setup to run a full table scan, but I’m also trying to avoid any undifferentiated heavy lifting.
Edit: One possibility is to update all of the items in DynamoDB so that it triggers a DynamoDB Stream event. However, my question still remains—is there an AWS managed service that can do this for me?

You can create a new kinesis data stream. Add this as a trigger to your existing lambda function. Create a new simple lambda function which scans the entire table and puts records into this stream. That's it.
Your business logic stays in your original function. You are sending existing data from dynamodb to this function via kinesis.
Ref: https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/

Write Blob MetaData and Pages Atomically

Is there a way to write data and meta data atomically in azure storage for Page Blobs?
Consider a page blob which has multiple writers.
I see recommendations to use the meta data for things like record count, sequence number, general structure of the blob's data. However, if two writers write data and then have to update the meta data, isn't there a race where each writes and tries to update the record count by reading the current count and then updating. Both read 0 and write 1, but there are actually 2.
Same applies to any scenario where the meta data write is not keyed by something particular to that write (eg, each write then writes a new name-value pair into meta data).
The below suggestion does not seem to work for me.
// 512 byte aligned stream with my data
Stream toWrite = PageAlignedStreamManager.Write(data);
long whereToWrite = this.MetaData.TotalWrittenSizeInBytes;
this.MetaData.TotalWrittenSizeInBytes += toWrite.Length;
await this.Blob.FetchAttributesAsync();
if (this.MetaData.TotalWrittenSizeInBytes > this.Blob.Properties.Length)
{
await this.Blob.ResizeAsync(PageAlignedMemoryStreamManager.PagesRequired(this.MetaData.TotalWrittenSizeInBytes) * PageAlignedMemoryStreamManager.PageSizeBytes * 2);
}
this.MetaData.RevisionNumber++;
this.Blob.Metadata[STREAM_METADATA_KEY] = JsonConvert.SerializeObject(this.MetaData);
// TODO: the below two lines should happen atomically
await this.Blob.WritePagesAsync(toWrite, whereToWrite, null, AccessCondition.GenerateLeaseCondition(this.BlobLeaseId), null, null);
await this.Blob.SetMetadataAsync(AccessCondition.GenerateLeaseCondition(this.BlobLeaseId), null, null);
toWrite.Dispose();
If I do not explicitly call SetMetaData as the next action, it does not get set :(

Is there a way to write data and meta data atomically in azure storage?
Yes. You could try to update the data and metadata atomically in this way. When we set/update blob metadata using the following code snippet, it is stored in current blob object. Currently, no network call is made.
blockBlob.Metadata["docType"] = "textDocuments";
when we use the following code to update blob, it actually makes the call to set the blob content and metadata. If upload fails, both blob content and metadata will not be updated.
blockBlob.UploadText("new content");
However, if two writers write data and then have to update the meta data, isn't there a race where each writes and tries to update the record count by reading the current count and then updating. Both read 0 and write 1, but there are actually 2.
Azure Storage supports these three data concurrency strategies (Optimistic concurrency, Pessimistic concurrency and Last writer wins), we could use optimistic concurrency control through the ETag property, or use pessimistic concurrency control through a lease, which could help us guarantee the data consistency.

Is it possible to write multiple blobs in a single request?

We're planning to use Azure blob storage to save processing log data for later analysis. Our systems are generating roughly 2000 events per minute, and each "event" is a json document. Looking at the pricing for blob storage, the sheer number of write operations would cost us tons of money if we take each event and simply write it to a blob.
My question is: Is it possible to create multiple blobs in a single write operation, or should I instead plan to create blobs containing multiple event data items (for example, one blob for each minute's worth of data)?

It is possible ,but isn't good practice ,it take long times to multipart files to be merge, hence we are trying to separate upload action from entity persist operation by passing entity id and update doc[image] name in other controller
Also it keeps you clean upload functionality .Best Wish

It's impossible to create multiple blobs in a single write operation.
One feasible solution is to create blobs containing multiple event data items as you planned (which is hard to implement and query in my opinion); another solution is to store the event data into Azure Storage Table rather than Blob, and leverage EntityGroupTransaction to write table entities in one batch (which is billed as one transaction).
Please note that all table entities in one batch must have the same partition key, which should be considered when you're designing your table (see Azure Storage Table Design Guide for further information). If some of your events have large data size that exceeds the size limitation of Azure Storage Table (1MB per entity, 4MB per batch), you can save data of those events to Blob and store the blob links in Azure Storage Table.

Can I capture Performance Counters for an Azure Web/Worker Role remotely...?

I am aware of the generation of the Performance Counters and Diagnosis in webrole and worker-role in Azure.
My question is can I get the Performance Counter on a remote place or remote app, given the subscription ID and other certificates (3rd Party app to give performance Counter).
Question in other words, Can I get the Performance Counter Data, the way I use Service Management API for any hosted service...?
What are the pre-configurations required to be done in Server...? to get CPU data...???

Following is the description of the attributes for Performance counters table:
EventTickCount: Stores the tick count (in UTC) when the log entry was recorded.
DeploymentId: Id of your deployment.
Role: Role name
RoleInstance: Role instance name
CounterName: Name of the counter
CounterValue: Value of the performance counter
One of the key thing here is to understand how to effectively query this table (and other diagnostics table). One of the things we would want from the diagnostics table is to fetch the data for a certain period of time. Our natural instinct would be to query this table on Timestamp attribute. However that's a BAD DESIGN choice because you know in an Azure table the data is indexed on PartitionKey and RowKey. Querying on any other attribute will result in full table scan which will create a problem when your table contains a lot of data.
The good thing about these logs table is that PartitionKey value in a way represents the date/time when the data point was collected. Basically PartitionKey is created by using higher order bits of DateTime.Ticks (in UTC). So if you were to fetch the data for a certain date/time range, first you would need to calculate the Ticks for your range (in UTC) and then prepend a "0" in front of it and use those values in your query.
If you're querying using REST API, you would use syntax like:
PartitionKey ge '0<from date/time ticks in UTC>' and PartitionKey le '0<to date/time in UTC>'.
You could use this syntax if you're querying table storage in our tool Cloud Storage Studio, Visual Studio or Azure Storage Explorer.
Unfortunately I don't have much experience with the Storage Client library but let me work something out. May be I will write a blog post about it. Once I do that, I will post the link to my blog post here.
Gaurav

Since the performance counters data gets persisted in Windows Azure Table Storage (WADPerformanceCountersTable), you can query that table through a remote app (either by using Microsoft's Storage Client library or writing your own custom wrapper around Azure Table Service REST API to retrieve the data. All you will need is the storage account name and key.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio