Write Blob MetaData and Pages Atomically - azure-blob-storage

Is there a way to write data and meta data atomically in azure storage for Page Blobs?
Consider a page blob which has multiple writers.
I see recommendations to use the meta data for things like record count, sequence number, general structure of the blob's data. However, if two writers write data and then have to update the meta data, isn't there a race where each writes and tries to update the record count by reading the current count and then updating. Both read 0 and write 1, but there are actually 2.
Same applies to any scenario where the meta data write is not keyed by something particular to that write (eg, each write then writes a new name-value pair into meta data).
The below suggestion does not seem to work for me.
// 512 byte aligned stream with my data
Stream toWrite = PageAlignedStreamManager.Write(data);
long whereToWrite = this.MetaData.TotalWrittenSizeInBytes;
this.MetaData.TotalWrittenSizeInBytes += toWrite.Length;
await this.Blob.FetchAttributesAsync();
if (this.MetaData.TotalWrittenSizeInBytes > this.Blob.Properties.Length)
{
await this.Blob.ResizeAsync(PageAlignedMemoryStreamManager.PagesRequired(this.MetaData.TotalWrittenSizeInBytes) * PageAlignedMemoryStreamManager.PageSizeBytes * 2);
}
this.MetaData.RevisionNumber++;
this.Blob.Metadata[STREAM_METADATA_KEY] = JsonConvert.SerializeObject(this.MetaData);
// TODO: the below two lines should happen atomically
await this.Blob.WritePagesAsync(toWrite, whereToWrite, null, AccessCondition.GenerateLeaseCondition(this.BlobLeaseId), null, null);
await this.Blob.SetMetadataAsync(AccessCondition.GenerateLeaseCondition(this.BlobLeaseId), null, null);
toWrite.Dispose();
If I do not explicitly call SetMetaData as the next action, it does not get set :(

Is there a way to write data and meta data atomically in azure storage?
Yes. You could try to update the data and metadata atomically in this way. When we set/update blob metadata using the following code snippet, it is stored in current blob object. Currently, no network call is made.
blockBlob.Metadata["docType"] = "textDocuments";
when we use the following code to update blob, it actually makes the call to set the blob content and metadata. If upload fails, both blob content and metadata will not be updated.
blockBlob.UploadText("new content");
However, if two writers write data and then have to update the meta data, isn't there a race where each writes and tries to update the record count by reading the current count and then updating. Both read 0 and write 1, but there are actually 2.
Azure Storage supports these three data concurrency strategies (Optimistic concurrency, Pessimistic concurrency and Last writer wins), we could use optimistic concurrency control through the ETag property, or use pessimistic concurrency control through a lease, which could help us guarantee the data consistency.

Related

Data health check tool

I want to perform data health check on huge volume of data, which can be either in RDBMS or cloud file storage like Amazon S3. Which tool would be appropriate for performing data health check, which can give me number of rows, rows not matching a given schema for data type validation, average volume for given time period etc?
I do not want to use any bigdata platform like Qubole or Databricks because of extra cost involved. I found Drools which can perform similar operations but it would need reading full data into memory and associate with a POJO before validation. Any alternatives would be appreciated where I do not have to load full data into memory.
You can avoid loading full data in memory by implementing the StatelessKieSession object of drools. StatelessKieSession works only on the current event it does not maintain the state of any event also does not keep objects in the memory. Read more about StatelessKieSession here.
Also, you can use Stateful KieSession and give an expiry to an event using the #expires declaration which expiries event after the specified time. Read more about #expires here.

What are the typical ways to cache the result of a relational database query using Redis?

What do developers commonly use as the key and value to cache the result from a SQL query into Redis? For example, if I have a Users table, and I want to cache the results from the query:
SELECT name, age FROM Users
1) Which Redis data structure should I use? Should I just have a single Key for the query and store the entire object returned by the database as the Value as such:
{ key: { object returned by database } }
Or should I use Redis' List data structure and loop through the rows individually and push them into the List as such:
{ key: [ ... ]}
Wouldn't this add computation time of O(N)? How is this more effective than just simply storing the object returned by the database?
Or should I use Redis' Hash Map data structure and loop through the rows individually and set a unique Key for each row with its corresponding attributes as such:
{ key1: {name: 'Bob', age: 25} }, { key2: {name: 'Sally', age: 15} }, ...
2) What would be a good rule of thumb with regards to the Key? From my understanding, some people just use the SQL query as the Key? But if you do so, does that mean you would have to store the entire object returned by the database as the Value (as per question 1)? Is this the best way to do it? If you are using an ORM, do you still use the SQL query as the key?
This is nicely analyzed in the Database Caching Strategies Using Redis whitepaper, by AWS.
Here the options discussed in the document. What is best is really a design decision based on tradeoffs you have to make for your specific use-case.
Cache the Database SQL ResultSet
Cache a serialized ResultSet object that contains the fetched database
row.
Pro: When data retrieval logic is abstracted (e.g., as in a Data Access Object or DAO layer), the consuming code expects only a
ResultSet object and does not need to be made aware of its
origination. A ResultSet object can be iterated over, regardless of
whether it originated from the database or was deserialized from the
cache, which greatly reduces integration logic. This pattern can be
applied to any relational database.
Con: Data retrieval still requires extracting values from the ResultSet object cursor and does not further simplify data access; it
only reduces data retrieval latency.
Cache Select Fields and Values in a Custom Format
Cache a subset of a fetched database row into a custom structure that
can be consumed by your applications.
Pro: This approach is easy to implement. You essentially store specific retrieved fields and values into a structure such as JSON or
XML and then SET that structure into a Redis string. The format you
choose should be something that conforms to your application’s data
access pattern.
Con: Your application is using different types of objects when querying for particular data (e.g., Redis string and database
results). In addition, you are required to parse through the entire
structure to retrieve the individual attributes associated with it.
Cache Select Fields and Values into an Aggregate Redis Data Structure
Cache the fetched database row into a specific data structure that can
simplify the application’s data access.
Pro: When converting the ResultSet object into a format that simplifies access, such as a Redis Hash, your application is able to
use that data more effectively. This technique simplifies your data
access pattern by reducing the need to iterate over a ResultSet object
or by parsing a structure like a JSON object stored in a string. In
addition, working with aggregate data structures, such as Redis Lists,
Sets, and Hashes provide various attribute level commands associated
with setting and getting data, eliminating the overhead associated
with processing the data before being able to leverage it.
Con: Your application is using different types of objects when querying for particular data (e.g., Redis Hash and database results).
Cache Serialized Application Object Entities
Cache a subset of a fetched database row into a custom structure that
can be consumed by your applications.
Pro: Use application objects in their native application state with simple serializing and deserializing techniques. This can
rapidly accelerate application performance by minimizing data
transformation logic.
Con: Advanced application development use case
Regarding 2)
What would be a good rule of thumb with regards to the Key?
Using the SQL query as the Key is OK for as long as you are sure it is unique. Add prefixes if there is a risk of not-uniqueness. You may have other databases with the same table names, leading to the same queries. Also make them invariant: all lower case or upper case. Redis keys are case-sensitive.
But if you do so, does that mean you would have to store the entire object returned by the database as the Value (as per question 1)?
Not necessarily, it comes down to what processing you are doing with the query. Chances are some are best stored as raw entire object for processing, some as JSON-stringified object to return quickly to the client, some as rows, etc. The best is to adapt accordingly.
Is this the best way to do it?
Not necessarily.
If you are using an ORM, do you still use the SQL query as the key?
You may if your ORM easily exposes the SQL Query programmatically, and it is consistent.
I wouldn't get fixed on the idea of using the SQL Query as key, use something you can be sure it is consistent, it will optimize your processing, and you'll have clear rules to invalidate. It could be the method call with parameters, the web API call, etc.

Solutions for Recording Google Protocol Buffer Messages

My project is using protobufs for our data types. We need to be able to record our data so it can be played back later. Our use case is to recreate the event or to reprocess the same data but with new algorithms and check for improvements.
As data is flowing through our system it is all protobufs. These are easily serilized to a byte array which could be recorded to files or maybe as blobs in a database. Playback would simply mean reading the byte array and converting back to a protobuf, then sending it off into our software again.
Are there any existing technologies used for recording protobufs?
Even though the initial use case is very simple, eventually the solution will get more complex. It will probably need to:
Farm out recording to multiple hosts to keep up with the input data rate
Allow querying to find out how much data exists during a specific time period
Play back only those data records where some field has a specific value
Save the data for long term storage, e.g. never delete a record but instead move it to a tape backup
I think the above is best accomplished using a database which stores some subset of meta data along with the protobuf byte array itself. Before I go reinventing the wheel, I would like opinions on anything that exists already that might do this job.

Multiple small inserts in clickhouse

I have an event table (MergeTree) in clickhouse and want to run a lot of small inserts at the same time. However the server becomes overloaded and unresponsive. Moreover, some of the inserts are lost. There are a lot of records in clickhouse error log:
01:43:01.668 [ 16 ] <Error> events (Merger): Part 201 61109_20161109_240760_266738_51 intersects previous part
Is there a way to optimize such queries? I know I can use bulk insert for some types of events. Basically, running one insert with many records, which clickhouse handles pretty well. However, some of the events, such as clicks or opens could not be handled in this way.
The other question: why clickhouse decides that similar records exist, when they don't? There are similar records at the time of insert, which have the same fields as in index, but other fields are different.
From time to time I also receive the following error:
Caused by: ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, message: Connect to localhost:8123 [ip6-localhost/0:0:0:0:0:0:0:1] timed out, host: localhost, port: 8123; Connect to ip6-localhost:8123 [ip6-localhost/0:0:0:0:0:0:0:1] timed out
... 36 more
Mostly during project build when test against clickhouse database are run.
Clickhouse has special type of tables for this - Buffer. It's stored in memory and allow many small inserts with out problem. We have near 200 different inserts per second - it works fine.
Buffer table:
CREATE TABLE logs.log_buffer (rid String, created DateTime, some String, d Date MATERIALIZED toDate(created))
ENGINE = Buffer('logs', 'log_main', 16, 5, 30, 1000, 10000, 1000000, 10000000);
Main table:
CREATE TABLE logs.log_main (rid String, created DateTime, some String, d Date)
ENGINE = MergeTree(d, sipHash128(rid), (created, sipHash128(rid)), 8192);
Details in manual: https://clickhouse.yandex/docs/en/operations/table_engines/buffer/
This is known issue when processing large number of small inserts into (non-replicated) MergeTree.
This is a bug, we need to investigate and fix.
For workaround, you should send inserts in larger batches, as recommended: about one batch per second: https://clickhouse.tech/docs/en/introduction/performance/#performance-when-inserting-data.
I've had a similar problem, although not as bad - making ~20 inserts per second caused the server to reach a high loadavg, memory consumption and CPU use. I created a Buffer table which buffers the inserts in memory, and then they are flushed periodically to the "real" on-disk table. And just like magic, everything went quite: loadavg, memory and CPU usage came down to normal levels. The nice thing is that you can run queries against the buffer table, and get back matching rows from both memory and disk - so clients are unaffected by the buffering. See https://clickhouse.tech/docs/en/engines/table-engines/special/buffer/
Alternatively, you can use something like https://github.com/nikepan/clickhouse-bulk: it will buffer multiple inserts and flush them all together according to user policy.
The design of clickhouse MergeEngines is not meant to take small writes concurrently. The MergeTree as much as I understands merges the parts of data written to a table into based on partitions and then re-organize the parts for better aggregated reads. If we do small writes often you would encounter another exception that Merge
Error: 500: Code: 252, e.displayText() = DB::Exception: Too many parts (300). Merges are processing significantly slow
When you would try to understand why the above exception is thrown the idea will be a lot clearer. CH needs to merge data and there is an upper limit as to how many parts can exist! And every write in a batch is added as a new part and then eventually merged with the partitioned table.
SELECT
table, count() as cnt
FROM system.parts
WHERE database = 'dbname' GROUP BY `table` order by cnt desc
The above query can help you monitor parts, observe while writing how the parts would increase and eventually merge down.
My best bet for the above would be buffering the data set and periodically flushing it to DB, but then that means no real-time analytics.
Using buffer is good, however please consider these points:
If the server is restarted abnormally, the data in the buffer is lost.
FINAL and SAMPLE do not work correctly for Buffer tables. These conditions are passed to the destination table, but are not used for processing data in the buffer
When adding data to a Buffer, one of the buffers is locked. (So no reads)
If the destination table is replicated, some expected characteristics of replicated tables are lost when writing to a Buffer table. (no deduplication)
Please read throughly, it's a special case engine: https://clickhouse.tech/docs/en/engines/table-engines/special/buffer/

Is it possible to write multiple blobs in a single request?

We're planning to use Azure blob storage to save processing log data for later analysis. Our systems are generating roughly 2000 events per minute, and each "event" is a json document. Looking at the pricing for blob storage, the sheer number of write operations would cost us tons of money if we take each event and simply write it to a blob.
My question is: Is it possible to create multiple blobs in a single write operation, or should I instead plan to create blobs containing multiple event data items (for example, one blob for each minute's worth of data)?
It is possible ,but isn't good practice ,it take long times to multipart files to be merge, hence we are trying to separate upload action from entity persist operation by passing entity id and update doc[image] name in other controller
Also it keeps you clean upload functionality .Best Wish
It's impossible to create multiple blobs in a single write operation.
One feasible solution is to create blobs containing multiple event data items as you planned (which is hard to implement and query in my opinion); another solution is to store the event data into Azure Storage Table rather than Blob, and leverage EntityGroupTransaction to write table entities in one batch (which is billed as one transaction).
Please note that all table entities in one batch must have the same partition key, which should be considered when you're designing your table (see Azure Storage Table Design Guide for further information). If some of your events have large data size that exceeds the size limitation of Azure Storage Table (1MB per entity, 4MB per batch), you can save data of those events to Blob and store the blob links in Azure Storage Table.

Resources