Azure Data Explorer slow ingestion / failures - parquet

I am trying to ingest some data into ADX but I don't see any data appearing:
40 parquet files (ranging from 1Mb to 550Mb, in total 8GB)
From blob storage using Event Grid
Running on D11 V2 cluster tier with auto-scale
Ingestion utilization stays at 100% for 2 days, then drops to 0%
Ingestion latency raises to 24h at maximum, then drops
Count of rows is always 0, database size does not increase
Operations log shows a lot of failures: "Operation": DataIngestPull, "The admin command execution timed out at '2020-09-08T06:39:18.1115065Z'" etc.
Diagnostic logging also shows failures: FailedIngestion, Blob has exceeded the '2.00:00:00' retry period or '10' retry attempts, BadRequest_MessageExhausted
When I ingest one small file it works and the data shows up
Worst part is that I am not able to cancel the ingestion, but have to wait 2 days. Is there a way to cancel?
How can I succesfully ingest this data? Is it supposed to take this long?

It looks like the ingestion is timing out, since each ingestion batch is too big. The best way to resolve this would be the add the raw data size of the blob (could be an approximate size) to the blob metadata, as explained here. Alternatively, you can try reducing the database/table batching policy, as explained here (you can start by reducing from 1GB to 500MB and reduce further if this is not sufficient).

Related

Is there a way to find out if load on Elastic stack is growing?

I have just started learning Elastic stack and I already have to diagnose production issue. Our setup from time to time has problems with pulling messages from ActiveMq to Elastic Search using Logstash. There is a lag which can be 1-3 hours.
One suspicion is that maybe load went up after latest release of our application.
Is there a way to find out total size of messages stored grouped by month? Not only their number but total size of them. Maybe documents' size went up not number of documents.
Start with setting up a production monitoring instance to provide detailed statistics on your cluster: https://www.elastic.co/guide/en/elastic-stack-overview/7.1/monitoring-production.html
This will allow you to get at those metrics like messages/month, average document size, index performance, buffer load, etc. A bit more detail on internal performance is available with https://visualvm.github.io/
While putting that piece together, you can also tweak Logstash performance e.g.
Tune Logstash worker settings:
Begin by scaling up the number of pipeline workers by using the -w flag. This will increase the number of threads available for filters and outputs. It is safe to scale this up to a multiple of CPU cores, if need be, as the threads can become idle on I/O.
You may also tune the output batch size. For many outputs, such as the Elasticsearch output, this setting will correspond to the size of I/O operations. In the case of the Elasticsearch output, this setting corresponds to the batch size.
From https://www.elastic.co/guide/en/logstash/current/performance-troubleshooting.html

Performance Issue: rejected execution of org.elasticsearch.ingest.PipelineExecutionSService

I've struggled to transfer 500 Million documents, which are shipped from Windows IIS logs, from kafka to elasticsearch. At the beginning of shipping process, Everything is good.
From Kafka-manager dashboard, I could see the speed of document out/bytes is about 1 million per minutes.
After one week, The speed of out/bytes is decreased to 200K per minutes. I thought that it has some problem. As I opened elasticsearch log file, I could see numerous of ERRORs.
Error is the below statement.
[ERROR][o.e.a.b.TransportBulkAction] [***-node-2] failed to execute
pipeline for a bulk request org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.ingest.PipelineExecutionSService$..... on EsThreadPoolExecutor
At the first time, I thought it was a problem of thread pool deficiency..
But tuning write thread pool is not strongly recommended by elasticsearch forum.
At the second time, it came from ingest-geoip because error statement said that "ingest.PipelineExecution....", So i simplified geoip filter in my logstash configuration. that is, turn off geoip.
Also, Tried to reduce the number of pipeline worker, and the number of batch size in logstash config.
Everything'd failed... There is no hope for overcoming this error.
Help Genius!
From the log you pasted it looks like the queue capacity is 200, but there are 203 queued tasks. I guess that either the indexing is slow due to ingest pipelines taking too long, or that there is a burst of indexing data which puts pressure on the queue. another option is that you are not rolling over the index, and when an index is getting too big the merges are bigger and longer and indexing performance decreases.
I would start by increasing the queue capacity to 2000, monitor the queue size, and check whether you get momentary/long bursts of incoming data.
Another thing to do is to monitor the indexing latency, and check whether ingest pipelines are the bottleneck, by checking their timing. you can try disabling them for a short time (if that is acceptable) and see if that relaxes the queue and errors in the log.

Azure Table Increased Latency

I'm trying to create an app which can efficiently write data into Azure Table. In order to test storage performance, I created a simple console app, which sends hardcoded entities in a loop. Each entry is 0.1 kByte. Data is sent in batches (100 items in each batch, 10 kBytes each batch). For every batch, I prepare entries with the same partition key, which is generated by incrementing a global counter - so I never send more than one request to the same partition. Also, I control a degree of parallelism by increasing/decreasing the number of threads. Each thread sends batches synchronously (no request overlapping).
If I use 1 thread, I see 5 requests per second (5 batches, 500 entities). At that time Azure portal metrics shows table latency below 100ms - which is quite good.
If I increase the number of treads up to 12 I see x12 increase in outgoing requests. This rate stays stable for a few minutes. But then, for some reason I start being throttled - I see latency increase and requests amount drop.
Below you can see account metrics - highlighted point shows 2K31 transactions (batches) per minute. It is 3850 entries per second. If threads are increased up to 50, then latency increases up to 4 seconds, and transaction rate drops to 700 requests per second.
According to documentation, I should be able to send up to 20K transaction per second within one account (my test account is used only for my performance test). 20K batches mean 200K entries. So the question is why I'm being throttled after 3K entries?
Test details:
Azure Datacenter: West US 2.
My location: Los Angeles.
App is written in C#, uses CosmosDB.Table nuget with the following configuration: ServicePointManager.DefaultConnectionLimit = 250, Nagles Algorithm is disabled.
Host machine is quite powerful with 1Gb internet link (i7, 8 cores, no high CPU, no high memory is observed during the test).
PS: I've read docs
The system's ability to handle a sudden burst of traffic to a partition is limited by the scalability of a single partition server until the load balancing operation kicks-in and rebalances the partition key range.
and waited for 30 mins, but the situation didn't change.
EDIT
I got a comment that E2E Latency doesn't reflect server problem.
So below is a new graph which shows not only E2E latency but also the server's one. As you can see they are almost identical and that makes me think that the source of the problem is not on the client side.

How to calculate my applications iops utilization

I'm trying to figure out how I determine the IOPS my application is driving so I can property size our cloud infrastructure components. I understand what IOPS are between a database and the storage layer but I'd like to understand how I go about calculating what my application drives. Here are some of my applications characteristics:
1) 90% write and 10% read
2) We have a java based application that ultimately inserts into an HBase database
3) Process about 50 msg/sec where each message results in probably 2 HBase inserts
Here is what I'm not sure about:
1) Is the only way to calculate the IOPS is by running iostat or something on the actual server during load?
2) Is there a general way I can calculate what needed from the data volume/size coming in and not on the actual storage unit?
3) Is there any relationship to the # of transactions and the # of bytes in each transaction (I read somewhere an IO is usually 3K, most inserts don't contain that much info so it doesn't matter).
Any help would be greatly appreciated.
Not very familiar with Hbase. But from the documentation, it uses a log structure, which means the writes will be sequential writes. It also has compactions, which will cause both sequential reads and writes of multi-MB. The read queries will cause random reads on the storage layer.
So here is the answer to your questions:
As far as I know, yes. The only way to get IOPS is running iostat. You can probably get some compaction stats from the application level. But it is hard to extract IOPS level details.
Compaction will cause more storage than the entire data size. And if your application is write heavy(compaction might not catch up with the speed of inserts), the size of actual data volume will be much larger. Given the 50 msg/sec in your question, this should not be the case. I will provision disks double the size of expected data volume per instance.
As mentioned above, Hbase is log structured. Writes are accumulated in memory and flushed to disk together. So it doesn't matter the size of each transaction.

How much load can cassandra handle on m1.xlarge instance?

I setup 3 nodes of Cassandra (1.2.10) cluster on 3 instances of EC2 m1.xlarge.
Based on default configuration with several guidelines included, like:
datastax_clustering_ami_2.4
not using EBS, raided 0 xfs on ephemerals instead,
commit logs on separate disk,
RF=3,
6GB heap, 200MB new size (also tested with greater new size/heap values),
enhanced limits.conf.
With 500 writes per second, the cluster works only for couple of hours. After that time it seems like not being able to respond because of CPU overload (mainly GC + compactions).
Nodes remain Up, but their load is huge and logs are full of GC infos and messages like:
ERROR [Native-Transport-Requests:186] 2013-12-10 18:38:12,412 ErrorMessage.java (line 210) Unexpected exception during request java.io.IOException: Broken pipe
nodetool shows many dropped mutations on each node:
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 7
BINARY 0
READ 2
MUTATION 4072827
_TRACE 0
REQUEST_RESPONSE 1769
Is 500 wps too much for 3-node cluster of m1.xlarge and I should add nodes? Or is it possible to further tune GC somehow? What load are you able to serve with 3 nodes of m1.xlarge? What are your GC configs?
Cassandra is perfectly able to handle tens of thousands small writes per second on a single node. I just checked on my laptop and got about 29000 writes/second from cassandra-stress on Cassandra 1.2. So 500 writes per second is not really an impressive number even for a single node.
However beware that there is also a limit on how fast data can be flushed to disk and you definitely don't want your incoming data rate to be close to the physical capabilities of your HDDs. Therefore 500 writes per second can be too much, if those writes are big enough.
So first - what is the average size of the write? What is your replication factor? Multiply number of writes by replication factor and by average write size - then you'll approximately know what is required write throughput of a cluster. But you should take some safety margin for other I/O related tasks like compaction. There are various benchmarks on the Internet telling a single m1.xlarge instance should be able to write anywhere between 20 MB/s to 100 MB/s...
If your cluster has sufficient I/O throughput (e.g. 3x more than needed), yet you observe OOM problems, you should try to:
reduce memtable_total_space_mb (this will cause C* to flush smaller memtables, more often, freeing heap earlier)
lower write_request_timeout to e.g. 2 seconds instead of 10 (if you have big writes, you don't want to keep too many of them in the incoming queues, which reside on the heap)
turn off row_cache (if you ever enabled it)
lower size of the key_cache
consider upgrading to Cassandra 2.0, which moved quite a lot of things off-heap (e.g. bloom filters and index-summaries); this is especially important if you just store lots of data per node
add more HDDs and set multiple data directories, to improve flush performance
set larger new generation size; I usually set it to about 800M for a 6 GB heap, to avoid pressure on the tenured gen.
if you're sure memtable flushing lags behind, make sure sstable compression is enabled - this will reduce amount of data physically saved to disk, at the cost of additional CPU cycles

Resources