How to avoid Kafka latency spikes caused by log segment flush - performance

We're experiencing a big latency spikes (two orders of magnitude) on 99th percentile in our Kafka deployment. We googled a bit and found that this is pretty well documented phenomenon: https://issues.apache.org/jira/browse/KAFKA-9693
In the ticket, suggested "solution" is disabling log flush but that's hardly an acceptable solution if you care about data consistency.
We've tried to tune around log sizes, flush intervals etc. but that's only delaying the log flush doing nothing to the magnitude of the spike.
Question
Is there any real solution/workaround to this problem? To be clear, I'm talking about how to lower the spike down to the minimum.

Related

Performance Issue: rejected execution of org.elasticsearch.ingest.PipelineExecutionSService

I've struggled to transfer 500 Million documents, which are shipped from Windows IIS logs, from kafka to elasticsearch. At the beginning of shipping process, Everything is good.
From Kafka-manager dashboard, I could see the speed of document out/bytes is about 1 million per minutes.
After one week, The speed of out/bytes is decreased to 200K per minutes. I thought that it has some problem. As I opened elasticsearch log file, I could see numerous of ERRORs.
Error is the below statement.
[ERROR][o.e.a.b.TransportBulkAction] [***-node-2] failed to execute
pipeline for a bulk request org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.ingest.PipelineExecutionSService$..... on EsThreadPoolExecutor
At the first time, I thought it was a problem of thread pool deficiency..
But tuning write thread pool is not strongly recommended by elasticsearch forum.
At the second time, it came from ingest-geoip because error statement said that "ingest.PipelineExecution....", So i simplified geoip filter in my logstash configuration. that is, turn off geoip.
Also, Tried to reduce the number of pipeline worker, and the number of batch size in logstash config.
Everything'd failed... There is no hope for overcoming this error.
Help Genius!
From the log you pasted it looks like the queue capacity is 200, but there are 203 queued tasks. I guess that either the indexing is slow due to ingest pipelines taking too long, or that there is a burst of indexing data which puts pressure on the queue. another option is that you are not rolling over the index, and when an index is getting too big the merges are bigger and longer and indexing performance decreases.
I would start by increasing the queue capacity to 2000, monitor the queue size, and check whether you get momentary/long bursts of incoming data.
Another thing to do is to monitor the indexing latency, and check whether ingest pipelines are the bottleneck, by checking their timing. you can try disabling them for a short time (if that is acceptable) and see if that relaxes the queue and errors in the log.

Debug failed shuffles in hadoop map reduces

I am seeing as the size of the input file increase failed shuffles increases and job complete time increases non linearly.
eg.
75GB took 1h
86GB took 5h
I also see average shuffle time increase 10 fold
eg.
75GB 4min
85GB 41min
Can someone point me a direction to debug this?
Whenever you are sure your algorithms are correct, automatic hard-disk volumes partioning or fragmentation problems may occur somewhere after that 75Gb threshold, as of you are probably using the same filesystem for caching the results.

Load test error rate failure

I am currently doing a load test to simulate concurrency on apache on an internal network. Below is the response time that i am getting based on 10/50/100/200/500/1000 people. My first question is, how do I deduce whether this load is too much or too little. and latter:
Attached below is the error rate
a) It seems to me that when the error rate hits 100%, the response time will fluctuate between 30 - 40 ms even for other tests.
b) And when the error rate is higher for apache, the response time seem to be faster.
Could someone shed some light on why will this is so for a) when the error rate hits 100%, why the response time will fluactuate at 30/40ms and b) why will the response time decrease when the error rate increases.
Thanks for taking your time in this matter.
I can't help with (a). However, (b) is fairly common in load testing - particularly for apps where the errors happen early in the request service cycle and generating an error message requires much less work than generating a correct response to the request. BUT, I don't see evidence of that in your results. The response time is increasing as the user load increase, as does the error rate.
Have you increased the connection limits in Apache? It's pretty well know that by default, Apache is not configured to handle a large number of simultaneous connections. IIRC, Nginx is. At the load levels your results indicate, this could be impacting your results. Also, is your testing tool using persistent connections?
On rule of thumb to tell whether the load is too much is check if there are errors or response time is too long. In your case, you have non-trivial amount of errors at 50 people (VUser), this means something is wrong. You may want to investigate before increasing the people/VUser.
#CMerrill's answer for b) sounds right. When there is error, the processing load on server may be less (in most cases). For a), the fluctuation of response time around 30ms to 40ms sounds normal. The key issue is to investigate on the errors.

How could I tell if my hadoop config parameter io.sort.factor is too small or too big?

After reading http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html we came to the conclusion our 6-nodes hadoop cluster could use some tuning, and io.sort.factor seems to be a good candidate, as it controls an important tradeoff. We're planning on tweaking and testing, but planning ahead and knowing what to expect and what to watch for seems reasonable.
It's currently on 10. How would we know that it's causing us too much merges? When we raise it, how would we know it's causing too much files to be opened?
Note that we can't follow the blog log extracts directly as it's updated to CDH3b2, and we're working on CDH3u2, and they have changed...
There are a few tradeoffs to consider.
the number of seeks being done when merging files. If you increase the merge factor too high, then the seek cost on disk will exceed the savings from doing a parallel merge (note that OS cache might mitigate this somewhat).
Increasing the sort factor decreases the amount of data in each partition. I believe the number is io.sort.mb / io.sort.factor for each partition of sorted data. I believe the general rule of thumb is to have io.sort.mb = 10 * io.sort.factor (this is based on the seek latency of the disk on the transfer speed, I believe. I'm sure this could be tuned better if it was your bottleneck. If you keep these in line with each other, then the seek overhead from merging should be minimized
If you increase io.sort.mb, then you increase memory pressure on the cluster, leaving less memory available for job tasks. Memory usage for sorting is mapper tasks * io.sort.mb -- so you could find yourself causing extra GCs if this is too high
Essentially,
If you find yourself swapping heavily, then there's a good chance you have set the sort factor too high.
If the ratio between io.sort.mb and io.sort.factor isn't correct, then you may need to change io.sort.mb (if you have the memory) or lower the sort factor.
If you find that you are spending more time in your mappers than in your reducers, then you may want to increase the number of map tasks and decrease the sort factor (assuming there is memory pressure).

Spreading out data from bursts

I am trying to spread out data that is received in bursts. This means I have data that is received by some other application in large bursts. For each data entry I need to do some additional requests on some server, at which I should limit the traffic. Hence I try to spread up the requests in the time that I have until the next data burst arrives.
Currently I am using a token-bucket to spread out the data. However because the data I receive is already badly shaped I am still either filling up the queue of pending request, or I get spikes whenever a bursts comes in. So this algorithm does not seem to do the kind of shaping I need.
What other algorithms are there available to limit the requests? I know I have times of high load and times of low load, so both should be handled well by the application.
I am not sure if I was really able to explain the problem I am currently having. If you need any clarifications, just let me know.
EDIT:
I'll try to clarify the problem some more and explain, why a simple rate limiter does not work.
The problem lies in the bursty nature of the traffic and the fact, that burst have a different size at different times. What is mostly constant is the delay between each burst. Thus we get a bunch of data records for processing and we need to spread them out as evenly as possible before the next bunch comes in. However we are not 100% sure when the next bunch will come in, just aproximately, so a simple divide time by number of records does not work as it should.
A rate limiting does not work, because the spread of the data is not sufficient this way. If we are close to saturation of the rate, everything is fine, and we spread out evenly (although this should not happen to frequently). If we are below the threshold, the spreading gets much worse though.
I'll make an example to make this problem more clear:
Let's say we limit our traffic to 10 requests per seconds and new data comes in about every 10 seconds.
When we get 100 records at the beginning of a time frame, we will query 10 records each second and we have a perfect even spread. However if we get only 15 records we'll have one second where we query 10 records, one second where we query 5 records and 8 seconds where we query 0 records, so we have very unequal levels of traffic over time. Instead it would be better if we just queried 1.5 records each second. However setting this rate would also make problems, since new data might arrive earlier, so we do not have the full 10 seconds and 1.5 queries would not be enough. If we use a token bucket, the problem actually gets even worse, because token-buckets allow bursts to get through at the beginning of the time-frame.
However this example over simplifies, because actually we cannot fully tell the number of pending requests at any given moment, but just an upper limit. So we would have to throttle each time based on this number.
This sounds like a problem within the domain of control theory. Specifically, I'm thinking a PID controller might work.
A first crack at the problem might be dividing the number of records by the estimated time until next batch. This would be like a P controller - proportional only. But then you run the risk of overestimating the time, and building up some unsent records. So try adding in an I term - integral - to account for built up error.
I'm not sure you even need a derivative term, if the variation in batch size is random. So try using a PI loop - you might build up some backlog between bursts, but it will be handled by the I term.
If it's unacceptable to have a backlog, then the solution might be more complicated...
If there are no other constraints, what you should do is figure out the maximum data rate that you are comfortable with sending additional requests, and limit your processing speed according to that. Then monitor what is happening. If that gets through all of your requests quickly, then there is no harm . If its sustained level of processing is not fast enough, then you need more capacity.

Resources