We are facing this issue in our cluster where we use phoenix to write the data. We have observed our jobs works fine initially. But after few days (4-5 days) we see drastic increase in our job time (4 mins to 30 mins). Input data size is almost same. And restarting hbase solves the issue for the next 4-5 days.
We have 70 region servers of size 128G each. 50k(per region server)*70(no of region servers) puts per job.
From the RS logs I can see increase in responseTooSlow warning logs frequency from 40k/day to 280k/day but response time is less than 1000ms in those logs.
2018-04-18 00:00:07,831 WARN [RW.default.writeRpcServer.handler=10,queue=4,port=16020] ipc.RpcServer: (responseTooSlow): {"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","starttimems":1524009607697,"responsesize":106,"method":"Multi","processingtimems":134,"client":"192.168.25.70:54718","queuetimems":0,"class":"HRegionServer"}
Related
I am running a spark streaming (1.6.1) on yarn using DirectAPI to read events from Kafka topic having 50 partitions and writing on HDFS. I have a batch interval of 60 seconds. I was receiving around 500K messages which was getting processed under 60 Sec.
Suddenly spark started receiving 15-20 million messages which took around 5-6 minutes to process with a batch interval of 60 seconds. I have configured "spark.streaming.concurrentJobs=4".
So when batch takes a long time for processing spark initiate concurrent 4 active tasks to handle the backlog batches but still over a period of time batch backlog increases as batch interval is too less for such volume of data.
I have few doubts around this.
When I start receiving 15-20 million messages & time to process those messages is around 5-6 minutes with batch interval of 60 Sec. When I check my HDFS directory I see the files created for each 60 Sec with 50 part files, I am little confused here my batch is getting processed in 5-6 minutes, then how it is writing files on HDFS every 1 min & 'saveAsTextFile' action is called only once per batch. Total records from all the files 50 part files comes around 3.3 million.
In order to handle the processing of 15-20 million messages, I configured my batch interval to 8-10 minutes now spark started consuming around 35-40 million messages from Kafka & again its processing time started exceeding batch interval.
I have configured 'spark.streaming.kafka.maxRatePerPartition=50' & 'spark.streaming.backpressure.enabled=true'.
I think one thing that may have confused you is the relationship between the length of a job, and the frequency.
From what you describe, with the resources available it seems that in the end the job took about 5 minutes to complete. However your batch frequency is 1 minute.
So as a result, every 1 minute you kick off some batch that takes 5 minutes to complete.
As a result, in the end you will expect to see HDFS receive nothing for the first few minutes, and then you keep receiving something every 1 minute (but with a 5 minute 'delay' from when the data went in).
There are several partitions on the cluster I work on. With sinfo I can see the time limit for each partition. I put my code to work on mid1 partition which has time limit of 8-00:00:00 from which I understand that time limit is 8 days. I had to wait for 1-15:23:41 which means nearly 1 day and 15 hours. However, my code ran for only 00:02:24 which means nearly 2.5 minutes (and the solution was converging). Also, I did not set a time limit in the file submitted with sbatch The reason of my code stopped was given as:
JOB 3216125 CANCELLED AT 2015-12-19T04:22:04 DUE TO TIME LIMIT
So, why my code was stopped if I did not exceed the time limit? I was asking this to the guys who were responsible for the cluster but they did not return.
Look at the value of DefaultTime in the output of scontrol show partitions. This is the maximum time that is allocated to your job in the case you do not specify it by yourself with --time.
Most probably this value is set to 2 minutes to force you to specify a sensible time limit (within the limits of the partition.)
I have been reading about apache Storm tried few examples from storm-starter. Also learnt about how to tune the topology and how to scale it to perform fast enough to meet the required throughput.
I have created example topology with acking enabled, i am able to achieve 3K-5K messages processing per second. It performs really fast in initial 10 to 15min or around 1mil to 2mil message and then it starts slowing down. On storm UI, I can see the overall latency starts going up gradually and does not comes back, after a while the processing drops to only few hundred a second. I am getting exact same behavior for all the typologies i tried, the simplest one is to just read from kafka using KafkaSpout and send it to transform bolt parse the msg and send it to kafka again using KafkaBolt. The parser is very fast as it takes less than a millisecond to parse the message. I tried few option of increasing/describing the parallelism, changing the buffer sizes etc. but same behavior. Please help me to find out the reason for gradual slowness in the topology. Here is the config i am using
1 Nimbus machine (4 CPU) 24GB RAM
2 Supervisor machines (8CPU) and using 1 thread per core with 24GB RAM
4 Node kafka cluster running on above 2 supervisor machines (each topic has 4 partitions)
KafkaSpout(2 parallelism)-->TransformerBolt(8)-->KafkaBolt(2)
topology.executor.receive.buffer.size: 65536
topology.executor.send.buffer.size: 65536
topology.spout.max.batch.size: 65536
topology.transfer.buffer.size: 32
topology.receiver.buffer.size: 8
topology.max.spout.pending: 250
At the start
After few minutes
After 45 min - latency started going up
After 80 min - Latency will keep going up and will go till 100 sec by the time it reaches 8 to 10mil messages
Visual VM screenshot
Threads
Pay attention to the capacity metric on RT_LEFT_BOLT, it is very close to 1; which explains why your topology is slowing down.
From the Storm documentation:
The Storm UI has also been made significantly more useful. There are new stats "#executed", "execute latency", and "capacity" tracked for all bolts. The "capacity" metric is very useful and tells you what % of the time in the last 10 minutes the bolt spent executing tuples. If this value is close to 1, then the bolt is "at capacity" and is a bottleneck in your topology. The solution to at-capacity bolts is to increase the parallelism of that bolt.
Therefore, your solution is to add more executors (and tasks) to that given bolt (RT_LEFT_BOLT). Another thing you can do is reduce the number of executors on RT_RIGHT_BOLT the capacity indicates you don't need that many executors, probably 1 or 2 can do the job.
The issue was due to GC setting with newgen params, it was not using the allocated heap completely so internal storm queues were getting full and running out of memory. The strange thing was that storm did not throw out of memory error, it just got stalled, with the help of visual vm i was able to trace it down.
I have driver program that runs a set of 5 experiments - basically the driver program just tells the program which dataset to use (of which there are 5 and they're very similar).
The first iteration takes 3.5 minutes, the second 6 minutes, the third 30 minutes and the fourth has been running for over 30 minutes.
After each run the SparkContext object is stopped, it is then re-started for the next run - I thought this method would prevent slow down, as when sc.stop is called I was under the impression that the instances were cleared of all their RDD data - this is at least how it works in local mode. The dataset is quite small and according to Spark UI only 20Mb of data on 2 nodes is used.
Does sc.stop not remove all data from a node? What would cause such a slow down?
call sc.stop after all iterations are complete. Whenever we stop SparkContenxt and invoke new, it require time to load spark configurations,jars and free driver port to execute the next job.
and
using config --executor-memory you can speed up the process, depending on how much memory you have in each node.
Stupidly, I had used T2 instances. Their burstable performance means they only work on full power for a small amount of time. Read the documentation thoroughly - lesson learnt!
I am currently trying to understand why some of my requests in my Python Heroku app take >30 seconds. Even simple requests which do absolutely nothing.
One of the things I've done is look into the load average on my dynos. I did three things:
1) Look at the Heroku logs. Once in a while, it will print the load. Here are examples:
Mar 16 11:44:50 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 11.900
Mar 16 11:45:11 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 8.386
Mar 16 11:45:32 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 6.798
Mar 16 11:45:53 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 8.031
2) Run "heroku run uptime" several times, each time hitting a different machine (verified by running "hostname"). Here is sample output from just now:
13:22:09 up 3 days, 13:57, 0 users, load average: 15.33, 20.55, 22.51
3) Measure the load average on the machines on which my dynos live by using psutil to send metrics to graphite. The graphs confirm numbers of anywhere between 5 and 20.
I am not sure whether this explains simple requests taking very long or not, but can anyone say why the load average numbers on Heroku are so high?
Heroku sub-virtualizes hosts to the guest 'Dyno' you are using via LXC. When you run 'uptime' you are seeing the whole hosts uptime NOT your containers, and as pointed out by #jon-mountjoy you are getting a new LXC container not one of your running Dynos when you do this.
https://devcenter.heroku.com/articles/dynos#isolation-and-security
Heroku’s dyno load calculation also differs from the traditional UNIX/LINUX load calculation.
The Heroku load average reflects the number of CPU tasks that are in the ready queue (i.e. waiting to be processed). The dyno manager takes the count of runnable tasks for each dyno roughly every 20 seconds. An exponentially damped moving average is computed with the count of runnable tasks from the previous 30 minutes where period is either 1-, 5-, or 15-minutes (in seconds), the count_of_runnable_tasks is an entry of the number of tasks in the queue at a given point in time, and the avg is the previous calculated exponential load average from the last iteration
https://devcenter.heroku.com/articles/log-runtime-metrics#understanding-load-averages
The difference between Heroku's load average and Linux is that Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system.
On CPU bound Dyno's I would presume this wouldn't make much difference. On an IO bound Dyno the load averages reported by Heroku would be much lower than what is reported by what you would get if you could get a TRUE uptime on an LXC container.
You can also enable sending periodic load messages of your running dynos with by enabling log-runtime-metrics
Perhaps it's expected dyno idling?
PS. I suspect there's no point running heroku run uptime - that will run it in a new one-off dyno every time.