I have written a program, loops through 1700 rows of data and 2 columns. It takes forever to run the program. Does anyone have a solution to this problem?
Related
I read 8 billion lines from GCS, do processing on each line, then output. My processing step can take a little time and to avoid worker leases expiring and getting below error; I do a GroupByKey on 8 billion and group by id to prevent fusion.
A work item was attempted 4 times without success. Each time the
worker eventually lost contact with the service. The work item was
attempted on:
The problem is GroupByKey step is taking forever to complete for 8 billion lines even on a 1000 high-mem-2 nodes.
I looked into the possible cause of slow processing being; large size of each value generated per key by GroupByKey. I don't think that's is possible because out of 8 billion inputs, one input id cannot be in that set more than 30 times. So clearly the problem of HotKeys is not here, something else is going on.
Any ideas on how to optimize this are appreciated. Thanks.
I did manage to solve this problem. There were a number of incorrect assumptions here on my part about dataflow wall times. I was looking at my pipeline and the step with highest wall time; which was in days, I thought is the bottleneck. But in Apache beam a step is usually fused together with steps downstream in the pipeline, and will only run as fast as the step down the pipeline runs. So a wall time that is significant is not enough to conclude that this step is the bottleneck in the pipeline. The real solution to the problem stated above came from this thread. I reduced the number of nodes my pipeline runs on. And changed node type from high-mem-2 to high-mem-4. I wish there was an easy way to get memory usage metrics for a dataflow pipeline. I had to ssh into VMs and do JMAP.
There are several partitions on the cluster I work on. With sinfo I can see the time limit for each partition. I put my code to work on mid1 partition which has time limit of 8-00:00:00 from which I understand that time limit is 8 days. I had to wait for 1-15:23:41 which means nearly 1 day and 15 hours. However, my code ran for only 00:02:24 which means nearly 2.5 minutes (and the solution was converging). Also, I did not set a time limit in the file submitted with sbatch The reason of my code stopped was given as:
JOB 3216125 CANCELLED AT 2015-12-19T04:22:04 DUE TO TIME LIMIT
So, why my code was stopped if I did not exceed the time limit? I was asking this to the guys who were responsible for the cluster but they did not return.
Look at the value of DefaultTime in the output of scontrol show partitions. This is the maximum time that is allocated to your job in the case you do not specify it by yourself with --time.
Most probably this value is set to 2 minutes to force you to specify a sensible time limit (within the limits of the partition.)
I have a problem with my load test on JMeter. I set:
Number of Threads = 10
Ramp-Up-Period = 1
Loop Count = 1
So normally the Test goes up to the 10 Threads (10/10) (You can find this in the right upper corner!) and then goes back to zero. My Problem is, that my JMeter Test just goes up to (4/10) and then back to zero. I have no idea why it doesn't work normally.
I tried this with another Number of Threads, e.g. with 20. And the problem also occurs in that example. The test goes up to (7/20) and then back to zero. But it should go up to (20/20).
Your test is setup for loopcount=1 and your threads start up gradually. Do you think your first thread will still be active by the time the last thread starts up?
For your particular test, apparently there are at-most 4 active threads at some point of the test execution. If your test runs for a very short time, if the first thread exits before second thread starts, you wont even get that, the max you'll see is 1/10
If you need all threads to be active at the same time, either
Increase the length of your transaction beyond ramp up time
Have a larger loop count so the threads are active longer
Reduce or eliminate your ramp-up time so all threads become active simultaneously (Though this is usually a bad idea)
I've noticed that when load testing with JMeter, if I do a single loop I get a fairly long average time for my test. If I have say a Loop Count of 10, my average time peaks early on and then drops way down. For example if I setup a test on a simple get request for a page with the following settings:
Number of Threads (users): 500
Ramp-up Period(in seconds): 5
Loop Count: 1
My average time is about 4 seconds. If I change it to 10 loops:
Number of Threads (users): 500
Ramp-up Period(in seconds): 5
Loop Count: 10
I get an average time of 1.4 seconds.
Apache's documentation states that the Loop Count is:
The number of times the subelements of this controller will be
iterated each time through a test run.
Is it possible that this means the first request will actually do something on the server and the subsequent 9 requests will be pulling from cache?
How exactly is the Loop Count being used that would cause the results I'm seeing?
Yes, Remaining 9 requests must be pulling from cache.
Loop controller is simple loop executor doing nothing magic inside.
Improved performance is because of use of cached results on server.
If you want one thing you can try, use the loop controller but use different substituted parameters so that every
time different requests will be sent to server (I know that loop controller is for repeating same values, but this is just to confirm the effect of caching).
then compare the results.
I hope this clears the doubt :)
I need a little help on how to debug the matter. My current jMeter scenario seems to run fine as long as I keep the loop count at 1, when I add more loops the performance starts to degrade a lot.
I have a thread group with 225 threads, 110s ramp up, loop count 1 - my total response time is ca. 8-9secs. I run this several times to confirm, each run shows similar response times.
Now, I did the same test , just changed the loop count to 3, all other parameters unchanged, and the performance went south, total response time is ca. 30-40s.
I was under the impression that 3x 1 loop runs would be, more or less, equivalent to 1x 3 loops run. It seems that is not the matter. Anyone could explain to me why is that?
Or, if this should be equivalent, any idea where to search for the culprit of degrading performance?
What you're saying is that the response times degrade if you increase the throughput (as in requests per second).
Based on 225 threads making a single request with a rampup of 110 seconds your throughput is going to be in the region of 2 requests every second. Increasing the loop count to 3 is going to up that by around a factor of 3 to 6 requests a second (assuming no timers). Except of course if the response times are increasing then you will not reach this level of throughput which is you problem.
Given that this request is already taking 8-9 seconds, which is not especially fast, it could be assumed that there is some heavy thinking going on behind the scenes and that you have simply hit a bottleneck, somewhere...
Try using less threads and a longer rampup and then monitor the response times and the throughput rate. At some point, as the load increases, you will see response times start to degrade and then at this point you need to roll up your sleeves and have a look at what is happening in your AUT.
Note. 3 x 1 loop is not the same as 1 x 3 loops. The delay between iterations will cause one thread with multiple iterations to have a different throughput vs. more threads with one iteration where the throughput is decided by the rampup, not the delay. That said, this is not what you describe in your question - you mention that the number of threads is consistent.
In addition to the answer from Oliver: try to use custom listener like Active Threads Over Time Listener - to monitor your load-scenario.
You can also retry both your scenarios described above, with this listener - sure, you'll see the difference in graphs.