I am using Resque for achieving cheap parallelism in my academic research - I split huge tasks into relatively small independent portions and submit them to Resque. These tasks do some heavy stuff, extensively using both database(MongoDB if that's important) and CPU.
All this works extremely slow - for my relatively small portion of dataset 1000 jobs get created and 14 hours of constant working of 2 workers is enough only for finishing ~800 of them. As you might've already suspected, this speed is more than frustrating.
I have a quad-core processor(Core i5 something, not high-end) and apart from Mongo instance and resque workers nothing gets scheduled on CPU for a considerable period of time.
Now that you know my story, all I am asking is - how do I squeeze maximum out of this setting? I believe that 3 workers + 1 mongo instance will quickly fill up all the cores, but at the same time mongo doesn't have to work all the time..
Related
I have a simple job on AWS that takes more than 25 minutes. I changed the number of DPUs from 10 to 100 (the max allowed), the job still takes 13 minutes.
Any other suggestions on improving the performance?
I've noticed the same behavior.
My understanding is that the job time includes spinning up an EMR cluster, which takes several minutes. So if it takes.. say 8 minutes (just a guess), then your job time went from 17 -> 5.
Unless CPU or memory was a bottleneck for your existing job, adding more DPUs (i.e. more CPU and memory) wouldn't benefit your job significantly. At least the benefits will not be linear, i.e. 10 times more DPU doesn't mean that the job will run 10 times faster.
I suggest that you gradually increase the number of DPUs to look at performance gains, and you will notice that after a certain point adding more DPUs doesn't have a major impact on performance and that probably is the right amount of DPUs for your job.
Can we take a look at your job? Sometimes simple may not be performant. We've found that simple things like using the DynamicFrame.map transformation is really slow and you might be better off using a tmp table and mapping your data using the SQLContext
Due to having trouble with the reliability and scheduling in celery we decided to evaluate alternatives. I have been struggling setup a benchmark between the two message queue solutions with regards to base performance.
My current approach is to place 1000 tasks (of getting nvie.com and counting the words on the site) on the two different queues and measuring how fast 4 celery (20 sec) vs 4 rq workers (70 sec). My code is https://github.com/swartchris8/celery-vs-rq-benchmark I am running celery through the command line and rq through supervisor on a Mac, Ubuntu running instructions for rq are clear from the vagrant file.
Celery is performing so much better and I am not sure if my test setup of measuring how fast the queues are clearing is flawed for measuring task throughput. I am also using default RQ workers which I suspect could be much slower.
Is my approach the correct way in benchmarking the two message queue system with regards to throughput? What kind of approaches have you taken? Is celery so much faster than RQ?
Suppose I've got a pool of workers (computers), around 1000 of them, but they're highly unreliable. I expect each to go down multiple times a day, sometimes for extended periods of time. Fyi these are volunteer computers running BOINC (not my botnet, I swear!)
Are there any tools that exist to facilitate using them to do parallel computations, (mostly trivially parallelizable)? I'm thinking something like an IPython parallel where maybe when a node dies the calculation is restarted elsewhere, and maybe where when a new node joins its brought up to speed to the current working environment.
could you please give some small explanation on what happens when kitchen.bat calls a job?
I can only guess that it instantiates it and that probably is the reason why my taskmgr spikes up whenever I call 5 jobs all at the same time. after a couple of seconds, the spikes would wind down.
or maybe not? would there be other reasons that the calling of jobs through kitchen uses a lot of resources?
would there be ways to save the cpu resources while taking advantage of parallelism (calling the jobs all at the same time)? are there optimizations that can be done?
How exactly are you calling 5 jobs at the same time? in the shell script? In which case the spike is because you're starting 5 JVM's at the same time - starting the JVM is relatively expensive. And there should be no need to do this - you can do it all in one JVM and do the parallelisation in the job?
Kitchen itself doesn't specifically use a lot of resources. If your transformation has a large number of steps, then getting that going can take some time, but not ages.
Is this really a problem? Why does it matter if your cpu spikes for a couple of seconds? The point of parallelism is generally to max out the CPU/box/resource!
It seems for the cost of running 2 large instances, I can run about 40 micro instances. In a distributed system (MongoDB in my case), 40 micro instances sounds a lot faster than 2 large instances, assume the database file is on EBS in both cases.
It this true?
Micro instances may have 97% CPU "steal" time, and they can be unresponsive for several seconds.
In many use cases it's not acceptable to have to wait 15 seconds for a reply. I think small instances are the best deal. I run several of them and I divide the risk of problems and the load among them.
source: personal experience and this article