Celery vs RQ benchmarking - performance

Due to having trouble with the reliability and scheduling in celery we decided to evaluate alternatives. I have been struggling setup a benchmark between the two message queue solutions with regards to base performance.
My current approach is to place 1000 tasks (of getting nvie.com and counting the words on the site) on the two different queues and measuring how fast 4 celery (20 sec) vs 4 rq workers (70 sec). My code is https://github.com/swartchris8/celery-vs-rq-benchmark I am running celery through the command line and rq through supervisor on a Mac, Ubuntu running instructions for rq are clear from the vagrant file.
Celery is performing so much better and I am not sure if my test setup of measuring how fast the queues are clearing is flawed for measuring task throughput. I am also using default RQ workers which I suspect could be much slower.
Is my approach the correct way in benchmarking the two message queue system with regards to throughput? What kind of approaches have you taken? Is celery so much faster than RQ?


Puma clustering benefits for site which handles lots of uploads/downloads

I'm trying to understand the benefits of using puma clustering. The GitHub says that the number puma workers should be set to the number of cpu cores, and the default number of threads for each is 0-16. The worker processes can run in parallel while the threads run concurrently. It was my understanding that The MRI GIL only allows one thread across all cores to run Ruby code, so how does puma enable things to run in parallel /provide benefits over running one worker process with double the amount of threads? The site I'm working on is heavily IO bound, handling several uploads and downloads at the same time - any config suggestions for this set up are also welcome.
The workers in clustered mode will actually spawn new child processes each of which has its own "GIL". Only one thread in a single process can be running code at one time, thus having a process per cpu core works well because each cpu can only be doing one thing at a time. It also makes sense to run multiple threads per process because if a thread is waiting for IO it will allow another thread to execute.

Parallelization with highly unreliable workers?

Suppose I've got a pool of workers (computers), around 1000 of them, but they're highly unreliable. I expect each to go down multiple times a day, sometimes for extended periods of time. Fyi these are volunteer computers running BOINC (not my botnet, I swear!)
Are there any tools that exist to facilitate using them to do parallel computations, (mostly trivially parallelizable)? I'm thinking something like an IPython parallel where maybe when a node dies the calculation is restarted elsewhere, and maybe where when a new node joins its brought up to speed to the current working environment.

Finish sidekiq queues much quicker

I reached a point now, where is taking to long for a queue to finish, because new jobs are added to that queue.
What are the best options to overcome this problem.
I already use 50 processors, but I noticed that if I open more, it will take longer for jobs to finish.
My setup:
ruby-on-rails 4,
You need to measure where you are constrained by resources.
If you're seeing things slow down as you add more workers you're likely blocked by your database server. Have you upgraded your Redis server to handle this amount of load? Where are you storing the scraped data to? Can that system handle the increased write load?
If you were blocked on CPU or I/O, you should see the amount of work through the system scale linearly as you add more workers. Since you're seeing things slow down when you scale out, you should measure where your problem is. I'd recommend instrumenting NewRelic for your worker processes and measuring where the time is being spent.
My guess would be that your Redis instance can't handle the load to manage the work queue with 50 worker processes.
Based on your comment, it sounds like you're entirely I/O Bound doing web scraping. In that case, you should be increasing the concurrency option for each Sidekiq worker using the -c option to spawn more threads. Having more threads will allow you to continue processing scraping jobs even when scrapers are blocked on network I/O.

why do pentaho jobs ran through kitchen take a lot of cpu resources?

could you please give some small explanation on what happens when kitchen.bat calls a job?
I can only guess that it instantiates it and that probably is the reason why my taskmgr spikes up whenever I call 5 jobs all at the same time. after a couple of seconds, the spikes would wind down.
or maybe not? would there be other reasons that the calling of jobs through kitchen uses a lot of resources?
would there be ways to save the cpu resources while taking advantage of parallelism (calling the jobs all at the same time)? are there optimizations that can be done?
How exactly are you calling 5 jobs at the same time? in the shell script? In which case the spike is because you're starting 5 JVM's at the same time - starting the JVM is relatively expensive. And there should be no need to do this - you can do it all in one JVM and do the parallelisation in the job?
Kitchen itself doesn't specifically use a lot of resources. If your transformation has a large number of steps, then getting that going can take some time, but not ages.
Is this really a problem? Why does it matter if your cpu spikes for a couple of seconds? The point of parallelism is generally to max out the CPU/box/resource!

Optimal number of Resque workers for maximum performance

I am using Resque for achieving cheap parallelism in my academic research - I split huge tasks into relatively small independent portions and submit them to Resque. These tasks do some heavy stuff, extensively using both database(MongoDB if that's important) and CPU.
All this works extremely slow - for my relatively small portion of dataset 1000 jobs get created and 14 hours of constant working of 2 workers is enough only for finishing ~800 of them. As you might've already suspected, this speed is more than frustrating.
I have a quad-core processor(Core i5 something, not high-end) and apart from Mongo instance and resque workers nothing gets scheduled on CPU for a considerable period of time.
Now that you know my story, all I am asking is - how do I squeeze maximum out of this setting? I believe that 3 workers + 1 mongo instance will quickly fill up all the cores, but at the same time mongo doesn't have to work all the time..
