I am trying out sidekiq alongside my resque system in production. Now I know this isn't quite an apples-to-oranges comparison but my resque jobs running on a heroku worker take around 4s to complete. I am running only 50 threads on an amazon large instance with sidekiq and the same jobs take on average around 18s. The jobs are very heavy on use of third-party apis so I am assuming my bottleneck is just my network connection but I just wanted to see if anyone has suggestions as to how I can better configure sidekiq.
Sidekiq workers will work parallel only if you will use jruby or rubinius, becouse ruby mri have global interpreter lock
Sidekiq workers will work faster only if you use jruby or rubinious with thread safe libriaries, that not block resoures that they use. So main reason of using sidekiq instead of resque is memory saving
Related
I'm trying to interact with Google's Calendar API. My tests so far show response times of 5-10 seconds to insert a single event, and I may need to export thousands of events at once [don't ask]. This seems likely to spam the heck out of my queues for unreasonable amounts of time. (95% of current jobs in this app finish in <300ms, so this will make it harder to allocate resources appropriately.)
I'm currently using Faraday in this app to call other, faster Google APIs. The Faraday wiki suggests using Typhoeus for parallel HTTP requests; however, using Typhoeus with Sidekiq was deemed "a bad idea" as of 2014.
Is Typhoeus still a bad idea? If so, is it reasonable to spawn N threads in a Sidekiq worker, make an HTTP request within each thread, and then wait for all threads to rejoin? Is there some other way to accomplish this extremely I/O-bound task without throwing more workers at the problem? Should I ask my manager to increase our Sidekiq Enterprise spend? ;) Or should I just throw these jobs in a low-priority queue and tell our users with ridiculous habits that they'll just have to wait?
It's reasonable to use threads within Sidekiq job threads. It's not reasonable to build your own threading infrastructure. You can use a reusable thread pool with the concurrent-ruby or parallel gems, you can use an http client which is thread-safe and allows concurrent requests, etc. HTTP.rb is a good one from Tony Arcieri but plain old net/http will work too:
https://github.com/httprb/http/wiki/Thread-Safety
Just remember that there's a few complexities: the job might be retried, how do you handle errors that the HTTP client raises? If you don't split these requests 1-to-1 with jobs, you might need to track each or idempotency becomes an issue.
And you are always welcome to increase your Sidekiq Enterprise thread count. :-D
I'm currently running ActiveJob with DelayedJob as the backend for my background jobs on Heroku with 10 worker dynos. Daily, I need to run ~2000+ jobs which require lots of interaction with Google Sheets API that can take ~60+ seconds after the worksheet has run its calculations.
As each job could potentially take more than a minute to run, I'm wondering how I can increase the efficiency of these workers? It seems to me that these 10 workers can only take on 1 task at a time. Is it possible for 1 worker to take on my jobs? Would switching my background service to Sidekiq or another service allow these workers to take on more jobs?
Any insight would be appreciated, thanks!
When using MRI Ruby 2.1.2 with Puma (say 1 worker with 8 threads), when is the GC run? Is it run by the parent worker process when all those threads become idle, or would it be run by the parent process as needed, even when those threads are busy processing requests?
And how would this behaviour be different in Ruby 2.0 (without deferred GC).
Also asked here.
Its been answered on the Github issue.
It runs whenever the VM decides to run it. Puma does nothing to control that nor can it really.
Given that unicorn usually manages more than one Rails server process, and given that a Resque job runner probably consumes less resources than a Web request, it should be possible to run more than one resque worker on a single Heroku dyno.
Is anyone doing this successfully so far? My thoughts are, that an easy way to do so would have the Procfile runs foreman, which then runs 2 (or more) instances of the actual worker (i.e. rake resque:work)
Or is rake resque:workers up to that task? Resque itself does not recommend using that method, as this starts workers in parallel threads instead of in parallel processes.
Obviously, this makes sense only on i/o bound jobs.
One can use foreman to start multiple processes. Add foreman to your Gemfile, and then create two files:
Procfile:
worker: bundle exec foreman start -f Procfile.workers
Procfile.workers:
worker_1: QUEUE=* bundle exec rake resque:work
worker_2: QUEUE=* bundle exec rake resque:work
The same technique can be used to run a web server alongside some workers.
NOTE: while many state success using this approach, I would not suggest to use it outside of some experiments, mostly because of the risk to run into RAM limitations on small heroku instances; and once you pay for the heroku service it is probably easier to just spin up a dedicated worker machine anyways.
Based on this article, it sounds like it's possible, but the biggest gotcha is that if one of the child processes dies, Heroku won't be able to restart it.
I'm not familiar with the internals of Sidekiq and am wondering if it's okay to launch several Sidekiq instances with the same configuration (processing the same queues).
Is it possible that 2 or more Sidekiq instances will process the same message from a queue?
UPDATE:
I need to know if there is a possible conflict, when running Sidekiq on more than 1 machine
Yes, sidekiq can absolutely run many processes against the same queue. Redis will just give the message to a random process.
Nope, I've ran Sidekiqs in different machines with no issues.
Each of the Sidekiqs read from the same redis server, and redis is very robust in multi-threaded, and distributed scenarios.
In addition, if you look at the web interface for Sidekiq, it will show all the workers across all machines because all the workers are logged in the same redis server.
So no, no issues.