So I have
multitask :do_something => [
:task1,
:task2,
:task3,
:task4,
:task5,
:task6
]
And each task runs a script. I want the tasks1 to task6 concurrently to run 1000 times without stopping. Is there a way it can be achieved?
Related
I'm working on a production app that has multiple rails servers behind nginx loadbalancer. We are monitoring sidekiq processes with monit, and it works just fine - when sidekiq proces dies monit starts it right back.
However recently encountered a situation where one of these processes was running and visible to monit, but for some reason not visible to sidekiq. That resulted in many failed jobs and took us some time to notice that we're missing one process in sidekiq Web UI, since monit was telling us everything was fine and all processes were running. Simple restart fixed the problem.
And that bring me to my question: how do you monitor your sidekiq processes? I know i can use something like rollbar to notify me when jobs fail, but i'd like to know if there is a way to monitor process count and preferably send mail when one dies. Any suggestions?
Something that would ping sidekiq/stats and verify response.
My super simple solution to a similar problem looks like this:
# sidekiq_check.rb
namespace :sidekiq_check do
task rerun: :environment do
if Sidekiq::ProcessSet.new.size == 0
exec 'bundle exec sidekiq -d -L log/sidekiq.log -C config/sidekiq.yml -e production'
end
end
end
and then using cron/whenever
# schedule.rb
every 5.minutes do
rake 'sidekiq_check:rerun'
end
We ran into this problem where our sidekiq processes had stopped working off jobs overnight and we had no idea. It took us about 30 minutes to integrate http://deadmanssnitch.com by following these instructions.
It's not the prettiest or cheapest option but it gets the job done (integrates nicely with Pagerduty) and has saved our butt twice in the last few months.
On of our complaints with the service is the shortest grace interval we can set is 15 minutes which is too long for us. So we're evaluating similar services like Healthchecks, etc.
My approach is the following:
create a background job that does something
call the job regularly
check that the thing is being done!
so; using a cron script (or something like whenever) every 5 mins, I run :
CheckinJob.perform_later
It's now up to sidekiq (or delayed_job, or whatever active job you're using) to actually run the job.
The job just has to do something which you can check.
I used to get the job to update a record in my Status table (essentially a list of key/value records). Then I'd have a /status page which returns a :500 status code if the record hasn't been updated in the last 6 minutes.
(obviously your timing may vary)
Then I use a monitoring service to monitor the status page! (something like StatusCake)
Nowdays I have a simpler approach; I just get the background job to check in with a cron monitoring service like
IsItWorking
Dead Mans Snitch
Health Checks
The monitoring service which expects your task to check in every X mins. If your task doesn't check in - then the monitoring service will let you know.
Integration is dead simple for all the services. For Is It Working it would be:
IsItWorkingInfo::Checkin.ping(key:"CHECKIN_IDENTIFIER")
full disclosure: I wrote IsItWorking !
I use god gem to monitor my sidekiq processes. God gem makes sure that your process is always running and also can notify the process status on various channels.
ROOT = File.dirname(File.dirname(__FILE__))
God.pid_file_directory = File.join(ROOT, "tmp/pids")
God.watch do |w|
w.env = {'RAILS_ENV' => ENV['RAILS_ENV'] || 'development'}
w.name = 'sidekiq'
w.start = "bundle exec sidekiq -d -L log/sidekiq.log -C config/sidekiq.yml -e #{ENV['RAILS_ENV']}"
w.log = "#{ROOT}/log/sidekiq_god.log"
w.behavior(:clean_pid_file)
w.dir = ROOT
w.keepalive
w.restart_if do |restart|
restart.condition(:memory_usage) do |c|
c.interval = 120.seconds
c.above = 100.megabytes
c.times = [3, 5] # 3 out of 5 intervals
end
restart.condition(:cpu_usage) do |c|
c.interval = 120.seconds
c.above = 80.percent
c.times = 5
end
end
w.lifecycle do |on|
on.condition(:flapping) do |c|
c.to_state = [:start, :restart]
c.times = 5
c.within = 5.minute
c.transition = :unmonitored
c.retry_in = 10.minutes
c.retry_times = 5
c.retry_within = 1.hours
end
end
end
The make commands allows a -j (--jobs) options documented as such:
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously. If there is more than one -j option,
the last one is effective. If the -j option is given without an argument, make will not limit the
number of jobs that can run simultaneously.
In a day and age where even cell phones have multiple cores and/or processors, I want my build systems to handle multithreaded processing.
What is the best way to set up rake so I can ensure up to 3 tasks are running at all times?
Yes, rake allows the jobs to run in parallel. To set up the level of parallelism, use -j switch. From rake --help:
-j, --jobs [NUMBER] Specifies the maximum number of tasks to execute in parallel. (default is number of CPU cores + 4)
But, the job itself must be written as a multitask, not a task. So the instead of defining the task like:
namespace :mynamespace do
desc "description"
task task_name: :environment do
your_code
end
end
use multitask:
namespace :mynamespace do
desc "description"
multitask task_name: :environment do
your_code
end
end
There is a blog post about rake MultiTask, but it supports the -j parameter as -m for parallelization.
I have a question regarding Sidekiq. I come from the Resque paradigm, and in the current application I launch one worker per queue, so in the terminal I would do:
rake resque:work QUEUE='first'
rake resque:work QUEUE='second'
rake resque:work QUEUE='third'
Then, If I want more workers, for example for the third queue, I just create more workers as:
rake resque:work QUEUE='third'
My question is...
With Sidekiq, how would you start with multiple workers? I know you can do this:
sidekiq -q first, -q second, -q third
But that would just start one worker that fetches all those queues. So, how would I go to start three workers, and tell each worker to just focus on a particular queue? Also, how would I do that in Heroku?
You could use a config file in config/sidekiq.yml :
# Sample configuration file for Sidekiq.
# Options here can still be overridden by cmd line args.
# sidekiq -C config.yml
---
:verbose: true
:pidfile: ./tmp/pids/sidekiq.pid
:concurrency: 15
:timeout: 5
:queues:
- [first, 20]
- [second, 20]
- [third, 1]
staging:
:verbose: false
:concurrency: 25
production:
:verbose: false
:concurrency: 50
:timeout: 60
That way you can configure exactly what you want, and to answer precisely your question the concurrency value is what you are loking for, it defines the number of concurrent workers executed.
More info here : https://github.com/mperham/sidekiq/wiki/Advanced-Options
So, how would I go to start three workers, and tell each worker to
just focus on a particular queue?
You can define on the worker-level in which queue it should be placed via sidekiq_options.
To place for example your worker in a queue called "first" just define it with
class MyWorker
include Sidekiq::Worker
sidekiq_options :queue => :first
...
end
I try to create tasks with different roles :
namespace :foo do
task :mytasks, :roles => [:a, :b,] do
task_a
task_b
end
task :task_a, :roles => :a do
run 'echo A'
end
task :task_b, :roles => :b do
run 'echo B'
end
end
When i execute 'mytasks' here is the result :
$ cap -n ROLES=b foo:mytasks
* 2013-03-01 16:59:14 executing `foo:mytasks'
* executing "echo A"
* executing "echo B"
All tasks get executed, why ?
Capistrano Roles are intended to associate a given server (or multiple servers) with a particular function, such as saying "machine-a" is a web server while "machine-b" is a database server, which is useful because certain tasks only need to be performed on certain machines.
So roles are not intended to be a way to conditionally select which machine(s) to run tasks on at the time when you are running Capistrano, they simply select which tasks should be run on which machines.
There is, however, another Capistrano feature called Multistage that may be what you're looking for. It allows you to specify different sets of servers (and even associate them with different roles) based on the "stage" you're deploying to. So you could have a and b stages, each with separate sets of servers, which you could deploy using:
cap a foo:mytasks
cap b foo:mytasks
How do I create a delayed job from a rake file. How should I move it into a controller and create a delayed_job that runs the task every 15 minutes.
Here is an example how my rake file:
namespace :reklamer do
task :runall => [:iqmedier, :euroads, :mikkelsen] do
# This will run after all those tasks have run
end
task :iqmedier => :environment do
require 'Mechanize'
agent = WWW::Mechanize.new
agent.get("http://www.iqmedier.dk")
end
task :euroads => :environment do
require 'Mechanize'
require 'pp'
agent = Mechanize.new { |agent|
end
task :mikkelsen => :environment do
require 'Mechanize'
agent = Mechanize.new
agent.get("http://affilate.mikkelsenmedia.dk/partnersystem/mylogins.php")
end
end
What should I change to be a delayed job (https://github.com/collectiveidea/delayed_job)?
Suggest you take a look at SimpleWorker, a cloud-based background processing / worker queue for Ruby apps. It's designed for offloading tasks, running scheduled jobs, and scaling out to handle many parallel jobs at once. It's simple, affordable, and scalable.
(Disclosure, I work for the company.)
You create your workers (in app/worker) and then in your controllers and elsewhere queue them up to run right away or schedule them for later or on a recurring basis with just a few lines of code. Here's a basic example.
worker = ReportWorker.new
worker.user_id = #current_user.id
worker.schedule(:start_at => 1.hours.since, :run_every => 900)
#Or to run once right away
#worker.queue
The ReportWorker class would contain the logic to create the report for the current user and sent it or post it needed.
DelayedJob alone will not help you since it is based around one-time jobs. You will still need something that runs on a regular basis that creates these jobs.
Assuming:
you're on Heroku and can only get a 1-hour cron
you need to run a job every 15 minutes
You can do something like this...
Make a class for your jobs:
class MechanizeJob < Struct.new(:url)
def perform
agent = Mechanize.new
agent.get(url)
end
end
Schedule the jobs from your Rakefile:
task :schedulejobs => :environment do
urls = ["http://...", "http://...", "http://..."]
urls.each do |url|
# 1 is the job priority
Delayed::Job.enqueue MechanizeJob.new(url), 1, Time.now
Delayed::Job.enqueue MechanizeJob.new(url), 1, 15.minutes.from_now
Delayed::Job.enqueue MechanizeJob.new(url), 1, 30.minutes.from_now
Delayed::Job.enqueue MechanizeJob.new(url), 1, 45.minutes.from_now
end
end
This will run a job per url every 15 minutes.