I'm writing an rails 3 application which requires performing small tasks on a custom schedule for each user. The scheduled tasks will be defined dynamically. Right now my plan is to use resque scheduler with redis.
Once I set the schedule for a specify task (for eg. run task A every 48 hours) I would like to run that task indefinitely. So I would like to store those schedules in a db or something so in case an app crashes when it restarts it would load queue those task again.
Is this something Resque supports by default by storing it in redis or do I need to write my own custom thing? I was also looking at ruby-taskr (http://code.google.com/p/ruby-taskr/). I am not sure if taskr supports storing it in a database and registering it on start?
Also it would be helpful if there are applications/demo that I can look at it.
Thanks
I have a similar setup for batch jobs. The user adds them on a web dashboard and they get run however often is specified.
I use active-record to store the scheduling definitions, use resque for execution and a single cron entry for enqueueing using a rake task.
so then in the rake task:
to_run = Report.daily
to_run += Report.weekly if Time.now.monday?
to_run += Report.monthly if Time.now.day == 1
to_run.each{|r| r.enqueue!}
where daily, weekly, monthly are named scopes on the model:
class Report < ActiveRecord::Base
scope :daily, where(:when_to_run => 'daily')
scope :weekly, where(:when_to_run => 'weekly')
scope :monthly, where(:when_to_run => 'monthly')
end
This is a little hacky, but it works well and I stay within the stack nicely. Hope that is useful
Related
Banging my head against the wall with this one. Trying to dynamically add jobs with resque-scheduler. What's the syntax for creating a monthly job? For example, the code below will set up a job to run every minute.
config[:class] = "job_name"
config[:args] = "arg"
config[:every] = "1m"
config[:persist] = true
What would the syntax here be for every month? Would it be config[:every] = "1 month"? I can't seem to find any answers on the resque-scheduler docs for this.
Thanks.
For dynamic schedules resque-scheduler uses rufus-scheduler, as it is explained on the documentation, which handles not only the actual scheduling business but also the parse of the :every option.
You can see that when resque-scheduler runs it basically loads all schedule information from redis and then passes on to rufus.
The supported letter/durations are documented on rufus here as a map between letters and durations in seconds and you can see more complex rules on the specs for duration parsing.
For one month, you can use 1M or you can also use 4w, there's also 30d...
I built a small web crawler implemented in two Sidekiq workers: Crawler and Parsing. The Crawler worker will seek for links while Parsing worker will read the page body.
I want to trigger an alert when the crawling/parsing of all pages is complete. Monitoring only the Crawler job is not the best solution since it may have finished but there might be several Parser jobs running.
Having a look at sidekiq-status gem it seems that I cannot dynamically add new jobs to the container for monitoring. E.g. it would be nice to have a "add" method in the following context:
#container = SidekiqStatus::Container.new
# ... for each page url found:
jid = ParserWorker.perform_async(page_url)
#container.add(jid)
The closest to this is to use "SidekiqStatus::Container.load" or "SidekiqStatus::Container.load_multi" however, it is not possible to add new jobs in the container a posteriori.
One solution would be to create as many SidekiqStatus::Container instances as the number of ParserJobs and check if all of them have status == "finished", but I wonder if a more elegant solution exists using these tools.
Any help is appreciated.
You are describing Sidekiq Pro's Batches feature exactly. You can spend a lot of time or some money to solve your problem.
https://github.com/mperham/sidekiq/wiki/Batches
OK, here's a simple solution. Using the sidekiq-status gem, the Crawler worker keeps track of the jobs IDs for the Parser jobs and halts if any Parser job is still busy (using the SidekiqStatus::Container instance to check job status).
def perform()
# for each page....
#jids << ParserWorker.perform_async(page_url)
# end
# crawler finished, parsers may still be running
while parsers_busy?
sleep 5 # wait 5 secs between each check
end
# all parsers complete, trigger notification...
end
def parsers_busy?
status_containers = SidekiqStatus::Container.load_multi(#jids)
for container in status_containers
if container.status == 'waiting' || container.status == 'working'
return true
end
end
return false
end
In order to properly scale our sidekiq workers to the size of our database pool, we came up with a little formula in our configuration
sidekiq.rb
Sidekiq.configure_server do |config|
config.options[:concurrency] = ((ENV['DB_POOL'] || 5).to_i - 1) / workers
end
def workers
... the number of workers configured for our project ...
(ENV['HEROKU_WORKERS'] || 1).to_i
end
We're setting HEROKU_WORKERS by hand, but it would be sweet if there was a way to interrogate the Heroku API from within the application.
Modulo all the things that can happen (workers going up or down, changing the number of workers, etc.), this seems to get us out of the initial problem; where our workers would consume all of the database pool connections, and then start crashing on startup.
The heroku-api gem should provide you this.
https://github.com/heroku/heroku.rb
You should find your API key here: https://dashboard.heroku.com/account
require 'heroku-api'
heroku = Heroku::API.new(api_key: API_KEY)
Total number of current processes:
heroku.get_ps('heroku-app-name').body.count
(You should be able to parse this to get total number of workers... or a count of a specific kind of worker, if you have different kinds defined in your Procfile/Heroku app)
We are running a Spring 3.0.x web application (.war) with a nightly #Scheduled job in a clustered WebLogic 10.3.4 environment. However, as the application is deployed to each node (using the deployment wizard in the AdminServer's web console), the job is started on each node every night thus running multiple times concurrently.
How can we prevent this from happening?
I know that libraries like Quartz allow coordinating jobs inside clustered environment by means of a database lock table or I could even implement something like this myself. But since this seems to be a fairly common scenario I wonder if Spring does not already come with an option how to easily circumvent this problem without having to add new libraries to my project or putting in manual workarounds.
We are not able to upgrade to Spring 3.1 with configuration profiles, as mentioned here
Please let me know if there are any open questions. I also asked this question on the Spring Community forums. Thanks a lot for your help.
We only have one task that send a daily summary email. To avoid extra dependencies, we simply check whether the hostname of each node corresponds with a configured system property.
private boolean isTriggerNode() {
String triggerHostmame = System.getProperty("trigger.hostname");;
String hostName = InetAddress.getLocalHost().getHostName();
return hostName.equals(triggerHostmame);
}
public void execute() {
if (isTriggerNode()) {
//send email
}
}
We are implementing our own synchronization logic using a shared lock table inside the application database. This allows all cluster nodes to check if a job is already running before actually starting it itself.
Be careful, since in the solution of implementing your own synchronization logic using a shared lock table, you always have the concurrency issue where the two cluster nodes are reading/writing from the table at the same time.
Best is to perform the following steps in one db transaction:
- read the value in the shared lock table
- if no other node is having the lock, take the lock
- update the table indicating you take the lock
I solved this problem by making one of the box as master.
basically set an environment variable on one of the box like master=true.
and read it in your java code through system.getenv("master").
if its present and its true then run your code.
basic snippet
#schedule()
void process(){
boolean master=Boolean.parseBoolean(system.getenv("master"));
if(master)
{
//your logic
}
}
you can try using TimerManager (Job Scheduler in a clustered environment) from WebLogic as TaskScheduler implementation (TimerManagerTaskScheduler). It should work in a clustered environment.
Andrea
I've recently implemented a simple annotation library, dlock, to execute a scheduled task only once over multiple nodes. You can simply do something like below.
#Scheduled(cron = "59 59 8 * * *" /* Every day at 8:59:59am */)
#TryLock(name = "emailLock", owner = NODE_NAME, lockFor = TEN_MINUTE)
public void sendEmails() {
List<Email> emails = emailDAO.getEmails();
emails.forEach(email -> sendEmail(email));
}
See my blog post about using it.
You don't neeed to synchronize your job start using a DB.
On a weblogic application you can get the instanze name where the application is running:
String serverName = System.getProperty("weblogic.Name");
Simply put a condition two execute the job:
if (serverName.equals(".....")) {
execute my job;
}
If you want to bounce your job from one machine to the other, you can get the current day in the year, and if it is odd you execute on a machine, if it is even you execute the job on the other one.
This way you load a different machine every day.
We can make other machines on cluster not run the batch job by using the following cron string. It will not run till 2099.
0 0 0 1 1 ? 2099
How can I pass along attributes to my tasks in capistrano?
My goal is to deploy to multiple servers in a load balancer. I'd like to take each one out, deploy, and add it back in sequentially so no more than one server is down at any time.
I'm thinking it would be something along the lines of... (and the hosts array would be generated dynamically after querying my load balancer)...
role :app,
[["server_one", {:instanceId => "alice"}],
["server_two", {:instanceId => "bob"}],
["server_three", {:instanceId => "charles"}]]
And then for my tasks...
before :deploy, :deregister_instance_from_lb
after :deploy, :register_instance_with_lb
task deregister_instance_from_lb
#TODO - Deregister #{instanceId} from load balancer
end
task register_instance_with_lb
#TODO - Register #{instanceId} with load balancer
end
Any ideas?
I use this to restart my servers in series, instead of in parallel.
task :my_task, :roles => :web do
find_servers_for_task(current_task).each do |server|
run "[task command here]", :hosts => server.host
end
end
Justin, I'm sorry that's not possible, once the stream pool is opened (first run on a server set) there's no way to access server properties. (as the run code isn't run per-server, but against all-matching in the pool). Some people have had some success with doing something like this, but really it's a symptom that your scripts need too much information that you should be able to extract from your production environment.
As in this case it seems you are doing something like using the host's name to pass to a script, use what Unix gives you:
run "./my_script.rb `hostname`"
WIll that work?
References:
• http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_04.html (Section 3.4.5)
• http://unixhelp.ed.ac.uk/CGI/man-cgi?hostname (or $ man (1) hostname)
No one knows? I found something about the sequential block below, but thats as far as I got...
find_servers.each do |server|
#TODO - remove from load balancer
#TODO - deploy
#TODO - add back to load balancer
end
I find it hard to believe that no one has ever needed to do sequential tasks with cap.