Scheduling a task run - ruby

I have a script that must run at a certain hour for the amount of time I specify.
I'm looking at the clockwork gem (https://github.com/tomykaira/clockwork) which seems to be the closest piece of software I might eventually use to accomplish this, unfortunately it doesn't seem to give the ability to set a duration (start at 3PM stop 5PM), meaning I have to split the feature in 2, starting the script is going to be clockwork's job, stopping it is in the script itself with a custom solution.
Very suboptimal and messy.
How does people do this in Ruby? TIA

There is great gem called whenever for same job. With it you can set exact time for your task, like:
every 1.day, :at => '4:30 am' do
runner "MyModel.task_to_run_at_four_thirty_in_the_morning"
end
But you'll have to have two stages, one for starting one for stopping your job, which seems to be more natural than job which kills itself at some time by my opinion.

Somewhat janky, but there is another solution. I'm not sure what you are using to host your app, but on Heroku you can set up a scheduler to run every 10 minutes, on the hour, or daily. Then inside the method that the scheduler calls, you can determine the current time. Say you only want to run it between 3pm and 5pm, you would just wrap your code inside an if statement that verifies the current time is between 3pm and 5pm (watch out for time conversions with UTC).
Hope this helps.

Related

Use gocron scheduler to schedule job on specific day at specific time

I want to schedule a job on a specific day at a specific time with some interval. I am using gocron scheduler for this. But I can't find a way to start a job on specific day. e.g. I want to execute a job on 7 Sept 2019 at 330pm. From 7 Sept, I want that job to be executed daily or weekly. How can I do that using gocron. or Any other packages available?
I tried passing UTC time to gocron.At() but its panics as it's expecting only "03:30" time formats and doesn't expect date.
When looking at the documentation for gocron, it does not seem to be designed to support scheduling things for specific days. It seems to be designed as a way to schedule things to run at various intervals, very similar to what the original cron utility was designed to do. So you would specify "I want this function to get called every 2 hours" or "I want this function to get called every Sunday at 3PM". There does not seem to be any documentation about starting jobs from a specific day.
The mentioned At(string) method is documented as allowing you to specify a time of day to run something. So you would use that to set that your job runs at 3:30PM.
If you wish to specify a start time, you would likely need to find another scheduling library or implement it yourself by creating a goroutine that sleeps until a specific time. The StackOverflow post mentioned by domcyrus looks like an excellent resource for implementing it yourself as well as listing some other scheduling libraries.

How does Laravels task scheduling work without persisting the last completed date?

Laravel is (correctly) running scheduled tasks via the App\Console\Kernel#schedule method. It does this without the need for a persistance layer. Previously ran scheduled tasks aren't saved to the database or stored in anyway.
How is this "magic" achieved? I want to have a deeper understanding.
I have looked through the source, and I can see it is somewhat achieved by rounding down the current date and diffing that to the schedule frequency, along with the fact that it is required to run every minute, it can say with a certain level of confidence that it should run a task. That is my interpretation, but I still can't fully grasp how it is guaranteeing to run on schedule and how it handles failure or things being off by a few seconds.
EDIT Edit due to clarity issue pointed out in comment.
By "a few seconds" I mean how does the "round down" method work, even when it is ran every minute, but not at the same second - example: first run 00:01.00, 00:01:02, 00:02:04
Maybe to clarify further, and to assist in understanding how it works, is there any boundary guarantees on how it functions? If ran multiple times per minute will it execute per minute tasks multiple times in the minute?
Cronjob can not guarantee seconds precisely. That is why generally no cronjob interval is less than a minute. So, in reality, it doesn't handle "things being off by a few seconds."
What happens in laravel is this, after running scheduling command for the first time the server asks "Is there a queued job?" every minute. If none, it doesn't do anything.
For example, take the "daily" cronjob. Scheduler doesn't need to know when was the last time it ran the task or something like this. When it encounters the daily cronjob it simply checks if it is midnight. If it is midnight it runs the job.
Also, take "every thirty minute" cronjob. Maybe you registered the cronjob at 10:25. But still the first time it will run on 10:30, not on 10:55. It doesn't care what time you registered or when was the last time it ran. It only checks if the current minute is "00" or divisible by thirty. So at 10:30 it will run. Again, it will run on 11:00. and so on.
Similarly a ten minute cronjob by default will only check if the current minute is divisible by ten or not. So, regardless of the time you registered the command it will run only on XX:00, XX:10, XX:20 and so on.
That is why by default it doesn't need to store previously ran scheduled task. However, you can store it into a file if you want for monitoring purpose.

Why are all of my Airflow dags one run behind?

I'm setting up Airflow right now and loving it, except for the fact that my dags are perpetually running behind. See the picture below - this was taken on 2/19 at 15:50 UTC, and you can see that for each of the dags, they should have run exactly one more time between the last time they ran and the present time (there are a couple for which this is not true - those ones are currently turned off). Is there some piece of configuration I missed?
False alarm! Airflow just labels execution times differently than how I expected. Turns out an hourly job that runs at 15:00 is labels "14:00" and includes data up to 14:00+1:00.
From https://airflow.apache.org/scheduler.html:
Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.
Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
Execution time is the lower bound of the batch.
Ex:
Say your execution schedule is hourly and its the run corresponding to the 13:00 schedule.
Your execution_time will be 12:00.
This is because we usually run the batch for 12:00 - 13:00 at 13:00(after the data is available for the batch).
But in my experience, we sometimes use the schedule based on the time its scheduled for(because we want the schedule to start and there are checks inside of the DAG/job that verify data readiness). In those cases, I just end up using next_execution_time(13:00) instead of execution_time(12:00).

How to run a per second cron job every two minutes

I have to set up a cron job on my hosting provider.
This cron job needs to run every second. It's not intensive, just doing a check.
The hosting provider however only allows cron jobs to be run every two minutes. (can't change hosting btw)
So, I'm clueless on how to go about this?
My thoughts so far:
If it can only run every two minutes, I need to make it run every second for two minutes. 1) How do I make my script run for two minutes executing a function every second?
But it's important that there are no interruptions. 2) I have to ensure that it runs smoothly and that it remains constantly active.
Maybe I can also try making it run forever, and run the cron job every two minutes checking whether it is running? 3) Is this possible?
My friend mentioned using multithreading to ensure it's running every second. 4) any comments on this?
Thanks for any advice. I'm using ZF.
Approach #3 is the standard solution. For instance you can have the cron job touch a file every time it runs. Then on startup you can check whether that file has been touched recently, and if it has then exit immediately. Else start running. (Other approaches include using file locking, or else writing the pid to a file and on startup check whether that pid exists and is the expected program.)
As for the one second timeout, I would suggest calling usleep at the end of your query, supplying the number of milliseconds from now to when you next want to run. If you do a regular sleep then you'll actually run less than once a second because sleeps sometimes last longer than expected, and your check takes time. As long as your check takes under a second to run, this should work fine.
I don't think cron allows second level resolution. http://unixhelp.ed.ac.uk/CGI/man-cgi?crontab+5
field allowed values
----- --------------
minute 0-59
hour 0-23
day of month 1-31
month 1-12 (or names, see below)
day of week 0-7 (0 or 7 is Sun, or use names)
So, even if your hosting provider allows you can't run a process that repeats every second. However, you can user command something like watch for repeated execution of your script. see here

How to make Ruby run some task every 10 minutes?

I would like to do a cron job every 10 minutes, but my system only does 1 hour. So I'm looking for a method to do this. I've seen Timer and sleep but I'm not sure how to do this or even better yet a resource for achieving this.
Take a look at http://rufus.rubyforge.org/rufus-scheduler/
rufus-scheduler is a Ruby gem for scheduling pieces of code (jobs). It understands running a job AT a certain time, IN a certain time, EVERY x time or simply via a CRON statement.
rufus-scheduler is no replacement for cron/at since it runs inside of Ruby.
To do this reliably, invest in a VPS and create the 10-minute cron job as desired. Trying to emulate cron all on your own is very likely to fail in unforeseen ways.
Creating a sleeping process is not the way to go about this; if your server doesn't give you the freedom to make your own cron as you like it, you probably can't create your own background process for this sort of thing, either. You might be able to, on each request, take a look and see how many of the jobs need done (if it was 25 minutes since last request, you might have to do two), and go back and do them retroactively.
But, seriously. You need your own server to do this dependably.

Resources