Heroku Scheduler "Next Due" time keeps increasing - heroku

I'm working on using Heroku's Scheduler add-on (node-cron doesn't seem to be working for me). I installed the add-on and added a new job (set its frequency to every 10 minutes). However, the job never seems to execute. Upon further inspection, I found that the "next due" time keeps increasing.
In other words, let's say the current time is 1:00. The next due time should be (and is) 1:10. 5 minutes later, the current time is 1:05. Even though the next due time should still be 1:10, it changes to 1:15 (basically always being 10 minutes ahead of current time). How do I fix this?

Related

How to handle DST correctly in Airflow 2.0+ for different regions?

I am trying to get execution date using
execution_dt = f"{{{{execution_date.in_timezone('{timezone}').strftime('{date_partition_format}')}}}}"
but the issue that I am facing is that the polling is happening for the previous hour right when DST is taking over, but is able to correct itself for subsequent runs.
So for example if the DST takes place and the clock has gone from 7th->8th hour, it would still try to poll for 7th hour, but in the subsequent run which is done after 2 hours, it will poll for 10th hour (in accordance with previous 8th hour).

How does Laravels task scheduling work without persisting the last completed date?

Laravel is (correctly) running scheduled tasks via the App\Console\Kernel#schedule method. It does this without the need for a persistance layer. Previously ran scheduled tasks aren't saved to the database or stored in anyway.
How is this "magic" achieved? I want to have a deeper understanding.
I have looked through the source, and I can see it is somewhat achieved by rounding down the current date and diffing that to the schedule frequency, along with the fact that it is required to run every minute, it can say with a certain level of confidence that it should run a task. That is my interpretation, but I still can't fully grasp how it is guaranteeing to run on schedule and how it handles failure or things being off by a few seconds.
EDIT Edit due to clarity issue pointed out in comment.
By "a few seconds" I mean how does the "round down" method work, even when it is ran every minute, but not at the same second - example: first run 00:01.00, 00:01:02, 00:02:04
Maybe to clarify further, and to assist in understanding how it works, is there any boundary guarantees on how it functions? If ran multiple times per minute will it execute per minute tasks multiple times in the minute?
Cronjob can not guarantee seconds precisely. That is why generally no cronjob interval is less than a minute. So, in reality, it doesn't handle "things being off by a few seconds."
What happens in laravel is this, after running scheduling command for the first time the server asks "Is there a queued job?" every minute. If none, it doesn't do anything.
For example, take the "daily" cronjob. Scheduler doesn't need to know when was the last time it ran the task or something like this. When it encounters the daily cronjob it simply checks if it is midnight. If it is midnight it runs the job.
Also, take "every thirty minute" cronjob. Maybe you registered the cronjob at 10:25. But still the first time it will run on 10:30, not on 10:55. It doesn't care what time you registered or when was the last time it ran. It only checks if the current minute is "00" or divisible by thirty. So at 10:30 it will run. Again, it will run on 11:00. and so on.
Similarly a ten minute cronjob by default will only check if the current minute is divisible by ten or not. So, regardless of the time you registered the command it will run only on XX:00, XX:10, XX:20 and so on.
That is why by default it doesn't need to store previously ran scheduled task. However, you can store it into a file if you want for monitoring purpose.

SOS-Berlin JobScheduler process queue logic

We're running into an issue with the SOS-Berlin JobScheduler running on Windows that is difficult to diagnose* and I would appreciate any guidance.
*Difficult because I don't know Scala (though I do know C++ and Java). It's difficult to navigate this code-base (some of it's in German).
We have a process-class called Foo, that will sometimes burst up outside the limit of how many processes can be run. So, for example, we limit the process-class to 30 processes and 60 want to run. This leaves 30 running and 30 "waiting for process."
The problem is that JobScheduler doesn't seem to prioritize the 30 that are waiting for a process. Instead, any new job that gets fired after the burst receives processes, leaving some jobs waiting indefinitely. Once the number of jobs "waiting for process" hits zero, the jobs clear out immediately.
Further, it seems that when there are a large number of jobs "waiting for process," the run time for tasks doubles or triples. A job that normally takes 20 seconds to run, will spike to 1-2 minutes, further amplifying the issue as processes are not released back to the pool.
Admittedly, we're running an older version of JS, which we're planning to upgrade this/next week. However, I'm wondering if there is something fundamental we're missing. We've turned down the logging, looked for DB locks, added memory to the heap, shut-down some other processes on the server. We've also increased the process pool, but we don't want to push it too far, lest we crush the server. Nothing seems to be alleviating the issue.
Any tuning help would be appreciated!
As a follow-up, we determined the cause of the issue.
Another user had been using the temp directory to store intermediate generated files. The user was not clearing out these files, resulting in 100's of thousands of files in the directory. They were not very large so we didn't notice. For some reason Job Scheduler started to choke based on this. I'm not clear on the reasons.
Clearing the temp directory, scolding the user, and fixing his script fixed the issue.

Oozie Behavior with misaligned start

I noticed that if I start an Oozie coordinator with a start time many "iterations" (in terms of the frequency) previous to the current time, then the coordinator would sequentially run workflows several times, ignoring the assigned frequency. However, for me it is more important that the workflow/action run itself at the assigned frequency, than it is for workflow/action to have run the correct number of times at a given point.
Is there any way I can avoid this behavior? One way would obviously be to ensure the start time is correct within an iteration time (is there a way to have it automatically take the start time?). Another would be to configure it to avoid this behavior altogether, and basically run at the next time when it should have given the start time and the frequency.
The obvious way to avoid side effects from "past" start dates is... to set the actual start date at submission time as "now".
That's the way we do it in my team:
on the local filesystem, write down a "Coord-template.xml" with a
placeholder such as start="%Now%"
just before submitting, generate the actual "Coordinator.xml" with
sed "s/%Now%/$(date --utc '+%FT%TZ')/" coord-template.xml > coordinator.xml
upload the coordinator definition to HDFS then submit it via Oozie CLI
~~~~~~~~~~~~
Aternative: if you are using "basic" frequency (not CRON-like scheduling) you may want to try these <controls> to have Oozie create executions for all "past" time slots but discard them immediately :
<throttle>1</throttle>
and/or
<execution>LAST_ONLY</execution>
cf. Oozie 4.x reference
The rules would also apply in case the Coordinator is suspended then resumed, or in case the Oozie service gets stopped then restarted, or in case YARN has to queue new jobs for a really long time (because the cluster is 100% busy).
Oozie has improved of late, so there's an easier solution available than the currently accepted answer. As of Oozie 4.1, there is a "NONE" execution available. This skips iterations which occur in the past, more or less. Here's the doc snippet:
NONE: Similar to LAST_ONLY except all older materializations are skipped. When NONE is set, an action that is WAITING or READY will be SKIPPED when the current time is more than a certain configured number of minutes (tolerance) past the action's nominal time. By default, the threshold is 1 minute. For example, suppose action 1 and 2 are both WAITING , the current time is 5:20pm, and both actions' nominal times are before 5:19pm. Both actions will become SKIPPED, assuming they don't transition to SUBMITTED (or a terminal state) before then. Another way of thinking about this is to view it as similar to setting the timeout equal to 1 minute which is the smallest time unit, except that the SKIPPED status doesn't cause the coordinator job to eventually become DONEWITHERROR and can actually become SUCCEEDED (i.e. it's a "good" version of TIMEDOUT ).
Oozie 4.1 doc
I have tested this, and it does work with CRON frequencies. It is superior to the LAST_ONLY execution in your case because LAST_ONLY will still run the most recent iteration in the past (with the misaligned time), in addition to current/future iterations.
<execution>NONE</execution>

How to run a per second cron job every two minutes

I have to set up a cron job on my hosting provider.
This cron job needs to run every second. It's not intensive, just doing a check.
The hosting provider however only allows cron jobs to be run every two minutes. (can't change hosting btw)
So, I'm clueless on how to go about this?
My thoughts so far:
If it can only run every two minutes, I need to make it run every second for two minutes. 1) How do I make my script run for two minutes executing a function every second?
But it's important that there are no interruptions. 2) I have to ensure that it runs smoothly and that it remains constantly active.
Maybe I can also try making it run forever, and run the cron job every two minutes checking whether it is running? 3) Is this possible?
My friend mentioned using multithreading to ensure it's running every second. 4) any comments on this?
Thanks for any advice. I'm using ZF.
Approach #3 is the standard solution. For instance you can have the cron job touch a file every time it runs. Then on startup you can check whether that file has been touched recently, and if it has then exit immediately. Else start running. (Other approaches include using file locking, or else writing the pid to a file and on startup check whether that pid exists and is the expected program.)
As for the one second timeout, I would suggest calling usleep at the end of your query, supplying the number of milliseconds from now to when you next want to run. If you do a regular sleep then you'll actually run less than once a second because sleeps sometimes last longer than expected, and your check takes time. As long as your check takes under a second to run, this should work fine.
I don't think cron allows second level resolution. http://unixhelp.ed.ac.uk/CGI/man-cgi?crontab+5
field allowed values
----- --------------
minute 0-59
hour 0-23
day of month 1-31
month 1-12 (or names, see below)
day of week 0-7 (0 or 7 is Sun, or use names)
So, even if your hosting provider allows you can't run a process that repeats every second. However, you can user command something like watch for repeated execution of your script. see here

Resources