I am using Quartz JDBC Job store (org.quartz.impl.jdbcjobstore.JobStoreTX) and MySQL for scheduling jobs.
I have the following setup:
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.dataSource=foo
org.quartz.dataSource.foo.driver=com.mysql.jdbc.Driver
org.quartz.dataSource.foo.URL=jdbc:mysql://localhost:3306/myDB
org.quartz.dataSource.foo.user=user
org.quartz.dataSource.foo.password=*****
org.quartz.dataSource.foo.maxConnections=5
org.quartz.dataSource.foo.validateOnCheckout=true
org.quartz.dataSource.foo.validationQuery=SELECT 1
I am able to schedule a job with Quartz picking up the job from the database when the time is due.
There are some jobs that can be scheduled up to 3 or 4 weeks in the future. How do I test this?
Right now I go manually change the system time. For example, if I schedule a job to run on 2/5/2013 12:45 PM, then I go change the system clock time to 2/5/2013 12:43 PM, then wait for a couple of minutes to see if Quartz picks up the job from the DB. This works fine for me.
I don't want to change the system clock time every time I need to test. Is there a better way to do this?
I noticed that changing the system time frequently sometimes messes up with Quartz with some jobs not picked up.
You could use the Quartz TriggerUtils methods to find out whether the future executions are the expected.
More specifically the computeFireTimes(org.quartz.spi.OperableTrigger trigg, Calendar cal, int numTimes) method returns a list of Dates that are the next fire times of a Trigger.
I hope this helps.
Related
I want to schedule a job on a specific day at a specific time with some interval. I am using gocron scheduler for this. But I can't find a way to start a job on specific day. e.g. I want to execute a job on 7 Sept 2019 at 330pm. From 7 Sept, I want that job to be executed daily or weekly. How can I do that using gocron. or Any other packages available?
I tried passing UTC time to gocron.At() but its panics as it's expecting only "03:30" time formats and doesn't expect date.
When looking at the documentation for gocron, it does not seem to be designed to support scheduling things for specific days. It seems to be designed as a way to schedule things to run at various intervals, very similar to what the original cron utility was designed to do. So you would specify "I want this function to get called every 2 hours" or "I want this function to get called every Sunday at 3PM". There does not seem to be any documentation about starting jobs from a specific day.
The mentioned At(string) method is documented as allowing you to specify a time of day to run something. So you would use that to set that your job runs at 3:30PM.
If you wish to specify a start time, you would likely need to find another scheduling library or implement it yourself by creating a goroutine that sleeps until a specific time. The StackOverflow post mentioned by domcyrus looks like an excellent resource for implementing it yourself as well as listing some other scheduling libraries.
Laravel is (correctly) running scheduled tasks via the App\Console\Kernel#schedule method. It does this without the need for a persistance layer. Previously ran scheduled tasks aren't saved to the database or stored in anyway.
How is this "magic" achieved? I want to have a deeper understanding.
I have looked through the source, and I can see it is somewhat achieved by rounding down the current date and diffing that to the schedule frequency, along with the fact that it is required to run every minute, it can say with a certain level of confidence that it should run a task. That is my interpretation, but I still can't fully grasp how it is guaranteeing to run on schedule and how it handles failure or things being off by a few seconds.
EDIT Edit due to clarity issue pointed out in comment.
By "a few seconds" I mean how does the "round down" method work, even when it is ran every minute, but not at the same second - example: first run 00:01.00, 00:01:02, 00:02:04
Maybe to clarify further, and to assist in understanding how it works, is there any boundary guarantees on how it functions? If ran multiple times per minute will it execute per minute tasks multiple times in the minute?
Cronjob can not guarantee seconds precisely. That is why generally no cronjob interval is less than a minute. So, in reality, it doesn't handle "things being off by a few seconds."
What happens in laravel is this, after running scheduling command for the first time the server asks "Is there a queued job?" every minute. If none, it doesn't do anything.
For example, take the "daily" cronjob. Scheduler doesn't need to know when was the last time it ran the task or something like this. When it encounters the daily cronjob it simply checks if it is midnight. If it is midnight it runs the job.
Also, take "every thirty minute" cronjob. Maybe you registered the cronjob at 10:25. But still the first time it will run on 10:30, not on 10:55. It doesn't care what time you registered or when was the last time it ran. It only checks if the current minute is "00" or divisible by thirty. So at 10:30 it will run. Again, it will run on 11:00. and so on.
Similarly a ten minute cronjob by default will only check if the current minute is divisible by ten or not. So, regardless of the time you registered the command it will run only on XX:00, XX:10, XX:20 and so on.
That is why by default it doesn't need to store previously ran scheduled task. However, you can store it into a file if you want for monitoring purpose.
According to the business logic of my Spring Boot application with Quartz Scheduling and MongoDB as Job persistent storage, every user of the system can create the postponed job that must be executed at some point in time. The user chooses the time when it must be executed.
Right now I'm thinking about the approach where every user will create a dedicated JobDetail for every postponed job, something like this:
schedulerFactoryBean.getScheduler().addJob(jobDetail(), true, true);
The issue I can potentially see here, that with this approach I can quickly create thousands of jobs in Quartz scheduler. Previously I never scheduled such amount of jobs in Spring Scheduling with Quartz and don't know how the system will handle it. Is it a good idea to implement the system in such way and will Spring Scheduling Quartz handle such amount of jobs without problems?
Yes, Quartz itself can handle thousands of jobs and triggers without any issues.
If you are going to have many jobs executing concurrently, just make sure that you configure Quartz with a sufficient number of worker threads. The number of worker threads should be typically equal to the maximum number of jobs that can be running concurrently + some small buffer (10% or so) just in case.
From what you write I assume that your jobs will be one-off jobs, i.e. each job will be executed only once. If that is the case, Quartz can automatically discard your jobs as soon as they finish executing unless your jobs are marked as durable. Quartz automatically removes non-durable jobs if they are not scheduled to run in the future. This feature may help you reduce the total number of registered jobs.
I hope this helps. If not, please ask.
I'm setting up Airflow right now and loving it, except for the fact that my dags are perpetually running behind. See the picture below - this was taken on 2/19 at 15:50 UTC, and you can see that for each of the dags, they should have run exactly one more time between the last time they ran and the present time (there are a couple for which this is not true - those ones are currently turned off). Is there some piece of configuration I missed?
False alarm! Airflow just labels execution times differently than how I expected. Turns out an hourly job that runs at 15:00 is labels "14:00" and includes data up to 14:00+1:00.
From https://airflow.apache.org/scheduler.html:
Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.
Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
Execution time is the lower bound of the batch.
Ex:
Say your execution schedule is hourly and its the run corresponding to the 13:00 schedule.
Your execution_time will be 12:00.
This is because we usually run the batch for 12:00 - 13:00 at 13:00(after the data is available for the batch).
But in my experience, we sometimes use the schedule based on the time its scheduled for(because we want the schedule to start and there are checks inside of the DAG/job that verify data readiness). In those cases, I just end up using next_execution_time(13:00) instead of execution_time(12:00).
In my Spring Boot application, based on the Cron job(runs every 5 minutes) I need to process 2000 products in my database.
Right now the process time of these 2000 products takes more than 5 minutes. I ran into the issue where the second Cron job runs when the first one is not completed yet.
Is there in Spring/Cron out of the box functionality that will allow to synchronize these jobs and wait for the previous job completion before starting the next one?
Please advise how to properly implement such kind of system. Anyway, the following technologies are also available Neo4j, MongoDB, Kafka. Please advise how to properly design/implement this functionality using the Spring/Cron separately or even together with the mentioned technologies.
1) You may try to use #Scheduled(fixedDelay = 5*60*1000). It will guarantee that next invocation will happen strictly in 5 minutes after previous one is finished. But this may break your scheduling requirements
2) You can limit the underlying ThreadExecutor's pool size to 1 thread, so next invocation will have to wait until previous is finished, but this, again, can break the logic, since it would affect all periodic tasks invoked by #Scheduled
3) You can use Quartz instead of spring's native #Scheduled. It's more complicated to configure, but allows to achieve the desired behaviour via #DisallowConcurrentExecution annotation or via setting JobDetail::isConcurrentExectionDisallowed in your job details