On a project we have to run a job that starts periodically (every 5 minutes on QA env now) that processes some assignments for 40k users.
We had decided to utilize Spring Batch because it fits perfectly and implemented it with pretty much default configuration (e.g. it uses SyncTaskExecutor under the hood).
Okay, so, there is a job that consists of a single step with:
out-of-box HibernatePagingItemReader
custom ItemProcessorthat performs lightweight calculations in memory
custom ItemWriter that persists data to the same PostgreSQL db via several JPQL and native queries.
The job itself is scheduled with #EnableScheduling and is being triggered every 5 mins by cron expression:
#Scheduled(cron = "${job.assignment-rules}")
void processAssignments() {
try {
log.debug("Running assignment processing job");
jobLauncher.run(assignmentProcessingJob, populateJobParameters());
} catch (JobExecutionException e) {
log.error("Job processing has failed", e);
}
}
Here is a cron expression from application.yml:
job:
assignment-rules: "0 0/5 * * * *"
The problem is that it stops being scheduled after several runs (different amount of runs every time). Let's take a look into the Spring Batch schema:
select ex.job_instance_id, ex.create_time, ex.start_time, ex.end_time, ex.status, ex.exit_code, ex.exit_message
from batch_job_execution ex inner join batch_job_instance bji on ex.job_instance_id = bji.job_instance_id
order by start_time desc, job_instance_id desc;
And then silence. Nothing special in logs.
The only thing I believe could make sense is that there are two more jobs running on that instance. And one of them is time consuming because it sends emails via SMTP.
The entire jobs schedule is:
job:
invitation-email: "0 0/10 * * * *"
assignment-rules: "0 0/5 * * * *"
rm-subordinates-count: "0 0/30 * * * *"
Colleagues, could anybody point me out the way this problem could be troubleshooted?
Thanks a lot in advance
Using the default SyncTaskExecutor to launch jobs is not safe in your use case as all jobs will be executed by a single thread. If one of the jobs takes more than 5 minutes to run, next jobs will pile up and fail to start at some point.
I would configure a JobLauncher with an asynchronous TaskExecutor implementation (like org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor) in your use case. You can find an example in the Configuring a JobLauncher section (See "Figure 3. Asynchronous Job Launcher Sequence").
Apart from the task executor configured to be used by the JobLauncher of Spring Batch, you need to make sure you have the right task executor used by Spring Boot to schedule tasks (since you are using #EnableScheduling). Please refer to the Task Execution and Scheduling section for more details.
Related
In Laravel, in my Kernel, I have:
protected $commands = [
Commands\SendRenewEmails::class,
];
/**
* Define the application's command schedule.
*
* #param \Illuminate\Console\Scheduling\Schedule $schedule
* #return void
*/
protected function schedule(Schedule $schedule)
{
// $schedule->command('inspire')
// ->hourly();
$schedule->command('renew:emails')
->daily();
}
The said function renew:emails, works as intended if I run this manually trough Artisan.
And in my crontab I have:
* */8 * * * cd /path-to-my-project && php artisan schedule:run >> /dev/null 2>&1
I have this to run every 8th hour, instead of every minute at * * * * *, since this is live for testing, and just to ensure that the task wasnt run every minute.
So how does Laravel know, when to run the daily job on the kernal, and when does this happen?
From this setup (which seems to be the basic setup for cronjobs in Laravel, but to run every minute instead of every 8th hour), there is no logs (that I can see), and no table in DB to keep track of this.
So if I where to set my cron to * * * * *, how does Laravel know not to run the scheduled job every minute, just because I have put ->daily(); at the end of the job?
And when I have daily();, at what specific time is that? And at specific what time is hourly();?
TL;DR:
How does Laravel know not to run the same jobs again if it is not supposed to, for example with daily(); rule? Where is this information stored? How can I be certain that a job with rule daily(); wont run every minute if my cronjobs std:out's php artisan schedule:run every minute?
Under the hood, the Laravel Scheduler uses https://github.com/dragonmantank/cron-expression to determine if a command or job is scheduled to run at the given minute the schedule:run is called.
Each task you schedule translates to a cron expression, which is then passed into the package. A method called isDue is then run against that expression to determine whether or not it should run. So, if you set a task to run hourly, then isDue will yield true at the top of the hour, and Laravel will execute to the task within the cron cycle.
As such, the information does not need to be stored anywhere, as determination is done on the fly.
This might also lead you to wonder what might happen if you have a long-running task that might take longer than the interval. This is where withoutOverlapping comes into the picture. When called, it creates what is known as a mutex, which is similar to a 'lock' of sorts (see What is a mutex? for more information), when the task is initially run. If a mutex already exists for a particular task on subsquent cycles, it means that task is currently running in another cycle, and should not be triggered again in this one.
Where are mutexes stored? Simple: Laravel stores them in a cache, and when a mutexed task is finished, the mutex is removed from the cache. And so the cycle continues.
I could go into much further detail here, but I think this answers your question for the most part.
In our project we need to retrieve prices from a remote ftp server. During the office hours this works fine, prices are retrieved and successfully processed. After office hours there are no new prices published on the ftp server, so as expected we don't find anything new.
Our problem is that after a few hours of not finding new prices, the poller just stops polling. No error in the logfiles (even when running on org.springframework.integration on debug level) and no exceptions. We are now using a separate TaskExecutor to isolate the issue, but still the poller just stops. In the mean time we adjusted the cron expression to match these hours, to limited the resource use, but still the poller just stops when it is supposed to run.
Any help to troubleshoot this issue is very much appreciated!
We use an #InboudChannelAdapter on a FtpStreamingMessageSource which is configured like this:
#Bean
#InboundChannelAdapter(
value = FTP_PRICES_INBOUND,
poller = [Poller(
maxMessagesPerPoll = "\${ftp.fetch.size}",
cron = "\${ftp.poll.cron}",
taskExecutor = "ftpTaskExecutor"
)],
autoStartup = "\${ftp.fetch.enabled:false}"
)
fun ftpInboundFlow(
#Value("\${ftp.remote.prices.dir}") pricesDir: String,
#Value("\${ftp.remote.prices.file.pattern}") remoteFilePattern: String,
#Value("\${ftp.fetch.size}") fetchSize: Int,
#Value("\${ftp.fetch.enabled:false}") fetchEnabled: Boolean,
clock: Clock,
remoteFileTemplate: RemoteFileTemplate<FTPFile>,
priceParseService: PriceParseService,
ftpFilterOnlyFilesFromMaxDurationAgo: FtpFilterOnlyFilesFromMaxDurationAgo
): FtpStreamingMessageSource {
val messageSource = FtpStreamingMessageSource(remoteFileTemplate, null)
messageSource.setRemoteDirectory(pricesDir)
messageSource.maxFetchSize = fetchSize
messageSource.setFilter(
inboundFilters(
remoteFilePattern,
ftpFilterOnlyFilesFromMaxDurationAgo
)
)
return messageSource;
}
The property values are:
poll.cron: "*/30 * 4-20 * * MON-FRI"
fetch.size: 10
fetch.enabled: true
We limit the poll.cron we used the retrieve every minute.
In the related DefaultFtpSessionFactory, the timeouts are set to 60 seconds to override the default value of -1 (which means no timeout at all):
sessionFactory.setDataTimeout(timeOut)
sessionFactory.setConnectTimeout(timeOut)
sessionFactory.setDefaultTimeout(timeOut)
Maybe my answer seems a bit too easy, bit is it because your cron expression states that it should schedule the job between 4 and 20 hour. After 8:00 PM it will not schedule the job anymore and it will start polling again at 4:00 AM.
It turned out that the processing took longer than the scheduled interval, so during processing a new task was already executed. So eventually multiple task were trying to accomplish the same thing.
We solved this by using a fixedDelay on the poller instead of a fixedRate.
The difference is that a fixedRate schedules on a regular interval independent if the task was finished and the fixedDelay schedules a delay after the task is finished.
I want to design a scheduler as service using spring-boot. My scheduler should be generic so that other microservices can use it as they want.
I tried normal spring boot examples.
/**
* This scheduler will run on every 20 Seconds.
*/
#Scheduled(fixedRate = 20 * 1000, initialDelay = 5000)
public void scheduleTaskWithInitialDelay() {
logger.info("Fixed Rate Task With Initail Delay 20 Seconds:: Execution Time - "+dateTimeFormatter.format(LocalDateTime.now()));
}
/**
* This scheduler will run on every 10 Seconds.
*/
#Scheduled(fixedRate = 10* 1000, initialDelay = 5000)
public void scheduleTaskWithInitialDelay1() {
logger.info("Fixed Rate Task With Initail Delay 10 Seconds:: Execution Time - "+dateTimeFormatter.format(LocalDateTime.now()));
}
You need to store other microservice's requests to schedule something in your persistent. So, you have an inventory that which microservice requested the scheduling service and with delay or cron or something else.
Now, you can read all the requested configuration from the database and start scheduler for them.
This is a common use case in enterprise applications when people choose to write custom code.
Your database table should contain all the detail + what to do if scheduler reached to given time (Push data/event to some URL or something else).
Some technical detail
You schedule service should allow to
Add Schedule
Start/Stop/Update existing schedule
Callback or some other operation when scheduler meet the time
Hope, this will help.
I have an annotated method width #Scheduled with an cron of */15 * * * * ? (run each 15 seconds).
Sometimes this process take more than 15 seconds to run.
Is there any way to avoid the call of the #Scheduled if it's already running?
My workaround currently is a flag field in the class to signal if the process is running, and if it is marked the code exits before execute the main code.
I think it's already the case, if the first job has'nt finished, the second will not start.
See :
How to prevent overlapping schedules in Spring?
If it isn't working, you can also use an AtomicBoolean to check if you must start the process or not.
We are running a Spring 3.0.x web application (.war) with a nightly #Scheduled job in a clustered WebLogic 10.3.4 environment. However, as the application is deployed to each node (using the deployment wizard in the AdminServer's web console), the job is started on each node every night thus running multiple times concurrently.
How can we prevent this from happening?
I know that libraries like Quartz allow coordinating jobs inside clustered environment by means of a database lock table or I could even implement something like this myself. But since this seems to be a fairly common scenario I wonder if Spring does not already come with an option how to easily circumvent this problem without having to add new libraries to my project or putting in manual workarounds.
We are not able to upgrade to Spring 3.1 with configuration profiles, as mentioned here
Please let me know if there are any open questions. I also asked this question on the Spring Community forums. Thanks a lot for your help.
We only have one task that send a daily summary email. To avoid extra dependencies, we simply check whether the hostname of each node corresponds with a configured system property.
private boolean isTriggerNode() {
String triggerHostmame = System.getProperty("trigger.hostname");;
String hostName = InetAddress.getLocalHost().getHostName();
return hostName.equals(triggerHostmame);
}
public void execute() {
if (isTriggerNode()) {
//send email
}
}
We are implementing our own synchronization logic using a shared lock table inside the application database. This allows all cluster nodes to check if a job is already running before actually starting it itself.
Be careful, since in the solution of implementing your own synchronization logic using a shared lock table, you always have the concurrency issue where the two cluster nodes are reading/writing from the table at the same time.
Best is to perform the following steps in one db transaction:
- read the value in the shared lock table
- if no other node is having the lock, take the lock
- update the table indicating you take the lock
I solved this problem by making one of the box as master.
basically set an environment variable on one of the box like master=true.
and read it in your java code through system.getenv("master").
if its present and its true then run your code.
basic snippet
#schedule()
void process(){
boolean master=Boolean.parseBoolean(system.getenv("master"));
if(master)
{
//your logic
}
}
you can try using TimerManager (Job Scheduler in a clustered environment) from WebLogic as TaskScheduler implementation (TimerManagerTaskScheduler). It should work in a clustered environment.
Andrea
I've recently implemented a simple annotation library, dlock, to execute a scheduled task only once over multiple nodes. You can simply do something like below.
#Scheduled(cron = "59 59 8 * * *" /* Every day at 8:59:59am */)
#TryLock(name = "emailLock", owner = NODE_NAME, lockFor = TEN_MINUTE)
public void sendEmails() {
List<Email> emails = emailDAO.getEmails();
emails.forEach(email -> sendEmail(email));
}
See my blog post about using it.
You don't neeed to synchronize your job start using a DB.
On a weblogic application you can get the instanze name where the application is running:
String serverName = System.getProperty("weblogic.Name");
Simply put a condition two execute the job:
if (serverName.equals(".....")) {
execute my job;
}
If you want to bounce your job from one machine to the other, you can get the current day in the year, and if it is odd you execute on a machine, if it is even you execute the job on the other one.
This way you load a different machine every day.
We can make other machines on cluster not run the batch job by using the following cron string. It will not run till 2099.
0 0 0 1 1 ? 2099