How to execute same job cocurrently in Spring xd - spring-xd

We have the below requirement,
In spring xd, we have a job lets assume the job name as MyJob
which will be invoked by another process using the rest service of spring xd, lets assume process name as OutsideProcess (non-spring xd process).
OutsideJob invokes MyJob when ever a file added to a location (lets assume FILES_LOC) to which OutsideJob is listening.
In this scenario, lets assume that MyJob takes 5minutes to complete the job.
At 10:00 AM, there is a file copied to FILES_LOC, then OutsideProcess will trigger MyJob immediately. (approximately it will be completed at 10:05AM)
At 10:01 AM, another file copied to FILES_LOC, then OutsideProcess will trigger one more instance of MyJob at 10:01AM. But the second instance is getting queued and starts the execution once the first instance completes its execution (approximately at 10:05AM).
If we invoke the different jobs at the same time they are getting executed concurrenctly, but the same job multiple instances are not getting executed concurrenctly.
Please let me know how can I execute the same job with multiple instances concurrently.
Thanks in advance.

The only thing I can think of is dynamic deployment of the job and triggering it right away. You can use SpringXD Rest template to create the job definition on the fly and launch them after sleeping a few seconds. And make sure you undeploy/destroy the job when the job completes successfully.
Another solution could be to create a few module instances of your job with different names and use them as your slave processes. You can query status of these job module instances and launch the one that is finished or queue the one that is least recently launched.
Remember you can run jobs with partition support if applicable. This way you will finish your job faster and be able to run more jobs.

Related

Identify a spring-batch job instance with incrementer

let me discribe shortly what I want and what I - maybe - know.
I want spring-batch to run a async job; in future more jobs.
The job gets two parameters: an external id and a year.
The job should be able to be restarted after completion because the user wants to run a job with the same parameters again and again.
Only one job should be executed with the same parameters at the same time.
From outside (web interface) it should be possible to query if a job is running by job name and parameters.
The querier could be different from the job starter so an instance or execution id is not present.
I know that a job instance is the representation of the job(name) and the parameters and - like you commented - I cannot rerun a job with the same parameters if the instance/execution is marked completed - except I use a incrementer.
But this changes the parameters by adding a run.id. Now a job is restartable but I and sping-batch itself are not able to identify a running job instance (by name and original parameters) anymore because every job run results in a new instance.
And the question "why would one would restart a successfully completed job instance?" is easy to answer: The user outside don't know about job/instance/execution. The user will start some data processing for a year again and again. And it's my task to make it possible :).
So it would be nice if spring-batch can let the user know "the job with your original parameters is still running".
Question:
What would be a good solution for my needs?
I didn't tried something but thought about it. Maybe I can write an own JobDao for my query? But this will not solve the run-instance-at-same-time problem. Or I can customize the JdbcJobInstanceDao or SimpleJobRepository? Maybe I must add a own job_key which contains only the original parameters?
To correctly understand the answer I am going to give to your question, it is important to know the difference and understand the relation between a job, a job instance and a job execution in Spring Batch. The The Domain Language of Batch section of the reference documentation explains that in details with examples.
The job should be able to be restarted after completion.
This is not possible by design, or more precisely, a job instance cannot be restarted after completion by design (Think of it like "why would one would restart a successfully completed job instance?").
From outside (web interface) it should be possible to query if an instance is running by job name and parameters. There querier could be different from the job starter so an instance or execution id is not present.
The JobExplorer is the API you are looking for. You can ask for job instances and job executions as needed.
Question: What would be a good solution for my needs?
In your case, you receive an external ID and a year as a job execution request. Those two parameters can be used as identifying parameters to define job instances. With this in place, if a job instance is failed, you can restart it by using the same parameters.
I see no need for an incrementer in your case. The incrementer is useful for jobs for which the instances can be defined as a "sequence" that can be "incremented". I see no need to create a custom DAO or JobRepository neither, you should be able to implement your requirement with the built-in components by correctly defining what a job instance is.
For my use-case I have to check if a execution for a job/parameters-combination is running. The parameters here are without run.id of an incrementor. This check must be done before a job run and by explicit rest call. Normally spring-batch checks for running executions but because of the used incrementor every job instance is unique and it will never find any.
So I created a bean with a check method and made use of jobExplorer.findRunningJobExecutions(jobName);. The result can then compared with the used paramters by iterating over JobExecution.getJobParameters().getParameters().
The bean can be used in the rest-method and in an own implemention of JobLauncher.run().
Another solution would be to store the increment separately for a job/parameters-combination. But I don't want to do this not least because I think a framework like spring-batch should do this for me or supports me by reusing/restarting a completed job instance.

How can I ensure a spring quartz job executed sequentially?

I have below trigger config:
SimpleTrigger trigger = TriggerBuilder.newTrigger().startNow()
.withSchedule(SimpleScheduleBuilder.repeatSecondlyForever(5)).build();
And my job could possibly run more than 5 seconds.
Which means there could be chances that 2 job instances run at the same time.
Is there any policy I can enforce on quartz scheduler:
if job is running, do not fire next one even if reach the repeat interval.
Is it doable?
Thanks
Try using this annotation #DisallowConcurrentExecution.This should not allow executing of same instance of a Job
https://www.quartz-scheduler.org/api/2.1.7/org/quartz/DisallowConcurrentExecution.html

Spring Scheduling Quartz and thousands of jobs

According to the business logic of my Spring Boot application with Quartz Scheduling and MongoDB as Job persistent storage, every user of the system can create the postponed job that must be executed at some point in time. The user chooses the time when it must be executed.
Right now I'm thinking about the approach where every user will create a dedicated JobDetail for every postponed job, something like this:
schedulerFactoryBean.getScheduler().addJob(jobDetail(), true, true);
The issue I can potentially see here, that with this approach I can quickly create thousands of jobs in Quartz scheduler. Previously I never scheduled such amount of jobs in Spring Scheduling with Quartz and don't know how the system will handle it. Is it a good idea to implement the system in such way and will Spring Scheduling Quartz handle such amount of jobs without problems?
Yes, Quartz itself can handle thousands of jobs and triggers without any issues.
If you are going to have many jobs executing concurrently, just make sure that you configure Quartz with a sufficient number of worker threads. The number of worker threads should be typically equal to the maximum number of jobs that can be running concurrently + some small buffer (10% or so) just in case.
From what you write I assume that your jobs will be one-off jobs, i.e. each job will be executed only once. If that is the case, Quartz can automatically discard your jobs as soon as they finish executing unless your jobs are marked as durable. Quartz automatically removes non-durable jobs if they are not scheduled to run in the future. This feature may help you reduce the total number of registered jobs.
I hope this helps. If not, please ask.

Spring Boot, Cron job synchronization

In my Spring Boot application, based on the Cron job(runs every 5 minutes) I need to process 2000 products in my database.
Right now the process time of these 2000 products takes more than 5 minutes. I ran into the issue where the second Cron job runs when the first one is not completed yet.
Is there in Spring/Cron out of the box functionality that will allow to synchronize these jobs and wait for the previous job completion before starting the next one?
Please advise how to properly implement such kind of system. Anyway, the following technologies are also available Neo4j, MongoDB, Kafka. Please advise how to properly design/implement this functionality using the Spring/Cron separately or even together with the mentioned technologies.
1) You may try to use #Scheduled(fixedDelay = 5*60*1000). It will guarantee that next invocation will happen strictly in 5 minutes after previous one is finished. But this may break your scheduling requirements
2) You can limit the underlying ThreadExecutor's pool size to 1 thread, so next invocation will have to wait until previous is finished, but this, again, can break the logic, since it would affect all periodic tasks invoked by #Scheduled
3) You can use Quartz instead of spring's native #Scheduled. It's more complicated to configure, but allows to achieve the desired behaviour via #DisallowConcurrentExecution annotation or via setting JobDetail::isConcurrentExectionDisallowed in your job details

Quartz one time job on application startup

I am trying to integrate a Quartz job in my spring application. I got this example from here. The example shows jobs executing at repeated intervals using a simpletrigger and at a specific time using a crontrigger.
My requirement is to run the job only once on application startup. I removed the property repeatInterval, but the application throws an exception :
org.quartz.SchedulerException: Repeat Interval cannot be zero
Is there any way to schedule a job just once ?
Thanks..
Found the answer here
Ignoring the repeatInterval and setting repeatCount = 0 does what I wanted.
Spring SimpleTriggerFactoryBean does the job: if you don't specify the start time, it will set it to 'now'.
Yet I think that long-running one-time job should be considered an anti-pattern, since it will not work even in 2-node cluster: if the node that runs the job goes down, there will be no one that would restart the job.
I prefer to have a job that repeats e.g. every hour, but annotated with #DisallowConcurrentExecution. This way you guarantee that precisely one job will be running, both when the node that originally hosted the job is up, and after it goes down.

Resources