Spring batch A job instance already exists - spring

OK,
I know this has been asked before, but I still can't find a definite answer to my question. And my question is this: I am using spring batch to export data to SOLR search server. It needs to run every minute, so I can export all the updates. The first execution passes OK, but the second one complains with:
2014-10-02 20:37:00,022 [defaultTaskScheduler-1] ERROR: catching
org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={catalogVersionPK=3378876823725152,
type=UPDATE}. If you want to run this job again, change the parameters.
at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:126)
at
Of course I can add a date-time parameter to the job like this:
.addLong("time", System.getCurrentTimeMillis())
and then the job can be run more than once. However, I also want to query the last execution of the job, so I have code like this:
DateTime endTime = new DateTime(0);
JobExecution je = jobRepository.getLastJobExecution("searchExportJob", new JobParametersBuilder().addLong("catalogVersionPK", catalogVersionPK).addString("type", type).toJobParameters());
if (je != null && je.getEndTime() != null) {
endTime = new DateTime(je.getEndTime());
}
and this returns nothing, because I didn't provide the time parameter. So seems like I can run the job once and get the last execution time, or i can run it multiple times and not get the last execution time. I am really stuck :(

Assumption
Spring Batch use some tables to store each JOB executed with its parameters.
If you run twice the job with the same parameters, the second one fails, because the job is identified by jobName and parameters.
1# Solution
You could use JobExecution when run a new Job.
JobExecution execution = jobLauncher.run(job, new JobParameters());
.....
// Use a JobExecutionDao to retrieve the JobExecution by ID
JobExecution ex = jobExecutionDao.getJobExecution(execution.getId());
2# Solution
You could implement a custom JobExecutionDao and perform a custom query to find your JobExecution on BATCH_JOB_EXECUTION table.
See here the reference of Spring.
I hope my answer is helpful to you.

Use the Job Explorer as suggested by Luca Basso Ricci.
Because you do not know the job parameters you need to look up by instance.
Look for the last instance of job named searchExportJob
Look for the last execution of the instance above
This way you use Spring Batch API only
//We can set count 1 because job instance are ordered by instance id descending
//so we know the first one returned is the last instance
List<JobInstance> instances = jobExplorer.getJobInstances("searchExportJob",0,1);
JobInstance lastInstance = instances.get(0);
List<JobExecution> jobExecutions = jobExplorer.getJobExecutions(lastInstance);
//JobExcectuin is ordered by execution id descending so first
//result is the last execution
JobExecution je = jobExecutions.get(0);
if (je != null && je.getEndTime() != null) {
endTime = new DateTime(je.getEndTime());
}
Note this code only work for Spring Batch 2.2.x and above in 2.1.x the API was somewhat different

There is another interface you can use: JobExplorer
From its javadoc:
Entry point for browsing executions of running or historical jobs and
steps. Since the data may be re-hydrated from persistent storage, it
may not contain volatile fields that would have been present when the
execution was active

If you are debugging your batch job and terminate your batch job before completing, then it will give this error if you try to start it again.
To start it again either you need to update the name of your job so that it will create another execution id.
or you can update below tables.
BATCH_JOB_EXECUTION
BATCH_STEP_EXECUTION
You need to update Status and End_time columns with non null values.

Create new Job RunId everytime.
If your code involves creating Job object using Jobfactory then below snippet would be useful for the problem:
return jobBuilderFactory
.get("someJobName")
.incrementer(new RunIdIncrementer()) // solution lies here- creating new job id everytime
.flow( // and here
stepBuilderFactory
.get("someTaskletStepName")
.tasklet(tasklet) // u can replace it with step
.allowStartIfComplete(true) // this will make the job run even if complete in last run
.build())
.end()
.build();

Related

Updating a QuartzJob from the running job itself

The update of a QuartzJob within a spring boot application works while the job is not running (here or here). The spring variable spring.quartz.overwrite-existing-jobs: true is set.
However, when doing the same from within a running job the job keeps firing itself in an endless loop without taking into account the interval time (each few milliseconds it fires again). I even tried doing the same from within a TriggerListener but that doesn't change it.
As code example I would have nothing else but what is given in the second link above:
// retrieve the trigger
Trigger oldTrigger = sched.getTrigger(triggerKey("oldTrigger", "group1");
// obtain a builder that would produce the trigger
TriggerBuilder tb = oldTrigger.getTriggerBuilder();
// update the schedule associated with the builder, and build the new trigger
// (other builder methods could be called, to change the trigger in any desired way)
Trigger newTrigger = tb.withSchedule(simpleSchedule()
.withIntervalInSeconds(10)
.withRepeatCount(10)
.build();
sched.rescheduleJob(oldTrigger.getKey(), newTrigger);
Did anyone try that from within a running job?
It works with the following trigger. It is the startAt which makes the difference. Without that the trigger fires immediately again.
Trigger trigger = newTrigger()
.withIdentity(triggerName, groupname)
.startAt(Date.from(LocalDateTime.now().plusSeconds(intervalInSeconds).atZone(ZoneId.systemDefault()).toInstant()))
.withSchedule(SimpleScheduleBuilder.simpleSchedule()
.withIntervalInSeconds(intervalInSeconds)
.repeatForever()
.withMisfireHandlingInstructionIgnoreMisfires())
.build();

Batch or Chain for jobs inside jobs

I have job A which downloads xml and then calls other job B which will create data on database. This job B will be called in loop and can be more than 10.000 items. First tried to use chain method but problem is that, if someone will call queue in wrong sequence it will not work. Then tried to use batch from new Laravel 8. Collecting all jobs (more than 10000) to one batch can cause out of memory exception. Other problem is calling job C at the end. This job will update some credentials. Thats why job A and B must be runned successfully. May be there is any good idea for this situation?
Laravel's job batching feature allows you to easily execute a batch of jobs and then perform some action when the batch of jobs has completed executing.
If you have an out-of-memory problem with Jobs Batching you are doing things wrong. Since the queues are executed one by one if you have it configured that way there should be no problems, even if they are more than 100k records. So, make sure you glue one Job for each item, and execute the action, you won't have problems with this.
Then, you could do something like this.
$chain = [
new ProcessPodcast(Podcast::find(1)),
new ProcessPodcast(Podcast::find(2)),
new ProcessPodcast(Podcast::find(3)),
new ProcessPodcast(Podcast::find(4)),
new ProcessPodcast(Podcast::find(5)),
...
// And so on for all your items.
// This should be generated by a foreach with all his items.
];
Bus::batch($chain)->then(function (Batch $batch) {
// All jobs completed successfully...
// Uupdate some credentials...
})->catch(function (Batch $batch, Throwable $e) {
// First batch job failure detected...
})->finally(function (Batch $batch) {
// The batch has finished executing...
})->dispatch();

Is it possible to lock some entries in MongoDB and do a query that do not take into account the locked recors?

I have a mongoDB that contains a list of "task" and two istance of executors. This 2 executors have to read a task from the DB, save it in the state "IN_EXECUTION" and execute the task. Of course I do not want that my 2 executors execute the same task and this is my problem.
I use the transaction query. In this way when An executor try to change state of the task it get "write exception" and have to start again and read a new task. The problem of this approach is that sometimes an Executor get a lot of errors before it can save the change of task state correctly and execute a new task. So it is like I have only one exector.
Note:
- I do not want to block my entire DB on read/write becouse in this way I will slow down the entire process.
- I think it is necessay to save the state of the task because it could be a long task.
I asked if it is possible to lock only certain record and execute a query on the "not-locked" records but each advices that solves my problem will be really appriciated.
Thanks in advance.
EDIT1:
Sorry, I simplified the concept in the question above. Actually I extract n messages that I have to send. I have to send this messages in block of 100 messages so my executors will split the messages extracted in block of 100 and pass them to others executors basically.
Each executor extract the messages and then update them with the new state. I hope this is more clear now.
#Transactional(readOnly = false, propagation = Propagation.REQUIRED)
public List<PushMessageDB> assignPendingMessages(int limitQuery, boolean sortByClientPriority,
LocalDateTime now, String senderId) {
final List<PushMessageDB> messages = repositoryMessage.findByNotSendendAndSpecificError(limitQuery, sortByClientPriority, now);
long count = repositoryMessage.updateStateAndSenderId(messages, senderId, MessageState.IN_EXECUTION);
return messages;
}
DB update:
public long updateStateAndSenderId(List<String> ids, String senderId, MessageState messageState) {
Query query = new Query(Criteria.where(INTERNAL_ID).in(ids));
Update update = new Update().set(MESSAGE_STATE, messageState).set(SENDER_ID, senderId);
return mongoTemplate.updateMulti(query, update, PushMessageDB.class).getModifiedCount();
}
You will have to do the locking one-by-one.
Trying to lock 100 records at once and at the same time have a second process also lock 100 records (without any coordination between the two) will almost certainly result in an overlapping set unless you have a huge selection of available records.
Depending on your application, having all work done by one thread (and the other being just a "hot standby") may also be acceptable as long as that single worker does not get overloaded.

Quartz .NET - Prevent parallel Job Execution

I am using Quartz .NET for job scheduling.
So I created one job class (implementing IJob).
public class TransferData : IJob
{
public Task Execute(IJobExecutionContext context){
string tableName = context.JobDetail.JobDataMap.Get("table");
// Transfer the table here.
}
}
So I want to transfer different and multiple tables. For this purpose I am doing something like this:
foreach (Table table in tables)
{
IJobDetail job = JobBuilder.Create<TransferData>()
.WithIdentity(new JobKey(table.Name, "table_transfer"))
.UsingJobData("table", table.Name)
.Build();
ITrigger trigger = TriggerBuilder.Create()
.WithIdentity(new TriggerKey("trigger_" + table.Name, "table_trigger"))
.WithCronSchedule("*/5 * * * *")
.ForJob(job)
.Build();
await this.scheduler.ScheduleJob(job, trigger);
}
So every table should be transfered every 5 minutes. To achieve this I create several jobs with different job names.
The question is: how to prevent the parallel job execution for the same jobName? (e.g. the previous run takes longer for one table, so I do not want to start the next transfer for the same table.)
I know about the attribute #DisallowConcurrentExecution, but this is used to prevent the parallel execution for the same Job class. I do not want to write an extra Job class per table, because the "main" code for the transfer is always the same, the one and only difference is the table name. So I want to use the same job class for this purpose.
The Quatz .NET documentation is a little bit confusing.
DisallowConcurrentExecution is an attribute that can be added to the
Job class that tells Quartz not to execute multiple instances of a
given job definition (that refers to the given job class)
concurrently. Notice the wording there, as it was chosen very
carefully. In the example from the previous section, if
“SalesReportJob” has this attribute, than only one instance of
“SalesReportForJoe” can execute at a given time, but it can execute
concurrently with an instance of “SalesReportForMike”. The constraint
is based upon an instance definition (JobDetail), not on instances of
the job class. However, it was decided (during the design of Quartz)
to have the attribute carried on the class itself, because it does
often make a difference to how the class is coded.
Source: https://www.quartz-scheduler.net/documentation/quartz-3.x/tutorial/more-about-jobs.html
But if you read the API documentation, it's says: the bold text is important!
An attribute that marks a IJob class as one that must not have
multiple instances executed concurrently (where instance is based-upon
a IJobDetail definition - or in other words based upon a JobKey).
Source: https://quartznet.sourceforge.io/apidoc/3.0/html/
In other words: the DisallowConcurrentExecution attribute works for my purposes.

Measure Hadoop job time using JobControl

I used to launch my Hadoop job with the following
long start = new Date().getTime();
boolean status = job.waitForCompletion(true);
long end = new Date().getTime();
This way I could measure the time taken by the job once it ends directly in my code.
Now I have to use the JobControl in order to express dependencies between my jobs:
JobControl jobControl = new JobControl("MyJob");
jobControl.addJob(job1);
jobControl.addJob(job2);
job3.addDependingJob(job2);
jobControl.addJob(job3);
jobControl.run();
However once jobControl.run() has been executed, the code never goes further so I cannot include code to poll on the jobControl.getState() for the completion of the job.
How can I measure the time taken by a job using JobControl?
JobControl has no nice functionality to allow you to hook and get this information. You have some (potentially painful) options to try:
Start JobControl.run() in a separate thread, and in your main thread, poll the JobControl.getXXXJobs() methods to track when jobs change state
Look into using the Job End Notification URL hook, but this will require you to start a 'server' in your client to receive the notification events, and then try to work backwards from when a job ends
Extend the JobControl and jobcontrol.Job objects to track when a job changes state and add methods to query the start / end times

Resources