Spring Batch restart uncompleted jobs from the same execution and step - spring

I use the following logic to restart the Spring Batch uncompleted(for example after application abnormal termination) jobs:
public void restartUncompletedJobs() {
LOGGER.info("Restarting uncompleted jobs");
try {
jobRegistry.register(new ReferenceJobFactory(documetPipelineJob));
List<String> jobs = jobExplorer.getJobNames();
for (String job : jobs) {
Set<JobExecution> runningJobs = jobExplorer.findRunningJobExecutions(job);
for (JobExecution runningJob : runningJobs) {
runningJob.setStatus(BatchStatus.FAILED);
runningJob.setEndTime(new Date());
jobRepository.update(runningJob);
jobOperator.restart(runningJob.getId());
LOGGER.info("Job restarted: " + runningJob);
}
}
} catch (Exception e) {
LOGGER.error(e.getMessage(), e);
}
}
This works fine but with one side effect - it doesn't restart the failed job execution but creates a new execution instance. How to change this logic in order to restart the failed execution from the failed step and do not create a new execution ?
UPDATED
When I try the following code:
public void restartUncompletedJobs() {
try {
jobRegistry.register(new ReferenceJobFactory(documetPipelineJob));
List<String> jobs = jobExplorer.getJobNames();
for (String job : jobs) {
Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions(job);
for (JobExecution jobExecution : jobExecutions) {
jobOperator.restart(jobExecution.getId());
}
}
} catch (Exception e) {
LOGGER.error(e.getMessage(), e);
}
}
it fails with the following exception:
2018-07-30 06:50:47.090 ERROR 1588 --- [ main] c.v.p.d.service.batch.BatchServiceImpl : Illegal state (only happens on a race condition): job execution already running with name=documetPipelineJob and parameters={ID=826407fa-d3bc-481a-8acb-b9643b849035, inputDir=/home/public/images, STORAGE_TYPE=LOCAL}
org.springframework.batch.core.UnexpectedJobExecutionException: Illegal state (only happens on a race condition): job execution already running with name=documetPipelineJob and parameters={ID=826407fa-d3bc-481a-8acb-b9643b849035, inputDir=/home/public/images, STORAGE_TYPE=LOCAL}
at org.springframework.batch.core.launch.support.SimpleJobOperator.restart(SimpleJobOperator.java:283) ~[spring-batch-core-4.0.1.RELEASE.jar!/:4.0.1.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobOperator$$FastClassBySpringCGLIB$$44ee6049.invoke(<generated>) ~[spring-batch-core-4.0.1.RELEASE.jar!/:4.0.1.RELEASE]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) [spring-core-5.0.6.RELEASE.jar!/:5.0.6.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:684) [spring-aop-5.0.6.RELEASE.jar!/:5.0.6.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobOperator$$EnhancerBySpringCGLIB$$7659d4c.restart(<generated>) ~[spring-batch-core-4.0.1.RELEASE.jar!/:4.0.1.RELEASE]
at com.example.pipeline.domain.service.batch.BatchServiceImpl.restartUncompletedJobs(BatchServiceImpl.java:143) ~[domain-0.0.1.jar!/:0.0.1]
The following code creates new executions in jobstore database:
public void restartUncompletedJobs() {
try {
jobRegistry.register(new ReferenceJobFactory(documetPipelineJob));
List<String> jobs = jobExplorer.getJobNames();
for (String job : jobs) {
Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions(job);
for (JobExecution jobExecution : jobExecutions) {
jobExecution.setStatus(BatchStatus.STOPPED);
jobExecution.setEndTime(new Date());
jobRepository.update(jobExecution);
Long jobExecutionId = jobExecution.getId();
jobOperator.restart(jobExecutionId);
}
}
} catch (Exception e) {
LOGGER.error(e.getMessage(), e);
}
}
The question is - how to continue to run the old uncompleted executions without creating new ones after application restart?

TL;DR: Spring Batch will always create new Job Execution and will not reuse a previous failed job execution to continue its execution.
Longer answer: First you need to understand three similar but different concept in Spring Batch: Job, Job Instance, Job Execution
I always use this example:
Job : End-Of-Day Batch
Job Instance : End-Of-Day Batch for 2018-01-01
Job Execution: End-Of-Day Batch for 2018-01-01, execution #1
In high-level, that's how Spring Batch's recovery works:
Assuming your first execution failed in the step 3. You can submit the same Job (End-of-Day Batch) with same Parameters (2018-01-01). Spring Batch will try to look up last Job Execution (End-Of-Day Batch for 2018-01-01, execution #1) of the submitted Job Instance (End-of-Day Batch for 2018-01-01), and found that it has previously failed in step 3. Spring Batch will then create a NEW execution, [End-Of-Day Batch for 2018-01-01, execution #2], and start the execution from step 3.
So by design, what Spring trying to recover is a previously failed Job Instance (instead of Job Execution). Spring batch will not reuse execution when you are re-running a previous-failed execution.

Related

How to use RemoteFileTemplate<SmbFile> in Spring integration?

I've got a Spring #Component where a SmbSessionFactory is injected to create a RemoteFileTemplate<SmbFile>. When my application runs, this piece of code is called multiple times:
public void process(Message myMessage, String filename) {
StopWatch stopWatch = StopWatch.createStarted();
byte[] bytes = marshallMessage(myMessage);
String destination = smbConfig.getDir() + filename + ".xml";
if (log.isDebugEnabled()) {
log.debug("Result: {}", new String(bytes));
}
Optional<IOException> optionalEx =
remoteFileTemplate.execute(
session -> {
try (InputStream inputStream = new ByteArrayInputStream(bytes)) {
session.write(inputStream, destination);
} catch (IOException e1) {
return Optional.of(e1);
}
return Optional.empty();
});
log.info("processed Message in {}", stopWatch.formatTime());
optionalEx.ifPresent(
ioe -> {
throw new UncheckedIOException(ioe);
});
}
this works (i.e. the file is written) and all is fine. Except that I see warnings appearing in my log:
DEBUG my.package.MyClass Result: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>....
INFO org.springframework.integration.smb.session.SmbSessionFactory SMB share init: XXX
WARN jcifs.smb.SmbResourceLocatorImpl Path consumed out of range 15
WARN jcifs.smb.SmbTreeImpl Disconnected tree while still in use SmbTree[share=XXX,service=null,tid=1,inDfs=true,inDomainDfs=true,connectionState=3,usage=2]
INFO org.springframework.integration.smb.session.SmbSession Successfully wrote remote file [path\to\myfile.xml].
WARN jcifs.smb.SmbSessionImpl Logging off session while still in use SmbSession[credentials=XXX,targetHost=XXX,targetDomain=XXX,uid=0,connectionState=3,usage=1]:[SmbTree[share=XXX,service=null,tid=1,inDfs=false,inDomainDfs=false,connectionState=0,usage=1], SmbTree[share=XXX,service=null,tid=5,inDfs=false,inDomainDfs=false,connectionState=2,usage=0]]
jcifs.smb.SmbTransportImpl Disconnecting transport while still in use Transport746[XXX/999.999.999.999:445,state=5,signingEnforced=false,usage=1]: [SmbSession[credentials=XXX,targetHost=XXX,targetDomain=XXX,uid=0,connectionState=2,usage=1], SmbSession[credentials=XXX,targetHost=XXX,targetDomain=null,uid=0,connectionState=2,usage=0]]
INFO my.package.MyClass processed Message in 00:00:00.268
The process method is called from a Rest method, which does little else.
What am I doing wrong here?

Detect shutting-down-controller in post action in declarative pipeline

I have success, failure and aborted under post in declarative pipeline, is there a way to detect
shutting down controller and perform different actions?
e.g:
post {
success {
// do some actions for successful completion
}
failure {
// do some actions for failure completion
}
aborted {
// do some actions when job gets aborted
}
controller-shutting-down {
echo "Jenkins is shutting down."
}
}
OR:
...
aborted {
if (reason == controller-shutting-down) {
echo "Jenkins is shutting down."
} else {
// do some actions when job gets aborted
}
}
...
Is there a way to acheive this?

How do I run multiple jobs with a given IJobConsumer within a single service instance?

I want to be able to execute multiple jobs concurrently on a Job Consumer. At the moment if I run one service instance and try to execute 2 jobs concurrently, 1 job waits for the other to complete (i.e. waits for the single job slot to become available).
However if I run 2 instances by using dotnet run twice to create 2 separate processes I am able to get the desired behavior where both jobs run at the same time.
Is it possible to run 2 (or more) jobs at the same time for a given consumer inside a single process? My application requires the ability to run several jobs concurrently but I don't have the ability to deploy many instances of my application.
Checking the application log I see this line which I feel may have something to do with it:
[04:13:43 DBG] Concurrent Job Limit: 1
I tried changing the SagaPartitionCount to something other than 1 on instance.ConfigureJobServiceEndpoints to no avail. I can't seem to get the Concurrent Job Limit to change.
My configuration looks like this:
services.AddMassTransit(x =>
{
x.AddDelayedMessageScheduler();
x.SetKebabCaseEndpointNameFormatter();
// registering the job consumer
x.AddConsumer<DeploymentConsumer>(typeof(DeploymentConsumerDefinition));
x.AddSagaRepository<JobSaga>()
.EntityFrameworkRepository(r =>
{
r.ExistingDbContext<JobServiceSagaDbContext>();
r.LockStatementProvider = new SqlServerLockStatementProvider();
});
// add other saga repositories here for JobTypeSaga and JobAttemptSaga here as well
x.UsingRabbitMq((context, cfg) =>
{
var rmq = configuration.GetSection("RabbitMq").Get<RabbitMq>();
cfg.Host(rmq.Host, rmq.Port, rmq.VirtualHost, h =>
{
h.Username(rmq.Username);
h.Password(rmq.Password);
});
cfg.UseDelayedMessageScheduler();
var options = new ServiceInstanceOptions()
.SetEndpointNameFormatter(context.GetService<IEndpointNameFormatter>() ?? KebabCaseEndpointNameFormatter.Instance);
cfg.ServiceInstance(options, instance =>
{
instance.ConfigureJobServiceEndpoints(js =>
{
js.SagaPartitionCount = 1;
js.FinalizeCompleted = true;
js.ConfigureSagaRepositories(context);
});
instance.ConfigureEndpoints(context);
});
});
}
Where DeploymentConsumerDefinition looks like
public class DeploymentConsumerDefinition : ConsumerDefinition<DeploymentConsumer>
{
protected override void ConfigureConsumer(IReceiveEndpointConfigurator endpointConfigurator,
IConsumerConfigurator<DeploymentConsumer> consumerConfigurator)
{
consumerConfigurator.Options<JobOptions<DeploymentConsumer>>(options =>
{
options.SetJobTimeout(TimeSpan.FromMinutes(20));
options.SetConcurrentJobLimit(10);
options.SetRetry(r =>
{
r.Ignore<InvalidOperationException>();
r.Interval(5, TimeSpan.FromSeconds(10));
});
});
}
}
Your definition should specify the job consumer message type, not the job consumer type:
public class DeploymentConsumerDefinition : ConsumerDefinition<DeploymentConsumer>
{
protected override void ConfigureConsumer(IReceiveEndpointConfigurator endpointConfigurator,
IConsumerConfigurator<DeploymentConsumer> consumerConfigurator)
{
// MESSAGE TYPE NOT CONSUMER TYPE
consumerConfigurator.Options<JobOptions<DeploymentCommand>>(options =>
{
options.SetJobTimeout(TimeSpan.FromMinutes(20));
options.SetConcurrentJobLimit(10);
options.SetRetry(r =>
{
r.Ignore<InvalidOperationException>();
r.Interval(5, TimeSpan.FromSeconds(10));
});
});
}
}

parameterized-remote-trigger is throwing 405 exception

I am trying to trigger a job A(this is configured as trigger remote) remotely from another job B, and job B needs to hold until results come back to show success or failure, I initially tried using rest API using curl command, it perfectly works.here's the curl code:
curl -v -X POST 'https://xxx.xxx/xxx-xxx/job/xxx/job/master/buildWithParameters?config_files=./jenkins/unit-tests.json' --user xxxx:110f4dfa33ba8f8ef5d8d299beb6aa1543
I choose parameterized plugin code which installed on Jenkins server because it handles the polling mechanism internally and also has handler friendly methods. please see below code for remoteJob, but it fails with 405 error, that means method not allowed in HTTP language, looks like plugin is using GET method instead of post. I added an option for logging , but it does not seems to be showing more log.
def handle = triggerRemoteJob(
remoteJenkinsName: 'remote-master',
job: 'https://xxx.xxx.com/xxx-xxx/job/xxx/job/master/buildWithParameters',
remoteJenkinsUrl: 'https://xxx.xxx.xxx/xxx-xxx/job/xxx/job/master/buildWithParameters',
auth: TokenAuth(apiToken: hudson.util.Secret.fromString('110f4dfa33ba8f8ef5d8d299beb6aa1543'), userName: 'xxxx'),
parameters: 'config_files=./jenkins/unit-tests')
I am getting following error -
[Pipeline] triggerRemoteJob
##########################################################################
Parameterized Remote Trigger Configuration:
- job: https://xxx.xxx.xxx/xxx-xxx/job/xxx/job/master/buildWithParameters
- remoteJenkinsUrl: https://xxx.xxx.xxx/xxx-xxx/job/ius/job/master/buildWithParameters
- auth: 'Token Authentication' as user 'sseri'
- parameters: [config_files=./jenkins/unit-tests]
- blockBuildUntilComplete: true
- connectionRetryLimit: 5
- trustAllCertificates: false
##########################################################################
Connection to remote server failed [405], waiting to retry - 10 seconds until next attempt. URL: https://xxx.xxx.xxx/xxx-xxx/job/xxx/job/master/buildWithParameters/api/json, parameters:
Retry attempt #1 out of 5
Please help me in this regard!
I am not sure about the plugins you are using, but it's quite simple to implement this scenario "call a downstream job from upstream and fail upstream if the downstream fail" without any plugins.
Take a look at my example below.
let say if you have 2 jobs called jobA and jobB and your goal is to call jobB from jobA and fail the jobA if jobB fail.
**Scripted Pipeline for jobA **
node() {
try {
def jobB = build(job: jobName,parameters: [string(name:"parameterName",value: "parameterValue")])
def jobBStatus = jobB.getResult()
if(jobBStatus == "failed") {
throw new RuntimeException("Downstream job-b failed with reason ...");
}
...
}catch(Exception e) {
throw e
}
}
Declarative Pipeline for jobA
pipeline {
agent any;
stages {
stage('call jobB') {
steps {
script {
def jobB = build(job: jobName,parameters: [
string(name:"parameterName",value: "parameterValue")
])
def jobBStatus = jobB.getResult()
if(jobBStatus == "failed") {
error("Downstream job-b failed with reason ...")
}
}
}
}
}
}
Try using this Parameterized-Remote-Trigger-Plugin. It should give you what you want. I'm having some problems configuring it using authentication tokens and users using Jenkinsfile but if you are using the GUI im sure you will get the job done.

What happens with the QueueWorker when TTR ran out?

This relates to laravel 5.3, beanstalk, ttr and timeout working with Queue's and QueueWorkers. TTR: https://github.com/kr/beanstalkd/wiki/faq
If I understand correctly a job from the Queue gets the state reserved when a QueueWorker is picking it. This job state will be changed back to ready when the ttr runs out. But what happens with the QueueWorker?
Let's say the QueueWorker has a timeout set to 600 by the following command:
php artisan queue:work --tries=1 --timeout=600 --sleep=0
ttr is, as default, set to 60 seconds.
During the job a request is done to another site and it takes 120 seconds till response. After 60 seconds the job is set back to the ready state because the TTR. Will the QueueWorker keep working on the job till response has been received, maximum of 600 seconds? Or will the QueueWorker stop working on the job when TTR has been reached?
Actually, the QueueWorker will run till the job is completed. When you run queue worker without the daemon flag, it will run the code below:
return $this->worker->pop(
$connection, $queue, $delay,
$this->option('sleep'), $this->option('tries')
);
Reference:
https://github.com/laravel/framework/blob/5.2/src/Illuminate/Queue/Console/WorkCommand.php#L123
What this code does is pop its job from the queue and fire that job as a command:
public function process($connection, Job $job, $maxTries = 0, $delay = 0)
{
if ($maxTries > 0 && $job->attempts() > $maxTries) {
return $this->logFailedJob($connection, $job);
}
try {
$job->fire();
$this->raiseAfterJobEvent($connection, $job);
return ['job' => $job, 'failed' => false];
} catch (Exception $e) {
if (! $job->isDeleted()) {
$job->release($delay);
}
throw $e;
} catch (Throwable $e) {
if (! $job->isDeleted()) {
$job->release($delay);
}
throw $e;
}
}
Reference:
https://github.com/laravel/framework/blob/5.2/src/Illuminate/Queue/Worker.php#L213
Digging in the source for more information:
https://github.com/laravel/framework/tree/5.2/src/Illuminate/Queue

Resources