SCDF: Restart and resume a composed task - spring

SCDF Composed Task Runner gives us the option to turn on the --increment-instance-enabled. This option creates an artificial run.id parameter, which increments for every run. Therefore the task is unique for Spring Batch and will restart.
The problem with the IdIncrementer is when I mix it with execution without the IdIncrementer. In the event when a task does not finish, I want to resume the Task. The problem I encountered was when the task finishes without the IdIncrementer, I could not start the task again with the IdIncrementer.
I was wondering what would be the best way to restart with the option to resume?
My idea would be to create a new IdResumer, which uses the same run.id as the last execution.
We are run SCDF 2.2.1 on Openshift v3.11.98 and we use CTR 2.1.1.
The steps to reproduce this:
Create a new SCDF Task Definition with the following definition: dummy1:dummy && dummy2: dummy && dummy3: dummy. The dummy app is a docker container, that fails randomly with 50% chance.
Execute the SCDF Task with the --increment-instance-enabled=true and wait for one of the dummy task to fail (restart if needed).
To resume the same failed execution, execute the SCDF Task now --increment-instance-enabled=false. And let it finish successfully (Redo if needed).
Start the SCDF Task again with --increment-instance-enabled=true.
At step 4 the composed task throws the JobInstanceAlreadyCompleteException, even though the --increment-instance-enabled is enabled again.
Caused by:
org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException:
A job instance already exists and is complete for
parameters={-spring.cloud.data.flow.taskappname=composed-task-runner,
-spring.cloud.task.executionid=3190, -spring.datasource.username=testuser, -graph=aaa-stackoverflow-dummy2 && aaa-stackoverflow-dummy3, -spring.cloud.data.flow.platformname=default, -spring.datasource.url=jdbc:postgresql://10.10.10.10:5432/tms_efa?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory&currentSchema=dev,
-spring.datasource.driverClassName=org.postgresql.Driver, -spring.datasource.password=pass1234, -spring.cloud.task.name=aaa-stackoverflow, -dataflowServerUri=https://scdf-dev.company.com:443/ , -increment-instance-enabled=true}. If you want to run this job again, change the parameters.
Is there a better way to resume and restart the task?

Related

Spring FlatFileItemWriter org.springframework.batch.item.ItemStreamException: Unable to create file:

I have a spring batch job that uses FlatFileItemWriter in a step to generate outfile.
This batch application will run chronically (say every minute).
However this application can run successfully in the first invocation but always fails in
later invocations due to
org.springframework.batch.item.ItemStreamException: Unable to create file: [C:\temp\20220413.txt]
at org.springframework.batch.item.util.FileUtils.setUpOutputFile(FileUtils.java:84)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.initializeBufferedWriter(AbstractFileItemWriter.java:551)
at org.springframework.batch.item.support.AbstractFileItemWriter$OutputState.access$000(AbstractFileItemWriter.java:385)
at org.springframework.batch.item.support.AbstractFileItemWriter.doOpen(AbstractFileItemWriter.java:319)
at org.springframework.batch.item.support.AbstractFileItemWriter.open(AbstractFileItemWriter.java:309)
at org.springframework.batch.item.support.AbstractFileItemWriter$$FastClassBySpringCGLIB$$f2d35c3.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
Permissions of the dir and file are set properly.
I do have
FlatFileItemWriter<T> writer = new FlatFileItemWriter<T>();
writer.setShouldDeleteIfExists(true);
Application restart won't hit this issue.
I read that I don't need to explicitly call close.
So what could be the root cause of "Unable to create file:"?

Spring Batch doesn't update Job repository after force termination

I'm using SpringBoot 2.4.x app with SpringBatch 4.3.x. I've created a simple job.
Where I've FlatFileItemReader which reads from CSV file. I've ImportKafkaItemWriter which writes to Kafka topic. One step where I combines these. I'm using SimpleJobLauncher and I've set ThreadPoolTaskExecutor as TasKExecutor of the JobLauncher. It is working fine as I've expected. But one resilience use case I've which is if I kill the app and then restart the app and trigger the job then it would carry on and finish the remaining job. Unfortunately it is not happening. I did further investigate and found that when I forcibly close the app SpringBatch job repository key tables look like this:
job_execution_id
version
job_instance_id
create_time
start_time
end_time
status
exit_code
exit_message
last_updated
job_configuration_location
1
1
1
2021-06-16 09:32:43
2021-06-16 09:32:43
STARTED
UNKNOWN
2021-06-16 09:32:43
and
step_execution_id
version
step_name
job_execution_id
start_time
end_time
status
commit_count
read_count
filter_count
write_count
read_skip_count
write_skip_count
process_skip_count
rollback_count
exit_code
exit_message
last_updated
1
4
productImportStep
1
2021-06-16 09:32:43
STARTED
3
6
0
6
0
0
0
0
EXECUTING
2021-06-16 09:32:50
If I manually update these tables where I set a valid end_time and status to FAILED then I can restart the job and works absolutely fine. May I know what I need to do so that Spring Batch can update those relevant repositories appropriately and I can avoid this manual steps. I can provide more information about code if needed.
If I manually update these tables where I set a valid end_time and status to FAILED then I can restart the job and works absolutely fine. May I know what I need to do so that Spring Batch can update those relevant repositories appropriately and I can avoid this manual steps
When a job is killed abruptly, Spring Batch won't have a chance to update its status in the Job repository, so the status is stuck at STARTED. Now when the job is restarted, the only information that Spring Batch has is the status in the job repository. By just looking at the status in the database, Spring Batch cannot distinguish between a job that is effectively running and a job that has been killed abruptly (in both cases, the status is STARTED).
The way to go in indeed manually updating the tables to either mark the status as FAILED to be able to restart the job or ABANDONED to abandon it. This is a business decision that you have to make and there is no way to automate it on the framework side. For more details, please refer to the reference documentation here: Aborting a Job.
You can add a faked parameter example Version a counter to increment for every new job execution so you don't have to check for the table database job.
What I mean mvn clean package
Then you try to launch the program like this :
java my-jarfile.jar dest=/tmp/foo Version="0"
java my-jarfile.jar dest=/tmp/foo Version="1"
java my-jarfile.jar dest=/tmp/foo Version="2"
etc ... Or
You Can use jobParameters to launch thé job programatically via jobLauncher and use date paramèter date = new Date().toString() which gives date with New stamp on every New job execution
You can use "JVM Shutdown Hook":
Something like this:
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
if (jobExecution.isRunning()) {
jobExecution.setEndTime(new Date());
jobExecution.setStatus(BatchStatus.FAILED);
jobExecution.setExitStatus(ExitStatus.FAILED);
jobRepository.update(jobExecution);
}
}));

passing properties to child task of composed-task-runner app of spring cloud dataflow

I am trying to run composed-task-runner of spring cloud data flow, which has 2 child tasks (A and B) within it. I am passing properties to my-composed-task, which has to be passed to child tasks too, but it is not passing into the child applications.
Below are the commands which I am using:
for the creation of task in dataflow shell:
task create my-composed-task --definition "A && B"
for launching the task in dataflow shell:
task launch my-composed-task --arguments "--spring.cloud.task.closecontextEnabled=true --increment-instance-enabled=true --composed-task-arguments=measurementyear=2020,--logging.level.org=ERROR,--spring.datasource.url=url,--spring.datasource.username=username,--spring.datasource.password=password" --properties "deployer.composed-task-runner.local.javaOpts=-Xmx8g"
the properties argument "deployer.composed-task-runner.local.javaOpts=-Xmx8g" is not going into the child applications of my-composed-task.
After launching through dataflow shell, server shows below commands for the task created:
my-composed-task
o.s.c.d.spi.local.LocalTaskLauncher : Command to be executed: C:\Program Files\jdk1.8.0_152\jre\bin\java.exe -Xmx8g -jar C:\Users\shivani.chittauri\Desktop\jar\composedtaskrunner-task-2.1.0.RELEASE.jar --spring.cloud.task.closecontextEnabled=true --composed-task-arguments=measurementyear=2020,--spring.messages.encoding=UTF-8,--logging.level.org=ERROR,--spring.datasource.url=url,--spring.datasource.username=username,--spring.datasource.password=password --increment-instance-enabled=true --spring.cloud.dataflow.task.platform.local.accounts.default.javaOpts=-Xmx8g --spring.cloud.data.flow.platformname=default --spring.cloud.task.executionid=49 --spring.cloud.data.flow.taskappname=composed-task-runner
It has javaOpts as -Xmx8g, which is correct
Task A
o.s.c.d.spi.local.LocalTaskLauncher : Command to be executed: C:\Program Files\jdk1.8.0_152\jre\bin\java.exe -jar C:\Users\shivani.chittauri\Desktop\jar\A-1.0-SNAPSHOT.jar --spring.datasource.username=username --spring.datasource.url=url --spring.messages.encoding=UTF-8 --logging.level.org=ERROR --spring.datasource.password=password measurementyear=2020 --spring.cloud.data.flow.platformname=default --spring.cloud.task.executionid=50
It does not has javaOpts, but it should have
Task B
o.s.c.d.spi.local.LocalTaskLauncher : Command to be executed: C:\Program Files\jdk1.8.0_152\jre\bin\java.exe -jar C:\Users\shivani.chittauri\Desktop\jar\B-1.0-SNAPSHOT.jar --spring.datasource.username=username --spring.datasource.url=url --spring.messages.encoding=UTF-8 --logging.level.org=ERROR --spring.datasource.password=password measurementyear=2020 --spring.cloud.data.flow.platformname=default --spring.cloud.task.executionid=51
It does not has javaOpts, but it should have
I want javaOpts to be present in child tasks too, what could be done to resolve it???
Thanks in advance!!
If you want to pass the app or deployer properties into the child tasks, then you need to reference the app or label name of the child task in order to qualify them for the child tasks.
For example, in your case, you need to use something like this:
deployer.my-composed-task.A.local.javaOpts=-Xmx8g
You see the name of the composed task my-composed-task with the name of the child task A in the above property reference.
For more information on this, you can check the documentation here

Spring Batch Integration job instance already exists on start up

I am using spring batch integration to poll for a file and process it and was looking for some guidance on the job parameters aspect of it. I am using the following to create a job launch request and turn a file into the request
#Transformer
public JobLaunchRequest toRequest(Message<File> message) {
JobParametersBuilder jobParametersBuilder =
new JobParametersBuilder();
jobParametersBuilder.addString(fileParameterName,
message.getPayload().getAbsolutePath());
jobParametersBuilder.addLong("time", new Date().getTime());
return new JobLaunchRequest(job, jobParametersBuilder.toJobParameters());
}
On starting up the application for the first time there is only one parameter run.id. If i add a file to repository that the file poller is looking in it creates 2 parameters in the db: fileParameterName and time. If I start the application again it will use the previous values for parameters fileParameterName and time and add a new run.id. The message on the initial start up is :
Job: ... launched with the following parameters: [{run.id=1}]
If I add a file my application handles the file correctly:
Job: ... launched with the following parameters:[{input.file.name=C:\Temp\test.csv, time=1472051531556}]
but if I stop and start the application again I get the following message:
Job: ... launched with the following parameters: [{time=1472051531556, run.id=1, input.file.name=C:\Temp\test.csv}]
My question is why on this start up it is looking at the previous parameters? Is there a way to add the current time as a parameter on start up instead of the previous time so I dont get "A job instance already exists and is complete for parameters={}"? Or to stop the jobs running on start up?
Also if the application is running and I add a file it will enter the toRequest method but it does not on start up.
Any help would be great.
Thanks
We should have a parameter as 'run.id' with 'current timestamp' to where we kick off Spring Batch Job. This is how we kick off a Spring Batch job from shell script.
RUN_ID=$(date +"%Y-%m-%d %H:%M:%S") JOB_PARAMS="filename=XXX"
$JAVA_HOME
org.springframework.batch.core.launch.support.CommandLineJobRunner
springbatch_XXX.xml SpringBatchJob run.id="$RUN_ID" ${JOB_PARAMS}

Informatica error 1417 :: Task not yet registered with this service process

I am getting following error while running a workflow in informatica.
Session task instance [worklet.session] : [TM_6775 The master DTM process was unable to connect to the master service process to update the session status with the following message: error message [ERROR: The session run for [Session task instance [worklet.session]] and [ folder id = 206, workflow id = 16042, workflow run id = 65095209, worklet run id = 65095337, task instance id = 13272 ] is not yet registered with this service process.] and error code [1417].]
This error comes randomly for many other sessions, when they are ran through workflow as a whole. However if I "start task" that failed task next time, it runs successfully.
Any help is much appreciated.
Just an idea to try if you use versioning. Check that everthing is checked in correctly. If the mapping, worflow or worklet is checked out then you and informatica will run different versions wich may cause the behaivour to differ when you start it manually.
Infromatica will allways use the checked in version and you will allways use the checked out version.

Resources