Executing a tasklet after an ItemWriter - spring

My purpose of this batch job is to fetch few documents from the DB, encrypt it and sftp it to a server. For this I am using item readers and writers. For encryption, I should use a tasklet which is in a jar(I don't own the source code). There are millions of records to be processed, so I am using some chunk-interval for reading and writing it. My problem is during encyrption(while calling the tasklet after every chunk of writing is complete).
Is there a way to call the tasklet after writer (in batch:chunk) ?
As of now, I am doing the following:
<batch:job id="batchJob">
<batch:step id="prepareStep" next="encryptStep">
<batch:tasklet task-executor="executor">
<batch:chunk reader="reader" processor="processor"
writer="writer" commit-interval="100" >
</batch:chunk>
</batch:tasklet>
</batch:step>
<batch:step id="encryptStep" next="uploadStep">
<batch:tasklet ref="encryptTasklet" />
</batch:step>
But the problem in the above approach is that only after reading, processing and writing all the million records only then encryptStep is called. But I want it to work in chunks, that is, execute encryptTasklet after every chunk write is executed. Is there a way to achieve this?
Please help.

Related

Spring batch : Exception on writer and rollback is done but no retry executed

I encoutered a problem when an exception is occured in writer phase.
One item caused rollback due to an integrity problem in database, and no retry is executed, thus the processor is never replayed.
While an item caused rollback, it would be skipped. And others items are retry with interval-commit to one.
But, in my case, no retry is done for others items with interval-commit to one
Would you know for what reason no retry is executed ?
Thanks by advance.
i hope you added the retry limit and the retry exception class needs to be listed in the retry-able list , check out below sample syntax for the same
<job id="flowJob">
<step id="retryStep">
<tasklet>
<chunk reader="itemReader" writer="itemWriter" processor="itemProcessor" commit-interval="1" retry-limit="3">
<retryable-exception-classes>
<include class="org.springframework.remoting.RemoteAccessException"/>
</retryable-exception-classes>
</chunk>
</tasklet>
</step>
</job>

records being processed twice

We have a spring batch which reach a bunch of data in the reader, processes it and writes it. It all happens as a batch.
I noticed that the processor and writer are going over the same data twice, once as a batch and once individual records.
for ex: writer reads 1000 records, sends 1000 records to the processor, sends 1000 records to the writer.
After this the records gets processed again, individually, but only processor and writer are being called.
We have log statements in all reader, processor, and writer and I can see the logs.
Is there any condition which can make the records being processed individually after they have been processed as a list?
<batch:job id="feeder-job">
<batch:step id="regular">
<tasklet>
<chunk reader="feeder-reader" processor="feeder-processor"
writer="feeder-composite-writer" commit-interval="#{stepExecutionContext['query-fetch-size']}"
skip-limit="1000000">
<skippable-exception-classes>
<include class="java.lang.Exception" />
<exclude class="org.apache.solr.client.solrj.SolrServerException"/>
<exclude class="org.apache.solr.common.SolrException"/>
<exclude class="com.batch.feeder.record.RecordFinderException"/>
</skippable-exception-classes>
</chunk>
<listeners>
<listener ref="feeder-reader" />
</listeners>
</tasklet>
</batch:step>
</batch:job>
You should well read about a feature before using it. Here you are correct that processing is happening twice only after error occurs.
Basically, you have defined a chunk / step which is fault tolerant to certain specified exceptions Configuring Skip Logic
Your step will not fail till total exception count remains below skip-limitbut on errors, chunk items will be processed twice - one by one, the second time and skipping bad records in second processing.

Spring Batch Step does not execute

I'm trying to fix a problem in Spring Batch that has been plaguing our system recently. We have a job that, for the most part works just fine. It's a multi-step job that downloads and processes data.
The problem is that sometimes the job will bomb out. Maybe the server we're trying to connect to throws an error or we shut down the server in the middle of the job. At this point the next time our quartz scheduler tries to run the job it doesn't seem to do anything. Below is an abridged version of this job definition:
<batch:job id="job.download-stuff" restartable="true">
<batch:validator ref="downloadValidator"/>
<batch:step id="job.download-stuff.download">
<batch:tasklet ref="salesChannelOrderDownloader" transaction-manager="transactionManager">
<batch:transaction-attributes isolation="READ_UNCOMMITTED" propagation="NOT_SUPPORTED"/>
<batch:listeners>
<batch:listener ref="downloadListener"/>
<batch:listener ref="loggingContextStepListener" />
</batch:listeners>
</batch:tasklet>
<batch:next on="CONTINUE" to="job.download-stuff.process-stuff.step" />
<batch:end on="*" />
</batch:step>
<batch:step id="job.download-stuff.process-stuff.step">
...
</batch:step>
<batch:listeners>
<batch:listener ref="loggingContextJobListener"/>
</batch:listeners>
Once it gets into this state, the downloadValidator runs, but it never makes it into the first step download-stuff.download. I set a breakpoint in the tasklet and it never makes it inside.
If I clear out all of the spring batch tables, which are stored in our mysql database, and restart the server it will begin working again, but I'd rather understand what prevents it from operating correctly at this point rather than employ scorched earth tactics to get the job running.
I'm a novice at Spring Batch, to put it mildly, so forgive me if I am omitting important details. I've set breakpoints and turned on logging to learn what I can.
What I have observed so far from going through the database is that entries appear to no longer be written to the BATCH_STEP_EXECUTION and BATCH_JOB_EXECUTION tables.
There are no BATCH_JOB_EXECUTION entries for the job that are not in COMPLETED status and no BATCH_STEP_EXECUTION entries that are not in COMPLETED
You'll see that there is a batch:validator defined, I've confirmed that spring batch calls that validator and that it makes it through successfully (set a breakpoint and stepped through). The first step does not get executed.
Neither the loggingContextJobListener nor the loggingContextStepListener seem to fire either. What could be causing this?
UPDATE
I took a closer look at the downloadListener added as a batch:listener. Here's the source code of afterStep:
#Override
#Transactional(propagation = Propagation.REQUIRES_NEW)
public ExitStatus afterStep(StepExecution stepExecution) {
long runSeconds = TimeUnit.NANOSECONDS.toSeconds(System.nanoTime() - nanoStart);
// If Success - we're good
if (stepExecution.getStatus() == BatchStatus.COMPLETED) {
Long endTs = stepExecution.getExecutionContext().getLong("toTime");
Date toTime = new Date(endTs);
handleSuccess(toTime, stepExecution.getWriteCount());
return null;
}
// Otherwise - record errors
List<Throwable> failures = stepExecution.getFailureExceptions();
handleError(failures);
return ExitStatus.FAILED;
}
I confirmed that the return ExitStatus.FAILED line executes and that the exception that was thrown is logged in the failureExceptions. It seems like once that happens the BATCH_JOB_EXECUTION entry is in COMPLETED status (and exit_code) and the STEP_EXECUTION is failed.
At this point, the entries in the BATCH_JOB_EXECUTION_PARAMS table remain. I actually tried modifying the values of their KEY_NAME and value columns, but this still didn't allow the job to run. As long as there are parameters tied to a JOB_EXECUTION_ID, another job belonging to the same BATCH_JOB_INSTANCE cannot run.
Once I remove the entries in BATCH_JOB_EXECUTION_PARAMS for that specific JOB_EXECUTION_ID, another BATCH_JOB_EXECUTION can run, even though all the BATCH_JOB_EXECUTION entries are in a completed state.
So I guess I have two questions- is that the correct behavior? And if so, what is preventing the BATCH_JOB_EXECUTION_PARAMS from being removed and how do I remove them?
Had the same issue, during test/debug process I kept job name and parameters same, make sure you are changing job name or job parameters to get different JobExecution
The JobParametersValidator, in your case the downloadValidator bean runs before the job kicks off.
What's happening in your case is the parameters you're passing the job are the same as that "blown up" JobInstance. However, because that job failed in dramatic fashion, it probably wasn't put into a failed status.
You can either run the job with different parameters (to get a new job instance) or try updating the status of the former step/job to FAILED in BATCH_STEP_EXECUTION or BATCH_JOB_EXECUTION before restarting.
UPDATE (new info added to question)
You must be carefull of your job flow here. Yes your step failed, but your context file indicates that the job should END (complete) on anything other than CONTINUE.
<batch:next on="CONTINUE" to="job.download-stuff.process-stuff.step" />
<batch:end on="*" />
First, be very careful of ending on *. In your scenario, it is causing you to finish your job (with a "success") for an ExitCode of FAILED. Also, the default ExitCode for a successful step is COMPLETED, not CONTINUE, so be careful there.
<!-- nothing to me indicates you'd get CONTINUE here, so I changed it -->
<batch:next on="COMPLETED" to="job.download-stuff.process-stuff.step" />
<!-- if you ever have reason to stop here -->
<batch:end on="END" />
<!-- always fail on anything unexpected -->
<batch:fail on="*" />

How to have a Scheduler at the parent job for all child jobs?

The situation is as follows. I want to have a parent job with some common properties, an ExecutionListener and a Scheduler. There could be many child jobs that extend from my parent job. Now the Scheduler at the parent needs to read all the child jobIds, pick-up the corresponding cron expressions from a DB and execute/schedule the jobs. Something of the sort:
<job id="job1">
<step id="step1">
<tasklet><bean id="some bean"/></tasklet>
</step>
</job>
<bean id="myjob1" parent="parentJob">
<property name="job" value="job1"/>
<property name="jobId" value=123/>
</bean>
Similarly, there could be more jobs extending "parentJob". Now at the "parentJob" I am trying to do something as follows:
scheduler = new ThreadPoolTaskScheduler();
scheduler.setPoolSize(5);
scheduler.schedule(new TriggerTask(), new Cron(some expr)
The challenge at hand is, the child jobIds are getting lost. At most the last child's jobId is getting picked up but not the others. NOTE: new TriggerTask() is an inner class that implements 'Runnable'.
Somehow I think I am messing up something bad with threads.
Could someone please assist or provide some directions on how this could be achieved?
Thanks

Spring Batch Partitioning - reuse of JMS Channels?

I am writing a Spring Batch job that is composed of 4 independent steps and would like to distribute the work over the nodes of the cluster. I was thinking about using a flow to break the job into 4 jobs that execute in parallel. Each of the 4 jobs would be configured to run as a single partition. It appears to work (not fully tested in a cluster) but requires a definition of separate PartitionHandlers, Request and Response Channels, and Outbound Gateway.
Can any of these entities be reused across partitioned steps?
Any other suggested approaches ?
For parallel processing, I can advice this doc
Ex:
<job id="parallelJobExample">
<split id="parallelProcessingExample" task-executor="taskExecutor">
<flow>
<step id="step1" parent="independetJob1"/>
</flow>
<flow>
<step id="step2" parent="independetJob2"/>
</flow>
</split>
</job>
<step id="independetJob1">
<tasklet>
<chunk reader="parallelReader1" processor="parallelProcessor1" writer="parallelWriter1" commit-interval="1000"/>
</tasklet>
</step>
<step id="independetJob2">
<tasklet>
<chunk reader="parallelReader2" processor="parallelProcessor2" writer="parallelWriter2" commit-interval="1000"/>
</tasklet>
</step>
İf you need jms example, I could also provide.

Resources