Spring Batch Step does not execute - spring

I'm trying to fix a problem in Spring Batch that has been plaguing our system recently. We have a job that, for the most part works just fine. It's a multi-step job that downloads and processes data.
The problem is that sometimes the job will bomb out. Maybe the server we're trying to connect to throws an error or we shut down the server in the middle of the job. At this point the next time our quartz scheduler tries to run the job it doesn't seem to do anything. Below is an abridged version of this job definition:
<batch:job id="job.download-stuff" restartable="true">
<batch:validator ref="downloadValidator"/>
<batch:step id="job.download-stuff.download">
<batch:tasklet ref="salesChannelOrderDownloader" transaction-manager="transactionManager">
<batch:transaction-attributes isolation="READ_UNCOMMITTED" propagation="NOT_SUPPORTED"/>
<batch:listeners>
<batch:listener ref="downloadListener"/>
<batch:listener ref="loggingContextStepListener" />
</batch:listeners>
</batch:tasklet>
<batch:next on="CONTINUE" to="job.download-stuff.process-stuff.step" />
<batch:end on="*" />
</batch:step>
<batch:step id="job.download-stuff.process-stuff.step">
...
</batch:step>
<batch:listeners>
<batch:listener ref="loggingContextJobListener"/>
</batch:listeners>
Once it gets into this state, the downloadValidator runs, but it never makes it into the first step download-stuff.download. I set a breakpoint in the tasklet and it never makes it inside.
If I clear out all of the spring batch tables, which are stored in our mysql database, and restart the server it will begin working again, but I'd rather understand what prevents it from operating correctly at this point rather than employ scorched earth tactics to get the job running.
I'm a novice at Spring Batch, to put it mildly, so forgive me if I am omitting important details. I've set breakpoints and turned on logging to learn what I can.
What I have observed so far from going through the database is that entries appear to no longer be written to the BATCH_STEP_EXECUTION and BATCH_JOB_EXECUTION tables.
There are no BATCH_JOB_EXECUTION entries for the job that are not in COMPLETED status and no BATCH_STEP_EXECUTION entries that are not in COMPLETED
You'll see that there is a batch:validator defined, I've confirmed that spring batch calls that validator and that it makes it through successfully (set a breakpoint and stepped through). The first step does not get executed.
Neither the loggingContextJobListener nor the loggingContextStepListener seem to fire either. What could be causing this?
UPDATE
I took a closer look at the downloadListener added as a batch:listener. Here's the source code of afterStep:
#Override
#Transactional(propagation = Propagation.REQUIRES_NEW)
public ExitStatus afterStep(StepExecution stepExecution) {
long runSeconds = TimeUnit.NANOSECONDS.toSeconds(System.nanoTime() - nanoStart);
// If Success - we're good
if (stepExecution.getStatus() == BatchStatus.COMPLETED) {
Long endTs = stepExecution.getExecutionContext().getLong("toTime");
Date toTime = new Date(endTs);
handleSuccess(toTime, stepExecution.getWriteCount());
return null;
}
// Otherwise - record errors
List<Throwable> failures = stepExecution.getFailureExceptions();
handleError(failures);
return ExitStatus.FAILED;
}
I confirmed that the return ExitStatus.FAILED line executes and that the exception that was thrown is logged in the failureExceptions. It seems like once that happens the BATCH_JOB_EXECUTION entry is in COMPLETED status (and exit_code) and the STEP_EXECUTION is failed.
At this point, the entries in the BATCH_JOB_EXECUTION_PARAMS table remain. I actually tried modifying the values of their KEY_NAME and value columns, but this still didn't allow the job to run. As long as there are parameters tied to a JOB_EXECUTION_ID, another job belonging to the same BATCH_JOB_INSTANCE cannot run.
Once I remove the entries in BATCH_JOB_EXECUTION_PARAMS for that specific JOB_EXECUTION_ID, another BATCH_JOB_EXECUTION can run, even though all the BATCH_JOB_EXECUTION entries are in a completed state.
So I guess I have two questions- is that the correct behavior? And if so, what is preventing the BATCH_JOB_EXECUTION_PARAMS from being removed and how do I remove them?

Had the same issue, during test/debug process I kept job name and parameters same, make sure you are changing job name or job parameters to get different JobExecution

The JobParametersValidator, in your case the downloadValidator bean runs before the job kicks off.
What's happening in your case is the parameters you're passing the job are the same as that "blown up" JobInstance. However, because that job failed in dramatic fashion, it probably wasn't put into a failed status.
You can either run the job with different parameters (to get a new job instance) or try updating the status of the former step/job to FAILED in BATCH_STEP_EXECUTION or BATCH_JOB_EXECUTION before restarting.
UPDATE (new info added to question)
You must be carefull of your job flow here. Yes your step failed, but your context file indicates that the job should END (complete) on anything other than CONTINUE.
<batch:next on="CONTINUE" to="job.download-stuff.process-stuff.step" />
<batch:end on="*" />
First, be very careful of ending on *. In your scenario, it is causing you to finish your job (with a "success") for an ExitCode of FAILED. Also, the default ExitCode for a successful step is COMPLETED, not CONTINUE, so be careful there.
<!-- nothing to me indicates you'd get CONTINUE here, so I changed it -->
<batch:next on="COMPLETED" to="job.download-stuff.process-stuff.step" />
<!-- if you ever have reason to stop here -->
<batch:end on="END" />
<!-- always fail on anything unexpected -->
<batch:fail on="*" />

Related

Determine the end of a cyclic workflow in spring integration (inbound-channel => service-activator)

We have the following simple int-jpa based workflow:
[inbound-channel-adapter] -> [service-activator]
The config is like this:
<int:channel id="inChannel"> <int:queue/> </int:channel>
<int:channel id="outChannel"> <int:queue/> </int:channel>
<int-jpa:inbound-channel-adapter id="inChannelAdapter" channel="inChannel"
jpa-query="SOME_COMPLEX_POLLING_QUERY"
max-results="2">
<int:poller max-messages-per-poll="2" fixed-rate="20" >
<int:advice-chain synchronization-factory="txSyncFactory" >
<tx:advice transaction-manager="transactionManager" >
<tx:attributes>
<tx:method name="*" timeout="30000" />
</tx:attributes>
</tx:advice>
<int:ref bean="pollerAdvice"/>
</int:advice-chain>
</int-jpa:inbound-channel-adapter>
<int:service-activator input-channel="inChannel" ref="myActivator"
method="pollEntry" output-channel="outChannel" />
<bean id="myActivator" class="com.company.myActivator" />
<bean id="pollerAdvice" class="com.company.myPollerAdvice" />
The entry point for processing is a constantly growing table against which the SOME_COMPLEX_POLLING_QUERY is run. The current flow is :
[Thread-1] The SOME_COMPLEX_POLLING_QUERY will only return entries that has busy set to false (we set busy to true as soon as polling is done using txSyncFactory)
[Thread-2] These entries will pass through the myActivator where it might take anywhere from 1 min to 30 mins.
[Thread-2] Once the processing is done, we set back the busy from true to false
Problem: We need to trigger a notification even when the processing of all the entries that were present in the table is done.
Approach tried: We used the afterReturning of pollerAdvice to find out if the SOME_COMPLEX_POLLING_QUERY returned any results or not. However this method will start returning "No Entries" way before the Thread-2 is done processing all the entries.
Note:
The same entries will be processes again after 24hrs. But this time it will have more entries.
We are not using outbound-channel-adapter, since we dont have any requirement for it. However, we are open to use it, if that is a part of the solution proposed.
Not sure if that will work for you, but since you still need to wait with the notification until Thread-2, I would suggest to have some AtomicBoolean bean. In the mentioned afterReturning(), when there is no data polled from the DB, you just change the state of the AtomicBoolean to true. When the Thread-2 finishes its work, it can call <filter> to check the state of the AtomicBoolean and then really perform an <int-event:outbound-channel-adapter> to emit a notification event.
So, the final decision to emit event or not is definitely done from the Thread-2, not polling channel adapter.

Is there a way to set a timeout for the commit-interval on a spring batch job?

We have data streaming in on an irregular basis and in quantities that I can not predict. I currently have the commit-interval set to 1 because we want data to be written as soon as we receive it. We sometimes get large numbers of items at a time (~1000-50000 items in a second) which I would like to commit in larger chunks as it takes awhile to write these individually. Is there way to set a timeout on the commit-interval?
Goal: We set the commit-interval to 10000, we get 9900 items and after 1 second it commits the 9900 items rather then waiting until it receives 100 more.
Currently, when we set the commit-interval greater than 1, we just see data waiting to be written until it hits the amount specified by the commit-interval.
How is your data streaming in? Is it being loaded to a work table? Added to a queue? Typically you'd just drain the work table or queue with whatever commit interval performs best then re-run the job periodically to check if a new batch of inbound records has been received.
Either way, I would typically leverage flow control to have your job loop and just process as many records as are ready to be processed for a given time interval:
<job id="job">
<decision id="decision" decider="decider">
<next on="PROCESS" to="processStep" />
<next on="DECIDE" to="decision" />
<end on="COMPLETED" />
<fail on="*" />
</decision>
<step id="processStep">
<!-- your step here -->
</step>
</job>
<beans:bean id="decider" class="com.package.MyDecider"/>
Then your decider would do something like this:
if (maxTimeReached) {
return END;
}
if (hasRecords) {
return PROCESS;
} else {
wait X seconds;
return DECIDE;
}

Drools loading session seems to fire rules

I am at a loss with this and can't seem to find an answer in the docs. I am observing the following behaviour. I have this rule:
import function util.CSVParser.parse;
declare Passenger
#role(event)
#expires(24h)
end
rule "Parse and Insert CSV"
when
CSVReadyEvent( $csv_location : reader ) from entry-point "CSVReadyEntryPoint";
$p : Passenger() from parse($csv_location);
then
insert( $p );
end
I can then enter my CSVReadyEvent into my session and call fireAllRules and it executes correctly. It hits the safe point at the end, and all is cool.
I then restart my app and load the session like this:
KieSession loadedKieSession = kieServices.getKieService().getStoreServices().loadKieSession(session.getId(), kieBase, ksConf, kieServices.getEnvironment());
The base and config I take from my kmodule.xml.
What happens now is that WITHOUT calling fireAllRules() loading the session somehow triggers fireing all rules.
I do not understand how unmarshalling triggers rule execution but this is obviously wrong. I have already executed that rule, and it should not be executed twice.
In a test case (my tests do NOT create persistent sessions because I only want the rules to be tested) I can call fireAllRules() twice, and the second time does not trigger any matched rules. I am not exactly sure what goes wrong, but the persistent session seems to be loaded in an odd way. Or the persisting of the session is wonky and forgets that it had executed the rule already.
Does anyone have inside in this? I am more than happy to share any code.
Here's my persistence.xml:
<persistence-unit name="org.jbpm.persistence.jpa" transaction-type="JTA">
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<class>org.drools.persistence.info.SessionInfo</class>
<class>org.drools.persistence.info.WorkItemInfo</class>
<exclude-unlisted-classes>true</exclude-unlisted-classes>
<properties>
<property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect" />
<property name="hibernate.max_fetch_depth" value="30" />
<property name="hibernate.hbm2ddl.auto" value="update" />
<property name="hibernate.show_sql" value="true" />
<property name="hibernate.transaction.jta.platform" value="org.hibernate.service.jta.platform.internal.JBossStandAloneJtaPlatform" />
</properties>
</persistence-unit>
Thanks!
An update/answer from a painful painful painful day of debugging and testing and running stuff:
I suspected my hibernate setup was wrong, so the wrong thing got persisted. I ended up throwing that approach away and writing a manual marshalling/de-marshalling thing.
After creating/loading/recreating/loading I can confirm the session NEVER changes on file.
This was interesting to me because I could swear that the rules are executed, and I was half right:
The WHEN part is executed when the session is loaded. Why? I have not the slightest idea...
I was chasing a red hearing because I am calling a function in my when part (as you can see in the rule) to iterate and insert all facts based on that event I am receiving.
My parse function obviously has logging, so each time I reload the session, I get a storm of log flying through my terminal hinting that my rules are being executed.
I then changed my rules to be very very specific (as in output everywhere I possible can). I debugged as deep as I could and I still can't seem to be able to pinpoint as to why on earth recreating the session is executing the when part of a rule. I settled on this: Magic. And with a little more detail:
The documentation of drools persistence https://docs.jboss.org/jbpm/v6.2/userguide/jBPMPersistence.html states that the guys implemented their own serialze/deserialize strategy in order to speed up the process. I resolve to blame this custom strategy on what I am seeing.
Lesson learned:
Do NOT create objects in the when part (because this will slow you down when loading a session since all when parts are executed)
Chasing red herrings is a pain in my butt.
So to sum up: I believe (up to say 99%) that loading a session is NOT executing the rules.
Using events in real mode and in a STREAM session running due to fireUntilHalt on the one hand and saving and restarting sessions with fireAllRules are somewhat contradictory paradigms.
If you have events, I suggest that you use the API to set up and start a (stateful) session in a thread, and insert facts (events) as they arrive.

How to have a Scheduler at the parent job for all child jobs?

The situation is as follows. I want to have a parent job with some common properties, an ExecutionListener and a Scheduler. There could be many child jobs that extend from my parent job. Now the Scheduler at the parent needs to read all the child jobIds, pick-up the corresponding cron expressions from a DB and execute/schedule the jobs. Something of the sort:
<job id="job1">
<step id="step1">
<tasklet><bean id="some bean"/></tasklet>
</step>
</job>
<bean id="myjob1" parent="parentJob">
<property name="job" value="job1"/>
<property name="jobId" value=123/>
</bean>
Similarly, there could be more jobs extending "parentJob". Now at the "parentJob" I am trying to do something as follows:
scheduler = new ThreadPoolTaskScheduler();
scheduler.setPoolSize(5);
scheduler.schedule(new TriggerTask(), new Cron(some expr)
The challenge at hand is, the child jobIds are getting lost. At most the last child's jobId is getting picked up but not the others. NOTE: new TriggerTask() is an inner class that implements 'Runnable'.
Somehow I think I am messing up something bad with threads.
Could someone please assist or provide some directions on how this could be achieved?
Thanks

Spring batch retry-limit

When defining a spring batch job and using retry-limit parameter in xml description is it the total number of runs, or the number of retries?
i.e. when retry-limit=1, will my job run once or twice (in case of an error on the first run)?
This seems like a silly question, but I didn't find a clear answer in any documentation I've seen...
The retry-limit attribute is really "item-based" and not "job-based". By "item-based" I mean that for every item (record/line) that is read/processed/writen, if that item fails, it will be retried up the retry-limit. If that limit is reached, the step will fail.
For example
<step id="someStep">
<tasklet>
<chunk reader="itemReader" writer="itemWriter"
processor="itemProcessor" commit-interval="20"
retry-limit="3">
<retryable-exception-classes>
<include class="org.springframework.exception.SomeException"/>
</retryable-exception-classes>
</chunk>
</tasklet>
</step>
In the above basic step configuration, when a SomeException is thrown by any of the components in the step (itemReader, itemWriter, or itemProcessor), the item is retried up to three times before the step fails.
Here's Spring doc's explanation
In most cases you want an exception to cause either a skip or Step failure. However, not all exceptions are deterministic. If a FlatFileParseException is encountered while reading, it will always be thrown for that record; resetting the ItemReader will not help. However, for other exceptions, such as a DeadlockLoserDataAccessException, which indicates that the current process has attempted to update a record that another process holds a lock on, waiting and trying again might result in success. In this case, retry should be configured:
<step id="step1">
<tasklet>
<chunk reader="itemReader" writer="itemWriter"
commit-interval="2" retry-limit="3">
<retryable-exception-classes>
<include class="org.springframework.dao.DeadlockLoserDataAccessException"/>
</retryable-exception-classes>
</chunk>
</tasklet>
</step>
The Step allows a limit for the number of times an individual item can be retried, and a list of exceptions that are 'retryable'. More details on how retry works can be found in Chapter 9, Retry.
spring batch job gets failed if any step fails to complete execution without any error or exception.
If any error or exception occurs in any steps the step is defined as failed, with this the job is also defined as failed job.
First of all, if you want to restart a job you need to make sure that the job is defined as restart-able. Otherwise
you can not run the same job again. More over a job is restart-able only and only if it was failed in the previous
attempt. Once it is finished successfully you can not restart a job even if it is declared as restart-able, Yes you can but the job parameter has to be have different.
the retry-limit attribute defines how many times a failed task/step of a failed job can be retry to start
To use retry-limit you also need to defined on which exception or error it should retry
The retry-limit attribute is really "item-based" and not "job-based". By "item-based" I mean that for every item (record/line) that is read/processed/writen, if that item fails, it will be retried up the retry-limit. If that limit is reached, the step will fail.
For example if retry-limit is set as 2, it will try to execute for twice.

Resources