Spring batch - stop job running in different JRE - spring

Background:
I created a Stop Job which finds running jobs with the specified name as this:
jobExplorer.findRunningJobExecutions("job_A")
and then, for each execution of job_A it calls:
jobOperator.stop(execution.getId());
Issue
when i call the above stop() method; even though it eventually accomplishes what i want, but it still throws an exception:
WARN o.s.b.c.l.s.SimpleJobOperator [main] Cannot find Job object
org.springframework.batch.core.launch.NoSuchJobException: No job configuration with the name [job_A] was registered
at org.springframework.batch.core.configuration.support.MapJobRegistry.getJob(MapJobRegistry.java:66) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobOperator.stop(SimpleJobOperator.java:403) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
The Cause
This happens when the stop() method tries to located job_A in the JobRegistry.
So even though job_A was found in the "JobRepository" because the repository looks in the database, it was not found in the "JobRegistry" which is a local cache of job beans created within its runtime environment, since job_A is running withing a different runtime instance it was not registered and threw and error.
Concern
Even though job 'A' stops i am still concerned what i have missed because of the exception.
I have searched this issue and found only general answers on how to stop a job, however i did not find anyone explaining how to stop a running job of another runtime.
Any answers would be greatly appreciated.

The JobOperator isn't intended to orchestrate distributed batch environments like you're attempting to do. You really have two options:
Use the JobRepository directly - The part that causes the job to stop successfully in the remote JVM is that the JobRepository is updated and the running job in the other JVM knows to check that periodically. Instead of using the JobOperator to accomplish this, just use the JobRepository directly to make the update.
Use a more robust orchestration tool like Spring Cloud Data Flow - This kind of orchestration (deploying, starting, stopping, etc) for jobs (via Spring Cloud Task) is what Spring Cloud Data Flow is for.
You can read more about Spring Cloud Data Flow at the website: https://cloud.spring.io/spring-cloud-dataflow/

In addition to what Michael mentioned you can solve this by adding some interface to the application allowing you to pass commands to start or stop your job. Something like a webservice. Exposing an end-point to stop it. Now the catch in this case is handling in clustered system may be a bit tricky.

Related

How can Spring Batch ItemReader be made more resilient via Retries

I'm developing a migration app to migrate data from one database to another database, with different schema's - so we have some processors in between. Currently, we're using the JdbcCursorItemReader in our steps. I'm trying to avoid a temporary network issue resulting in a failed job, hours into a migration job.
I tried extending the JdbcCursorItemReader and overriding the 'open(ExecutionContext ec)' and also the 'read()' methods and annotating them with #Retryable. However, when an exception is thrown in either the open or read methods, the job fails - the exception is not caught and retry is not triggered.
I'm now starting to wonder if the JdbcCursorItemReader can encounter transient errors, which would need to be retried. As I understand it, a long running connection is opened and the results are streamed. Am I wasting my time trying to make the JdbcCursorItemReader retryable? Is it even possible?
If I used the JdbcPagingItemReader, could I make its read() method retryable?
I'm not too experienced with Spring Batch, any guidance on making the reader more resilient would be greatly appreciated!
Regards,
David
The retry policy of a fault-tolerant chunk-oriented step is not applied to the item reader, see Retry not working with Spring Batch with Java Config and Spring Batch: Using SkipPolicy and RetryPolicy with SpringBoot. The reason for that is because the contract of an item reader is "forward only", there is no way (in the current interface) to go back to a previous position in case of a retry.
So the path you are exploring is the way to go. You can add retry capabilities to your custom reader either declaratively with an annotation or programmatically with a RetryTemplate.

How to debug spring boot application not starting

Spring lists SO as the only place to ask questions on their community page, which is why I ask this rather generic question here. It may not be the best fit for SO, but, according to Spring's community overview page, there's no other adequate place to ask such questions.
I have a spring boot application built on spring cloud gateway (version 2) which also uses an embedded hazelcast cluster. It runs in multiple instances, which communicate via hazelcast. Everything works fine, except under heavy load. If one instance fails, restarting it is no longer possible.
When the instance is restarted while the cluster of instances is under heavy load, it will start creating and wiring beans, up to some point, after which it will not do anything spring-related anymore. Hazelcast-generated messages are visible in the log (with root log level DEBUG), past that point, but nothing generated by spring or the application itself.
In order to restart that one instance that failed, I need to stop the load generation, wait some 10-15 minutes, then restart the failed instance. Then the new/restarted instance starts up rather quickly, with no problems at all.
The load consists of http requests which get proxied to another application, and is of such nature that it generates a lot of read accesses to hazelcast's distributed storage, but very few writes.
My problem: I have no idea how to debug this. Since the http endpoint never becomes available, there's no way I can query metrics or other actuator information.
So my question is: what tools or mechanisms can I employ to debug this problem? I.e. how can I find out exactly how the boot sequence under heavy load of the other instances of the hazelcast cluster differs from the boot sequence when there is no load at all in the cluster? Once I have this information, the problem is narrowed down enough for me to investigate it further on my own.
I didn't find a way to debug the problem, but had an idea of what might cause it, tried it, and it was a fix.
My application was running as a Kubernetes deployment. A few beans inside the application were relying on a usable CP subsystem during their initialization. Spring's bean initialization process is by necessity sequential and blocking, to account for inter-bean dependencies.
I hypothesized that under heavy load, for whatever reason, the initialization of those beans was blocking forever. As a first experiment, I made that initialization code async, so that Spring can finish bean wiring, even if, until that async part finished too, the instance was unable to perform usable work, to see if that was the problem, at least.
To my surprise, that fully fixed the problem. This way, Spring finished bean wiring, the HZ-dependant initialization also finished rather quickly, when executed async, even under high load, and the instance became usable soon after being started.
I didn't have the time to dig deeper to find out what the precise failure mechanism was. What I believe might have been the problem is the interaction between HZ and K8s. K8s-based discovery works using a K8S service. A pod/instance isn't added to the service until it becomes healthy. If a bean inside the application prevents initialization, the instance is never added to the service. As such, discovery never finds the new/restarted instance. I don't know what effect this might have on the HZ cluster's inner workings.

Spring task:scheduled or #Scheduler to restrict a Job to run in multiple instance

I have one #Scheduler job which runs on multiple servers in a clustered environment. However I want to restrict the job to run in only in one server and other servers should not run the same job once any other server has started it .
I have explored Spring Batch has lock mechanism using some Database table , but looking for any a solution only in spring task:scheduler.
I had the same problem and the solution what I implemented was a the Lock mechanism with Hazelcast and to made it easy to use I also added a proper annotation and a bit of spring AOP for that. So with this trick I was able to enforce a single schedule over the cluster done with a single annotation.
Spring Batch has this nice functionality that it would not run the job with same job arguments twice.
You can use this feature so that when a spring batch job kicks start in another server it does not run.
Usually people pass a timestamp as argument so it will by pass this logic, which you can change it.

Spring Integration Invoking Spring Batch

Just looking for some information if others have solved this pattern. I want to use Spring Integration and Spring Batch together. Both of these are SpringBoot applications and ideally I'd like to keep them and their respective configuration separated, so they are both their own executable jar. I'm having problems executing them in their own process space and I believe I want, unless someone can convince me otherwise, each to run like they are their own Spring Boot app and initialize themselves with their own profiles and properties. What I'm having trouble with though is the invocation of the job in my SpringBatch project from my SpringIntegration project. At first I couldn't get the properties loaded from the batch project, so I realized I need to pass the spring.active.profiles as a Job Parameter and that seemed to solve that. But there are other things in the Spring Boot Batch application that aren't loading correctly like the schema-platform.sql file and the database isn't getting initialized, etc.
On this initial launch of the job I might want the response to go back to Spring Integration for some messaging on Job Status. There might be times when I want to run a job without Spring Integration kicking off the job, but still take advantage of sending statuses back to the Spring Integration project providing its listening on a channel or something.
I've reviewed quite a few Spring samples and have yet to find my exact scenario, most are with the two dependencies in the same project, so maybe I'm doing something that's not possible, but I'm sure I'm just missing a little something in the Spring configuration.
My questions/issues are:
I don't want the Spring Integration project to know anything about the SpringBatch configuration other than the job its kicking off. I have found a good way to do that reference to the Job Bean without getting my entire batch configuration loading.
Should I keep these two projects separated or would it be better to combine them since I have two-way communication between both.
How should the Job be launch from the integration project. We're using the spring-batch-integration project with JobLaunchRequest and JobLauncher. This seems to run it in the same process as the Spring Integration project and I'm missing a lot of my SpringBootBatch projects initialization
Should I be using a CommandLineRunner instead to force it to another process.
Is SpringApplication.run(BatchConfiguration.class) the answer?
Looking for some general project configuration setup to meet these requirements.
Spring Cloud Data Flow in combination with Spring Cloud Task does exactly what you're asking. It launches Spring Cloud Task applications (which can contain batch jobs) as new processes on the platform you choose. I'd encourage you to check out that project here: http://cloud.spring.io/spring-cloud-dataflow/

What's the point of Spring Batch's JobLocator?

I've created a Spring Batch application (together with Spring Boot). Configuring a Job works well, and each Job is executed at startup. The job configurations also show up in the database as expected.
To launch a Job with parameters, there are two options:
Inject the Job as it is a Java Bean - works great
Use JobLocator to get a instance of the Job - seen in several tutorials, but does not work ("No job configuration with the name [doSomethingJob] was registered")
So my question is: What's the point of a JobLocator if one can easily inject jobs directly?
The JobLocator isn't for injection of jobs. It's to locate an instance to execute. If you have something that will be executing jobs dynamically (not knowing what job it will need to execute), you'll want to use a JobLocator. An example of this is in Spring Batch Admin. There, a JobLocator is used from within the JobService to get the requested Job to launch.
Wiring a Job instance into your class works well when it's predetermined what job you'll be running. However, if you don't, the JobLocator is the way to go.

Resources