Recommended approach for parallel spring batch jobs - spring

The Spring Batch Integration documentation explains how to use remote chunking and partitioning for steps, see
http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#externalizing-batch-process-execution
Our jobs do not consist of straightforward reader/processor/writer steps. So we want to simply have whole jobs running in parallel, with each job being farmed out to different partitions.
Is there already a pattern for this in Spring Batch? Or would I need to implement my own JobLauncher to maintain a pool of slaves to launch jobs on?
Cheers,
Menno

Spring Batch specifically takes the position of not handling job orchestration (which your question fundamentally is about). There are a couple of approaches for something like this:
Distributed Scheduler - Most distributed schedulers have the ability to execute tasks on multiple nodes. Quartz has a distributed mode for example.
Using remote partitioning for orchestration - Remote partitioning executes full Spring Batch steps as slaves. There's no reason those steps couldn't be job steps that execute a job.
Message driven job launching - Spring Batch Integration (a child module of Spring Batch) provides the facilities to launch jobs via messages. Another approach would be to have a collection of slaves listening to a queue waiting for a message to launch a job. You'd have to handle things like load balancing between the slaves some way but this is another common approach of handling job orchestration.

Related

Asynchronous Kafka consumer in Spring Batch Application

In our Spring Batch application workers, item processors are further interacting with another service asynchronously through Kafka. The requirement here is we required an acknowledgement in order to retry failed batches and the condition is to not wait for the acknowledgement.
Is there any mechanism in spring batch by which we can asynchronously consume Kafka ?
Is it possible to rerun specific local worker step in rerun of job?
We implement producers and consumers over same step using Spring batch decider. Thus, during the first run it will only produce Kafka and on second run it will consume the Kafka.
We are looking for solution where we can asynchronously consume Kafka in Spring batch application in order to rerun specific worker step.
Is there any mechanism in spring batch by which we can asynchronously consume Kafka ? Is it possible to rerun specific local worker step in rerun of job?
According to your diagram, you are doing that call from an item processor. The closest "feature" you can get from Spring Batch is the AsyncItemProcessor. This is a special processor that processes items asynchronously in a separate thread. The callback is unwrapped in an AsyncItemWriter with the result of the call.
Other than that, I do not see any other obvious way to do that with a built-in feature from Spring Batch. So you would have to manage that in a custom ItemProcessor.

Get status of running worker nodes at regular intervals in spring batch deployer partition handler

I am using deployer partition handler for remote partitioning in spring batch. I want to get the status of each worker node at regular intervals and display it to the user. ( Like heartbeats ). Is there any approach to achieve this ?
This depends on what your workers are doing (simple tasklet or chunk-oriented one) and how they are reporting their progress. Typically, workers share the same job repository as the manager step that launched them, so you should be able to track their StepExecution updates (readCount, writeCount, etc) on that repository using the JobExplorer API.
If you deploy your job on Spring Cloud DataFlow, you can use the Step execution progress endpoint to track the progress of workers.

How to stop jobs from Spring Cloud Data Flow immediately

I have used Spring Cloud Data Flow to control some batch jobs. In SCDF, after I defined some tasks, they were launched as jobs with running status. When I tried to stop a particular job, It did not stop immediately. I have found that the job was still running until it finished it's current step.
For example, My job 'ABC' has 2 steps A and B. In SCDF, I stop job 'ABC" when step A is being executed and job 'ABC' is still running until step A is completed and it do not implement step B.
So, Are there any ways to stop a Job immediately from Spring Cloud Data Flow?
From the Spring Cloud Data Flow, the batch job stop operation is delegated to the Spring Batch API. This means there is nothing Spring Cloud Data Flow offers to stop a batch job immediately as it needs to be handled by the Spring Batch or the job implementation itself.
When a batch job stop request is sent for a Batch Job execution (if it is running), the current step execution is set with the flag terminateOnly to true which means the step execution is ready to be stopped based on the underlying step execution implementation.

Scheduling jobs while consuming Kafka messages

I want build a single Spring Boot application which does multiple different tasks concurrently. I did research on the internet but I could not find any way out. Let me get into detail.
I would like to start jobs in certain intervals for example once a day. I can do it using Spring Quartz. I also would like to listen messages on a dedicated internet address. Messages will come from Apache Kafka platform. Thus, I would like to use Kafka integration for Spring framework.
Is it applicable practically (listening messages all the time and executing scheduled jobs on time)
Functionally speaking, this design is fine: a single Spring Boot app can consume Kafka messages while also executing quartz jobs.
But higher level, you should ask why these two functions belong in a single app. Is there some inherent relationship between the quartz jobs and Kafka messages being consumed? Are you just combining them solely to limit yourself to one app and save on compute/memory resources?
You should also consider the impacts to scalability. What if you need to increase the rate at which you consume Kafka messages? If you scale your app to get more Kafka consumers, you have to worry about multiple apps now firing your quartz jobs.
So yes, it can be done, but without any more detail it sounds like you should break this design into 2 separate applications: one for Quartz and one for Kafka consuming.

How does Spring XD load balance between instances of the same module in different containers

I have read this post but it's not my case and not enough clear:
How does load balancing in Spring XD get done?
I have a composed job with different instances of the same sub-jobs deployed in different containers. My composed job is scheduled to run periodically. I need to know how Spring XD choose the sub-jobs instances to invoke for every new request to the composed job.
The same question for a stream triggered every X minutes.
It's handled by the transport (rabbit, redis).
Each downstream module competes for messages - with rabbit it will generally be round robin; with redis it will be more random.

Resources