Spring Batch running in Kubernetes - spring

I have a Spring Batch that partitions into "Slave Steps" and run in a thread pool, here is the configuration: Spring Batch - FlatFileItemWriter Error 14416: Stream is already closed
I'd like to run this Spring Batch Job in Kubernetes. I checked this post: https://spring.io/blog/2021/01/27/spring-batch-on-kubernetes-efficient-batch-processing-at-scale by #MAHMOUD BEN HASSINE.
From the post, on Paragraph:
Choosing the Right Kubernetes Job Concurrency Policy
As I pointed out earlier, Spring Batch prevents concurrent job executions of the
same job instance. So, if you follow the “Kubernetes job per Spring
Batch job instance” deployment pattern, setting the job’s
spec.parallelism to a value higher than 1 does not make sense, as this
starts two pods in parallel and one of them will certainly fail with a
JobExecutionAlreadyRunningException. However, setting a
spec.parallelism to a value higher than 1 makes perfect sense for a
partitioned job. In this case, partitions can be executed in parallel
pods. Correctly choosing the concurrency policy is tightly related to
which job pattern is chosen (As explained in point 3).
Looking into my Batch Job, if I start 2 or more pods, it sounds like one/more pods will fail because it will try to start the same job. But on the other hand, it sounds like more pods will run in parallel because I am using partitioned job.
My Spring Batch seems to be a similar to https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/
This said, what is the right approach to it? How many pods should I set on my deployment?
Do the partition/threads will run on separate/different pods, or the threads will run in just one pod?
Where do I define that, in the parallelism? And the parallelism, should it be the same as the number of threads?
Thank you! Markus.

A thread runs in a JVM which runs inside container that in turn is run in a Pod. So it does not make sense to talk about having different threads running on different Pods.
The partitioning technique in Spring Batch can be either local (multiple threads within the same JVM where each thread processes a different partition) or remote (multiple JVMs processing different partitions). Local partitioning requires a single JVM, hence you only need one Pod for that. Remote partitioning requires multiple JVMs, so you need multiple Pods.
I have a Spring Batch that partitions into "Slave Steps" and run in a thread pool
Since you implemented local partitioning with a pool of worker threads, you only need one Pod to run your partitioned Job.

Related

Multiple instances of a partitioned spring batch job

I have a Spring batch partitioned job. The job is always started with a unique set of parameters so always a new job.
My remoting fabric is JMS with request/response queues configured for communication between the masters and slaves.
One instance of this partitioned job processes files in a given folder. Master step gets the file names from the folder and submits the file names to the slaves; each slave instance processes one of the files.
Job works fine.
Recently, I started to execute multiple instances (completely separate JVMs) of this job to process files from multiple folders. So I essentially have multiple master steps running but the same set of slaves.
Randomly; I notice the following behavior sometimes - the slaves will finish their work but the master keeps spinning thinking the slaves are still doing something. The step status will show successful in the job repo but at the job level the status is STARTING with an exit code of UNKNOWN.
All masters share the set of request/response queues; one queue for requests and one for responses.
Is this a supported configuration? Can you have multiple master steps sharing the same set of queues running concurrently? Because of the behavior above I'm thinking the responses back from the workers are going to the incorrect master.

Schedule application Spring on Docker cluster

I need to schedule in batch my application Spring in a Docker cluster with different nodes.
I found the solution of set replicas=1 on docker-compose, but in my opinion this isn't the best solution because minimizes the potential of Docker.
Some help or advice? Thank you.
If I understand you correctly you want to run several replicas of a spring application (it does not matter if this is managed by docker, k8s, it is standalone, etc). Then you want a background job to be started only on one single instance. Right? In this case I may advise you to have a look at ShedLock.
ShedLock does one and only one thing. It makes sure your scheduled
tasks are executed at most once at the same time. If a task is being
executed on one node, it acquires a lock which prevents execution of
the same task from another node (or thread). Please note, that if one
task is already being executed on one node, execution on other nodes
does not wait, it is simply skipped.
It integrates smoothly in Spring. For example a scheduled batch job may look like this:
#Scheduled(cron = ...)
#SchedulerLock(name = "scheduledTaskName")
public void scheduledTask() {
// do something
}
Various options may be used under the hood to implement a distributed lock, e.g. MySQL, Redis, Zookeeper and others.

Spring Batch in clustered environment, high-availability

Right now I use H2 in-memory database as JobRepostiry for my single node Spring Batch/Boot application.
Now I would like to run Spring Batch application on two nodes in order to increase performance (distribute jobs between these 2 instances) and made the application more failover.
Instead of H2 I'm going to use PostgreSQL and configure both of the applications to use this shared database. Is that enough for Sring Batch in order to start working properly in the cluster and start distributing jobs between cluster nodes or do I need to perform some additional actions?
Depending on how you will distribute your jobs across the nodes, you might need to setup a communication middleware (such a JMS or AMQP provider) in addition to a shared job repository.
For example, if you use remote partitioning, your job will be partitioned and each worker can be run on one node. In this case, the job repository must be shared in order for:
the workers to report their progress to the job repository
the master to poll the job repository for workers statuses.
If your jobs are completely independent and you don't need feature like restart, you can continue using an in-memory database for each job and launch multiple instances of the same job on different nodes. But even in this case, I would recommend using a production grade job repository instead of an in-memory database. Things can go wrong very quickly in a clustered environment and having a job repository to store the execution status, synchronize executions, restart failed executions, etc is crucial in such an environment.

Spring Scheduler code within an App with multiple instances with multiple JVMs

I have a spring scheduler task configured with either of fixedDelay or cron, and have multiple instances of this app running on multiple JVMs.
The default behavior is all the instances are executing the scheduler task.
Is there a way by which we can control this behavior so that only one instance will execute the scheduler task and others don't.
Please let me know if you know any approaches.
Thank you
We had similar problem. We fixed it like this:
Removed all #Scheduled beans from our Spring Boot services.
Created AWS Lambda function scheduled with desired schedule.
Lambda function hits our top level domain with scheduling request.
Load balancer forwards this request to one of the service instances.
This way we are sure that scheduled task is executed only once across the cluster of our services.
I have faced similar problem where same scheduled batch job was running on two server where it was intended to be running on one node at a time. But later on I found a solution to not to execute the job if it is already running on other server.
Job someJob = ...
Set<JobExecution> jobs = jobExplorer.findRunningJobExecutions("someJobName");
if (jobs == null || jobs.isEmpty()) {
jobLauncher.run(someJob, jobParametersBuilder.toJobParameters());
}
}
So before launching the job, a check is needed if the job is already in execution on other node.
Please note that this approach will work only with DB based job repository.
We had the same problem our three instance were running same job and doing the tasks three times every day. We solved it by making use of Spring batch. Spring batch can have only unique job id so if you start the job with a job id like date it will restricts duplicate jobs to start with same id. In our case we used date like '2020-1-1' (since it runs only once a day) . All three instance tries to start the job with id '2020-1-1' but spring rejects two duplicate job stating already job '2020-1-1' is running.
If my understanding is correct on your question, that you want to run this scheduled job on a single instance, then i think you should look at ShedLock
ShedLock makes sure that your scheduled tasks are executed at most once at the same time. If a task is being executed on one node, it acquires a lock which prevents execution of the same task from another node (or thread). Please note, that if one task is already being executed on one node, execution on other nodes does not wait, it is simply skipped.

How to configure/run Java Batch partitions in multi JVM in WebSphere LP

In WebSphere LP Java Batch, I have divided my job in 4 partitions through job.xml configuration, So when the job executes on server 4 threads runs on single jvm to complete the job. Now I want to run the partitions on 2 jvm.
Lets say 2 partitions will run on server-1 and 2 partitions will run on server-2.
Does someone tried something on this to run partitions in multiple jvm through configuration or any thoughts would be welcome.
You need some extra configuration in the server running the job to allow it to send messages for the job partitions rather than just spin off threads to run them. And you need other servers configured to get those messages and execute the partitions.
If you are using MQ then information about this configuration can be found here: http://www.ibm.com/support/knowledgecenter/SS7K4U_liberty/com.ibm.websphere.wlp.zseries.doc/ae/twlp_batch_multipartitionsmq.html
If you are using the Liberty embedded messaging provider then look here: http://www.ibm.com/support/knowledgecenter/SS7K4U_liberty/com.ibm.websphere.wlp.zseries.doc/ae/twlp_batch_multipartitionsembed.html
The parent document for those two has links to other information about configuring and running Liberty Batch jobs.

Resources