Is there a way to auto-resume batch jobs in spring batch? - heroku

I am trying to run a spring batch job in Heroku dyno. Let's say the job contains 2 steps.
If the Heroku dyno recycles (read as JVM shuts down) while step 2 is in progress, is there a way to start from step 2 the next time JVM comes up?

Related

Spring Batch running in Kubernetes

I have a Spring Batch that partitions into "Slave Steps" and run in a thread pool, here is the configuration: Spring Batch - FlatFileItemWriter Error 14416: Stream is already closed
I'd like to run this Spring Batch Job in Kubernetes. I checked this post: https://spring.io/blog/2021/01/27/spring-batch-on-kubernetes-efficient-batch-processing-at-scale by #MAHMOUD BEN HASSINE.
From the post, on Paragraph:
Choosing the Right Kubernetes Job Concurrency Policy
As I pointed out earlier, Spring Batch prevents concurrent job executions of the
same job instance. So, if you follow the “Kubernetes job per Spring
Batch job instance” deployment pattern, setting the job’s
spec.parallelism to a value higher than 1 does not make sense, as this
starts two pods in parallel and one of them will certainly fail with a
JobExecutionAlreadyRunningException. However, setting a
spec.parallelism to a value higher than 1 makes perfect sense for a
partitioned job. In this case, partitions can be executed in parallel
pods. Correctly choosing the concurrency policy is tightly related to
which job pattern is chosen (As explained in point 3).
Looking into my Batch Job, if I start 2 or more pods, it sounds like one/more pods will fail because it will try to start the same job. But on the other hand, it sounds like more pods will run in parallel because I am using partitioned job.
My Spring Batch seems to be a similar to https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/
This said, what is the right approach to it? How many pods should I set on my deployment?
Do the partition/threads will run on separate/different pods, or the threads will run in just one pod?
Where do I define that, in the parallelism? And the parallelism, should it be the same as the number of threads?
Thank you! Markus.
A thread runs in a JVM which runs inside container that in turn is run in a Pod. So it does not make sense to talk about having different threads running on different Pods.
The partitioning technique in Spring Batch can be either local (multiple threads within the same JVM where each thread processes a different partition) or remote (multiple JVMs processing different partitions). Local partitioning requires a single JVM, hence you only need one Pod for that. Remote partitioning requires multiple JVMs, so you need multiple Pods.
I have a Spring Batch that partitions into "Slave Steps" and run in a thread pool
Since you implemented local partitioning with a pool of worker threads, you only need one Pod to run your partitioned Job.

How to run Spring boot cron job at different time interval when same code is deployed on 2 diff servers who are started and running at same time?

We are going to deploy same code on two different servers which will be started and running at the same time but the problem is both servers will be started at the same time will run cron job at the same time, then the cron job process will run by 2 servers which will be duplicated. So I want to start cron job on both servers should start at a different initial time so that 1 server can read some rows from DB and finish its task and change the status of the same row which is processed, so that other server can read only new entries.

How to know whether a Spring Batch job is running for the 1st time

Requirement:
A Spring batch (designed using Spring Boot) job is designed to take data from a database every 5 hours and is scheduled to run accordingly. An added requirement is When the job will run for the very 1st time, it should fetch data of 4 months from the database.
Problem:
How to know that the Spring Batch job is running for the 1st time. Is there any parameter in Spring Batch that can help us know whether its running for the 1st time.
As far as I'm aware Spring scheduler does not have a way to identify first time jobs(please comment/edit if im wrong). You can schedule two jobs, a single one time job that fetches data for 4 months and then the recurrant normal tasks in another job.

How to deploy laravel into a docker container while there are jobs running

We are trying to migrate our laravel setup to use docker. Dockerizing the laravel app was straight forward however we ran into an issue where if do a deployment while scheduled jobs are running they would be killed since the container is destroyed. Whats the best practice here? Having a separate container to run the laravel scheduler doesnt seem like it would solve the problem.
Run the scheduled job in a different container so you can scale it independently of the laravel app.
Run multiple containers of the scheduled job so you can stop some to upgrade them while the old ones will continue processing jobs.
Docker will send a SIGTERM signal to the container and wait for the container to exit cleanly before issuing SIGKILL (the time between the two signals is configurable, 10 seconds by default). This will allow to finish your current job cleanly (or save a checkpoint to continue later).
The plan is to stop old containers and start new containers gradually so there aren't lost jobs or downtime. If you use an orchestrator like Docker Swarm or Kubernetes, they will handle most of these logistics for you.
Note: the laravel scheduler is based on cron and will fire processes that will be killed by docker. To prevent this have the scheduler add a job to a laravel queue. The queue is a foreground process and it will be given the chance to stop/save cleanly by the SIGTERM that it will receive before being killed.

Spark Launcher Jobs not starting because of token cant be found in cache after 24 hours

I have a Java Application, which runs continuously and checks a table in database for new records. When a New record is added in the table, the Java application do a unzip file and puts into HDFS location and then a Spark Job gets triggered(I am pro-grammatically triggering the Spark Job using 'SparkLauncher" class inside the Java Application), which does the processing for newly added file in HDFS location.
I have scheduled the Java Application in cluster using Oozie Java Action.
The cluster is HDP kerberized cluster.
The Job is working perfectly fine for 24 hours. All the unzip happens and spark job is running.
But after 24 hours the unzip happens in Java Application but the Spark Job is not get triggered in Resource Manager.
Exception : Exception encountered while connecting to the server :INFO: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (owner=****, renewer=oozie mr token, realUser=oozie, issueDate=1498798762481, maxDate=1499403562481, sequenceNumber=36550, masterKeyId=619) can't be found in cache
As per my understanding, after 24 hours oozie is renewing the token, and that token is not getting updated for the Spark launcher Job. The spark Launcher is still looking for the older Token which is not available in cache.
Please help me, how I can make Spark Launcher to look for the new-token.
As per my understanding, after 24 hours oozie is renewing the token
Why? Can you point to any documentation, source code, blog?
Remember that Oozie is a scheduler for batch jobs, and its canonical use case (at Yahoo!) is for triggering hourly jobs.
Only a pathological batch job would run for more than 24h, therefore renewal of the Hadoop delegation token is not really useful in Oozie.
But your Java thing acts as a service, running continuously, and needing automatic restart if it ever crashes. So you should consider...
either Slider, if you really want to run it inside YARN (although there
are many, many drawbacks -- how do you inspect the
logs of a running YARN job? how can you make sure that the app starts on time and is not delayed by a lack of resources? how can you make sure that your app will not be killed because YARN needs resources for a high-priority job?) but it is probably overkill for simply running your toy app
or a plain Linux service running on some Edge Node -- it's a Do-It-Yourself task, but not extremely complicated, and there are tutorials on the web
If you insist on using Oozie, in spite of all the limitations of both YARN and Oozie, then you have to change the way your app runs -- for instance, schedule the Coordinator to launch a job every 12h and pass the "nominal time" as Workflow property, edit the Workflow to pass that time to the Java app, edit the Java code so that the app exits at (arg + 11:58) and clears the way for the next exec.

Resources