I am using HangFire hosted by IIS with an app pool set to "AlwaysRunning". I am using the Autofac extension for DI. Currently, when running background jobs with HangFire they are executing sequentially. Both jobs are similar in nature and involve File I/O. The first job executes and starts generating the requisite file. The second job executes and it starts executing. It will then stop executing until the first job is complete at which point the second job is resumed. I am not sure if this is an issue related to DI and the lifetime scope. I tend to think not as I create everything with instance per dependency scope. I am using owin to bootstrap hangfire and I am not passing any BackgroundServer options, nor am I applying any hints via attributes. What would be causing the jobs to execute sequentially? I am using the default configuration for workers. I am sending a post request to web api and add jobs to the queue with the following BackgroundJob.Enqueue<ExecutionWrapperContext>(c => c.ExecuteJob(job.SearchId, $"{request.User} : {request.SearchName}"));
Thanks In Advance
I was looking for that exact behavior recently and I manage to have that by using this attribute..
[DisableConcurrentExecution(<timeout>)]
Might be that you had this attribute applied, either in the job or globally?
Is this what you were looking for?
var hangfireJobId = BackgroundJob.Enqueue<ExecutionWrapperContext>(x => x.ExecuteJob1(arguments));
hangfireJobId = BackgroundJob.ContinueWith<ExecutionWrapperContext>(hangfireJobId, x => x.ExecuteJob2(arguments));
This will basically execute the first part and when that is finished it will start the second part
Related
I have a job that is somehow getting kicked off multiple times. I want the job to kick off once and only once. If any other attempts to run the job while it's already on the queue, I want those runs to ABORT.
I've read the Laravel 8 documentation and can't figure out if I should use:
Queue\ShouldBeUnique (documented here: https://laravel.com/docs/8.x/queues#unique-jobs)
OR
Queue\Middleware\WithoutOverlapping
mentioned here: https://laravel.com/docs/8.x/queues#preventing-job-overlaps
I believe the first one aborts subsequent attempts to run the job whereas the second keeps it queued, just makes sure it doesn't run until the first job is finished. Can anyone confirm?
Confirmed locally by attempting to run multiple instances of the same job in a console window.
Implementing the Queue\ShouldBeUnique interface in the class of my job means that subsequent attempts are ABORTED.
Whereas adding ->withoutOverlapping() to the end of my job reference in the app\console\kernel.php file simply prevents it from running simultaneously. It does NOT abort the job if one is already running.
My spring batch application is running on PCF platform which is connected to MySQL database (single instance), it's running fine when only an instance is up & running but when it comes to more than one application instance, I'm getting exception org.springframework.dao.DuplicateKeyException. This might be happening because similar batch job is firing at the same time & trying to update batch instance table with same job ID. Is there any way to restrict this kind of failure or in another way, I wanted a solution where only one batch job will run at a time even there are multiple instances running.
For me , it is a good sign that DuplicateKeyException is thrown. Because it exactly achieves what you want to do is that spring-batch already makes sure that the same job execution will not executed in parallel. (i.e. Only one server instance execute the job successfully while other fail to execute)
So I see no harms in your case. If you don't like this exception , you can catch it and re-throw it as your application level exception saying something like "The job is executing by other sever instances , so skip to execute it."
If you really want that only one server instance will try to trigger to execute a job and other servers will not try to trigger in the meantime , it is not the problem of spring-batch but is a problem about how you ensure that only one server node will fires the request in the distributed environment. If the batch job is fired as a scheduled task using #Scheduled , you can consider to use a distributed lock such as ShedLock to make sure that it is executed at most once at the same time on one node only.
I believe I've got a scoping issue here.
Project explanation:
The goal is to process any incoming file (on disk), including meta data (which is stored in an SQL database). For this I have two tasklets (FileReservation and FileProcessorTask) which are the steps in the overarching "worker" jobs. They wait for an event to start their work. There are several threads dealing with jobs for concurrency. The FileReservation tasklet sends the fileId to FileProcessorTask using the job context.
A separate job (which runs indefinitely) checks for new file meta data records in the database and upon discovering new records "wakes up" the FileReservationTask tasklets using a published event.
With the current configuration the second step in a job can receive a null message when the FileReservation tasklets are awoken.
If you uncomment the code in BatchConfiguration you'll see that it works when we have separate instances of the beans.
Any pointers are greatly appreciated.
Thanks!
Polling a folder for new files is not suitable for a batch job. So using a Spring Batch job (filePollingJob) is not a good idea IMO.
Any pointers are greatly appreciated.
Polling a folder for new files and running a job for each incoming file is a common use case, which can be implemented using a java.nio.file.WatchService or a FileInboundChannelAdapter from Spring integration. See How do I kickoff a batch job when input file arrives? for more details.
I read a lot about how to enable parallel processing and chunking of an individual job, using Master/Slave paradigm. Consider an already implemented Spring Batch solution that was intended to run on a standalone server. With minimal refactoring I would like to enable this to horizontally scale and be more resilient in production operation. Speed and efficiency is not a goal.
http://www.mkyong.com/spring-batch/spring-batch-hello-world-example/
In the following example a Job Repository is used that connects to an initializes a database schema for the Job Repository. Job initiation requests are fed to a message queue, that a single server, with a single Java process is listening on via Spring JMS. When encountering this it executes a new Java process that is the Spring Batch job. If the job has not been started according to the Job Repository it will begin. If the job had failed it will pick up where the job left off. If the job is in process it will ignore.
The single point of failure is the single server and single listening process for job initiation. I would like to increase resiliency by horizontally scaling identical server instances all competing for who can first grab the job initiation message when it first appears in the queue. That server instance will now attempt to run the job.
I was conceiving that all instances of the JobRepository would share the same schema, so they can all query for when the status is currently in process and decide what they will do. I am unsure though if this schema or JobRepository implementation is meant to be utilized by multiple instances.
Is there a risk in pursuing this that this approach could result in deadlocking the database? There are other constraints to where the Partition features of Spring Batch will not work for my application.
I decided to build a prototype to test if the condition that the Spring Batch Job Repository schema and SimpleJobRepository can be used in a load balanced way with multiple Spring Batch Java processes running concurrently. I was afraid that deadlock scenarios might have occurred at the database to where all running job processes get stuck.
My Test
I started with the mkyong Spring Batch HelloWorld example and made some changes to it where it could be packaged into a Jar that can be executed from the command line. I also removed the initialize database step defined in the database.config file and manually established a local MySQL server with the proper schema elements. I added a Job parameter for time to be the current time in millis so that each job instance would be unique.
Next, I wrote a separate Java main class that used Apache Commons Exec framework to create 50 sub processes with no wait between them. Each of these processes have a Thread.sleep for 1 second within their Processor objects as well so that a number of processes will all kick off at the same time and all attempt to access the database at the same time.
Results
After running this test a number of times in a row I see that all 50 Spring batch processes consistently complete successfully and update the same database schema correctly. I don't see any indication that if there were multiple Spring Batch job processes running on multiple servers connecting to the same database that they would interfere with each other on the schema nor do I see any indication that a deadlock could happen at this time.
So it sounds as if load balancing of Spring Batch jobs without the use of advanced Master/Slave and Step Partitioning approaches is a valid use case.
If anybody would like to comment on my test or suggest ways to improve it I would appreciate it.
Here is excerpt from
Spring Batch docs on how Spring Batch handles database updates for its repository:
Spring Batch employs an optimistic locking strategy when dealing with updates to the database. This means that each time a record is 'touched' (updated) the value in the version column is incremented by one. When the repository goes back to save the value, if the version number has changed it throws an OptimisticLockingFailureException, indicating there has been an error with concurrent access. This check is necessary, since, even though different batch jobs may be running in different machines, they all use the same database tables.
A newbie question on Sprint Batch Admin.
My requirement is that the user should be able to schedule new jobs (passing some parameters for the job functionality) through a web UI. These jobs should be persistent, will be repetitive and could be cancelled or deleted. Also, a report could be generated for last run jobs and to list all the existing jobs with their next run dates.
Perhaps my most important requirement is that this should be possible "on the fly", not requiring redeploying the web-application or a server re-start.
Can this be done using Spring Batch Admin (I see that the guide talks about uploading an XML for adding a job but that seems tedious, if there is an API why shouldn't we be able to create a job on the fly through the Batch Admin Web UI)? Or does JDK Timer or Quartz support it?
Once a job has been created, it can't be deleted, but it can be stopped. Allowing deletion from DB is a risky operation, as Spring Batch might have already been started the job execution, but the DB has not been updated yet. If one removes the job at this moment, you have inconsistency.
Scheduling a new job is described in Launch Job. It is not possible to create new types of jobs, as jobs can generally have complicated configuration which is parsed only once when Spring Context is loaded.
Dynamic deployment (on the fly) of jobs and configurations, without requiring server restart, is a feature we implemented in Trooper Batch Profile - it is not exactly Spring Batch admin but builds on it. You continue to write your jobs using Spring batch, just the container changes for in Trooper you would use its Batch profile runtime. Screen shots and features are here : https://github.com/regunathb/Trooper/wiki/Writing-Batch-jobs-in-Trooper
I think we can deploy the each spring batch job by a SBA. I mean each batch job will be compiled as a war file. We deploy them together in server. In this way, we have the following visiting urls to monitor each jobs:
h t t p://bactchjobserver/job1
h t t p://bactchjobserver/job2
h t t p://bactchjobserver/job3
h t t p://bactchjobserver/job4
But the downside is that each war fill surely contains lib files, which make each war file like 10MB size.
At the same time, I tried to manually add new-job.xml to war-file\WEB-INF\classes\META-INF\spring\batch\jobs, and new-job.jar to war-file\WEB-INF\lib without stopping JBoss. It works. The new job can be showed in SBA UI and runnable.
But obviously this would lead much maintenance and trouble shooting. It is not implementable.