Quartz .NET - Prevent parallel Job Execution - parallel-processing

I am using Quartz .NET for job scheduling.
So I created one job class (implementing IJob).
public class TransferData : IJob
{
public Task Execute(IJobExecutionContext context){
string tableName = context.JobDetail.JobDataMap.Get("table");
// Transfer the table here.
}
}
So I want to transfer different and multiple tables. For this purpose I am doing something like this:
foreach (Table table in tables)
{
IJobDetail job = JobBuilder.Create<TransferData>()
.WithIdentity(new JobKey(table.Name, "table_transfer"))
.UsingJobData("table", table.Name)
.Build();
ITrigger trigger = TriggerBuilder.Create()
.WithIdentity(new TriggerKey("trigger_" + table.Name, "table_trigger"))
.WithCronSchedule("*/5 * * * *")
.ForJob(job)
.Build();
await this.scheduler.ScheduleJob(job, trigger);
}
So every table should be transfered every 5 minutes. To achieve this I create several jobs with different job names.
The question is: how to prevent the parallel job execution for the same jobName? (e.g. the previous run takes longer for one table, so I do not want to start the next transfer for the same table.)
I know about the attribute #DisallowConcurrentExecution, but this is used to prevent the parallel execution for the same Job class. I do not want to write an extra Job class per table, because the "main" code for the transfer is always the same, the one and only difference is the table name. So I want to use the same job class for this purpose.

The Quatz .NET documentation is a little bit confusing.
DisallowConcurrentExecution is an attribute that can be added to the
Job class that tells Quartz not to execute multiple instances of a
given job definition (that refers to the given job class)
concurrently. Notice the wording there, as it was chosen very
carefully. In the example from the previous section, if
“SalesReportJob” has this attribute, than only one instance of
“SalesReportForJoe” can execute at a given time, but it can execute
concurrently with an instance of “SalesReportForMike”. The constraint
is based upon an instance definition (JobDetail), not on instances of
the job class. However, it was decided (during the design of Quartz)
to have the attribute carried on the class itself, because it does
often make a difference to how the class is coded.
Source: https://www.quartz-scheduler.net/documentation/quartz-3.x/tutorial/more-about-jobs.html
But if you read the API documentation, it's says: the bold text is important!
An attribute that marks a IJob class as one that must not have
multiple instances executed concurrently (where instance is based-upon
a IJobDetail definition - or in other words based upon a JobKey).
Source: https://quartznet.sourceforge.io/apidoc/3.0/html/
In other words: the DisallowConcurrentExecution attribute works for my purposes.

Related

How to lock the job during execution in Laravel?

I see withoutOverlapping() mutex for commands, but I don't see it for jobs. How can I protect jobs of the same type from overlapping each other?
Thanks!
I think it's possible using the following:
https://laravel.com/docs/8.x/queues#unique-jobs
You can specify a needed key that you can pass to the job to mark its uniqueness. In my case, I need to limit requests to a third-party API that happens in the job so if I have more than one worker handling the queue, it's possible to get 429 from the API. As soon as I have many API-keys (per user of the app), I can use it to have the same type of job being exxecuted independently across the app users but lock the job execution if the current job with a specific key is not completed.
Like this:
//In the class defining you must use ShouldBeUnique interface
class UpdateSpreadsheet implements ShouldQueue, ShouldBeUnique
//some other code
public function __construct($keyValue)
{
//some other constructor code if needed
$this->keyValue= $keyValue;
}
//This function allows to set the unique key
public function uniqueId()
{
return $this->keyValue;
}
//If you don't need to wait until the job is processed, you may also specify
//the time for the force lock removing (so you'll be able to queue another
//job with this key after 10 seconds even if the current job is
//still in process)
public $uniqueFor = 10;

Can we run Spring Batch multiple times a day?

I have implemented Spring batch before couple of times before but it was designed to run only once in a day.
Now I have a new requirement where I need to start the batch whenever a record gets inserted into the table. whenver a new record is inserted, it will launch the job and batch will generate PDF and save it in repository and send mail to user.
I am not sure how to design a spring batch which runs multiple times a day or is it even correct to go for Spring batch for this scenario. Can someone please throw some light on this. Thanks !!!
You can implement a listener to catch when data are stored in db (easily with hibernate for instance) and then use CommandLineJobRunner to start your job manually.
See spring_source
You can run it several times, just be careful with identifier pattern use for your batch instance
As per your requirement, you can achieve this with the help of #EntityListeners (if working hiberante).
let me give you dummy scenario :-
#Entity
#Table(name="Order")
#EntityListeners(OrderListner.class)
public class Order{
#Id
public Integer id;
// other properties
}
This Listener :-
class OrderListner{
#PostPersist
public void doStartSchedulerCode(){
// You can call the code from here responobile for generating pdf and send mail,
}
}
Each time you will insert a row in order table,doStartSchedulerCode() will be called.
Try this

Laravel Single Job Class dispatched multiple times with different parameters getting overwritten

I'm using Laravel Jobs to pull data from the Stripe API in a paginated way. Basically, each job gets a "brand id" (a user can have multiple brands per account) and a "start after" parameter. It uses that to know which stripe token to use and where to start in the paginated calls (this job call itself if more stripe responses are available in the pagination). This runs fine when the job is started once.
But there is a use case where a user could add stripe keys to multiple brands in a short time and that job class could get called multiple times concurrently with different parameters. When this happens, whichever process is started last overwrites the others because the parameters are being overwritten to just the last called. So if I start stripe job with brand_id = 1, then job with brand_id = 2, then brand_id = 3, 3 overwrites the other two after one cycle and only 3 gets passed for all future calls.
How do I keep this from happening?
I've tried static vars, I've tried protected, private and public vars. I thought might be able to solve it with dynamically created queues for each brand, but this seems like a huge headache.
public function __construct($brand_id, $start_after = null)
{
$this->brand_id = $brand_id;
$this->start_after = $start_after;
}
public function handle()
{
// Do stripe calls with $brand_id & $start_after
if ($response->has_more) {
// Call next job with new "start_at".
dispatch(new ThisJob($this->brand_id, $new_start_after));
}
}
According to Laravel Documentation
if you dispatch a job without explicitly defining which queue it
should be dispatched to, the job will be placed on the queue that is
defined in the queue attribute of the connection configuration.
// This job is sent to the default queue...
dispatch(new Job);
// This job is sent to the "emails" queue...
dispatch((new Job)->onQueue('emails'));
However, pushing jobs to multiple queues with unique names can be especially useful for your use case.
The queue name may be any string that uniquely identifies the queue itself. For example, you may wish to construct the queue name based on the uniqid() and $brand_id.
E.g:
dispatch(new ThisJob($this->brand_id, $new_start_after)->onQueue(uniqid() . '_' . $this->brand_id));

Is it possible to lock some entries in MongoDB and do a query that do not take into account the locked recors?

I have a mongoDB that contains a list of "task" and two istance of executors. This 2 executors have to read a task from the DB, save it in the state "IN_EXECUTION" and execute the task. Of course I do not want that my 2 executors execute the same task and this is my problem.
I use the transaction query. In this way when An executor try to change state of the task it get "write exception" and have to start again and read a new task. The problem of this approach is that sometimes an Executor get a lot of errors before it can save the change of task state correctly and execute a new task. So it is like I have only one exector.
Note:
- I do not want to block my entire DB on read/write becouse in this way I will slow down the entire process.
- I think it is necessay to save the state of the task because it could be a long task.
I asked if it is possible to lock only certain record and execute a query on the "not-locked" records but each advices that solves my problem will be really appriciated.
Thanks in advance.
EDIT1:
Sorry, I simplified the concept in the question above. Actually I extract n messages that I have to send. I have to send this messages in block of 100 messages so my executors will split the messages extracted in block of 100 and pass them to others executors basically.
Each executor extract the messages and then update them with the new state. I hope this is more clear now.
#Transactional(readOnly = false, propagation = Propagation.REQUIRED)
public List<PushMessageDB> assignPendingMessages(int limitQuery, boolean sortByClientPriority,
LocalDateTime now, String senderId) {
final List<PushMessageDB> messages = repositoryMessage.findByNotSendendAndSpecificError(limitQuery, sortByClientPriority, now);
long count = repositoryMessage.updateStateAndSenderId(messages, senderId, MessageState.IN_EXECUTION);
return messages;
}
DB update:
public long updateStateAndSenderId(List<String> ids, String senderId, MessageState messageState) {
Query query = new Query(Criteria.where(INTERNAL_ID).in(ids));
Update update = new Update().set(MESSAGE_STATE, messageState).set(SENDER_ID, senderId);
return mongoTemplate.updateMulti(query, update, PushMessageDB.class).getModifiedCount();
}
You will have to do the locking one-by-one.
Trying to lock 100 records at once and at the same time have a second process also lock 100 records (without any coordination between the two) will almost certainly result in an overlapping set unless you have a huge selection of available records.
Depending on your application, having all work done by one thread (and the other being just a "hot standby") may also be acceptable as long as that single worker does not get overloaded.

Laravel Scheduling in clustered environment

I am working with scheduling in Laravel 5.3. Previously, I was using one server to host the laravel application. Now that I am using two servers to run the Laravel App, how do I ensure that both servers are not running the same jobs at the same time?
Recently, I saw an Event method called "withoutOverlapping()". See https://laravel.com/docs/5.3/scheduling#preventing-task-overlaps
In my case, withoutOverlapping() cannot help me as I am working in a clustered environment.
Are there any workarounds or suggestions regarding this?
First of all, define if it is critical or not to avoid running task multiple times.
For example, if your app is using a task to do some sort of cleanup, there is almost no drawback to run it on every server (who care if you try to delete messages with +10 min twice?)
If it is absolutely critical to run every task only one time, you'll need to define a "main server" that will execute tasks, and a slave server that will just answer to requests but not perform any task. This is quite trivial as you just have to give every env a different name in your .env, and test against that when you define the scheduler tasks.
This is the easiest way, seriously don't bother making a database locking mecanism or whatever so you can synchronise tasks accross servers. Even OS's struggle to manage properly synchronisation against threads on the same machine, why do you want to implement the same accross different machines?
Here's what I've done when I ran into the same problems with load balancing:
class MutexCommand extends Command {
private $hash = null;
public function cleanup() {
if (is_string($this->hash)) {
Redis::del($this->hash);
$this->hash = null;
}
}
protected abstract function generateHash();
protected abstract function handleInternal();
public final function handle() {
register_shutdown_function([$this,"cleanup"]);
try {
$this->hash = $this->generateHash();
//Set a value if it does not exist atomically. Will fail if it does exist.
//Essentially setnx is the mechanism to acquire the lock
if (!Redis::setnx($this->hash,true)) {
$this->hash = null; //Prevent it from being cleaned up
throw new Exception("Already running");
}
$this->handleInternal();
} finally {
$this->cleanup();
}
}
}
Then you can write your commands:
class ThisShouldNotOverlap extends MutexCommand {
public function generateHash() {
return "Unique key for mutex, you can just use the class name if you want by doing return static::class";
}
public function handleInternal() { /* do stuff */ }
}
Then whenever you try to run the same command on multiple instances one would successfully acquire the "lock" and the others should fail.
Of course this assumes that you are using a non-clustered redis cache.
If you are not using redis then there's probably similar locking mechanisms you can implement in other caches, if you are using a clustered redis then you may need to use the RedLock locking mechanism
Essentially no, there's no a natural way using Laravel to know if another Laravel app have the same job on the job dispatcher.
We have some options there to find a solution:
Create a intermediate app that manages the jobs from the other apps.
Allow only one app to dispatch jobs.
Use worker queues, you have some packages for this, I would recommend to use Laravel 5 with WebSockets and Queue Asynchronously.
First of all Laravel scheduler isn't designed to work in a clustered environment. It was never intended to be that way.
I would suggest you should have a dedicated cron instance which manages your Laravel scheduler jobs.

Resources