How can I efficiently interrogate jobs on a Laravel queue?
For example, my RosterEmployee job has an Employee property. I have a lot of these jobs queued. Later, I need to terminate a batch of Employees, and in doing so I want to identify any queued jobs that should be deleted. (There is no point in Rostering an employee if they are about to be terminated).
To do this, I need to identify jobs where the EmployeeId is the same as an employee who is in the termination batch and delete the jobs.
(I realize that in the real world, there would be better ways to accomplish this Employee termination logic, however, I have contrived the example to demonstrate my need - efficiently interrogating jobs on a Laravel queue.)
Related
I'm new to jobs and queues.
At the moment, I'm only really using the ->later() method on a Mail. This places each mail on the default queue.
There are instances where I need cancel jobs on the queue related to a specific model ID. I don't really see any reference to deleting pending jobs in the queue - only deleting / clearing failed.
In Telescope, there are tags showing the Model IDs associated with each pending job.
There are a few things I was hoping to do:
Delete all jobs associated with a specific model ID
Listen for the execution of a job based on a specific model ID, so I may update the database table with the date/timestamp of when the job actually executed. (users can queue emails to send hours in advance and I'd like to log when their customer actually receives the email)
Remove record associated with job since it should not exist if the email didn't actually get sent.
Hoping for some advice on how to solve this problem of needing to manage jobs in this fashion.
I'm using Redis if that makes any difference.
I need to create several 10 million jobs.
I have tried it with for-loops and Bus::batch([]) and unfortunately the creation of the jobs takes longer than the processing of the jobs by the 10 servers/workers. That means the workers have to wait until the job shows up in the database (redis etc). With redis-benchmark I could learn that Redis is not the problem.
Anyway... is there a way to create jobs in BULK (not batch)? I'm just thinking of something like:
INSERT INTO ... () VALUES (), (), (), (), ...
Anyway, creating several million jobs in a for-loop or in batch seems to be very slow for some reason. Probably because it's always just 1 query at a time and not an "upsert".
For any help I would be very grateful!
Writing a million records will be kind of slow at any condition. I'd recommend maximize your queue performance using several methods:
Create job that will create all other jobs if possible
Use only QUEUE_CONNECTION=redis for your queues as redis stores data in RAM which is fastest possible.
Create your jobs after response was processed already
I have a large set of users in my project like 50m.
I should create a playlist for each user every day, for doing this, I'm currently using this method:
I have a column in my users' table that holds the latest time of creating a playlist for that user, and I name it last_playlist_created_at.
I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at
After that, I run a foreach on the result and publish a message for each in my message-broker.
Behind the message-broker, I start around 64 workers to process the messages (create a playlist for the user) and update last_playlist_created_at in the users' table.
If my message-broker messages list was empty, I will repeat these steps (While - Do-While)
I think the processing method is good enough and can be scalable as well,
but the method we use to create the message for each user is not scalable!
How should I do to dispatch a large set of messages for each of my users?
Ok, so my answer is completely based on your comment where you mentioned that you use while(true) to check if the playlist needs to be updated which does not seem so trivial.
Although this is a design question and there are multiple solutions, here's how I would solve it.
First up, think of updating the playlist for a user as a job.
Now, in your case this is a scheduled Job. ie. once a day.
So, use a scheduler to schedule the next job time.
Write a Scheduled Job Handler to push this to a Message Queue. This part is just to handle multiple jobs at the same time where you could control the flow.
Generate the playlist for the user based on the job. Create a Schedule event for the next day.
You could persist Scheduled Job data just to avoid race conditions.
I would like to parallelize several but not all steps of my spring batch application.
My flow looks like this:
MainStep1: read customers table and create a list of customer config
MainStep2 per customer (If a flow for a single customer fails, do not abort the job):
innerStep1: retrieve all transactions of this customer from transactions table
innerStep2: generate a customer bill from these transactions
innerStep3: email the bill to the customer
MainStep3: aggregate results (which customers succeeded and which ones failed)
MainStep4: email results to the manager
What would be the best way to approach this? I am looking for general advice. I see several concepts, such as: multi-threaded steps, parallel steps, split flows etc.
For clarification, if there are 400 customers in the customers table, I do not want to spin up hundreds of threads in MainStep2.
Another approach would be to drop everything in 1 step:
Reader: read customers table
Composite processor:
processor1: retrieve all transactions of this customer
processor2: generate a customer bill from these transactions
Writer: email the bill to the customer
Step2:
Tasklet1: aggregate results (count success and failure)
Tasklet2: email results to the manager
Problem with the last approach is, there's a lot of logic going in each processor here and it might get overly complex. The goal is to have parts of the flow reusable for many jobs in the future (e.g. how a bill is created differs from a vendor to vendor but sending a bill is the same).
This is how I would approach this problem & I would use partitioning to achieve desired goal - provided you don't partition for each customer but a bulk of customers. Secondly, I would design it as a two step job to achieve better results in case of failures & reruns.
1.First I would try to group customers with some other attributes in addition to CUSTOMER_ID & would try to achieve a grouping of max 10, 50 or 100 groups.
So <CUSTOMER_ID,CUSTOMER_ATTR1, CUSTOMER_ATTR2, ...> will be your partitioning criteria.
So what I am saying is that you achieve parallelism at step level for a group of customers & not for each customer ( since that is going to be very time consuming as you would be setting up one partitioned step for each customer ).
For better performance, grouping needs to be done wisely keeping in mind a better distribution over all steps.
2.Your concern - I do not want to spin up hundreds of threads is valid & you first limit that concern by point#1 - by fixing number of max partitions irrespective of how many customers you got.
Secondly, setting up of partitioned steps & actually starting a partitioned step are distinct in Spring batch & that is achieved by using an async task executor & setting its concurrency limit.
SimpleAsyncTaskExecutor.setConcurrencyLimit
So at any point of time, you will have max these many steps / threads running in parallel.
You need to set your custom defined async task executor to partitioned step definition / configuration.
3.Within step # 1 transaction ( points 1 & 2 ) , keep marking customers that have successfully been processed as PROCESSED & read DB again for these processed records to prepare the reports that you need to send.
I am fairly new to SLURM: the grid I use has many different users and when they are submitting or canceling jobs, it seems that other users are not able to query partition status, etc. This is extremely frustrating especially when creating jobs that spawn other jobs since they end up failing because the controller is busy. Does anyone know a workaround?
With the default settings, Slurm can get slow/hang when many users submit/modify/cancel many jobs at the same time, especially with backfill and accounting enabled.
See tips to improve on that in these slides from the Slurm User Group Meeting of 2012.