Is the attempts passed to retry_on respected with multiple workers - ruby

When retry_on causes a delayed job to get retried, I see the run_at change on the record in the database, but attempts in the database does not get incremented until the job has been retried as many times as the attempts argument I passed to retry_on (That may be confusing. There are two different things here called attempts: the column in the db and the argument passed to retry_on).
How does this work when I have multiple workers? I see in the code for ActiveJob that the number of attempts in retry_on is being tracked through exception_executions, which appears to be just a hash stored in RAM.
Does anything prevent different workers from picking up the job on the retry?
If not, it seems like if I pass attempts: 3 to retry_on and I have 10 workers, then the job could end up getting retried as many as 30 times before it is reported as an error in DelayedJob.
Is that right? If so, is it a bug?

In ActiveJob, it looks like both executions and exception_executions are persisted on the job instance. If you're using something like delayed_job, this is stored in the handler column on the delayed_jobs table.
When a job is created, various job data (args, provider information, etc.) is stored deserialized on the job instance. When the job is executed, this data is then serialized for use in ActiveJob as well as your queueing adapter.
This should not be an issue with multiple worker processes - each process would read from and write to this job data in ActiveJob's retry_on logic, meaning each process would be aware of how many times the jobs has executed and/or raised an exception across all processes.

Related

Laravel Vapor queue sqs doesn't run jobs synchronously

We have an external service which processes a specific task on given data. Because this takes a while per task, and we have ten thousand of tasks we decided to put processing into jobs and a queue otherwise we will get an timeout.
Processing all the tasks can be take 15 hours.
So, we decided to split them into chunks and put processing the chunk into a job. So the job will only take about 1 minute.
Considering that the receiving service has limited resources it is important to process each job after each other without a synchronicity.
We put these jobs into a specific named queue to divide this jobs from other jobs like email submitting.
In the local test environment, it works properly with sync, database and sqs.
Now I will explain the issue with the live environment:
When I run the jobs in my local test environment with sqs, invoked by php art queue:listen --queue=name of the queue, all jobs will be written in the "message available" column and one by one will be removed from "message available" column and added to the "message in flight" column.
The "message in flight" column has never more than one message.
After deploying everything to production the following happens:
The command to add the jobs to the queue will invoked by a scheduler, instead of invoking in console on my local environment.
Then all jobs will be added to "message available" column and immediately dozens of jobs will be moved to "messages in flight". That means all jobs from "message available" will be moved to "messages in flight". So that it seems that the jobs won't be processed step by step instead of a kind of brute force.
The other thing is that only 5 jobs will be executed. After that nothing happens, the receiving service gets no requests, the failed_jobs table is empty, and the jobs still remains in "messages in flight".
I have no idea what I do wrong!
Is there another way to process thousands of jobs?
I've set the "queue-concurrency" to 1 and all queues are listed below the "queues" section in vapor.yml.
Furthermore, I've set timeout for cli, general and queue to 900 (seconds).
After checking the sqs log file in cloud watch I see that the job has been executed about 4 times.
The time between first job and last job in the log file is about 6 minutes max.
Does anybody has any ideas?
Thank you in advance.
Best
Michael

How can I associate a queued job with a particular user in Laravel?

I've built a system based on Laravel where users are able to begin a "task" which repeats a number of times, with a delay between each repetition. I've accomplished this by queueing a job with an amount argument, which then recursively queues an additional job until the count is up.
For example, I start my task with 3 repetitions:
A job is queued with an amount argument of 3. It is ran, the amount is decremented to 2. The same job is queued again with a delay of 5 seconds specified.
When the job runs again, the process repeats with an amount of 1.
The last job executes, and now that the amount has reached 0, it is not queued again and the tasks have been completed.
This is working as expected, but I need to know whether a user currently has any tasks being processed. I need to be able to do the following:
Check if a particular queue has any jobs started by a particular user.
Check the value that was set for amount on that job.
I'm using the database driver for a queue named tasks. Is there any existing method to accomplish my goals here?
Thanks!
You shoudln't be using delay to queue multiple repetitions of the same job over and over. That functionality is meant for something like retrying a failed network request. Keeping jobs in the queue for multiple hours at a time can lead to memory issues with your queues if the count gets too high.
I would suggest you use the php artisan schedule:run functionality to run a command every 1-5 minutes to check the database if it is time to run a user's job. If so, kick off that job and add a status flag to the user table (or whatever table you want to keep track of these things). When finished you mark that same row as completed and wait for the next time the cron runs to do it again.

Laravel Queue start a second job after first job

In my Laravel 5.1 project I want to start my second job when first will finished.
Here is my logic.
\Queue::push(new MyJob())
and when this job finish I want to start this job
\Queue::push(new ClearJob())
How can i realize this?
If you want this, you just should define 1 Queue.
A queue is just a list/line of things waiting to be handled in order,
starting from the beginning. When I say things, I mean jobs. - https://toniperic.com/2015/12/01/laravel-queues-demystified
To get the opposite of what you want: async executed Jobs, you should define a new Queue for every Job.
Multiple Queues and Workers
You can have different queues/lists for
storing the jobs. You can name them however you want, such as “images”
for pushing image processing tasks, or “emails” for queue that holds
jobs specific to sending emails. You can also have multiple workers,
each working on a different queue if you want. You can even have
multiple workers per queue, thus having more than one job being worked
on simultaneously. Bear in mind having multiple workers comes with a
CPU and memory cost. Look it up in the official docs, it’s pretty
straightforward.

Laravel Queues for multi user environment

I am using Laravel 5.1, and I have a task that takes around 2 minutes to process, and this task particularly is generating a report...
Now, it is obvious that I can't make the user wait for 2 minutes on the same page where I took user's input, instead I should process this task in the background and notify the user later about task completion...
So, to achieve this, Laravel provides Queues that runs the tasks in background (If I didn't understand wrong), Now for multi-user environment, i.e. if more than one user demands report generation (say there are 4 users), so being the feature named Queues, does it mean that tasks will be performed one after the other (i.e. when 4 users demand for report generation one after other, then 4th user's report will only be generated when report of 3rd user is generated) ??
If Queues completes their tasks one after other, then is there anyway with which tasks are instantly processed in background, on request of user, and user can get notified later when its task is completed??
Queue based architecture is little complicated than that. See the Queue provides you an interface to different messaging implementations like rabbitMQ, beanstalkd.
Now at any point in code you send send message to Queue which in this context is termed as a JOB. Now your queue will have multiple jobs which are ready to get out as in FIFO sequence.
As per your questions, there are worker which listens to queue, they get a job and execute them. It's up to you how many workers you want. If you have one worker your tasks will be executed one after another, more the workers more the parallel processes.
Worker process are started with command line interface of laravel called Artisan. Each process means one worker. You can start multiple workers with supervisor.
Since you know for sure that u r going to send notification to user after around 2 mins, i suggest to use cron job to check whether any report to generate every 2 mins and if there are, you can send notification to user. That check will be a simple one query so don't need to worry about performance that much.

Ruby on Rails, Resque

I have a resque job class that is responsible for producing a report on user activity. The class queries the database and then performs numerous calculations/data parsing to send out an email to certain people. My question is, should resque jobs like this, that have numerous method (200 lines or so of code), be filled with all class methods and respond to the single ResqueClass.perform method? Or, should I be instantiating a new instance of this resque class to represent the single report that is being produced? If both methods properly calculate the data and email it, is there a convention or best practice on how it should be handled for background jobs?
Thank You
Both strategies are valid. I generally approach this from the perspective of concurrency. While your job is running, the resque worker servicing your job is busy, so if you have N workers and N of these jobs running, you're going to have to wait until one is done before anything else in the queue gets processed.
Maybe that's ok - if you just have one report at a time then you in effect will dedicate one worker to running the report, your others can do other things. But if you have a pile of these and it takes a while, you might impact other jobs in your queue.
The downside is that if your report dies, you may need logic to pick up where you left off. If you instantiate the report once per user, you'd simply need to retry the failed jobs - no "where was I" logic is required.

Resources