How to send an exception to Sentry from Laravel Job only on final fail? - laravel

Configuration
I'm using Laravel 8 with sentry/sentry-laravel plugin.
There is a Job that works just fine 99% of time. It retries N times in case of any problems due to:
public $backoff = 120;
public function retryUntil()
{
return now()->addHours(6);
}
And it simply calls some service:
public function handle()
{
// Service calls some external API
$service->doSomeWork(...);
}
Method doSomeWork sometimes throws an exception due to network problems, like Curl error: Operation timed out after 15001 milliseconds with 0 bytes received. This is fine due to automatic retries. In most cases next retry will succeed.
Problem
Every curl error is sent to Sentry. As an administrator I must check every alert, because this job is pretty important and I can't miss actually failed job. For example:
There is some network problem that is not resolved for an hour.
Application queues a Job
Every 2 minutes application generates similar message to Sentry
After network problems resolved job succeeds, so no attention required
But we are seing dozens of errors, that theoretically could be ignored. But what if there an actual problem in that pile and I will miss it?
Question
How to make that only "final" job fail would send a message to Sentry? I mean after 6 hours of failed retries: only then I'd like to receive one alert.
What I tried
There is one workaround that kind of "works". We can replace Exception with SomeCustomException and add it to \App\Exceptions\Handler::$dontReport array. In that case there are no "intermediate" messages sent to Sentry.
But when job finally fails, Laravel sends standard ... job has been attempted too many times or run too long message without details of actual error.

Related

Crossing of messages in worker replies during concurrent spring batch jobs with remote partitioning

This has been asked here, but I don't think this was answered. The only answer talks about how aggregator uses correlationId. But the real issue is how job status is updated without checking JobExecutionId in replies.
I don't have enough reputation to comment on existing question, so asking here again.
According to javadoc on MessageChannelPartitionHandler it is supposed to be step or job scoped. In remote partitioning scenario we are using RemotePartitioningManagerStepBuilder to build manager step which does not allow to set PartitionHandler. Given that every job will use same queue on rabbitmq, when worker node replies are received message are getting crossed. There is no simple way to reproduce this but I can see this behavior using some manual steps as below
Launch first job
Kill the manager node before worker can reply
Let worker node finish handling all partitions and send a reply on rabbitmq
Start manager node again and launch a new job
Have some mechanism to fail the second job i.e. explicitly fail in reader/writer
Check the status of 2 jobs
Expected Result: Job-1 marked completed and job-2 as failed
Actual Result: Job-1 remains in started and job-2 is marked completed eventhough its worker steps are marked as failed
Below is sample code that shows how manager and worker steps are configured
#Bean
public Step importDataStep(RemotePartitioningManagerStepBuilderFactory managerStepBuilderFactory) {
return managerStepBuilderFactory.get()
.<String, String>partitioner("worker", partitioner())
.gridSize(2)
.outputChannel(outgoingRequestsToWorkers)
.inputChannel(incomingRepliesFromWorkers)
.listener(stepExecutionListener)
.build();
}
#Bean
public Step worker(
RemotePartitioningWorkerStepBuilderFactory workerStepBuilderFactory) {
return workerStepBuilderFactory.get("worker")
.listener(stepExecutionListener)
.inputChannel(incomingRequestsFromManager())
.outputChannel(outgoingRepliesToManager())
.<String, String>chunk(10)
.reader(itemReader())
.processor(itemProcessor())
.writer(itemWriter());
}
Alternatively, I can think of using polling instead of replies where crossing of message does not occur. But polling cannot be restarted if manager nodes crashed while worker nodes were processing. If I follow the same above steps using polling
Actual Result: Job-1 remains in started and job-2 is marked failed as expected
This issue does not occur in case of polling because each Poller is using exact jobExecutionId to poll and update corresponding manger step/job.
What am I doing wrong? Is there a better way to handle this scenario?

Job always fails at laravel jobs Redis rate limiting

This is a follow up on
Laravel - Running Jobs in Sequence
I decided to go with redis rate limit. Code is below
jobClass {
protected $subscription;
public function __construct(Subscription$subscription) {
$this->subscription= $subscription;
}
public function handle() {
Redis::funnel('mailingJob')->limit(1)->then(function () {
// Job logic...
(new Mailer($this->subscription))->send();
}, function () {
// Could not obtain lock...
return $this->release(10);
});
}
}
And the controller code looks like.
<?php
namespace App\Http\Controllers;
use App\Http\Controllers\Controller;
use App\Models\Subscriptions;
class MailController extends Controller
{
public function sendEmail() {
Subscriptions::all()
->each(function($subscription) {
SendMailJob::dispatch($subscription);
});
}
}
Now, when i run the queue, some of them works rest(around 90%) failed with the below error in horizon.
SendMailJob has been attempted too many times or run too long. The job may have
previously timed out.
What am i missing? Please someone guide me to the right direction.
My goal is to run only one job of a type at one time.
[...] has been attempted too many times or run too long is an error that doesn't tell you why the job failed. It means some other exception has caused your job to fail every time it was attempted by the worker, and the worker has tried it the maximum number of times it was allowed to by your configuration. To understand why it's failing, check your laravel.log file for the exception that caused the job to fail.
In your case, since Mailer is contacting an external system it could be that the system you're connecting to is rate limiting you, or they could be having temporary connection problems or other service downtime. Again, there should be more detail in your log files.
The Laravel documentation has a hint about this:
When using rate limiting, the number of attempts your job will need to run successfully can be hard to determine. Therefore, it is useful to combine rate limiting with time based attempts.
The core of the issue is, the job keeps failing until it can achieve a lock and run.
So I imagine that where you are running your queue worker, you are not setting the --tries flag high enough.
Although you could just set a very high --tries, it is not really scalable.
The best solution, as suggested in the documentation, would be to increase the number of tries as well as using time based attempts
You can also increase return $this->release(10); the release time here. That should have the job wait longer before trying to reacquire a lock, so will use up fewer tries!

Laravel Jobs fail on Redis when attempting to use throttle

End Goal
The aim is for my application to fire off potentially a lot of emails to the Redis queue (This bit is working) and then Redis throttle the processing of these to only a set number of emails every selected number of minutes.
For this example, I have a test job that appends the time to a file and I am attempting to throttle it to once every 60 seconds.
The story so far....
So far, I have the application successfully pushing a test amount of 50 jobs to the Redis queue. I can log in to Horizon and see these 50 jobs in the "processjob" queue. I can also log in to redis-cli and see 50 sets under the list key "queues:processjob".
My issue is that as soon as I attempt to put the throttle on, only 1 job runs and the rest fail with the following error:
Predis\Response\ServerException: ERR Error running script (call to f_29cc07bd431ccbf64637e5dcb60484560fdfa2da): #user_script:10: WRONGTYPE Operation against a key holding the wrong kind of value in /var/www/html/smhub/vendor/predis/predis/src/Client.php:370
If I remove the throttle, all works file and 5 jobs are instantly ran.
I thought maybe it was the incorrect key name but if I change the following:
public function handle()
{
//
Redis::throttle('queues:processjob')->allow(1)->every(60)->then(function(){
Storage::disk('local')->append('testFile.txt',date("Y-m-d H:i:s"));
}, function (){
return $this->release(10);
});
}
to this:
public function handle()
{
//
Redis::funnel('queues:processjob')->limit(1)->then(function(){
Storage::disk('local')->append('testFile.txt',date("Y-m-d H:i:s"));
}, function (){
return $this->release(10);
});
}
then it all works fine.
My thoughts...
Something tells me that the issue is that the redis key is of type "list" and that the jobs are all under a single list. That being said, if it didn't work this way, how would we throttle a queue as the throttle requires a unique key.
For anybody else that is having issues attempting to get this to work and is getting the same issue as I was, this is what resolved my issues:
The Fault
I assumed that Redis::throttle('queues:processjob') was meant to be referring to the queue that you wanted to be throttled. However, after some re-reading of the documentation and testing of the code, I realized that this was not the case.
The Fix
Redis::throttle('queues:processjob') is meant to point to it's own 'holding' queue and so must be a unique Redis key name. Therefore, changing it to Redis::throttle('throttle:queues:processjob') worked fine for me.
The workings
When I first looked in to this, I assumed that that Redis::throttle('this') throttled the queue that you specified. To some degree this is correct but it will not work if the job was created via another means.
Redis::throttle('this') actually creates a new 'holding' queue where the jobs go until the condition(s) you specify are met. So jobs will go to the queue 'this' in this example and when the throttle trigger is released, they will be passed to the queue specified in their execution code. In this case, 'queues:processjob'.
I hope this helps!

Rocketmq:MQBrokerException: CODE: 2 DESC: [TIMEOUT_CLEAN_QUEUE]

when i send message to broker,this exception occasionally occurs.
MQBrokerException: CODE: 2 DESC: [TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while
This means broker is too busy(when tps>1,5000) to handle so many sending message request.
What would be the most impossible reason to cause this? Disk ,cpu or other things? How can i fix it?
There are many possible ways.
The root cause is that, there are some messages has waited for long time and no worker thread processes them, rocketmq will trigger the fast failure.
So the below is the cause:
Too many thread are working and they are working very slow to process storing message which makes the cache request is timeout.
The jobs it self cost a long time to process for message storing.
This may be because of:
2.1 Storing message is busy, especially when SYNC_FLUSH is used.
2.2 Syncing message to slave takes long when SYNC_MASTER is used.
In
/broker/src/main/java/org/apache/rocketmq/broker/latency/BrokerFastFailure.java you can see:
final long behind = System.currentTimeMillis() - rt.getCreateTimestamp();
if (behind >= this.brokerController.getBrokerConfig().getWaitTimeMillsInSendQueue()) {
if (this.brokerController.getSendThreadPoolQueue().remove(runnable)) {
rt.setStopRun(true);
rt.returnResponse(RemotingSysResponseCode.SYSTEM_BUSY, String.format("[TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while, period in queue: %sms, size of queue: %d", behind, this.brokerController.getSendThreadPoolQueue().size()));
}
}
In common/src/main/java/org/apache/rocketmq/common/BrokerConfig.java, getWaitTimeMillsInSendQueue() method returns
public long getWaitTimeMillsInSendQueue() {
return waitTimeMillsInSendQueue;
}
The default value of waitTimeMillsInSendQueue is 200, thus you can just set it bigger to make the queue waiting for longer time. But if you wanna solve the problem completely, you should follow Jaskey's advice and check your code.

How can I troubleshoot silently failing queued jobs?

I have a job that is dispatched with two arguments - path and filename to a file. The job parses the file using simplexml, then makes a notice of it in the database and moves the file to the appropriate folder for safekeeping. If anything goes wrong, it moves the file to another folder for failed files, as well as creates an event to give me a notification.
My problem is that sometimes the job will fail silently. The job is removed from the queue, but the file has not been parsed and it remains in the same directory. The failed_jobs table is empty (I'm using the database queue driver for development) and the failed() method has not been triggered. The Queue::failing() method I put in the app service provider has not been triggered either - I know, since both of those contain only a single log call to check whether they were hit. The Laravel log is empty (it's readable and Laravel does write to it for other errors - I double-checked) and so are relevant system log files such as e.g. php's.
At first I thought it was a timeout issue, but the queue listener has not failed or stopped, nor been restarted. I increased the timeout to 300 seconds anyway, and verified that all of the "[datetime] Processed: [job]" lines the listener generates were well within that timespan. Php execution times etc. are also far longer than required for this job.
So how on earth can I troubleshoot this when the logs are empty, nothing appears to fail, and I get no notification of what's wrong? If I queue up 200 files then maybe 180 will be processed and the remaining 20 fail silently. If I refresh the database + migrations and queue up the same 200 again, then maybe 182 will be processed and 18 will fail silently - but they won't necessarily be the same.
My handle method, simplified to show relevant bits, looks as follows:
public function handle()
{
try {
$xml = simplexml_load_file($this->path.$this->filename);
$this->parse($xml);
$parsedFilename = config('feeds.parsed path').$this->filename;
File::move($this->path.$this->filename, $parsedFilename);
} catch (Exception $e) {
// if i put deliberate errors in the files, this works fine
$errorFilename = config('feeds.error path').$this->filename;
File::move($this->path.$this->filename, $errorFilename);
event(new ParserThrewAnError($this->filename));
}
}
Okay, so I still have absolutely no idea why, but... after restarting the VM I have tested eight times with various different files and options and had zero problems. If anyone can guess the reason, feel free to reply and I'll accept your answer if it sounds reasonable. For now, I'll mark my own answer as correct once I can, in case somebody else stumbles across this later.

Resources