Laravel 8 - Queue jobs timeout, Fixed by clearing cache & restarting horizon - laravel

My queue jobs all run fairly seamlessy in our production server, but about every 2 - 3 months I start getting a lot of timeout exceeded/too many attempts exceptions.
Our app is running with event sourcing and many events are queued so neededless to say we have a lot of jobs passing through the system (100 - 200k per day generally).
I have not found the root cause of the issues yet, but a simple re-deploy through Laravel Envoyer fixes the issue. This is most likely due to the cache:clear command being run.
Currently, the cache is handled by Redis and is on the same server as the app. I was considering moving the cache to its own server/instance but this still does not help me with the root cause.
Does anyone have any ideas what might be going on here and how I can diagnose/fix it? I am guessing the cache is just getting overloaded/running out of space/leaking etc. over time but not really sure where to go from here.

Check :
The version of your redis make an update of the predis package
The version of your Laravel
Your server
I hope I gave you some solutions

Related

Redis intermittent "crash" on Laravel Horizon. Redis stops working every few weeks/months

I have an issue with Redis that effects running of Laravel Horizon Queue and I am unsure how to debug it at this stage, so am looking for some advice.
Issue
Approx. every 3 - 6 weeks my queues stop running. Every time this happens, the first set of exceptions I see are:
Redis Exception: socket error on read socket
Redis Exception: read error on connection to 127.0.0.1:6379
Both of these are caused by Horizon running the command:
artisan horizon:work redis
Theory
We push around 50k - 100k jobs through the queue each day and I am guessing that Redis is running out of resources over the 3-6 week period. Maybe general memory, maybe something else?
I am unsure if this is due to a leak wikthin my system or something else.
Current Fix
At the moment, I simply run the command redis-cli FLUSHALL to completely clear the database and we are back working again for another 3 - 6 weeks. This is obviously not a great fix!
Other Details
Currently Redis runs within the webserver (not a dedicated Redis server). I am open to changing that but it is not fixing the root cause of the issue.
Help!
At this stage, I am really unsure where to start in terms of debugging and identifing the issue. I feel that is probably a good first step!

Laravel multiple workers running job twice

I am using Laravel 5.6 and I am dispatching jobs to a queue and then using supervisor to activate 8 workers on that queue. I was expecting that Laravel will know NOT to run the same job twice but I was surprised to discover that it did.
Same job was taken cared of by more than one worker and therefore weird stuff started to happen.
The thing is that one year ago I wrote the same mechanism for another Laravel project (but on Laravel version 5.1) and the whole thing worked out of the box. I didn't have to configure anything.
Anyone can help?
Thank you.
I was having the exact same problem and It drove me crazy until I managed to solve it!
For some reason Laravel 5.6 creates the "jobs" table with engine=MyISAM which does not support transactions which are necessary for the locking mechanism that prevents the case of job runs twice. I believe that it was different in Laravel 5.1 because I also once wrote an app with Laravel 5.4 and it worked perfectly with 8 workers. When I did the same thing with Laravel 5.6 it didn't work. Same as you describe.
Once I've changed the Engine to InnoDB which supports transactions everything worked as expected and the locking mechanism started to work.
So basically all you need to do is:
ALTER TABLE jobs ENGINE = InnoDB;
Hope that will solve your misery...
$schedule->command('emails:send')->withoutOverlapping();

Laravel Flysystem & Rackspace CRON uploads fail with 401 while on-demand uploads work

Our system generates various invoices every hour and uploads those to the cloud. Also it is possible to create the invoice on demand by clicking a button on our frontend.
When manually requesting to create the said invoice then it never fails to upload.
As for the cron generated invoices after some time all uploads fail with:
Client error response
[status code] 401
[reason phrase] Unauthorized
[url] https://storage101.dfw1.clouddrive.com/v1/MossoCloudFS_930575/ <...> .pdf" # Guzzle\Http\Exception\BadResponseException->factory
Which should mean that the token has expired which it probably is. Rackspace tokens last for 24 hours but the Laravel's Storage should automatically refresh the token.
Now here for some fun facts:
1) Every time our code is deployed with Capistrano the token seems to get refreshed and cron uploads work again for some time. Important thing to notice here is that every deploy creates a new folder similar to this /releases/201605190925 pulls the code, installs dependencies from scratch and if all goes well then symlinks this folder with the one Apache is showing.
2) Laravel jobs get handled on a different process than www-data
3) It's hard to track down if after deploy the upload works more than 24 hours or not. I suspect that sometimes it works more than that. But it's hard to track down since not every hour there are invoices that need to be generated. There are more developers who deploy than just me etc.
4) When cron fails and I get the failing notification I can immediately go and successfully generate the invoice. After that the cron still keeps failing. So it seems like these two instances have different tokens stored somewhere.
5) Cache cleaning with php artisan cache:clear doesn't seem to have any impact
6) Probably have tried restarting Apache service but without results
Since it has been going on for some time now I have tested various things and even contacted Rackspace at one point but they couldn't find anything odd from their end... Reminded me simply to catch the 401 error, update the token and try again. But Laravel and Flysystem should handle it somewhere by itself
Since nobody else seems to be having similar problems with Flysystem nor Laravel nor Rackspace I suspect that it's some kind of unique problem with the cron process. I was simply hoping that soon we have a refactored version of our system ready and the problem simply goes away. As for now it's still in development and might take another month.
Don't think it's related to code but anyway here's the upload line:
Storage::put($folder . '/' . $filename, file_get_contents($filePath));
Here's our config:
'default' => 'rackspace',
'disks' => [
'rackspace' => [
'driver' => 'rackspace',
'username' => ' ... ',
'key' => ' ... ',
'container' => ' ... ',
'endpoint' => 'https://identity.api.rackspacecloud.com/v2.0/',
'region' => 'DFW',
]
]
Any thoughts on the matter are appreciated.
Here I was once again trying to find a solution to the problem almost 2.5 months later. And I think it really clicked this time when I reread #Tommmm update.
So you restarted the worker queue and I started to think about the way I was running my supervisor/workers.
php artisan queue:work database --daemon --sleep=10 tries=3
--Daemon (as it was really nicely explained in this StackOverflow post by #the-shift-exchange )
In Laravel >=4.2 there was a --daemon command added. The way it works is simply keeps running the queues directly, rather than rebooting the entire framework after every queue is processed. This is an optional command that significantly reduces the memory and cpu requirements of your queue.
The important point with the --daemon command is that when you upgrade your application, you need to specifically restart your queue with queue:restart, otherwise you could potentially get all sorts of strange errors as your queue would still have the old code in memory.
Since the application never gets booted up again I feel that the token is also never renewed. As for the rest of the application still renews the token and uses the new one. That explains why all on-click actions were working and only the background tasks were failing. As to why deploying temporarily fixed our problem was that we had queue:restart in there after every deploy.
I am quite certain that this is the case and now there are two possibilities to start with. Either you restart your workers after some time (<24h for Rackspace) or you don't run the workers together with --daemon.
Edit:
Confirmed that adding following to Kernel.php restarts daemon workers after every 6 hours.
$schedule->command('queue:restart')->cron('0 */6 * * *')
I've been having a similar issue. Documentation and error rates seem rather low for this problem. Reading here (https://github.com/rackspace/php-opencloud/issues/427) it looks like an expired token is the root cause.
The recommendation from the link above is to reauthenticate the client via
$client->authenticate();
But, the client is buried under a handful of commands. You should be able to access the underlying client via
Storage::disk('my-disk')->getDriver()->getAdapter()->getContainer()->getClient()->authenticate();
Unfortunately, I end up with a 35 curl error when trying that (SSL_CONNECT_ERROR).
There is a
hasExpired()
method available as well. If you encounter the same error, you should be able to combine this check with some sort of mechanism to restart the cron job or worker. While this is certainly far from an ideal fix, it should get you up and running. You would think this behavior would be caught and handled automatically.
5/25 Update: After spending another couple of hours trying to find a solution here, I threw in the towel and just created a cron job to restart the relevant workers every 12 hours. Supposedly the token is valid for 24 hours (according to HTTP 401 Fog::Storage::Rackspace::ServiceError), so the 12 hour window should safely keep the error from happening. While super hacky, it does keep that piece of the site up and running.

Laravel 4.1 - queue:listen performance

I am trying to use the Queue system in Laravel (4.1). Everything works as expected, with both Redis (with the native driver) and RabbitMQ.
The only "issue" I am experiencing is the poor performance. It looks like only 4 jobs per seconds can be processed (I push 1000 jobs in the queue to test it). Have you got any tip to improve the performance?
This is a old question but I thought I would post anyway. The problem is Laravel's default listener is not really a true queue consumer, it polls the queue at regular intervals unless it is already busy with a job. Using a true AMQP requires some additional libraries to be install from pecl. You can find that plugin here. I would also suggest using this composer package for you PHP AMQP library. You would then need to write your Laravel command.
Currently I'm writing a RabbitMQ handler for Laravel that will resolve this issue.
Old question, but for anyone coming here, the queue:work command has a daemon mode that runs just like queue:listen except that it doesn't have to restart/reload Laravel each time, making it much more performant. See the docs:
http://laravel.com/docs/4.2/queues

How do you troubleshoot all Apache threads becoming occupied and idle?

I have a Drupal 6 site that is frequently (about once a day) going down. The hosting provider is reporting that something in our site code is occupying all Apache threads but keeping them idle, making the server run out of threads to respond to new requests. A simple restart of Apache frees the threads and fixes the issue, though it reoccurs within a few hours or a day.
I have no idea how to troubleshoot this issue and have never come across PHP code doing this. Is there some kind of Apache settings change I can make to capture more information about what might be keeping a thread occupied but idle? What typical PHP routines can cause this behavior? I looked for code that connects to external resources, but didn't see any issues there.
Any hints for what to look at, capture more information, or PHP code that can cause this would be most useful.
With Drupal6 you could have the poormanscron module running sometimes, or even the classical cron (from crontab wget or whatever).
Then you could get one heavy cron operation putting your database under heavy stuff. Then if your database reponse time is becoming very slow every http request will become very slow (as for example sessions are in the database, and several hundreds queries are required for a drupal page). having all reqests slowing down may put all the avĂ ilable php process in a 'occupied state'.
Restarting apache all current process are stoped. If you run the cron via wget and not via drush cron tasks are a nice thing to check (running cron via drush would make it run via php-cli' restarting apache would not kill the cron). You can try a module like elysia cron to get more details on cron tasks and maybe isolate the long ones (you have a report on tasks duration).
This effect (one request hurting bad the database, all requests slowing down, no more process available) could also be done by one bad piece of code coming from any of your installed modules. This would be harder to detect.
So I would ensure slow queries are tracked on MySQL (see my.cnf otinons), then analyse theses requests with tolls like mysqsla. The problem is that sometimes one query is so big that all query becames slow. Se use time of crash te detect the first ones. Use also tho MySQL option to track queries not using indexes.
Another way to get all apache process stalled on php operation with drupal is having a lock problem. Drupal is using is own lock implementation with MySQL. You could maybe add some watchdog (drupal internal debug messages) calls on theses files to try to detect locks problems.
Then you could also have sonme external http requests calls made by drupal. Calling external websites like facebook, google, some tiny url tools, or drupal.org module update things (which always try to find all modules, even the one you write). If the distant website is down or filtering your traffic you'll have problems (but the apache restart would not help you, so it may not be that).

Resources