I have a Laravel app and have several Daemon Queue Workers running on a server consuming messages from SQS. I know memory is a concern here but I'm having trouble figuring out how to free up memory during / between each job. if I leave the workers running long enough, the memory usage continues to build up.
Any suggestions on how to handle this?
Related
We are running Laravel 7 and Horizon 4.3.5. Horizon runs with Supervisor.
We have 10 different queues configured, but workers responsible for one particular queue constantly dies without any output. After restarting Horizon, I can see these workers up and running for several seconds via top and ps commands. Then they are gone.
I checked supervisor's stdout_logfile: nothing suspicious there. I can see Jobs related to this queue are being processed successfully. Each worker processes exactly 2 jobs before crash.
I checked supervisor's stderr_logfile, but it's empty.
Laravel logs and failed_jobs table both are empty.
I even checked syslog, but nothing related there.
There are no problems with other queues at all. Only this particular queue keeps piling up: jobs are being pushed to queue by application, but never processed until I restart Horizon.
There are lot of free space on disk, free RAM, CPU usage is low.
Worker command: /usr/bin/php7.4 artisan horizon:work redis --delay=0 --memory=128 --queue=main --sleep=3 --timeout=1800 --tries=1 --supervisor=php01-Mexm:business
Turned out it was Out Of Memory problem. We had one job in this queue which caused crash.
Still not sure why logs were empty. Probably there wasn't enough memory to log anything.
I am new to Go and I am using go routines in my app in Heroku, which are long (up to 7 minutes), and cannot be interrupted.
I saw that the auto scaler sometimes kills the Heroku dyno which is running the routine. I need a way of running this routine independently from the dynos so I know that it will not get shutdown. I read articles and still don't understand how to perform a go routine in a background worker. It is hard for me to believe I am the only one experiencing this.
My go routines use my redis database.
Could someone please point me to an example of how to setup a background worker in heroku for go and how to send my go routine to that worker?
Thank you very much
I need a way of running this routine independently from the dynos so I
know that it will not get shutdown.
If you don't want to run your worker code on a dyno then you'll need to use a different provider from Heroku, like Amazon AWS, Digital Ocean, Linode etc.
Having said that, you should design your workers, especially those that are mission critical, to be able to recover from a shutdown. Either to be able to continue where they left off or to start over. Heroku's dyno manager restarts the dynos at least once a day but I wouldn't be surprised if the other cloud providers also restart their virtual instances once in a while, probably not once a day but still... And even if you decide to deploy your workers on a physical machine that you control and never turn off, you cannot prevent things like hardware failure or power outage from happening.
If your workers need to perform some task till it's done you need to make them be aware of possible shutdowns and have them handle such scenarios gracefully. Do not ever rely on a machine, physical or virtual, to keep running while your worker is doing it's job.
For example if you're on Heroku, use a worker dyno and make your worker listen for the SIGTERM signal, after your worker receives such a signal...
The application processes have 30 seconds to shut down cleanly
(ideally, they will do so more quickly than that). During this time
they should stop accepting new requests or jobs and attempt to finish
their current requests, or put jobs back on the queue for other worker
processes to handle. If any processes remain after that time period,
the dyno manager will terminate them forcefully with SIGKILL.
... continue reading here.
But keep in mind, as I mentioned earlier, if there is an outage and Heroku goes down, which is something that happens from time to time, your worker won't even have those 30 seconds to clean up.
This is in relation to my previous post (here) regarding the OOM I'm experiencing on a driver after running some Spark steps.
I have a cluster with 2 nodes in addition to the master, running the job as client. It's a small job that is not very memory intensive.
I've paid particular attention to the hadoop processes via htop, they are the user generated ones and also the highest memory consumers. The main culprit is the amazon.emr.metric.server process, followed by the state pusher process.
As a test I killed the process, the memory as shown by Ganglia dropped quite drastically whereby I was then able to run 3-4 consecutive jobs before the OOM happened again. This behaviour repeats if I manually kill the process.
My question really is regarding the default behaviour of these processes and whether what I'm witnessing is the norm or whether something crazy is happening.
I have a web app on heroku which all the time is using around 300% of the allowed RAM (512 MB). I see my logs full of Error R14 (Memory quota exceeded) [an entry every second]. Although in bad condition, my app still works.
Apart from degraded performance, are there any other consequences also which I should be aware of ( like heroku be charging extra for anything related to this issue, scheduled jobs might fail etc) ?
To the best of my knowledge Heroku will not take action even if you continue to exceed the memory requirements. However, I don't think the availability of the full 1 GB of overage (out of the 1.5 GB that you are consuming) is guaranteed, or is guaranteed to be physical memory at all times. Also, if you are running close to 1.5 GB, then you risk going over the hard 1.5 GB limit at which point your dyno will be terminated.
I also get the following every time I run a specific task on my Heroku app and check heroku logs --tail:
Process running mem=626M(121.6%)
Error R14 (Memory quota exceeded)
My solution would be to check out Celery and Heroku's documentation on this.
Celery is an open source asynchronous task queue, or job queue, which makes it very easy to offload work out of the synchronous request lifecycle of a web app onto a pool of task workers to perform jobs asynchronously.
I'm running an eventmachine process on heroku, and it seems to be hitting their memory limit of 512MB after an hour or so. I start seeing messages like this:
Error R14 (Memory quota exceeded)
Process running mem=531M(103.8%)
I'm running a lot of events through the reactor, so I'm thinking maybe the reactor is getting backed up (I'm imagining it as a big queue)? But there could be some other reason, I'm still fairly new to eventmachine.
Are there any good ways to profile eventmachine and and get some stats on it. As a simple example, I was hoping to see how many events were scheduled in the queue to see if it was getting backed up and keeping those all in memory. But if anyone has other suggestions I'd really appreciate it.
Thanks!
I use eventmachine extensively and never ran into any memory leak inside the reactor so your bet is that the ruby code is but without knowing more about your application it is hard to give you a real answer.
The only queue I can think of right now is the thread pool, each time you use the defer method the block is either given to a free thread or queued waiting for a free thread, I suppose if all your threads are blocking waiting for something the queue could grow and use all the memory available.
The leak turned out to be in Mongoid's identity_map (nothing to do with EventMachine). Setting Mongoid.identity_map_enabled = false at the beginning of the event machine process resolved it.