I have celery running with supervisor. I am running two ec2 instances through a load balancer on AWS. One instance works while the second one gets an error when I tried running supervisorctl restart all and I got
celerybeat: stopped
celeryd: stopped
celeryd_high: stopped
flowerd: ERROR (spawn error)
celeryd: started
celerybeat: started
celeryd_high: started
checking the status i get
flowerd FATAL Exited too quickly (process log may have details)
my flowerd looks
[program:flowerd]
command=/opt/django/www/bin/python /opt/django/www/src/manage.py celery flower -A site_aggrigator -l info --address=0.0.0.0
stdout_logfile=/var/log/django/flowerd.log
stderr_logfile=/var/log/django/flowerd.log
autostart=true
autorestart=true
startsecs=10
stopwaitsecs=600
If I make changes to one instance will it affect the other? Also I get this same log on both instances. Maybe its not the issue.
UPDATE EDIT #1: I set the startsec of flowerd to 0 and it runs fine. The only problem is that the EW2 instance spikes.
Related
Our application provides a separate database for each of our users. I have set up an emailing job which users may dispatch to run in background via a Laravel 5.3 Queue. Some users succeed in evoking this facility (so it does work - same codebase) but some users fail.
The users who fail to generate the email are all characterised by the following error when trying to restart all user queues using sudo supervisor start all, eg:
shenstone-worker:shenstone-worker_02: ERROR (spawn error)
shenstone-worker:shenstone-worker_00: ERROR (spawn error)
shenstone-worker:shenstone-worker_01: ERROR (spawn error)
An example of a user who's email facility works:
meadowwood-worker:meadowwood-worker_02: started
meadowwood-worker:meadowwood-worker_00: started
meadowwood-worker:meadowwood-worker_01: started
The log of all attempted restarts has a load of these spawn errors at the beginning then all the successful queue restarts at the end.
The worker config files for these two users are:
[program:shenstone-worker]
process_name=%(program_name)s_%(process_num)02d
directory=/var/www/solar3/current
environment=APP_ENV="shenstone"
command=php artisan queue:work --tries=1 --timeout=300
autostart=true
autorestart=true
user=root
numprocs=3
redirect_stderr=true
stdout-logfiles=/var/www/solar3/storage/logs/shenstone-worker.log
and
[program:meadowwood-worker]
process_name=%(program_name)s_%(process_num)02d
directory=/var/www/solar3/current
environment=APP_ENV="meadowwood"
command=php artisan queue:work --tries=1 --timeout=300
autostart=true
autorestart=true
user=root
numprocs=3
redirect_stderr=true
stdout-logfiles=/var/www/solar3/storage/logs/meadowwood-worker.log
As you see, generically identical. Yet shenstone does not restart its queues to capture requests from its jobs table, but meadowwood does. No logfiles appear in storage.
So why do some of these queues restart successfully, and a load don't?
Looking at the stackoverflow issue Running multiple Laravel queue workers using Supervisor inspired me to run sudo supervisorctl status and yes I can see a more elaborate explanation of my problem:
shenstone-worker:shenstone-worker_00 FATAL too many open files to spawn 'shenstone-worker_00'
shenstone-worker:shenstone-worker_01 FATAL too many open files to spawn 'shenstone-worker_01'
shenstone-worker:shenstone-worker_02 FATAL too many open files to spawn 'shenstone-worker_02'
As opposed to:
meadowwood-worker:meadowwood-worker_00 RUNNING pid 32459, uptime 0:51:52
meadowwood-worker:meadowwood-worker_01 RUNNING pid 32460, uptime 0:51:52
meadowwood-worker:meadowwood-worker_02 RUNNING pid 32457, uptime 0:51:52
But I still cannot see what I can do to resolve the issue.
If you haven't come up with any other ideas, increasing the limit of open files in your server might help, maybe. Check this for instance
https://unix.stackexchange.com/questions/8945/how-can-i-increase-open-files-limit-for-all-processes
However, having many files open might affect the performance of your system and you should check why is it happening and prevent if you can.
Thanks Armando.
Yes, you’re right: open files is the issue. I did a blitz on the server to increase these, but ran into problems with MySQL, so I backed out my config changes.
We host over 200 customers ech with their own .env. Each of these have their own workers.
I’ll revisit the problem sometime.
I am having a hard time figuring out why it's not running in supervisor but works fine when running it on project.
When I try to run
php artisan queue:work redis
on my project and it returns
but if I run it via supervisor, I getting this log
this is my laravel-worker program name inside /etc/supervisor/conf.d
Thank you!
I see that supervisor will launch 8 worker processes that is different from running 1 process in first way.
You can try specify numprocs = 1, if it works.
Please check the code logic.
Horizon runs fine but only recently, after a deploy, supervisor and queue workers do not start back up again with Horizon GUI showing "Inactive"
To get them running again I can:
restart the daemon worker from within forge
restart the supervisor /etc/init.d/supervisor restart
My deploy script has php artisan horizon:terminate within it. I have also tried reset/purge and a combination thereof.
When I run terminate in the command with an inactive horizon, it seems to do nothing. When I run the same command with horizon active, it shuts it down but the daemon does not reboot supervisor.
The daemon runs without any errors throughout all of this.
Should terminate take down and bring up the service or is it the daemon itself?
Running horizon:terminate will kill the daemon, when the daemon is killed supervisor will realize this and boot up a new daemon. You can clearly see this if you monitor your server with htop while running terminate command.
If a long running job is running, it will run the current job until it finishes. Terminate in general is to reboot the process, to be certain the new code is loaded into horizon, this should be done after the last step in envoyer or similar deployment tool.
This seems like there is something wrong in your setup. Does the horizon process run before you call terminate, again check htop?. Or what happens when the command is called manually?
I have met a problem.
I used Jenkins to install haproxy and start the service, but after the job complete, the executor is free, and the haproxy daemon also disappear.
if I use sleep 30s after the service start, and the haproxy service will also alive at the 30s, after that, the haproxy daemon will down.
This behaviour is by design, as explained in ProcessTreeKiller. To avoid daemons spawned by the Jenkins build being terminated, add
export BUILD_ID=dontKillMe
to the beginning of your shell step.
I know how to sentry start.
But when I change the sentry.conf.py, how can I make it work?
I run sentry help and can not find sentry stop or restart commond.
Is there a way to restart the sentry server?
I just ran into this problem myself. I was using supervisor to start my sentry server, and for some reason it was not killing sentry when I stopped supervisor. To fix this, I ran sudo netstat -tulpn | grep 9000 to find the process id that was still running. For me, it was gunicorn. Kill that process then start the server again and your new settings should take effect.
I'm using systemctl to manage sentry.
Firstly, create an executable file. run_worker
#!/bin/bash
source ~/.sentry/bin/activate
SENTRY_CONF=~/sentry sentry run worker>/var/log/sentry_worker.log 2>&1
Then, create service files. just like:
[Service]
ExecStart={YourPath}/sentry/run_worker
Restart=always
StartLimitInterval=0
[Install]
WantedBy=default.target
Create sentry_web.service sentry_cron.service likewise and use
systemctl --user restart sentry_*
to restart.
If you are running your workers using supervisor, just run the commands to restart all the workers:
supervisorctl
restart all
Or if you want to restart single worker enter:
supervisorctl
status
to get the list of the workers and use:
restart worker_name
It will restart the sentry process and enable your new configurations.