supervisord stopping child processes - bash

One of the problems, I face with supervisord is that when I have a command which in turn spawns another process, supervisord is not able to kill it.
For example I have a java process which when runs normally is like
$ zkServer.sh start-foreground
$ ps -eaf | grep zk
user 30404 28280 0 09:21 pts/2 00:00:00 bash zkServer.sh start-foreground
user 30413 30404 76 09:21 pts/2 00:00:10 java -Dzookeeper.something..something
The supervisord config file looks like:
[program:zookeeper]
command=zkServer.sh start-foreground
autorestart=true
stopsignal=KILL
These kind of processes which have multiple childs are not well handled by supervisord when it comes to stopping them from supervisorctl. So when I run this from the supervisord and try to stop it from supervisorctl, only the top level process gets killed but not the actual java process.

The same problem was encountered by Rick Hanlon II here: https://coderwall.com/p/4tcw7w
Option stopasgroup=true should be set in the program section for supervisord to stop not only the parent process but also the child processes.
The example is given as:
[program:some_django]
command=python manage.py runserver
directory=/dir/to/app
stopasgroup=true
Also, have in mind that you may have an older package of supervisord that does not have "stopasgroup" functionality.
I tried these Debian packages on Raspberry Pi:
supervisor_3.0a8 does not work.
supervisor_3.0b2-1 works as expected.

Doing the following early in the main bash script called by supervisord fixed the problem for me:
trap "kill -- -$$" EXIT
This kills the entire process group when the main script exits, such as when it is killed by supervisord.

A feature was recently added to supervisord to send SIGKILL to the whole process group. It's in github but not officially released yet.
If the process id is available in a file, you can use the pid-proxy program

try this supervisor program config:
stopasgroup=true
killasgroup=true
stopsignal=INT

The following article has an in-depth discussion of the problem:
http://veithen.github.io/2014/11/16/sigterm-propagation.html

You can also use priorities in /conf.d/your-configuration.conf file. For example, if you want to run zookeeper first and then kafka you can specify two programs.
Lower priority means that the program starts first and stops last.

Related

Can't terminate node(js) process without terminating ssh server in docker container

I'm using a Dockerfile that ends with a CMD ["/start.sh"]:
#!/bin/bash
service ssh start
/usr/bin/node /myApp/app.js
if for some reason i need to kill the node process, the ssh server is being closed as well (forces me to reboot the container to reconnect).
Any simple way to avoid this behavior?
Thank You.
The container exits as soon as main process of the container exits. In your case, the main process inside the container is start.sh shell script. The start.sh shell script is starting the ssh service and then running the nodejs process as child process. Once the nodejs process dies, the shell script exits as well and so the container exits. So what you can do is to put the nodejs process in background.
#!/bin/bash
service ssh start
/usr/bin/node /myApp/app.js &
# Need the following infinite loop as the shell script should not exit
while do:
sleep 2
done
I DO NOT recommend this approach though. You should have only a single process per container. Read the following answers to understand why -
Running multiple applications in one docker container
If you still want to run multiple processes inside container, there are better ways to do it like using supervisord - https://docs.docker.com/config/containers/multi-service_container/

Bash to start and kill process on Ubuntu in a given period

I have this situation: I have a script in php running on ubuntu terminal (xfce4-terminal) as a console/process (in php there is a loop with some process).
The problem is: every two days this process is killed due to memory overuse.
What I need is: A bash script that can start the process and every 48hrs it kills this process and start it again.
The optimal solution is fixing the memory leak, trace the leaking function and post a new question with the relevant code if you need help.
Now for this specific case you can use something like this:
while true
do
timeout 12h php myfile.php
done
This is a infinite loop that starts your command and kills it afer 12 hours. (or any other duration you want: 30m, 1d, etc)
A more stable solution is creating a systemd service or deploying your script using some process manager like Supervisor or Monit.
Supervisor has a config parameter "autorestart", if you specify true it restarts your script every time it crashes, and this is a stable production ready solution.
A sample supervisor config from this post
[program:are_we_there_yet]
command=php /var/www/areWeThereYet.php
numprocs=1
directory=/tmp
autostart=true
autorestart=true
startsecs=5
startretries=10
redirect_stderr=false

How do I handle stopping my service?

I've turned a program I wrote into a service, and I have a bash script that runs to start up the actual program, since there are things that have to be started in a particular order. The contents of the startup script (called with start-stop-daemon from the init.d script look like :
./rfid_reader &
sleep 2
java ReaderClass &
This works fine, but how do I go about killing them when it comes time to stop the process?
I've seen pidfiles used, do I just get the PIDs of the two programs, write them to a file, and then kill them when it comes time to shut them down, or do I have to ps -ef | grep program to get their PIDs?
I don't think killall is a good idea to do this. You'd better to record the PID of the program started in background in some file(e.g. PID_FILE) and then kill $(<$PID_FILE) to stop it.
Please refer to this thread for how to get the PID the previous started background program.
Assuming you know the name of your program, you can kill them as below :
killall -KILL program_name

Kafka in supervisor mode

I'm trying to run kafka in supervision mode so that it can start automatically in case of a shutdown. But all the examples of running kafka use shell scripts and the supervisord is not able to note which PID to monitor. Can anyone suggesthow to accomplish auto restart of kafka?
If you are on a Unix or Linux machine, then this is when /etc/inittab comes in handy. Or you might want to use daemontools. I don't know about Windows though.
We are running Kafka under Supervisord (http://supervisord.org/), it works like a charm. Run command looks like this (as specified in supervisord.conf file:
command=/usr/local/bin/pidproxy /var/run/kafka.pid /usr/lib/kafka/bin/kafka-server.sh -f -p /var/run/kafka.pid
Flag -f tells Kafka to start in foreground. If flag -p is set, Kafka process PID is written into specified file.
The command pidproxy is a part of Supervisord distribution. Upon receiving KILL signal, it reads PID from specified file, and forwards the signal to the corresponding process.

Script which launches another application will bring it down on exit

I have a script which does launch another application using nohup my_app &, but when the initial script dies the launched process also goes down. As per my understanding since since it has been ran with nohup that should not happen. The original script also called with nohup.
What went wrong there?
A very reliable script that has been used successfully for years, and has always terminated after invoking a nohup uses this construct:
nohup ${BinDir}/${Watcher} >${DataDir}/${Watcher}.nohup.out 2>&1 &
Perhaps the problem is that output is not being managed?
nohup does not mean that a (child) process is still running when the (parent) process is killed. nohup is used f.e. when you're connecting over ssh to a server and there starting a process. If you log out, the process will terminate (logging out sents the signal SIGHUP to the process causing the process to terminate), using nohup avoid this behaviour and you're process is still running when you logged out.
If you need a program which runs in the background even it's parent process has terminated try using daemons.
It depends what my-app does - it might set its own signal mask. You probably know that nohup ignores the hang-up signal SIGHUP, and this is inherited by the target program. If that target program does its own signal handling then it might be setting SIGHUP to, for example SIG_DFT - the default action (which is to die).
To check, run strace -f -o out or truss -f -o out on the command. This will give you all the kernel calls in the file called 'out'. You should be able to spot the signal mask being changed if it is.

Resources