shell: clean up leaked background processes which hang due to shared stdout/stderr - shell

I need to run essentially arbitrary commands on a (remote) shell in ephemeral containers/VMs for a test execution engine. Sometimes these leak background processes which then cause the entire command to hang. This can be boiled down to this simple command:
$ sh -c 'sleep 30 & echo payload'
payload
$
Here the backgrounded sleep 30 plays the role of a leaked process (which in reality will be something like dbus-daemon) and the echo is the actual thing I want to run. The sleep 30 & echo payload should be considered as an atomic opaque example command here.
The above command is fine and returns immediately as the shell's and also sleep's stdout/stderr are a PTY. However, when capturing the output of the command to a pipe/file (a test runner wants to save everything into a log, after all), the whole command hangs:
$ sh -c 'sleep 30 & echo payload' | cat
payload
# ... does not return to the shell (until the sleep finishes)
Now, this could be fixed with some rather ridiculously complicated shell magic which determines the FDs of stdout/err from /proc/$$/fd/{1,2}, iterating over ls /proc/[0-9]*/fd/* and killing every process which also has the same stdout/stderr. But this involves a lot of brittle shell code and expensive shell string comparisons.
Is there a way to clean up these leaked background processes in a more elegant and simpler way? setsid does not help:
$ sh -c 'setsid -w sh -c "sleep 30 & echo payload"' | cat
payload
# hangs...
Note that process groups/sessions and killing them wholesale isn't sufficient as leaked processes (like dbus-daemon) often setsid themselves.
P.S. I can only assume POSIX shell or bash in these environments; no Python, Perl, etc.
Thank you in advance!

We had this problem with parallel tests in Launchpad. The simplest solution we had then - which worked well - was just to make sure that no processes share stdout/stdin/stderr (except ones where you actually want to hang if they haven't finished - e.g. the test workers themselves).

Hmm, having re-read this I cannot give you the solution you are after (use systemd to kill them). What we came up with is to simply ignore the processes but reliably not hang when the single process we were waiting for is done. Note that this is distinctly different from the pipes getting closed.
Another option, not perfect but useful, is to become a local reaper with prctl(2) and PR_SET_CHILD_SUBREAPER. This will allow you to be the parent of all the processes that would otherwise reparent to init. With this arrangement you could try to kill all the processes that have you as ppid. This is terrible but it's the closest best thing to using cgroups.
But note, that unless you are running this helper as root you will find that practical testing might spawn some setuid thing that will lurk and won't be killable. It's an annoying problem really.

Use script -qfc instead of sh -c.

Related

Bash script is waiting to open second file in gedit until I close the first one [duplicate]

When running commands from a bash script, does bash always wait for the previous command to complete, or does it just start the command then go on to the next one?
ie: If you run the following two commands from a bash script is it possible for things to fail?
cp /tmp/a /tmp/b
cp /tmp/b /tmp/c
Yes, if you do nothing else then commands in a bash script are serialized. You can tell bash to run a bunch of commands in parallel, and then wait for them all to finish, but doing something like this:
command1 &
command2 &
command3 &
wait
The ampersands at the end of each of the first three lines tells bash to run the command in the background. The fourth command, wait, tells bash to wait until all the child processes have exited.
Note that if you do things this way, you'll be unable to get the exit status of the child commands (and set -e won't work), so you won't be able to tell whether they succeeded or failed in the usual way.
The bash manual has more information (search for wait, about two-thirds of the way down).
add '&' at the end of a command to run it parallel.
However, it is strange because in your case the second command depends on the final result of the first one. Either use sequential commands or copy to b and c from a like this:
cp /tmp/a /tmp/b &
cp /tmp/a /tmp/c &
Unless you explicitly tell bash to start a process in the background, it will wait until the process exits. So if you write this:
foo args &
bash will continue without waiting for foo to exit. But if you don't explicitly put the process in the background, bash will wait for it to exit.
Technically, a process can effectively put itself in the background by forking a child and then exiting. But since that technique is used primarily by long-lived processes, this shouldn't affect you.
In general, unless explicitly sent to the background or forking themselves off as a daemon, commands in a shell script are serialized.
They wait until the previous one is finished.
However, you can write 2 scripts and run them in separate processes, so they can be executed simultaneously. It's a wild guess, really, but I think you'll get an access error if a process tries to write in a file that's being read by another process.
I think what you want is the concept of a subshell. Here's one reference I just googled: http://www.linuxtopia.org/online_books/advanced_bash_scripting_guide/subshells.html

Terminating multiple background processes in bash?

I'm trying to dump trade-data off binance for multiple symbol-pairs, e.g. doge/btc, ada/btc, etc.
I can background, thus:
wscat -c wss://stream.binance.com:9443/ws/dogebtc#trade > doge.txt &
wscat -c wss://stream.binance.com:9443/ws/adabtc#trade > ada.txt &
But how to terminate them all?
Is there some smart way, like terminating the parent process?
I think the right answer depends a lot on the way your current system is implemented / used.
At the most basic scripting level, you could simply run kill against all wscat processes; but that may be too generic depending on the details.
Slightly better, in a BASH script, directly after creating these processes you'd have access to their PID as $!. You could stash those PIDs in a variable or file and later use them to kill each individual process.
If you're aiming for something slicker than that, you'd likely want to look into things like:
the SIGCHLD signal, becoming a subreaper (prctl PR_SET_CHILD_SUBREAPER), running as PID 1 in a PID-namespace (unshare --pid ...), things like that.

How to create an anonymous pipe between 2 child processes and know their pids (while not using files/named pipes)?

Please note that this questions was edited after a couple of comments I received. Initially I wanted to split my goal into smaller pieces to make it simpler (and perhaps expand my knowledge on various fronts), but it seems I went too far with the simplicity :). So, here I am asking the big question.
Using bash, is there a way one can actually create an anonymous pipe between two child processes and know their pids?
The reason I'm asking is when you use the classic pipeline, e.g.
cmd1 | cmd2 &
you lose the ability to send signals to cmd1. In my case the actual commands I am running are these
./my_web_server | ./my_log_parser &
my_web_server is a basic web server that dump a lot of logging information to it's stdout
my_log_parser is a log parser that I wrote that reads through all the logging information it receives from my_web_server and it basically selects only certain values from the log (in reality it actually stores the whole log as it received it, but additionally it creates an extra csv file with the values it finds).
The issue I am having is that my_web_server actually never stops by itself (it is a web server, you don't want that from a web server :)). So after I am done, I need to stop it myself. I would like for the bash script to do this when I stop it (the bash script), either via SIGINT or SIGTERM.
For something like this, traps are the way to go. In essence I would create a trap for INT and TERM and the function it would call would kill my_web_server, but... I don't have the pid and even though I know I could look for it via ps, I am looking for a pretty solution :).
Some of you might say: "Well, why don't you just kill my_log_parser and let my_web_server die on its own with SIGPIPE?". The reason why I don't want to kill it is when you kill a process that's at the end of the pipeline, the output buffer of the process before it, is not flushed. Ergo, you lose stuff.
I've seen several solutions here and in other places that suggested to store the pid of my_web_server in a file. This is a solution that works. It is possible to write the pipeline by fiddling with the filedescriptors a bit. I, however don't like this solution, because I have to generate files. I don't like the idea of creating arbitrary files just to store a 5-character PID :).
What I ended up doing for now is this:
#!/bin/bash
trap " " HUP
fifo="$( mktemp -u "$( basename "${0}" ).XXXXXX" )"
mkfifo "${fifo}"
<"${fifo}" ./my_log_parser &
parser_pid="$!"
>"${fifo}" ./my_web_server &
server_pid="$!"
rm "${fifo}"
trap '2>/dev/null kill -TERM '"${server_pid}"'' INT TERM
while true; do
wait "${parser_pid}" && break
done
This solves the issue with me not being able to terminate my_web_server when the script receives SIGINT or SIGTERM. It seems more readable than any hackery fiddling with file descriptors in order to eventually use a file to store my_web_server's pid, which I think is good, because it improves the readability.
But it still uses a file (named pipe). Even though I know it uses the file (named pipe) for my_web_server and my_log_parser to talk (which is a pretty good reason) and the file gets wiped from the disk very shortly after it's created, it's still a file :).
Would any of you guys know of a way to do this task without using any files (named pipes)?
From the Bash man pages:
! Expands to the process ID of the most recently executed back-
ground (asynchronous) command.
You are not running a background command, you are running process substitution to read to file descriptor 3.
The following works, but I'm not sure if it is what you are trying to achieve:
sleep 120 &
child_pid="$!"
wait "${child_pid}"
sleep 120
Edit:
Comment was: I know I can pretty much do this the silly 'while read i; do blah blah; done < <( ./my_proxy_server )'-way, but I don't particularly like the fact that when a script using this approach receives INT or TERM, it simply dies without telling ./my_proxy_server to bugger off too :)
So, it seems like your problem stems from the fact that it is not so easy to get the PID of the proxy server. So, how about using your own named pipe, with the trap command:
pipe='/tmp/mypipe'
mkfifo "$pipe"
./my_proxy_server > "$pipe" &
child_pid="$!"
echo "child pid is $child_pid"
# Tell the proxy server to bugger-off
trap 'kill $child_pid' INT TERM
while read
do
echo $REPLY
# blah blah blah
done < "$pipe"
rm "$pipe"
You could probably also use kill %1 instead of using $child_pid.
YAE (Yet Another Edit):
You ask how to get the PIDS from:
./my_web_server | ./my_log_parser &
Simples, sort of. To test I used sleep, just like your original.
sleep 400 | sleep 500 &
jobs -l
Gives:
[1]+ 8419 Running sleep 400
8420 Running | sleep 500 &
So its just a question of extracting those PIDS:
pid1=$(jobs -l|awk 'NR==1{print $2}')
pid2=$(jobs -l|awk 'NR==2{print $1}')
I hate calling awk twice here, but anything else is just jumping through hoops.

How can I create a process in Bash that has zero overhead but which gives me a process ID

For those of you who know what you're talking about I apologise for butchering the way that I'm going to phrase this question. I know nothing about bash whatsoever. With that caveat out of the way, let me get out my cleaver...
I am building a Rails app which has what's called a procfile which sets up any processes that need to be run in different environments
e.g.
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
redis: redis-server
worker: bundle exec sidekiq
proxylocal: bin/proxylocal_local
Each one of these lines specs a process to be run. It also expects a pid to be returned after the process spins up. The syntax is
process_name: process_invokation_script
However the last process, proxylocal, only actually starts a process in development. In production it doesn't do anything.
Unfortunately that causes the Procfile to choke as it needs a process ID returned. So is there some super-simple, zero-overhead process that I can spawn in that case just to keep the procfile happy?
The sleep command does nothing for a specified period of time, with very low overhead. Give it an argument longer than your code will run.
For example
sleep 2147483647
does nothing for 231-1 seconds, just over 68 years. I picked that number because any reasonable implementation of sleep should be able to handle it.
In the unlikely event that that doesn't work (say if you're on an old 16-bit system that can't sleep for more than 216-1 seconds), you can do a sleep in an infinite loop:
sh -c 'while : ; do sleep 30000 ; done'
This assumes that you need the process to run for a very long time; that depends on what your application needs to do with the process ID. If it's required to be unique as long as the application is running, you need something that will continue to run for a long time; if the process terminates, its PID can be re-used by another process.
If that's not a requirement, you can use sleep 0 or true, which will terminate immediately.
If you need to give the application a little time to get the process ID before the process terminates, something like sleep 10 or even sleep 1 might work, though determining just how long it needs to run can be tricky and error-prone.
If Heroku isn't doing anything with proxylocal I'm not sure why you'd even want this in your Procifle. I'm also a bit confused about whether you want to change the Procfile or what bin/proxylocal_local does and how you would even do that.
That being said, if you are able to do anything you like for production your script can just call cat and it will create a pid and then just sit waiting for the next command (which never comes).
For truly minimal overhead, you don't want to run any external commands. When the shell starts a command, it first forks itself, then the child shell execs the external command. If the forked child can run a builtin, you can skip the exec.
Start by creating a read-only fifo somewhere.
mkfifo foo
chmod 400 foo
Then, whenever you need a do-nothing process, just fork a shell which tries to read from the fifo. It's read-only, so no one can write to it, so all reads will block.
read < foo &

Is it possible for bash commands to continue before the result of the previous command?

When running commands from a bash script, does bash always wait for the previous command to complete, or does it just start the command then go on to the next one?
ie: If you run the following two commands from a bash script is it possible for things to fail?
cp /tmp/a /tmp/b
cp /tmp/b /tmp/c
Yes, if you do nothing else then commands in a bash script are serialized. You can tell bash to run a bunch of commands in parallel, and then wait for them all to finish, but doing something like this:
command1 &
command2 &
command3 &
wait
The ampersands at the end of each of the first three lines tells bash to run the command in the background. The fourth command, wait, tells bash to wait until all the child processes have exited.
Note that if you do things this way, you'll be unable to get the exit status of the child commands (and set -e won't work), so you won't be able to tell whether they succeeded or failed in the usual way.
The bash manual has more information (search for wait, about two-thirds of the way down).
add '&' at the end of a command to run it parallel.
However, it is strange because in your case the second command depends on the final result of the first one. Either use sequential commands or copy to b and c from a like this:
cp /tmp/a /tmp/b &
cp /tmp/a /tmp/c &
Unless you explicitly tell bash to start a process in the background, it will wait until the process exits. So if you write this:
foo args &
bash will continue without waiting for foo to exit. But if you don't explicitly put the process in the background, bash will wait for it to exit.
Technically, a process can effectively put itself in the background by forking a child and then exiting. But since that technique is used primarily by long-lived processes, this shouldn't affect you.
In general, unless explicitly sent to the background or forking themselves off as a daemon, commands in a shell script are serialized.
They wait until the previous one is finished.
However, you can write 2 scripts and run them in separate processes, so they can be executed simultaneously. It's a wild guess, really, but I think you'll get an access error if a process tries to write in a file that's being read by another process.
I think what you want is the concept of a subshell. Here's one reference I just googled: http://www.linuxtopia.org/online_books/advanced_bash_scripting_guide/subshells.html

Resources