why is this simple bash trap failing - bash

I'm still pretty new to bash scripting, and I'm having a hard time figuring out why this simple trap is not working as expected.
Goal - create an optional waiting period that can be skipped by pressing CTRL+C.
Expected result of pressing CTRL+C - immediately echo "No time for napping!" and exit.
Actual result of pressing CTRL+C - immediately echo "naptime over." and exit.
#!/bin/bash
nonap() {
echo "No time for napping!"
exit
}
trap nonap INT
echo "Sleeping for 5 seconds, hit ctrl-c to proceed now."
sleep 5
echo "Naptime over."
Why is my trap function not invoked?

I just tried it (on an ancient RHEL Linux with bash 3.2.25) and saved your code in trap.sh, ran bash trap.sh, and got:
Sleeping for 5 seconds, hit ctrl-c to proceed now.
followed by:
No time for napping!
when I interrupted, as you expected. When I let it run without interrupting, I got the expected message:
Naptime over.
You then commented:
At least I know it should work as expected. I'm using the latest version of Tinycore Linux with GNU bash, version 4.0.33(1)-release (i686-pc-linux-gnu). Upon opening a new terminal, declare -f nonap and trap both return no output. After running this script and getting the "Naptime over." output, trap returns trap -- 'nonap' SIGINT and declare -f nonap returns the function as defined in my script.
To which I responded:
How are you running this script, then? Using source or . to read it? Ah, yes; you must be. I just tried that, and the interrupt while sourcing gave me Naptime over.; typing another interrupt though gave me No time for napping! and the shell exited. The second time it behaved as expected; I'm not sure what's up with the interrupt while dotting the script. That is unexpected behaviour.
Why did you want to source or dot this? Why not just use it as a plain old script?
No reason to source...I was just using that to run it while testing. I guess I've never run into any anomalies like this when using . before, but I'm still a newb too. I am seeing the same results you are, and it works as expected on the first interrupt if I run it with bash.
Well, there's a "Doctor, Doctor, it hurts when I hit my head against the wall" component to the following advice, but there's also basic pragmatism in there too.
You use source (in C shell or bash) or . (in Bourne, Korn, POSIX shells or bash) to have the script affect the environment of the invoking shell, rather than running as a sub-shell. The giveaway to solving the problem (albeit largely by fluke) was when you reported that after running the script, you had the function defined; that can't happen unless you were using source. In this case, it is fairly clear that you do not want the trap set in the calling shell. When I ran it (from a ksh with prompt Toru JL:), I got:
Toru JL: bash
bash-3.2$ trap
bash-3.2$ source trap.sh
Sleeping for 5 seconds, hit ctrl-c to proceed now.
Naptime over.
bash-3.2$ trap
trap -- 'nonap' INT
bash-3.2$ No time for napping!
Toru JL:
The 'No time for napping!' message appeared when I hit the interrupt key again, and it terminated the bash I'd run. If you continue to use it with source, you would want to add trap INT to the end of the script, and you might also want to undefine the the function.
However, you are much better off isolating it all in a shell and running it as a sub-process, I think.
But...your finding that this sort of thing plays funny games when the script is sourced is interesting. It's a minor anomaly in the behaviour of bash. I'm not sure it rises to the level of 'bug'; I'd have to read a lot of manual rather carefully (probably several times) and consult with other knowledgeable people before claiming 'bug'.
I'm not sure it will be any consolation, but I tried ksh on your script with . and it worked as we'd both expect:
Toru JL: ksh
$ . trap.sh
Sleeping for 5 seconds, hit ctrl-c to proceed now.
No time for napping!
Toru JL:

Related

Why does bash "forget" about my background processes?

I have this code:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[#]}"; do
wait "$pid"
done
I expect the following behavior:
spin through the first loop
wait about a second on the first pid
spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
POSIX says:
The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:
$ getconf CHILD_MAX
13195
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep is executed in the background via a fork+exec.
At some point, sleep exits leaving behind a zombie.
That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .
What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.

How to immediately trap a signal to an interactive Bash shell?

I try to send a signal from one terminal A to another terminal B. Both run an interactive shell.
In terminal B, I trap signal SIGUSR1 like so :
$ trap 'source ~/mycommand' SIGUSR1
Now in terminal A I send a signal like so :
$ kill -SIGUSR1 pidOfB
Unfortunately, nothing happens in B. If I want to have my command executed, I need to switch to B and either input a new command or press enter.
How can I avoid this drawback and immediately execute my command instead ?
EDIT :
It's important to note that I want to interact directly with the interactive shell in terminal B from terminal A.
For this reason, every solution where the trap command would be executed in a subshell would not work for me...
Also, terminal B must stay interactive.
The shell may simply be stuck in a blocking read, waiting for command-line input. Hitting enter causes the handler to execute before the entered command. Running a non-blocking command like wait:
$ sleep 60 & wait
then sending the signal causes wait to terminate immediately, followed by the output of the handler.
Based on the answers and my numerous attempt to solve this, I don't think it's possible to catch a trap signal immediately in an interactive bash terminal.
For it to trigger, there must be an interaction from the user.
This is due to the readline program blocks until a newline is entered. And there is no way to stop this read.
My solution is to use dtach, a small program that emulate the detach feature of screen.
This program can run a fully interactive shell and features in its last version a way to communicate via a custom socket to this shell (or whatever program you launch)
To start a new dtach session running an interactive bash, in terminal B :
$ dtach -a /tmp/MySocket bash -i
Now from terminal A, we can send a message to the bash session in terminal B like so :
$ echo 'echo hello' | dtach -p /tmp/MySocket
In terminal B, we now see :
$ echo hello
hello
To expand on that if I now do in terminal A :
$ trap 'echo "cd $(pwd)" | dtach -p /tmp/MySocket' DEBUG
I'll have the directory of the two terminals synced
PS :I'd still like to know if there is a way to do this in pure bash
I use a similar trap so that periodically I can (from a separate cron job) force all idle bash processes to do a 'history -a'. I found that if I trap SIGALRM instead of SIGUSR1, then the bash blocking read seems not to be a problem: the trap runs now, rather than next time one hits return. I tried SIGINT, but that caused an annoying "^C", followed by a new prompt line, to be displayed. I haven't yet found any drawbacks of using SIGALRM, but perhaps they will arise.
It may be buffering.
As a test, try installing a loop trigger. In window A:
{ trap 'ls' USR1; while sleep 1; do echo>/dev/null;done } &
[1] 7316
in window B:
kill -usr1 7316
back in window A the ls is firing when the loop does an echo.
Don't know if that will help, but it's something.

Killing Subshell with SIGTERM

I'm sure this is really simple, but it's biting me in the face anyway, and I'm a little frustrated and stumped.
So, I have a script which I've managed to boil down to:
#!/bin/sh
sleep 50 | echo
If I run that at the command line, and hit Ctrl-C it stops, like I would expect.
If I send it sigint, using kill, it does nothing.
I thought that was strange, since I thought those should have been the same.
Then, if I send it sigterm, then it also dies, but if I look in ps, the sleep is still running.
What am I missing, here?
This is obviously not the real script, which runs python, and it's more of a problem when it keeps running after start-stop-daemon tries to kill the daemon.
Help me people. I'm dumb.
The reason this happens is that the Ctrl-C is delivered to the sleep process, whereas the sigint you are sending is delivered only to the script itself. See Child process receives parent's SIGINT for details on this.
You can verify this yourself by using strace -p when hitting ctrl-c or sending sigint; strace will tell you what signals are delivered.
EDIT: I don't think you are dumb. Processes and how they work are seemingly simple, but the details are often complicated, and even experts get confused by this sort of thing.
I did the same thing I written script named as test.sh with below containt.
#!/bin/sh
sleep 50 | echo
After executing , I did Ctrl-C -> its working fine means closing it.
Again executed and in another terminal i checked the PID by ps -ef|grep test.sh after finding the pid , i did kill <pid> and it killed the process , to verify again i executed ps -ef|grep test.sh and didnt get any pid.

How to make bash interpreter stop until a command is finished?

I have a bash script with a loop that calls a hard calculation routine every iteration. I use the results from every calculation as input to the next. I need make bash stop the script reading until every calculation is finished.
for i in $(cat calculation-list.txt)
do
./calculation
(other commands)
done
I know the sleep program, and i used to use it, but now the time of the calculations varies greatly.
Thanks for any help you can give.
P.s>
The "./calculation" is another program, and a subprocess is opened. Then the script passes instantly to next step, but I get an error in the calculation because the last is not finished yet.
If your calculation daemon will work with a precreated empty logfile, then the inotify-tools package might serve:
touch $logfile
inotifywait -qqe close $logfile & ipid=$!
./calculation
wait $ipid
(edit: stripped a stray semicolon)
if it closes the file just once.
If it's doing an open/write/close loop, perhaps you can mod the daemon process to wrap some other filesystem event around the execution? `
#!/bin/sh
# Uglier, but handles logfile being closed multiple times before exit:
# Have the ./calculation start this shell script, perhaps by substituting
# this for the program it's starting
trap 'echo >closed-on-calculation-exit' 0 1 2 3 15
./real-calculation-daemon-program
Well, guys, I've solved my problem with a different approach. When the calculation is finished a logfile is created. I wrote then a simple until loop with a sleep command. Although this is very ugly, it works for me and it's enough.
for i in $(cat calculation-list.txt)
do
(calculations routine)
until [[ -f $logfile ]]; do
sleep 60
done
(other commands)
done
Easy. Get the process ID (PID) via some awk magic and then use wait too wait for that PID to end. Here are the details on wait from the advanced Bash scripting guide:
Suspend script execution until all jobs running in background have
terminated, or until the job number or process ID specified as an
option terminates. Returns the exit status of waited-for command.
You may use the wait command to prevent a script from exiting before a
background job finishes executing (this would create a dreaded orphan
process).
And using it within your code should work like this:
for i in $(cat calculation-list.txt)
do
./calculation >/dev/null 2>&1 & CALCULATION_PID=(`jobs -l | awk '{print $2}'`);
wait ${CALCULATION_PID}
(other commands)
done

bash restart sub-process using trap SIGCHLD?

I've seen monitoring programs either in scripts that check process status using 'ps' or 'service status(on Linux)' periodically, or in C/C++ that forks and wait on the process...
I wonder if it is possible to use bash with trap and restart the sub-process when SIGCLD received?
I have tested a basic suite on RedHat Linux with following idea (and certainly it didn't work...)
#!/bin/bash
set -o monitor # can someone explain this? discussion on Internet say this is needed
trap startProcess SIGCHLD
startProcess() {
/path/to/another/bash/script.sh & # the one to restart
while [ 1 ]
do
sleep 60
done
}
startProcess
what the bash script being started just sleep for a few seconds and exit for now.
several issues observed:
when the shell starts in foreground, SIGCHLD will be handled only once. does trap reset signal handling like signal()?
the script and its child seem to be immune to SIGINT, which means they cannot be stopped by ^C
since cannot be closed, I closed the terminal. The script seems to be HUP and many zombie children left.
when run in background, the script caused terminal to die
... anyway, this does not work at all. I have to say I know too little about this topic.
Can someone suggest or give some working examples?
Are there scripts for such use?
how about use wait in bash, then?
Thanks
I can try to answer some of your questions but not all based on what I
know.
The line set -o monitor (or equivalently, set -m) turns on job
control, which is only on by default for interactive shells. This seems
to be required for SIGCHLD to be sent. However, job control is more of
an interactive feature and not really meant to be used in shell scripts
(see also this question).
Also keep in mind this is probably not what you intended to do
because once you enable job control, SIGCHLD will be sent for every
external command that exists (e.g. every time you run ls or grep or
anything, a SIGCHLD will fire when that command completes and your trap
will run).
I suspect the reason the SIGCHLD trap only appears to run once is
because your trap handler contains a foreground infinite loop, so your
script gets stuck in the trap handler. There doesn't seem to be a point
to that loop anyways, so you could simply remove it.
The script's "immunity" to SIGINT seems to be an effect of enabling
job control (the monitor part). My hunch is with job control turned on,
the sub-instance of bash that runs your script no longer terminates
itself in response to a SIGINT but instead passes the SIGINT through to
its foreground child process. In your script, the ^C i.e. SIGINT
simply acts like a continue statement in other programming languages
case, since SIGINT will just kill the currently running sleep 60,
whereupon the while loop will immediately run a new sleep 60.
When I tried running your script and then killing it (from another
terminal), all I ended up with were two stray sleep processes.
Backgrounding that script also kills my shell for me, although
the behavior is not terribly consistent (sometimes it happens
immediately, other times not at all). It seems typing any keys other
than enter causes an EOF to get sent somehow. Even after the terminal
exits the script continues to run in the background. I have no idea
what is going on here.
Being more specific about what you want to accomplish would help. If
you just want a command to run continuously for the lifetime of your
script, you could run an infinite loop in the background, like
while true; do
some-command
echo some-command finished
echo restarting some-command ...
done &
Note the & after the done.
For other tasks, wait is probably a better idea than using job control
in a shell script. Again, it would depend on what exactly you are trying
to do.

Resources