Any reason not to exec in shell script?

Any reason not to exec in shell script? - bash

I have a bunch of wrapper shell scripts which manipulate command line arguments and do some stuff before invoking another binary at the end. Is there any reason to not always exec the binary at the end? It seems like this would be simpler and more efficient, but I never see it done.

If you check /usr/bin, you will likely find many many shell scripts that end with an exec command. Just as an example, here is /usr/bin/ps2pdf (debian):
#!/bin/sh
# Convert PostScript to PDF.
# Currently, we produce PDF 1.4 by default, but this is not guaranteed
# not to change in the future.
version=14
ps2pdf="`dirname \"$0\"`/ps2pdf$version"
if test ! -x "$ps2pdf"; then
____ps2pdf="ps2pdf$version"
fi
exec "$ps2pdf" "$#"
exec is used because it eliminates the need for keeping the shell process active after it is no longer needed.
My /usr/bin directory has over 150 shell scripts that use exec. So, the use of exec is common.
A reason not to use exec would be if there was some processing to be done after the binary finished executing.

I disagree with your assessment that this is not a common practice. That said, it's not always the right thing.
The most common scenario where I end a script with the execution of another command, but can't reasonably use exec, is if I need a cleanup hook to be run after the command at the end finishes. For instance:
#!/bin/sh
# create a temporary directory
tempdir=$(mktemp -t -d myprog.XXXXXX)
cleanup() { rm -rf "$tempdir"; }
trap cleanup 0
# use that temporary directory for our program
exec myprog --workdir="$tempdir" "$#"
...won't actually clean up tempdir after execution! Changing that exec myprog to merely myprog has some disadvantages -- continued memory usage from the shell, an extra process-table entry, signals being potentially delivered to the shell rather than to the program that it's executing -- but it also ensures that the shell is still around on myprog's exit to run any traps required.

Related

Writing a bash script, how do I stop my session from exiting when my script exits?

bash scripting noob here. I've found this article: https://www.shellhacks.com/print-usage-exit-if-arguments-not-provided/ that suggests putting
[ $# -eq 0 ] && { echo "Usage: $0 argument"; exit 1; }
at the top of a script to ensure arguments are passed. Seems sensible.
However, when I do that and test that that line does indeed work (by running the script without supplying any arguments: . myscript.sh) then the script does indeed exit but so does the bash session that I was calling the script from. This is very irritating.
Clearly I'm doing something wrong but I don't know what. Can anyone put me straight?

. myscript.sh is a synonym for source myscript.sh, which runs the script in the current shell (rather than as a separate process). So exit terminates your current shell. (return, on the other hand, wouldn't; it has special behaviour for sourced scripts.)
Use ./myscript.sh to run it "the normal way" instead. If that gives you a permission error, make it executable first, using chmod a+x myscript.sh. To inform the kernel that your script should be run with bash (rather than /bin/sh), add the following as the very first line in the script:
#!/usr/bin/env bash
You can also use bash myscript.sh if you can't make it executable, but this is slightly more error-prone (somebody might do sh myscript.sh instead).

Question seems not clear if you're sourcing script source script_name or . script_name it's interpreted in current bash process, if you're running a function it's the same it's running in same process, otherwise, calling a script, caller bash forks a new bash process and waits until it terminates (so running exit doesn't exit caller process), but when running exit builtin in in current bash it exits current process.

whether a shell script can be executed if another instance of the same script is already running

I have a shell script which usually runs nearly 10 mins for a single run,but i need to know if another request for running the script comes while a instance of the script is running already, whether new request need to wait for existing instance to compplete or a new instance will be started.
I need a new instance must be started whenever a request is available for the same script.
How to do it...
The shell script is a polling script which looks for a file in a directory and execute the file.The execution of the file takes nearly 10 min or more.But during execution if a new file arrives, it also has to be executed simultaneously.
the shell script is below, and how to modify it to execute multiple requests..
#!/bin/bash
while [ 1 ]; do
newfiles=`find /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/ -newer /afs/rch/usr$
touch /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/.my_marker
if [ -n "$newfiles" ]; then
echo "found files $newfiles"
name2=`ls /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/ -Art |tail -n 2 |head $
echo " $name2 "
mkdir -p -m 0755 /afs/rch/usr8/fsptools/WWW/dumpspace/$name2
name1="/afs/rch/usr8/fsptools/WWW/dumpspace/fipsdumputils/fipsdumputil -e -$
$name1
touch /afs/rch/usr8/fsptools/WWW/dumpspace/tempfiles/$name2
fi
sleep 5
done

When writing scripts like the one you describe, I take one of two approaches.
First, you can use a pid file to indicate that a second copy should not run. For example:
#!/bin/sh
pidfile=/var/run/$(0##*/).pid
# remove pid if we exit normally or are terminated
trap "rm -f $pidfile" 0 1 3 15
# Write the pid as a symlink
if ! ln -s "pid=$$" "$pidfile"; then
echo "Already running. Exiting." >&2
exit 0
fi
# Do your stuff
I like using symlinks to store pid because writing a symlink is an atomic operation; two processes can't conflict with each other. You don't even need to check for the existence of the pid symlink, because a failure of ln clearly indicates that a pid cannot be set. That's either a permission or path problem, or it's due to the symlink already being there.
Second option is to make it possible .. nay, preferable .. not to block additional instances, and instead configure whatever it is that this script does to permit multiple servers to run at the same time on different queue entries. "Single-queue-single-server" is never as good as "single-queue-multi-server". Since you haven't included code in your question, I have no way to know whether this approach would be useful for you, but here's some explanatory meta bash:
#!/usr/bin/env bash
workdir=/var/tmp # Set a better $workdir than this.
a=( $(get_list_of_queue_ids) ) # A command? A function? Up to you.
for qid in "${a[#]}"; do
# Set a "lock" for this item .. or don't, and move on.
if ! ln -s "pid=$$" $workdir/$qid.working; then
continue
fi
# Do your stuff with just this $qid.
...
# And finally, clean up after ourselves
remove_qid_from_queue $qid
rm $workdir/$qid.working
done
The effect of this is to transfer the idea of "one at a time" from the handler to the data. If you have a multi-CPU system, you probably have enough capacity to handle multiple queue entries at the same time.

ghoti's answer shows some helpful techniques, if modifying the script is an option.
Generally speaking, for an existing script:
Unless you know with certainty that:
the script has no side effects other than to output to the terminal or to write to files with shell-instance specific names (such as incorporating $$, the current shell's PID, into filenames) or some other instance-specific location,
OR that the script was explicitly designed for parallel execution,
I would assume that you cannot safely run multiple copies of the script simultaneously.
It is not reasonable to expect the average shell script to be designed for concurrent use.

From the viewpoint of the operating system, several processes may of course execute the same program in parallel. No need to worry about this.
However, it is conceivable, that a (careless) programmer wrote the program in such a way that it produces incorrect results, when two copies are executed in parallel.

Bash /usr/bin/time command runs command in subshell

In a bash script (running on REHL5) I use the /usr/bin/time command (not the builtin time command) to record the run times of other commands I run in that same script. The problem I am facing is that some of the commands that I want to record times for are not builtin commands or external scripts but are functions that are either declared in the script or are sourced from a different script. I have found out that the time command fails with error like below:
/usr/bin/time: cannot run shared-func: No such file or directory
Which means that the function shared-func() that I declare elsewhere is not visible in the scope and therefore time cannot run that command. I have run some tests and have verified that the reason behind this error is in fact because the time command tries to execute its command in a new subshell and therefore loses every declared function or variable in its scope. Is there a way to get around this? The ideal solution would be to force the time command to change its behavior and use the current shell for executing its command but I am also interested in any other solutions if that is not possible.
For the record, below is the test I ran. I created two small scripts:
shared.sh:
function shared-func() {
echo "The shared function is visible."
}
test.sh:
#!/bin/bash
function record-timestamp() {
/usr/bin/time -f % -a -o timestamps.csv "$#"
}
source "shared.sh"
record-timestamp shared-func
And this is the test:
$ ./test.sh
/usr/bin/time: cannot run shared-func: No such file or directory
$

A different process, yes. A subshell, no.
A subshell is what you get when your parent shell forks but doesn't exec() -- it's a new process made by copying your current shell instance. Functions are accessible within subshells, though they can't have direct effect on the parent shell (changes to process state die with the subshell when it exits).
When you launch an external program without using exec, the shell first forks, but then calls execve() to invoke the new program. execve() replaces the process image in memory with the program being run -- so it's not the fork(), that is, the creation of the subshell causing this to fail; instead, it's the exec(), the invocation of a separate program.
Even if your new process is also a shell, if it's gone through any exec()-family call it's not a subshell -- it's a whole new process.
tl;dr: You cannot use an external program to wrap a shell function invocation inside the current shell, because an external program's invocation always uses execve(), and execve() always clears process state -- including non-exported shell functions.

function shared-func() {
echo "The shared function is visible."
}
export -f shared-func
echo shared-func | /usr/bin/time -lp /bin/bash
Output:
cat <<EOF
The shared function is visible.
real 0.00
user 0.00
sys 0.00
1040384 maximum resident set size
...
On a 3GHz machine, the overhead of running bash this way is approximately 5ms (u+s time).

/bin/time is a separate program so it will be run in a separate process, and then it will try to run yet another program in yet another separate process.
There's a trick though, a shell script can call itself, or in this case tell /bin/time to call itself.
function record-timestamp() {
/usr/bin/time -f % -a -o timestamps.csv bash "$0" "$#"
}
if [ -n "$1" ]; then
source "shared.sh"
"$#"
else
record-timestamp shared-func
fi
If you call test.sh with arguments then it will try to run that function (which comes from shared.sh), and if you call it without arguments will call the record-timestamp function which in turn will call /bin/tine which in turn will call test.sh with arguments.

Can I combine flock and source?

I'd like to source a script (it sets up variables in my shell):
source foo.sh args
But under flock so that only one instance operates at a time (it does a lot of disk access which I'd like to ensure is serialized).
$ source flock lockfile foo.sh args
-bash: source: /usr/bin/flock: cannot execute binary file
and
$ flock lockfile source foo.sh args
flock: source: Success
don't work.
Is there some simple syntax for this I'm missing? Let's assume I can't edit foo.sh to put the locking commands inside it.

You can't source a script directly via flock because it is an external command and source is a shell builtin. You actually have two problems because of this:
flock doesn't know any command called source because it's built into bash
Even if flock could run it, the changes would not affect the state of the calling shell as you'd want with source because it's happening in a child process.
And, passing flock to source won't work because source expects a script. To do what you want, you need to lock by fd. Here's an example
#!/bin/bash
exec 9>lockfile
flock 9
echo whoopie
sleep 5
flock -u 9
Run two instances of this script in the same directory at the same time and you will see one wait for the other.

Handling temporary files in Bash

I need to copy and execute a bash script from within a parent bash script, when the job is done (and if it fails) I need the parent script to remove the child script file that it copied.
Here's the code snippet that I'm working on:
if [ -e $repo_path/install ]; then
cp $repo_path/install $install_path
exec $install_path/install
rm $install_path/install
fi
This fails for some reason, it seems to exit altogether when the child process ends.
Is it correct to use exec is this example?

exec replaces your current process, so the statements after that will never be reached.
You may replace exec with sh or bash, or just remove it if the child script is executable.
See also: The Bash Reference Manual for exec

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio