Avoid command line arguments propagation when sourcing bash script - bash

I have a bash script a.sh that looks like this:
#!/bin/bash
echo $#
echo $1
and a script b.sh that looks like this:
#!/bin/bash
source ./a.sh
If I call ./a.sh I'm correctly getting 0 and an empty line as output. When calling ./a.sh blabla I'm getting 1 and blabla as output.
However when I call ./b.sh blabla I'm also getting 1 and blabla as output, even though no argument was passed to a.sh from within b.sh.
This seems to be related to the use of source (which I have to use since in my real use case, a.sh exports some variables). How can I avoid arguments from b.sh being propagated to a.sh? I thought about using eval $(a.sh) but this makes my echo statements in a.sh fail. I thought of using shift to consume the arguments from b.sh before calling a.sh but I don't necessarily know how many arguments there are.

The root of the problem is an anomaly in how the source command works. From the bash man page, in the "Shell Builtin Commands" section:
. filename [arguments]
source filename [arguments]
[...] If any arguments are supplied, they become the positional parameters when filename is executed. Otherwise the positional parameters are unchanged.
...which means you can override the main script's arguments by supplying different arguments to the sourced script, but you can't just not pass arguments to it.
Fortunately, there's a workaround; just source the script in a context where there are no arguments:
#!/bin/bash
wrapperfunction() {
source ./a.sh
}
wrapperfunction
Since no arguments are supplied to wrapperfunction, inside it the arg list is empty. Since a.sh's commands are run in that context, the arg list is empty there as well. And variables assigned inside a.sh are available outside the function (unless they're declared as local or something similar).
(Note: I tested this in bash, zsh, dash, and ksh93, and it works in all of them -- well, except that dash doesn't have the source command, so you have to use . instead.)
Update: I realized you can write a generic wrapper function that allows you to specify a filename as an argument:
sourceWithoutArgs() {
local fileToSource="$1"
shift
source "$fileToSource"
}
sourceWithoutArgs ./a.sh
The shift command removes the filename from the function's arg list, so it's empty when the file actually gets sourced. Well, unless you passed additional arguments to the function, in which case those will be in the arg list and will get passed on... so you can actually use this function to replace both the without-args and the with-args usage of source.
(This works in bash and zsh. If you want to use it in ksh, you have to remove local; and to use it in dash, replace source with .)

You can even keep passing normal arguments using
source() {
local f="${1}"; shift;
builtin source "${f}" "${#}"
}
It is also possible to check from the sourced file what arguments have actually been given
# in x.bash, a file meant to be sourced
# fix `source` arguments
__ARGV=( "${#}" )
__shopts=$( shopt -p ) # save shopt
shopt -u extdebug
shopt -s extdebug # create BASH_ARGV
# no args have been given to `source x.bash`
if [[ ${BASH_ARGV[0]} == "${BASH_SOURCE[0]}" ]]; then
__ARGV=() # clear `${__ARGV[#]}`
fi
eval "${__shopts}" # restore shopt
unset __shopts
# Actual args are in ${__ARGV[#]}

Related

How do I specify a default command-line argument for a different (Python) script via a shell script?

My understanding of shell is very minimal, I'm working on a small task and need some help.
I've been given a python script that can parse command line arguments. One of these arguments is called "-targetdir". When -targetdir is unspecified, it defaults to a /tmp/{USER} folder on the user's machine. I need to direct -targetdir to a specific filepath.
I effectively want to do something like this in my script:
set ${-targetdir, "filepath"}
So that the python script doesn't set a default value. Would anyone know how to do this? I also am not sure if I'm giving sufficient information, so please let me know if I'm being ambiguous.
I strongly suggest modifying the Python script to explicitly specify the desired default rather than engaging in this kind of hackery.
That said, some approaches:
Option A: Function Wrapper
Assuming that your Python script is called foobar, you can write a wrapper function like the following:
foobar() {
local arg found=0
for arg; do
[[ $arg = -targetdir ]] && { found=1; break; }
done
if (( found )); then
# call the real foobar command without any changes to its argument list
command foobar "$#"
else
# call the real foobar, with ''-targetdir filepath'' added to its argument list
command foobar -targetdir "filepath" "$#"
fi
}
If put in a user's .bashrc, any invocation of foobar from the user's interactive shell (assuming they're using bash) will be replaced with the above wrapper. Note that this doesn't impact other shells; export -f foobar will cause other instances of bash to honor the wrapper, but that isn't guaranteed to extend to instances of sh, as used by system() invocations, Python's Popen(..., shell=True), and other places in the system.
Option B: Shell Wrapper
Assume you rename the original foobar script to foobar.real. Then you can make foobar a wrapper, like the following:
#!/usr/bin/env bash
found=0
for arg; do
[[ $arg = -targetdir ]] && { found=1; break; }
done
if (( found )); then
exec foobar.real "$#"
else
exec foobar.real -targetdir "filepath" "$#"
fi
Using exec terminates the wrapper's execution, replacing it with foobar.real without remaining in memory.

What could be preventing my alias from expanding in a shell script

I am trying to set an alias in a script and then execute the alias later on in the script. I've verified that the file path that the alias contains is valid, and I've also set the shell script to expand aliases as well, yet the script still refuses to use the alias. What could I be doing incorrectly here?
Script:
#set location of parallel script and replace ~ with $HOME if necessary
parallellocation="~/newnnm/parallel"
parallellocation="${parallellocation/#\~/$HOME}"
#If the parallellocation variable is set and a parallel command is not currently available,
#proceed with setting an alias that points to the parallellocation variable
if [ -r "$parallellocation" ] && ! command -v parallel &>/dev/null; then
shopt -s expand_aliases
alias parallel="$parallellocation"
parallel
fi
Sample output:
./processlocations_new2.sh
./processlocations_new2.sh: line 98: parallel: command not found
As reflected in the comment record on the question, bash seems not to honor alias definitions or setting of the alias_expand option within the scope of an if block or other compound command. The Bash Manual explains this:
The rules concerning the definition and use of aliases are somewhat
confusing. Bash always reads at least one complete line of input
before executing any of the commands on that line. Aliases are
expanded when a command is read, not when it is executed. Therefore,
an alias definition appearing on the same line as another command does
not take effect until the next line of input is read. The commands
following the alias definition on that line are not affected by the
new alias. This behavior is also an issue when functions are executed.
Aliases are expanded when a function definition is read, not when the
function is executed, because a function definition is itself a
command. As a consequence, aliases defined in a function are not
available until after that function is executed. To be safe, always
put alias definitions on a separate line, and do not use alias in
compound commands.
(Emphasis added.) The comments do not refer directly to shell options, but the same logic that says alias definitions within a compound command do not apply within the same compound command also implies that it is the value of the expand_aliases option in effect when the compound command is read that applies.
The question arose as to how to use a shell function instead of an alias for this purpose. Here's one way:
altparallel="$HOME/newnnm/parallel"
parallellocation=
if command -v parallel >/dev/null 2>&1; then
parallellocation="command parallel"
elif [[ -x "$altparallel" ]]; then
parallellocation="$altparallel"
fi
# Runs parallel with the given arguments. Uses the 'parallel' command
# found in the path if there is one, or a local one designated by
# $altparallel if that exists and is executable. Exit status is that of
# 'parallel', or 1 if no 'parallel' command is available.
parallel() {
[[ -z "$parallellocation" ]] && return 1
# expansion of $parallellocation is intentionally unquoted
$parallellocation "$#"
}
You source that from your environment setup scripts to get a parallel function defined that does what you want.
On the third hand, if all you want is a script that runs one version of parallel or the other, directly, then you don't need either a function or an alias. Just figure out which you want to run, and run it, something like:
altparallel="$HOME/newnnm/parallel"
if command -v parallel >/dev/null 2>&1; then
parallel "$#"
elif [[ -x "$altparallel" ]]; then
"$altparallel" "$#"
else
exit 1
fi
Aliases are a parse time feature. They work by substituting one string for another during parsing.
Commands are entirely parsed before they're executed, and this includes compound commands like if.
The effect of this is that any changes to the parser, like setting aliases, will not take effect in any of the possibly nested compound commands where the setting takes place.
For example, if you wrap your entire script in {...}, no aliases will work because it's now a giant compound command.
This is yet another reason why you should never use aliases outside .bashrc, and even then just sparingly. Use functions.
You can enable/disable alias expansion with shopt within a bash script/command line.
$ shopt -u expand_aliases
-s Set
-u Unset
You can also use unalias to remove specific aliases (if you know them)
$ unalias <aliasname>
Be aware when you're using aliases in BASH srcipts, they aren't going to expand inside conditional constructs.
So while this one would work inside of a script:
#!/bin/bash
shopt -e expand_aliases
alias list1='ls -lhS'
list1
this one won't:
#!/bin/bash
shopt -e expand_aliases
alias list2='ls -lah'
if [ 1 ]; then
list2
fi
So as the others pointed out in the comment section, use functions, or use eval to execute the command string stored in $parallellocation:
eval $parallellocation

Importing function definitions from a bash script w/o running it

I have two scripts foo.sh and bla.sh
foo.sh
#bin/bash
test(){
"hello world"
}
test
exit 1
bla.sh
#bin/bash
source ./a.sh
echo a.test
The problem is that source seems like run the a.sh script, and of course then after exit 1 b never is executed.
ThereĀ“s any way to just use the function test from bla without run the whole script?
If you want your script to be capable of being sourced without running its contents, you should design it that way.
#!/bin/bash
# put your function definitions here
mytest() { echo "hello world"; }
# ...and choose one of the following, depending on your requirements:
# more reliable approach, *except* doesn't detect being sourced from an interactive or
# piped-in shell.
(( ${#BASH_SOURCE[#]} > 1 )) && return
# less reliable approach in general, but *does* detect being sourced from an interactive
# shell.
[[ "$BASH_SOURCE" != "$0" ]] && return
# put your actions to take when executed here
mytest
exit 1
Why it works: (( ${#BASH_SOURCE[#]} > 1 ))
If the array of source files (per stack frame) is of length more than one in the root of a script, the only way to have any additional stack frame is for the script to have been sourced from elsewhere.
The caveat, here, is that an interactive shell (or a noninteractive shell with its input coming from a pipeline or other non-file source) doesn't have an entry in the BASH_SOURCE array, so if we're sourced from a human-driven shell -- or a shell reading its input from a pipeline or other non-file source -- there will still be only one entry.
Why it works: [[ $BASH_SOURCE != "$0" ]]
BASH_SOURCE is an array of source files, one element per stack frame; like all bash arrays, when expanded without explicitly indexing into a specific element, it defaults to the first one (that being the file currently being executed or sourced). $0 is the name of the command being executed, which is not updated by the source command.
Thus, if these don't match, we know that we were sourced.
Important caveat:
Note that there are circumstances where depending on $0 will necessarily be broken: cat input-script | bash can't accurately know the location on disk where input-script came from, so it will always detect this as being sourced. See the Why $0 is NOT an option section of BashFAQ #28 to understand these limitations in detail.

choosing between $0 and BASH_SOURCE

How does one choose between "$0" and "${BASH_SOURCE[0]}"
This description from GNU didn't help me much.
BASH_SOURCE
An array variable whose members are the source filenames where the
corresponding shell function names in the FUNCNAME array variable are
defined. The shell function ${FUNCNAME[$i]} is defined in the file
${BASH_SOURCE[$i]} and called from ${BASH_SOURCE[$i+1]}
Note: For a POSIX-compliant solution, see this answer.
${BASH_SOURCE[0]} (or, more simply, $BASH_SOURCE[1]
) contains the (potentially relative) path of the containing script in all invocation scenarios, notably also when the script is sourced, which is not true for $0.
Furthermore, as Charles Duffy points out, $0 can be set to an arbitrary value by the caller.
On the flip side, $BASH_SOURCE can be empty, if no named file is involved; e.g.:
echo 'echo "[$BASH_SOURCE]"' | bash
The following example illustrates this:
Script foo:
#!/bin/bash
echo "[$0] vs. [${BASH_SOURCE[0]}]"
$ bash ./foo
[./foo] vs. [./foo]
$ ./foo
[./foo] vs. [./foo]
$ . ./foo
[bash] vs. [./foo]
$0 is part of the POSIX shell specification, whereas BASH_SOURCE, as the name suggests, is Bash-specific.
[1] Optional reading: ${BASH_SOURCE[0]} vs. $BASH_SOURCE:
Bash allows you to reference element 0 of an array variable using scalar notation: instead of writing ${arr[0]}, you can write $arr; in other words: if you reference the variable as if it were a scalar, you get the element at index 0.
Using this feature obscures the fact that $arr is an array, which is why popular shell-code linter shellcheck.net issues the following warning (as of this writing):
SC2128: Expanding an array without an index only gives the first element.
On a side note: While this warning is helpful, it could be more precise, because you won't necessarily get the first element: It is specifically the element at index 0 that is returned, so if the first element has a higher index - which is possible in Bash - you'll get the empty string; try a[1]='hi'; echo "$a".
(By contrast, zsh, ever the renegade, returns all elements as a single string, separated with the first char. stored in $IFS, which is a space by default).
You may choose to eschew this feature due to its obscurity, but it works predictably and, pragmatically speaking, you'll rarely, if ever, need to access indices other than 0 of array variable ${BASH_SOURCE[#]}.
Optional reading, part 2: Under what conditions does the BASH_SOURCE array variable actually contain multiple elements?:
BASH_SOURCE only has multiple entries if function calls are involved, in which case its elements parallel the FUNCNAME array that contains all function names currently on the call stack.
That is, inside a function, ${FUNCNAME[0]} contains the name of the executing function, and ${BASH_SOURCE[0]} contains the path of the script file in which that function is defined, ${FUNCNAME[1]} contains the name of the function from which the currently executing function was called, if applicable, and so on.
If a given function was invoked directly from the top-level scope in the script file that defined the function at level $i of the call stack, ${FUNCNAME[$i+1]} contains:
main (a pseudo function name), if the script file was invoked directly (e.g., ./script)
source (a pseudo function name), if the script file was sourced (e.g. source ./script or . ./script).
These scripts may help illustrate. The outer script calls the middle script, which calls the inner script:
$ cat outer.sh
#!/usr/bin/env bash
./middle.sh
$ cat middle.sh
#!/usr/bin/env bash
./inner.sh
$ cat inner.sh
#!/usr/bin/env bash
echo "\$0 = '$0'"
echo "\${BASH_SOURCE[0]} = '${BASH_SOURCE[0]}'"
echo "\${BASH_SOURCE[1]} = '${BASH_SOURCE[1]}'"
echo "\${BASH_SOURCE[2]} = '${BASH_SOURCE[2]}'"
$ ./outer.sh
$0 = './inner.sh'
$BASH_SOURCE[0] = './inner.sh'
$BASH_SOURCE[1] = ''
$BASH_SOURCE[2] = ''
However, if we change the script calls to source statements:
$ cat outer.sh
#!/usr/bin/env bash
source ./middle.sh
$ cat middle.sh
#!/usr/bin/env bash
source ./inner.sh
$ cat inner.sh
#!/usr/bin/env bash
echo "\$0 = '$0'"
echo "\${BASH_SOURCE[0]} = '${BASH_SOURCE[0]}'"
echo "\${BASH_SOURCE[1]} = '${BASH_SOURCE[1]}'"
echo "\${BASH_SOURCE[2]} = '${BASH_SOURCE[2]}'"
$ ./outer.sh
$0 = './outer.sh'
$BASH_SOURCE[0] = './inner.sh'
$BASH_SOURCE[1] = './middle.sh'
$BASH_SOURCE[2] = './outer.sh'
For portability, use ${BASH_SOURCE[0]} when it is defined, and $0 otherwise. That gives
${BASH_SOURCE[0]:-$0}
Notably, in say zsh, the $0 does contain correct filepath even if the script is sourced.
TL;DR I'd recommend using ${BASH_SOURCE:-$0} as the most universal variant.
Previous answers are good but they do not mention one caveat of using ${BASH_SOURCE[0]} directly: if you invoke the script as sh's argument and your sh is not aliased to bash (in my case, on Ubuntu 16.04.5 LTS, it was linked to dash), it may fail with BASH_SOURCE variable being empty/undefined. Here's an example:
t.sh:
#!/usr/bin/env bash
echo "\$0: [$0]"
echo "\$BASH_SOURCE: [$BASH_SOURCE]"
echo "\$BASH_SOURCE or \$0: [${BASH_SOURCE:-$0}]"
echo "\$BASH_SOURCE[0] or \$0: [${BASH_SOURCE[0]:-$0}]"
(Successfully) runs:
$ ./t.sh
$0: [./t.sh]
$BASH_SOURCE: [./t.sh]
$BASH_SOURCE or $0: [./t.sh]
$BASH_SOURCE[0] or $0: [./t.sh]
$ source ./t.sh
$0: [/bin/bash]
$BASH_SOURCE: [./t.sh]
$BASH_SOURCE or $0: [./t.sh]
$BASH_SOURCE[0] or $0: [./t.sh]
$ bash t.sh
$0: [t.sh]
$BASH_SOURCE: [t.sh]
$BASH_SOURCE or $0: [t.sh]
$BASH_SOURCE[0] or $0: [t.sh]
And finally:
$ sh t.sh
$0: [t.sh]
$BASH_SOURCE: []
$BASH_SOURCE or $0: [t.sh]
t.sh: 6: t.sh: Bad substitution
Resume
As you see, only the third variant: ${BASH_SOURCE:-$0} - works and gives consistent result under all invocation scenarios. Note that we take advantage of bash's feature of making a reference to an unsubscripted array variable equal to the first array element.

bash: function + source + declare = boom

Here is a problem:
In my bash scripts I want to source several file with some checks, so I have:
if [ -r foo ] ; then
source foo
else
logger -t $0 -p crit "unable to source foo"
exit 1
fi
if [ -r bar ] ; then
source bar
else
logger -t $0 -p crit "unable to source bar"
exit 1
fi
# ... etc ...
Naively I tried to create a function that do:
function safe_source() {
if [ -r $1 ] ; then
source $1
else
logger -t $0 -p crit "unable to source $1"
exit 1
fi
}
safe_source foo
safe_source bar
# ... etc ...
But there is a snag there.
If one of the files foo, bar, etc. have a global such as --
declare GLOBAL_VAR=42
-- it will effectively become:
function safe_source() {
# ...
declare GLOBAL_VAR=42
# ...
}
thus a global variable becomes local.
The question:
An alias in bash seems too weak for this, so must I unroll the above function, and repeat myself, or is there a more elegant approach?
... and yes, I agree that Python, Perl, Ruby would make my life easier, but when working with legacy system, one doesn't always have the privilege of choosing the best tool.
It's a bit late answer, but now declare supports a -g parameter, which makes a variable global (when used inside function). Same works in sourced file.
If you need a global (read exported) variable, use:
declare -g DATA="Hello World, meow!"
Yes, Bash's 'eval' command can make this work. 'eval' isn't very elegant, and it sometimes can be difficult to understand and debug code that uses it. I usually try to avoid it, but Bash often leaves you with no other choice (like the situation that prompted your question). You'll have to weigh the pros and cons of using 'eval' for yourself.
Some background on 'eval'
If you're not familiar with 'eval', it's a Bash built-in command that expects you to pass it a string as its parameter. 'eval' dynamically interprets and executes your string as a command in its own right, in the current shell context and scope. Here's a basic example of a common use (dynamic variable assignment):
$> a_var_name="color"
$> eval ${a_var_name}="blue"
$> echo -e "The color is ${color}."
The color is blue.
See the Advanced Bash Scripting Guide for more info and examples: http://tldp.org/LDP/abs/html/internal.html#EVALREF
Solving your 'source' problem
To make 'eval' handle your sourcing issue, you'd start by rewriting your function, 'safe_source()'. Instead of actually executing the command, 'safe_source()' should just PRINT the command as a string on STDOUT:
function safe_source() { echo eval " \
if [ -r $1 ] ; then \
source $1 ; \
else \
logger -t $0 -p crit \"unable to source $1\" ; \
exit 1 ; \
fi \
"; }
Also, you'll need to change your function invocations, slightly, to actually execute the 'eval' command:
`safe_source foo`
`safe_source bar`
(Those are backticks/backquotes, BTW.)
How it works
In short:
We converted the function into a command-string emitter.
Our new function emits an 'eval' command invocation string.
Our new backticks call the new function in a subshell context, returning the 'eval' command string output by the function back up to the main script.
The main script executes the 'eval' command string, captured by the backticks, in the main script context.
The 'eval' command string re-parses and executes the 'eval' command string in the main script context, running the whole if-then-else block, including (if the file exists) executing the 'source' command.
It's kind of complex. Like I said, 'eval' is not exactly elegant. In particular, there are a couple of special things you should notice about the changes we made:
The entire IF-THEN-ELSE block has becomes one whole double-quoted string, with backslashes at the end of each line "hiding" the newlines.
Some of the shell special characters like '"') have been backslash-escaped, while others ('$') have been left un-escaped.
'echo eval' has been prepended to the whole command string.
Extra semicolons have been appended to all of the lines where a command gets executed to terminate them, a role that the (now-hidden) newlines originally performed.
The function invocation has been wrapped in backticks.
Most of these changes are motived by the fact that 'eval' won't handle newlines. It can only deal with multiple commands if we combine them into a single line delimited by semicolons, instead. The new function's line breaks are purely a formatting convenience for the human eye.
If any of this is unclear, run your script with Bash's '-x' (debug execution) flag turned on, and that should give you a better picture of exactly what's happening. For instance, in the function context, the function actually produces the 'eval' command string by executing this command:
echo eval ' if [ -r <INCL_FILE> ] ; then source <INCL_FILE> ; else logger -t <SCRIPT_NAME> -p crit "unable to source <INCL_FILE>" ; exit 1 ; fi '
Then, in the main context, the main script executes this:
eval if '[' -r <INCL_FILE> ']' ';' then source <INCL_FILE> ';' else logger -t <SCRIPT_NAME> -p crit '"unable' to source '<INCL_FILE>"' ';' exit 1 ';' fi
Finally, again in the main context, the eval command executes these two commands if exists:
'[' -r <INCL_FILE> ']'
source <INCL_FILE>
Good luck.
declare inside a function makes the variable local to that function. export affects the environment of child processes not the current or parent environments.
You can set the values of your variables inside the functions and do the declare -r, declare -i or declare -ri after the fact.

Resources