Bash: Export functions for use in xargs - bash

When my bash scripts start getting complex, I usually break them up into functions. This applies especially to complex pipes, as a sequence of complicated pipe commands (e.g. containing while-loops) can quickly become hard to read. Even more so when parallelization is wanted, where xargs is very helpful.
I know that I can export functions to a subshell with export -f, thus in a simple case I can do
export -f myfunction
some-command | xargs -Iline bash -c "myfunction 'line'"
but if the myfunction depends on other functions this becomes hard to maintain -- every time the function changes such that the functions needed by the subshell for executing myfunction change, the export statement would have to be changed -- that seems pretty error prone.
Is there some general way to export functions for use by subshells? I was thinking about something along the lines of an "export all defined functions" command, which would then allow a code structure like
main() { ... }
func1 () { ... }
func2 () { ... }
<export all functions>
main "$#"

Your question asks only about exporting functions. This is easy in bash, see below.
Your question title/subject implies using functions in xargs, as though they were a script;
I don't know that xargs can "call" a bash function directly, but you can of course wrap
your use of the exported function(s) in a script called by xargs, see below.
First, a function to list functions. User functions by default and -v to list all functions:
lsfns () {
case "$1" in
-v | v*)
# verbose:
set | grep '()' --color=always
;;
*)
declare -F | cut -d" " -f3 | egrep -v "^_"
;;
esac
}
Next a function to export all user functions:
exportfns () { export -f $(lsfns); }
or just put export -f $(lsfns) in your .bashrc.
Example script doit.sh:
#!/bin/bash
lsfns "$#" # make use of function exported by parent shell :)
Example command line (after chmod a+rx doit.sh):
echo -v | xargs doit.sh
Compare with
echo "" | xargs doit.sh
EDIT 1: responding further to kdb's comment/answer below ("running into situations where exporting functions does not work at all"):
Export of shell functions is not Posix compatible - i.e. it only works with Bash and presumably other shells such as Zsh, Ksh etc.
That is, in Dash, and "standard" Posix shells not providing "export -f", we cannot export functions, AND if we export a function say in Bash, then run a script which starts with the sh-bang e.g. "#!/bin/dash", that script will NOT be able to use the "exported" functions from the parent shell, since functions exported to the "process environment" by Bash, are not recognised by Dash.
EDIT 2: responding further to OP comment "but if the myfunction depends on other functions this becomes hard to maintain":
This is probably a situation where one could make good use of the shell source command (alias "."), e.g.:
. ~/etc/my-functions.sh; myMain ...
And similarly, if you "live" in functions rather than script files, e.g. by calling myMain when you need to, then the first line of this function can be to source your function library;
since this would be excess overhead in the "running a script regularly" case, myMain becomes the command-line stub function, which (re)loads your function library, and calls the actuallyDoit function (which would also be called from inside your script, if you have a script file).
Enjoy
Zenaan

This seems to work to print all the function names. It feels fragile, so test it out
declare -f | grep -oP '^\S+(?=\s*\(\))'

export -f $(compgen -A function)

Since receiving the answer, I found many cases where a different technique proofed preferable: Making the script invoke itself. A simple example would be
# Print hash and disk usage for each argument
if [[ $1 == run ]]; then
shift 1
printf "%s\0" "$#" | xargs -0 -n 1 -P "$NUMBER_OF_PROCESSORS" -- "$0" printpar
elif [[ $1 == printpar ]]; then
echo ":: $(cat "$2" | sha1sum) $(du -sh "$2")"
else
echo "Invalid first parameter '$1'"
exit 1
fi
In real-world examples I'd either make assumptions about the arguments (e.g. using __SUCH__ a shape for self-call keywords) or hide recursive invocations behind an undocumented --command-line-switch.
Exporting functions is generally more elegant, but for large numbers of function it can get prohibitively slow and I remember running into issues, where it failed entirely.

Related

Store a command in a variable; implement without `eval`

This is almost the exact same question as in this post, except that I do not want to use eval.
Quick question short, I want to execute the command echo aaa | grep a by first storing it in a string variable Command='echo aaa | grep a', and then running it without using eval.
In the post above, the selected answer used eval. That works for me too. What concerns me a lot is that there are plenty of warnings about eval below, followed by some attempts to circumvent it. However, none of them are able to solve my problem (essentially the OP's). I have commented below their attempts, but since it has been there for a long time, I suppose it is better to post the question again with the restriction of not using eval.
Concrete Example
What I want is a shell script that runs my command when I am happy:
#!/bin/bash
# This script run-this-if.sh runs the commands when I am happy
# Warning: the following script does not work (on nose)
if [ "$1" == "I-am-happy" ]; then
"$2"
fi
$ run-if.sh I-am-happy [insert-any-command]
Your sample usage can't ever work with an assignment, because assignments are scoped to the current process and its children. Because there's no reason to try to support assignments, things get suddenly far easier:
#!/bin/sh
if [ "$1" = "I-am-happy" ]; then
shift; "$#"
fi
This then can later use all the usual techniques to run shell pipelines, such as:
run-if-happy "$happiness" \
sh -c 'echo "$1" | grep "$2"' _ "$untrustedStringOne" "$untrustedStringTwo"
Note that we're passing the execve() syscall an argv with six elements:
sh (the shell to run; change to bash etc if preferred)
-c (telling the shell that the following argument is the code for it to run)
echo "$1" | grep "$2" (the code for sh to parse)
_ (a constant which becomes $0)
...whatever the shell variable untrustedStringOne contains... (which becomes $1)
...whatever the shell variable untrustedStringTwo contains... (which becomes $2)
Note here that echo "$1" | grep "$2" is a constant string -- in single-quotes, with no parameter expansions or command substitutions -- and that untrusted values are passed into the slots that fill in $1 and $2, out-of-band from the code being evaluated; this is essential to have any kind of increase in security over what eval would give you.

Filtering GNU split with a custom shell function

Is there a way to use GNU split --filter with a custom shell function, like
my_func () {
echo $1
}
split -d 10 INPUT_FILE chunk_ --filter="my_func $FILE$"
which I would expect to output
chunk_00
chunk_01
...
Of course the echo in the custom func is just for expressing my question here, in my concrete case the custom function creates a script that uses the chunks from split as input.
It seems that GNU shell only accepts standard shell commands within --filter.
Any smart way around this?
You can do this by exporting the function to the environment, which is available to the sub-shell run by split. For example with bash:
ex.sh
#!/bin/bash
my_func() {
echo $1
}
export -f my_func
seq inf | split -d --filter='my_func $FILE' /dev/stdin chunk_
If you run it like this:
bash ex.sh | head
The output is:
chunk_00
chunk_01
chunk_02
chunk_03
chunk_04
chunk_05
chunk_06
chunk_07
chunk_08
chunk_09
More details in this answer on UL.
Note that split uses whatever the SHELL variable is set to as the sub-shell to run the --filter command. If you are running a different shell, you may need to add export SHELL=/bin/bash before running split.

How to evaluate bash function arguments as command with possible environment overrides?

How to write a function in bash (I can rely on it being v4+), that, given words constituting a command with possible environment overrides, execute this command in the current shell?
For example, given
f cd src
f CXX="ccache gcc" make -k XOPTIONS="--test1 --test2"
the function f would do approximately same thing as simply having these lines in the shell script without the f up front?
A few unsuccessful attempts.
This tries to evaluate environment override CXX="ccache gcc" as command.
f() { "$#" ; }
This loses word-quoting on all arguments, breaking single argument words on spaces:
f() { eval "$#" ; }
This handles the environment overrides, but runs the command in a subshell, as env(1) is not a bash builtin:
f() { env -- "$#" ; }
This question came up multiple times on SO and Unix SE, but I have never seen it asked about supporting all three important parts, namely: environment overrides; execution in the current shell; and correct handling of arguments containing spaces (and other characters that are lexically special to bash).
One thing I could potentially use is that environment overrides are rarely used with builtins (but v. IFS= read...), so I can select between the "#" ; and eval -- "#" ; patterns based on $1 being syntactically a variable assignment. But that is, again, not as simple as spotting a = in it, as the equal sign may be quoted part of a command, albeit that is not likely sane. Still, I usually prefer correct code to mostly correct code, and this approach has 2 consecutive leaps of faith.
Addressing a possible question why do I need a function replicating the default behavior of the shell ("just drop the f"): in reality, f() is more complex that just running a command, implementing a pattern repeating in the script in a few dozen locations; this is only the part I cannot get right.
If you can make eval see your arguments properly quoted, it should work. To this end, you can use the %q format specification of printf, which works as follows:
$ printf '%q ' CXX="ccache gcc" make -k XOPTIONS="--test1 --test2"
CXX=ccache\ gcc make -k XOPTIONS=--test1\ --test2
This would result in a function like
f () {
eval "$(printf '%q ' "$#")"
}
Notice that this appends an extra space at the end of the command, but this shouldn't hurt.
Tricky. You could do this, but it's going to pollute the environment of the shell:
f() {
# process any leading "var=value" assignments
while [[ $1 == ?*=* ]]; do
declare -x "$1"
shift
done
"$#"
}
Just did a quick test: the env vars declared in the function are still local to the scope of the function and will not actually pollute the script's environment.
$ f() {
declare -x FOO=bar
sh -c 'echo subshell FOO=$FOO'
echo function FOO=$FOO
}
$ unset foo
$ f
subshell FOO=bar
function FOO=bar
$ echo main shell FOO=$FOO
main shell FOO=

How to recall a string in shell script

I made a script like this:
#! /usr/bin/bash
a=`ls ../wrfprd/wrfout_d0${i}* | cut -c22-25`
b=`ls ../wrfprd/wrfout_d0${i}* | cut -c27-28`
c=`ls ../wrfprd/wrfout_d0${i}* | cut -c30-31`
d=`ls ../wrfprd/wrfout_d0${i}* | cut -c33-34`
f=$a$b$c$d
echo $f
sed "s/.* startdate=.*/export startdate=${f}/g" ./post_process > post_process2
echo command works and gives 2008042118 that is what I want but in file post_process2 is like this export startdate= and can not recall variable f. I want to produce a line like export startdate=2008042118
First -- don't use ls here -- it's both expensive in terms of performance (compared to globbing, which is performed internal to the shell without starting any external programs), and doesn't guarantee useful output for the full range of possible filenames, making its use in this context inherently bug-prone. A better way to retrieve pieces from a filename, assuming a ksh-derived shell such as bash or zsh, would look like this:
#!/bin/bash
# this is an array, but we're only going to use the first element
file=( "../wrfprd/wrfout_d0${i}"* )
[[ -e $file ]] || { echo "No file found" >&2; exit 1; }
f=${file:22:4}${file:27:2}${file:30:2}${file:33:2}
Second, don't use sed to modify code -- doing so requires that your runtime user have permission to modify its own code, and moreover invites injection vulnerabilities. Just write your content out to a data file:
printf '%s\n' "$f" >startdate.txt
...and, in your second script, to read in the value from that file:
# if the shebang is #!/bin/bash
startdate=$(<startdate.txt)
# if the shebang is #!/bin/sh
startdate=$(cat startdate.txt)

bash: function + source + declare = boom

Here is a problem:
In my bash scripts I want to source several file with some checks, so I have:
if [ -r foo ] ; then
source foo
else
logger -t $0 -p crit "unable to source foo"
exit 1
fi
if [ -r bar ] ; then
source bar
else
logger -t $0 -p crit "unable to source bar"
exit 1
fi
# ... etc ...
Naively I tried to create a function that do:
function safe_source() {
if [ -r $1 ] ; then
source $1
else
logger -t $0 -p crit "unable to source $1"
exit 1
fi
}
safe_source foo
safe_source bar
# ... etc ...
But there is a snag there.
If one of the files foo, bar, etc. have a global such as --
declare GLOBAL_VAR=42
-- it will effectively become:
function safe_source() {
# ...
declare GLOBAL_VAR=42
# ...
}
thus a global variable becomes local.
The question:
An alias in bash seems too weak for this, so must I unroll the above function, and repeat myself, or is there a more elegant approach?
... and yes, I agree that Python, Perl, Ruby would make my life easier, but when working with legacy system, one doesn't always have the privilege of choosing the best tool.
It's a bit late answer, but now declare supports a -g parameter, which makes a variable global (when used inside function). Same works in sourced file.
If you need a global (read exported) variable, use:
declare -g DATA="Hello World, meow!"
Yes, Bash's 'eval' command can make this work. 'eval' isn't very elegant, and it sometimes can be difficult to understand and debug code that uses it. I usually try to avoid it, but Bash often leaves you with no other choice (like the situation that prompted your question). You'll have to weigh the pros and cons of using 'eval' for yourself.
Some background on 'eval'
If you're not familiar with 'eval', it's a Bash built-in command that expects you to pass it a string as its parameter. 'eval' dynamically interprets and executes your string as a command in its own right, in the current shell context and scope. Here's a basic example of a common use (dynamic variable assignment):
$> a_var_name="color"
$> eval ${a_var_name}="blue"
$> echo -e "The color is ${color}."
The color is blue.
See the Advanced Bash Scripting Guide for more info and examples: http://tldp.org/LDP/abs/html/internal.html#EVALREF
Solving your 'source' problem
To make 'eval' handle your sourcing issue, you'd start by rewriting your function, 'safe_source()'. Instead of actually executing the command, 'safe_source()' should just PRINT the command as a string on STDOUT:
function safe_source() { echo eval " \
if [ -r $1 ] ; then \
source $1 ; \
else \
logger -t $0 -p crit \"unable to source $1\" ; \
exit 1 ; \
fi \
"; }
Also, you'll need to change your function invocations, slightly, to actually execute the 'eval' command:
`safe_source foo`
`safe_source bar`
(Those are backticks/backquotes, BTW.)
How it works
In short:
We converted the function into a command-string emitter.
Our new function emits an 'eval' command invocation string.
Our new backticks call the new function in a subshell context, returning the 'eval' command string output by the function back up to the main script.
The main script executes the 'eval' command string, captured by the backticks, in the main script context.
The 'eval' command string re-parses and executes the 'eval' command string in the main script context, running the whole if-then-else block, including (if the file exists) executing the 'source' command.
It's kind of complex. Like I said, 'eval' is not exactly elegant. In particular, there are a couple of special things you should notice about the changes we made:
The entire IF-THEN-ELSE block has becomes one whole double-quoted string, with backslashes at the end of each line "hiding" the newlines.
Some of the shell special characters like '"') have been backslash-escaped, while others ('$') have been left un-escaped.
'echo eval' has been prepended to the whole command string.
Extra semicolons have been appended to all of the lines where a command gets executed to terminate them, a role that the (now-hidden) newlines originally performed.
The function invocation has been wrapped in backticks.
Most of these changes are motived by the fact that 'eval' won't handle newlines. It can only deal with multiple commands if we combine them into a single line delimited by semicolons, instead. The new function's line breaks are purely a formatting convenience for the human eye.
If any of this is unclear, run your script with Bash's '-x' (debug execution) flag turned on, and that should give you a better picture of exactly what's happening. For instance, in the function context, the function actually produces the 'eval' command string by executing this command:
echo eval ' if [ -r <INCL_FILE> ] ; then source <INCL_FILE> ; else logger -t <SCRIPT_NAME> -p crit "unable to source <INCL_FILE>" ; exit 1 ; fi '
Then, in the main context, the main script executes this:
eval if '[' -r <INCL_FILE> ']' ';' then source <INCL_FILE> ';' else logger -t <SCRIPT_NAME> -p crit '"unable' to source '<INCL_FILE>"' ';' exit 1 ';' fi
Finally, again in the main context, the eval command executes these two commands if exists:
'[' -r <INCL_FILE> ']'
source <INCL_FILE>
Good luck.
declare inside a function makes the variable local to that function. export affects the environment of child processes not the current or parent environments.
You can set the values of your variables inside the functions and do the declare -r, declare -i or declare -ri after the fact.

Resources