Exporting the full environment to GNU Parallel - bash

I find it somewhat annoying that I cannot use aliases in GNU Parallel:
alias gi="grep -i"
parallel gi bar ::: foo
/bin/bash: gi: command not found
I had somewhat come to terms with that it is just the way it is. But reading Accessing Associative Arrays in GNU Parallel I am starting to think: Does it really have to be this way?
Is is possible to make a bash function, that collects all of the environment into a function, exports that function and calls GNU Parallel, which will then import the environment in the spawned shell using that function?
So I am not talking about a specialized solution for the gi-alias, but a bash function that will take all aliases/functions/variables (without me having to name them explicitly), package those into a function, that can be activated by GNU Parallel.
Something similar to:
env_parallel() {
# [... gather all environment/all aliases/all functions into parallel_environment() ...]
foreach alias in all aliases {
append alias definition to definition of parallel_environment()
}
foreach variable in all variables (including assoc arrays) {
append variable definition to definition of parallel_environment()
# Code somewhat similar to https://stackoverflow.com/questions/24977782/accessing-associative-arrays-in-gnu-parallel
}
foreach function in all functions {
append function definition to definition of parallel_environment()
}
# make parallel_environment visible to GNU Parallel
export -f parallel_environment
# Running parallel_environment will now create an environment with
# all variables/all aliases/all functions set in current state
# (with the exception of the function parallel_environment of course)
# Inside GNU parallel:
# if set parallel_environment(): prepend it to the command to run
`which parallel` "$#"
}
# Set an example alias
alias fb="echo fubar"
# Set an example variable
BAZ=quux
# Make an example function
myfunc() {
echo $BAZ
}
# This will record the current environment including the 3 examples
# put it into parallel_environment
# run parallel_environment (to set the environment)
# use the 3 examples
env_parallel parallel_environment\; fb bar {}\; myfunc ::: foo
# It should give the same output as running:
fb bar foo; myfunc
# Outputs:
# fubar bar foo
# quux
Progress: This seems to be close to what I want activated:
env_parallel() {
export parallel_environment='() {
'"$(echo "shopt -s expand_aliases"; alias;typeset -p | grep -vFf <(readonly);typeset -f)"'
}'
`which parallel` "$#"
}
VAR=foo
myfunc() {
echo $VAR $1
}
alias myf=myfunc
env_parallel parallel_environment';
' myfunc ::: bar # Works (but gives errors)
env_parallel parallel_environment';
' myf ::: bar # Works, but requires the \n after ;
So now I am down to 1 issue:
weed out all the variables that cannot be assigned value (e.g BASH_ARGC)
How do I list those?

GNU Parallel 20140822 implements this. To activate it you will need to run this once (e.g. in .bashrc):
env_parallel() {
export parallel_bash_environment='() {
'"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"'
}'
# Run as: env_parallel ...
`which parallel` "$#"
unset parallel_bash_environment
}
And call GNU Parallel as:
env_parallel ...
That should put the myth to rest that it is impossible to export aliases: all you need is a little Behändigkeit (Thanks a lot to #rici for the inspiration).

In principle, it should be possible. But, as usual, there are a lot of details.
First, it is quite possible in bash for a name to be simultaneously a function, a variable (scalar or array) and an alias. Also, the function and the variable can be exported independently.
So there would be a certain ambiguity in env_parallel foo ... in the case that foo has more than one definition. Possibly the best solution would be to detect the situation and report an error, using a syntax like:
env_parallel -a foo -f bar
in order to be more specific, if necessary.
A simpler possibility is to just export the ambiguity, which is what I do below.
So the basic logic to the importer used in env_parallel might be something like this, leaving out lots of error checking and other niceties:
# Helper functions for clarity. In practice, since they are all short,
# I'd probably in-line all of these by hand to reduce name pollution.
get_alias_() { alias "$1" 2>/dev/null; }
get_func_() { declare -f "$1" 2>/dev/null; }
get_var_() { [[ -v "$1" ]] && declare -p "$1" | sed '1s/--\?/-g/'; }
make_importer() {
local name_
export $1='() {
'"$(for name_ in "${#:2}"; do
local got_=()
get_alias_ "$name_" && got_+=(alias)
get_func_ "$name_" && got_+=(function)
get_var_ "$name_" && got_+=(variable)
if [[ -z $got_ ]]; then
echo "Not found: $name_" >>/dev/stderr
elif (( ${#got_[#]} > 1 )); then
printf >>/dev/stderr \
"Ambiguous: %s is%s\n" \
$name_ "$(printf " %s" "${got_[#]}")"
fi
done)"'
}'
}
In practice, there's no real point defining the function in the local environment if the only purpose is to transmit it to a remote shell. It would be sufficient to print the export command. And, while it is convenient to package the import into a function, as in Accessing Associative Arrays in GNU Parallel,
it's not strictly necessary. It does make it a lot easier to pass the definitions through utilities like Gnu parallel, xargs or find, which is what I typically use this hack for. But depending on how you expect to use the definitions, you might be able to simplify the above by simply prepending the list of definitions to the given command. (If you do that, you won't need to fiddle the global flag with the sed in get_var_.)
Finding out what is in the environment
In case it is useful, here is how to get a list of all aliases, functions and variables:
Functions
declare -F | cut -d' ' -f3
Aliases (Note 1)
alias | awk '/^alias /{print substr($2,1,index($2,"=")-1)}'
Variables (Note 1)
declare -p | awk '$1=="declare"{o=(index($3, "="));print o?substr($3,1,o-1):$3}'
In the awk program, you could check for variable type by looking at $2, which will is usually -- but could be -A for an associative array, -a for an array with integer keys, -i for an integer, -x for exported and -r for readonly. (More than one option may apply; -aix is an "exported" (not implemented) integer array.
Note 1
The alias and declare -p commands produce "reusable" output, which could be eval'ed or piped into another bash, so the values are quoted. Unfortunately, the quoting is just good enough for eval; it's not good enough to avoid confusion. It is possible to define, for example:
x='
declare -a FAKE
'
in which case the output of declare -p will include:
declare -x='
declare -a FAKE
'
Consequently, the lists of aliases and variables need to be treated as "possible names": all names will be included, but it might be that everything included is not a name. Mostly that means being sure to ignore errors:
for a in "${_aliases[#]}"; do
if
defn=$(alias $a 2>>/dev/null)
then
# do something with $defn
fi
done

As is often the case, the solution is to use a function, not an alias. You must first export the function (since parallel and bash are both developed by GNU, parallel knows how to deal with functions as exported by bash).
gi () {
grep -i "$#"
}
export -f go
parallel gi bar ::: foo

Related

Is there a way to unpack a config file to cli flags in general?

Basically what foo(**bar) does in python, here I’d want something like
foo **bar.yaml
and that would become
foo --bar1=1 --bar2=2
Where bar.yaml would be
bar1: 1
bar2: 2
You could use a combination of sed and xargs:
sed -E 's/^(.+):[[:space:]]+(.+)$/--\1=\2/' bar.yaml | xargs -d '\n' foo
sed converts the format of bar.yaml lines (e.g. bar1: 1 -> --bar1=1) and xargs feeds the converted lines as arguments to foo.
You could of course modify/extend the sed part to support other formats or single-dash options like -v.
To test if this does what you want, you can run this Bash script instead of foo:
#!/usr/bin/env bash
echo "Arguments: $#"
for ((i=1; i <= $#; i++)); do
echo "Argument $i: '${!i}'"
done
Here's a version for zsh. Run this code or add it to ~/.zshrc:
function _yamlExpand {
setopt local_options extended_glob
# 'words' array contains the current command line
# yaml filename is the last value
yamlFile=${words[-1]}
# parse 'key : value' lines from file, create associative array
typeset -A parms=("${(#s.:.)${(f)"$(<${yamlFile})"}}")
# trim leading and trailing whitespace from keys and values
# requires extended_glob
parms=("${(kv#)${(kv#)parms##[[:space:]]##}%%[[:space:]]##}")
# add -- and = to create flags
typeset -a flags
for key val in "${(#kv)parms}"; do
flags+=("--${key}='${val}'")
done
# replace the value on the command line
compadd -QU -- "$flags"
}
# add the function as a completion and map it to ctrl-y
compdef -k _yamlExpand expand-or-complete '^Y'
At the zsh shell prompt, type in the command and the yaml file name:
% print -l -- ./bar.yaml▃
With the cursor immediately after the yaml file name, hit ctrl+y. The yaml filename will be replaced with the expanded parameters:
% print -l -- --bar1='1' --bar2='2' ▃
Now you're set; you can hit enter, or add parameters, just like any other command line.
Notes:
This only supports the yaml subset in your example.
You can add more yaml parsing to the function, possibly with yq.
In this version, the cursor must be next to the yaml filename - otherwise the last value in words will be empty. You can add code to detect that case and then alter the words array with compset -n.
compadd and compset are described in the zshcompwid man page.
zshcompsys has details on compdef; the section on autoloaded files describes another way to deploy something like this.

Modify a shell variable inside awk block of code

Is there any way to modify a shell variable inside awk block of code?
--------- [shell_awk.sh]---------------------
#!/bin/bash
shell_variable_1=<value A>
shell_variable_2=<value B>
shell_variable_3=<value C>
awk 'function A(X)
{ return X+1 }
{ a=A('$shell_variable_1')
b=A('$shell_variable_2')
c=A('$shell_variable_3')
shell_variable_1=a
shell_variable_2=b
shell_variable_3=c
}' FILE.TXT
--------- [shell_awk.sh]---------------------
This is a very simple example, the real script load a file and make some changes using functions, I need to keep each value before change into a specific variable, so then I can register into MySQL the before and after value.
The after value is received from parameters ($1, $2 and so on).
The value before I already know how to get it from the file.
All is done well, except the shell_variable been set by awk variable. Outside from awk block code is easy to set, but inside, is it possible?
No program -- in awk, shell, or any other language -- can directly modify a parent process's memory. That includes variables. However, of course, your awk can write contents to stdout, and the parent shell can read that content and modify its own variables itself.
Here's an example of awk that writes key/value pairs out to be read by bash. It's not perfect -- read the caveats below.
#!/bin/bash
shell_variable_1=10
shell_variable_2=20
shell_variable_3=30
run_awk() {
awk -v shell_variable_1="$shell_variable_1" \
-v shell_variable_2="$shell_variable_2" \
-v shell_variable_3="$shell_variable_3" '
function A(X) { return X+1 }
{ a=A(shell_variable_1)
b=A(shell_variable_2)
c=A(shell_variable_3) }
END {
print "shell_variable_1=" a
print "shell_variable_2=" b
print "shell_variable_3=" c
}' <<<""
}
while IFS="=" read -r key value; do
printf -v "$key" '%s' "$value"
done < <(run_awk)
for var in shell_variable_{1,2,3}; do
printf 'New value for %s is %s\n' "$var" "${!var}"
done
Advantages
Doesn't use eval. Content such as $(rm -rf ~) in the output from awk won't be executed by your shell.
Disadvantages
Can't handle variable contents with newlines. (You could fix this by NUL-delimiting output from your awk script, and adding -d '' to the read command).
A hostile awk script could modify PATH, LD_LIBRARY_PATH, or other security-sensitive variables. (You could fix this by reading variables into an associative array, rather than the global namespace, or by enforcing a prefix on their names).
The code above uses several ksh extensions also available in bash; however, it will not run with POSIX sh. Thus, be sure not to run this via sh scriptname (which only guarantees POSIX functionality).

why function ls { ls; } hangs in there?

The following shell function definition hangs on there in bash console (RHEL/Ubuntu) in Cygwin it just quits the terminal when it is invoked.
$ function ls { ls; }
$ ls
Any reason why this behavior is?
Your defined ls command is recursively calling itself rather than the previous ls command.
If you want to call the actual ls from your redefined one, you can simply use which to get the full path name, such as redefining ls to give you the long format:
function ls { $(which ls) -l; }
That's effectively the same as:
function ls { /bin/ls -l; }
which won't give you the problems your solution has with recursion.
Another option is to use
function ls { command ls -l; }
command will suppress shell function lookup and only allow for built-ins or programs on the path.
Builtins (like cd) are handled slightly differently to programs since they aren't actually located on the file system. In that case, you can use builtin, rather then which, to call the built-in version.
If you want to define a function in terms of something that may already be a function, that's a bit trickier. You can use declare -f to get the current definition, then manipulate that to create a new definition.
An example of this (though contrived) follows. Let's say you declare a function to show all text files:
pax> showtxt()
...> {
...> ls *.txt
...> }
and you now want to give it a pretty heading. Using declare -f showtxt, you can see it's definition:
pax> declare -f showtxt
showtxt ()
{
ls *.txt
}
Running that may result in the following output:
pax> showtxt
passwords.txt p0rnsites.txt results.txt
Now say you wanted to change it to give it a heading. You can capture the output of declare -f and modify it to make a script which will redefine the function thus:
pax> declare -f showtxt | awk '$1=="ls"{print "echo Text files:"}{print}' >tmp.sh
pax> cat tmp.sh
showtxt ()
{
echo Text files:
ls *.txt
}
You can see that you now have a modified function definition which, when run, will replace the function:
pax> . ./tmp.sh
pax> declare -f showtxt
showtxt ()
{
echo Text files:;
ls *.txt
}
And, when you run the new function, it's behaviour has changed:
pax> showtxt
Text files:
passwords.txt p0rnsites.txt results.txt
Now that contrived example isn't that useful since you probably could have typed in in yourself. Where this comes in handy is when the original function is more complex or the changes you want to make to it are many and varied.
You named your function ls. Now, this overrides any other function(s) which were named ls before. So, as a result, your function calls itself recursively infinitely...
The best idea is to use unique names for your functions, i.e., this works fine:
function myls { ls; }

How can you override function redirection in bash?

I recently discovered some bash code that used the little-known (well, little known to me anyway) feature of function redirection, such as the greatly simplified:
function xyzzy () {
echo hello
} >/dev/null
When you call the function with a simple xyzzy, it automatically applies the redirections attached to the function regardless of what you've done when calling it.
What I'd like to know is if there's any way to override this behaviour in the call to the function itself, to see the message being generated. I'm reticent to change the file containing all the functions since (1) it's large, (2) it changes regularly, and (3) it's heavily protected by the group that supports it.
I've tried:
xyzzy >&1
to try to override it but the output still doesn't show up (possibly because >&1 may be considered a no-op).
In other words, given the script:
function xyzzy () {
echo hello
} >/tmp/junk
rm -f /tmp/junk
echo ================
echo Standard output
echo ----------------
xyzzy # something else here
echo ================
echo Function capture
echo ----------------
cat /tmp/junk
echo ================
it currently outputs:
================
Standard output
----------------
================
Function capture
----------------
hello
================
What can I change the xyzzy call to, so as to get hello printed in the standard output section rather than the function capture section?
And this needs to be without trying to read the file /tmp/junk after it's created since the actual redirections may be to /dev/null so they won't be in a file.
The only thing I can think of would be to parse the output of declare -f function_name and remove the redirection.
This is perhaps the easiest approach. Note that you need to tailor the awk script to the specific function layout and it doesn't modify the body of the function at all. That means you can only turn off redirection at the top level. You could modify whole call trees of functions to turn off redirection but that would require a bash parser capable of recognising and changing function calls within the body.
The following script shows how to do it with your sample function. All the awk command does is create a new function my_xyzzy which mirrors the xyzzy function except for the final line, effectively turning it into:
function my_xyzzy () {
echo hello
}
And the complete script as per the specifications:
function xyzzy () {
echo hello
} >/tmp/qqqq
declare -f xyzzy | awk '
NR==1 {print "my_xyzzy ()"}
NR==2 {prev=$0}
NR>2 {print prev;prev=$0}
END {print "}"}' >$$.bash
. $$.bash
rm -f $$.bash
rm -f /tmp/qqqq
echo ================
echo Standard output
echo ----------------
my_xyzzy
echo ================
echo Function capture
echo ----------------
cat /tmp/qqqq
echo ================
The output of that is:
================
Standard output
----------------
hello
================
Function capture
----------------
cat: /tmp/qqqq: No such file or directory
================
I don't think Bash function redirections can be overridden dynamically in the call to the function itself although a temporarily altered shell context can be made use of by combining Bash aliases and functions (see Magic Aliases: A Layering Loophole in the Bourne Shell).
Non-dynamically it is the last redirection expression, i. e. the rightmost one, that overrides the previous ones if the redirection expressions refer to the same file descriptor.
# example
ls -ld / no_such_file 1>/dev/null 1>/dev/tty 1>&2 1>redirtest.txt
cat redirtest.txt
Therefore, glenn jackman's suggestion to use declare -f function_name seems the way to add a final stdout redirection expression to override the previous ones.
xyzzy() { echo 'Hello, world!'; } 1>/dev/null
#func="$(declare -f xyzzy) 1>&2"
func="$(declare -f xyzzy) 1>/dev/tty"
eval "$func"
xyzzy

Capturing multiple line output into a Bash variable

I've got a script 'myscript' that outputs the following:
abc
def
ghi
in another script, I call:
declare RESULT=$(./myscript)
and $RESULT gets the value
abc def ghi
Is there a way to store the result either with the newlines, or with '\n' character so I can output it with 'echo -e'?
Actually, RESULT contains what you want — to demonstrate:
echo "$RESULT"
What you show is what you get from:
echo $RESULT
As noted in the comments, the difference is that (1) the double-quoted version of the variable (echo "$RESULT") preserves internal spacing of the value exactly as it is represented in the variable — newlines, tabs, multiple blanks and all — whereas (2) the unquoted version (echo $RESULT) replaces each sequence of one or more blanks, tabs and newlines with a single space. Thus (1) preserves the shape of the input variable, whereas (2) creates a potentially very long single line of output with 'words' separated by single spaces (where a 'word' is a sequence of non-whitespace characters; there needn't be any alphanumerics in any of the words).
Another pitfall with this is that command substitution — $() — strips trailing newlines. Probably not always important, but if you really want to preserve exactly what was output, you'll have to use another line and some quoting:
RESULTX="$(./myscript; echo x)"
RESULT="${RESULTX%x}"
This is especially important if you want to handle all possible filenames (to avoid undefined behavior like operating on the wrong file).
In case that you're interested in specific lines, use a result-array:
declare RESULT=($(./myscript)) # (..) = array
echo "First line: ${RESULT[0]}"
echo "Second line: ${RESULT[1]}"
echo "N-th line: ${RESULT[N]}"
In addition to the answer given by #l0b0 I just had the situation where I needed to both keep any trailing newlines output by the script and check the script's return code.
And the problem with l0b0's answer is that the 'echo x' was resetting $? back to zero... so I managed to come up with this very cunning solution:
RESULTX="$(./myscript; echo x$?)"
RETURNCODE=${RESULTX##*x}
RESULT="${RESULTX%x*}"
Parsing multiple output
Introduction
So your myscript output 3 lines, could look like:
myscript() { echo $'abc\ndef\nghi'; }
or
myscript() { local i; for i in abc def ghi ;do echo $i; done ;}
Ok this is a function, not a script (no need of path ./), but output is same
myscript
abc
def
ghi
Considering result code
To check for result code, test function will become:
myscript() { local i;for i in abc def ghi ;do echo $i;done;return $((RANDOM%128));}
1. Storing multiple output in one single variable, showing newlines
Your operation is correct:
RESULT=$(myscript)
About result code, you could add:
RCODE=$?
even in same line:
RESULT=$(myscript) RCODE=$?
Then
echo $RESULT $RCODE
abc def ghi 66
echo "$RESULT"
abc
def
ghi
echo ${RESULT#Q}
$'abc\ndef\nghi'
printf '%q\n' "$RESULT"
$'abc\ndef\nghi'
but for showing variable definition, use declare -p:
declare -p RESULT RCODE
declare -- RESULT="abc
def
ghi"
declare -- RCODE="66"
2. Parsing multiple output in array, using mapfile
Storing answer into myvar variable:
mapfile -t myvar < <(myscript)
echo ${myvar[2]}
ghi
Showing $myvar:
declare -p myvar
declare -a myvar=([0]="abc" [1]="def" [2]="ghi")
Considering result code
In case you have to check for result code, you could:
RESULT=$(myscript) RCODE=$?
mapfile -t myvar <<<"$RESULT"
declare -p myvar RCODE
declare -a myvar=([0]="abc" [1]="def" [2]="ghi")
declare -- RCODE="40"
3. Parsing multiple output by consecutives read in command group
{ read firstline; read secondline; read thirdline;} < <(myscript)
echo $secondline
def
Showing variables:
declare -p firstline secondline thirdline
declare -- firstline="abc"
declare -- secondline="def"
declare -- thirdline="ghi"
I often use:
{ read foo;read foo total use free foo ;} < <(df -k /)
Then
declare -p use free total
declare -- use="843476"
declare -- free="582128"
declare -- total="1515376"
Considering result code
Same prepended step:
RESULT=$(myscript) RCODE=$?
{ read firstline; read secondline; read thirdline;} <<<"$RESULT"
declare -p firstline secondline thirdline RCODE
declare -- firstline="abc"
declare -- secondline="def"
declare -- thirdline="ghi"
declare -- RCODE="50"
After trying most of the solutions here, the easiest thing I found was the obvious - using a temp file. I'm not sure what you want to do with your multiple line output, but you can then deal with it line by line using read. About the only thing you can't really do is easily stick it all in the same variable, but for most practical purposes this is way easier to deal with.
./myscript.sh > /tmp/foo
while read line ; do
echo 'whatever you want to do with $line'
done < /tmp/foo
Quick hack to make it do the requested action:
result=""
./myscript.sh > /tmp/foo
while read line ; do
result="$result$line\n"
done < /tmp/foo
echo -e $result
Note this adds an extra line. If you work on it you can code around it, I'm just too lazy.
EDIT: While this case works perfectly well, people reading this should be aware that you can easily squash your stdin inside the while loop, thus giving you a script that will run one line, clear stdin, and exit. Like ssh will do that I think? I just saw it recently, other code examples here: https://unix.stackexchange.com/questions/24260/reading-lines-from-a-file-with-bash-for-vs-while
One more time! This time with a different filehandle (stdin, stdout, stderr are 0-2, so we can use &3 or higher in bash).
result=""
./test>/tmp/foo
while read line <&3; do
result="$result$line\n"
done 3</tmp/foo
echo -e $result
you can also use mktemp, but this is just a quick code example. Usage for mktemp looks like:
filenamevar=`mktemp /tmp/tempXXXXXX`
./test > $filenamevar
Then use $filenamevar like you would the actual name of a file. Probably doesn't need to be explained here but someone complained in the comments.
How about this, it will read each line to a variable and that can be used subsequently !
say myscript output is redirected to a file called myscript_output
awk '{while ( (getline var < "myscript_output") >0){print var;} close ("myscript_output");}'

Resources