Transferring variable into a shell function - bash

Is there a way to use shell function that accepts variable from the bash script (or rather, transfer a variable into a shell function)?
The following procedure works just fine (Note, I'm using this procedure as part of my need to implement output redirection as explained here):
mycmd() { cat <(head -3 MyProgrammingBook.txt|awk -F "\t" '{OFS="\t"}{print "Helloworld",$0}') > outputfile.txt; };
export -f mycmd;
bsub -q short "bash -c mycmd"
However, I would like to provide the initial file name as a variable and not as hardcoded name, something such as the following, but the following doesn't work:
myinputfile="MyProgrammingBook.txt";
mycmd() { cat <(head -3 ${myinputfile}|awk -F "\t" '{OFS="\t"}{print "Helloworld",$0}') > outputfile.txt; };
export -f mycmd;
bsub -q short "bash -c mycmd"
Ultimately, mycmd() would be called inside a loop and will be utilized each time with a different variable.

export the variable too:
myinputfile="MyProgrammingBook.txt";
mycmd() { cat <(head -3 ${myinputfile}|awk -F "\t" '{OFS="\t"}{print "Helloworld",$0}') > outputfile.txt; };
export -f mycmd;
export myinputfile; # Here
bsub -q short "bash -c mycmd"

Related

Passing args to defined bash functions through GNU parallel

Let me show you a snippet of my Bash script and how I try to run parallel:
parallel -a "$file" \
-k \
-j8 \
--block 100M \
--pipepart \
--bar \
--will-cite \
_fix_col_number {} | _unify_null_value {} >> "$OUTPUT_DIR/$new_filename"
So, I am basically trying to process each line in a file in parallel using Bash functions defined inside my script. However, I am not sure how to pass each line to my defined functions "_fix_col_number" and "_unify_null_value". Whatever I do, nothing gets passed to the functions.
I am exporting the functions like this in my script:
declare -x NUM_OF_COLUMNS
export -f _fix_col_number
export -f _add_tabs
export -f _unify_null_value
The mentioned functions are:
_unify_null_value()
{
_string=$(echo "$1" | perl -0777 -pe "s/(?<=\t)\.(?=\s)//g" | \
perl -0777 -pe "s/(?<=\t)NA(?=\s)//g" | \
perl -0777 -pe "s/(?<=\t)No Info(?=\s)//g")
echo "$_string"
}
_add_tabs()
{
_tabs=""
for (( c=1; c<=$1; c++ ))
do
_tabs="$_tabs\t"
done
echo -e "$_tabs"
}
_fix_col_number()
{
line_cols=$(echo "$1" | awk -F"\t" '{ print NF }')
if [[ $line_cols -gt $NUM_OF_COLUMNS ]]; then
new_line=$(echo "$1" | cut -f1-"$NUM_OF_COLUMNS")
echo -e "$new_line\n"
elif [[ $line_cols -lt $NUM_OF_COLUMNS ]]; then
missing_columns=$(( NUM_OF_COLUMNS - line_cols ))
new_line="${1//$'\n'/}$(_add_tabs $missing_columns)"
echo -e "$new_line\n"
else
echo -e "$1"
fi
}
I tried removing {} from parallel. Not really sure what I am doing wrong.
I see two problems in the invocation plus additional problems with the functions:
With --pipepart there are no arguments. The blocks read from -a file are passed over stdin to your functions. Try the following commands to confirm this:
seq 9 > file
parallel -a file --pipepart echo
parallel -a file --pipepart cat
Theoretically, you could read stdin into a variable and pass that variable to your functions, ...
parallel -a file --pipepart 'b=$(cat); someFunction "$b"'
... but I wouldn't recommend it, especially since your blocks are 100MB each.
Bash interprets the pipe | in your command before parallel even sees it. To run a pipe, quote the entire command:
parallel ... 'b=$(cat); _fix_col_number "$b" | _unify_null_value "$b"' >> ...
_fix_col_number seems to assume its argument to be a single line, but receives 100MB blocks instead.
_unify_null_value does not read stdin, so _fix_col_number {} | _unify_null_value {} is equivalent to _unify_null_value {}.
That being said, your functions can be drastically improved. They start a lot of processes which becomes incredibly expensive for larger files. You can do some trivial improvements like combining perl ... | perl ... | perl ... into a single perl. Likewise, instead of storing everything in variables, you can process stdin directly: Just use f() { cmd1 | cmd2; } instead of f() { var=$(echo "$1" | cmd1); var=$(echo "$var" | cmd2); echo "$var"; }.
However, don't waste time on small things like these. A complete rewrite in sed, awk, or perl is easy and should outperfom every optimization on the existing functions.
Try
n="INSERT NUMBER OF COLUMNS HERE"
tabs=$(perl -e "print \"\t\" x $n")
perl -pe "s/\r?\$/$tabs/; s/\t\K(\.|NA|No Info)(?=\s)//g;" file |
cut -f "1-$n"
If you still find this too slow, leave out file; pack the command into a function, export that function and then call parallel -a file -k --pipepart nameOfTheFunction. The option --block is not necessary as pipepart will evenly split the input based on the number of jobs (can be specified with -j).

Why can't pass the variable's value into file in /etc directory?

I want to pass the value of the $ip variable into the file /etc/test.json with bash.
ip="xxxx"
sudo bash -c 'cat > /etc/test.json <<EOF
{
"server":"$ip",
}
EOF'
I expect the content of /etc/test.json to be
{
"server":"xxxx",
}
However the real content in /etc/test.json is:
{
"server":"",
}
But if I replace the target directory /etc/ with /tmp
ip="xxxx"
cat > /tmp/test.json <<EOF
{
"server":"$ip",
}
EOF
the value of the $ip variable gets passed into /tmp/test.json:
$ cat /tmp/test.json
{
"server":"xxxx",
}
In Kamil Cuk's example, the subprocess is cat > /etc/test.json which contains no variable.
sudo sh -c 'cat > /etc/test.json' << EOF
{
"server":"$ip",
}
EOF
It does not export the $ip variable at all.
Now let's make an analysis for the following:
ip="xxxx"
sudo bash -c "cat > /etc/test.json <<EOF
{
"server":\""$ip"\",
}
EOF"
The different parts in
"cat > /etc/test.json <<EOF
{
"server":\""$ip"\",
}
EOF"
will concatenate into a long string and as a command .Why can the $ip variable inherit the value from its father process here?
There are two reasons for this behavior:
Per default, variables are no passed to the environment of subsequently executed commands.
The variable is not expanded in the current context, because your command is wrapped in single quotes.
Exporting the variable
Place an export statement before the variable, see man 1 bash
The supplied names are marked for automatic export to the environment of subsequently executed commands.
And as noted by Léa Gris you also need to tell sudo to preserve the environment with the -E or --preserve-environment flag.
export ip="xxxx"
sudo -E bash -c 'cat > /etc/test.json <<EOF
{
"server":"$ip",
}
EOF'
Expand the variable in the current context:
This is the reason your second command works, you do not have any quotes around the here document in this example.
But if I replace the target directory /etc/ with /tmp [...] the value of the $ip variable gets passed into /tmp/test.json
You can change your original snippet by replacing the single quotes with double quotes and escaping the quotes around your ip:
ip="xxxx"
sudo bash -c "cat > /etc/test.json <<EOF
{
"server":\""$ip"\",
}
EOF"
Edit: For your additional questions:
In Kamil Cuk's example, the subprocess is cat > /etc/test.json which contains no variable.
sudo sh -c 'cat > /etc/test.json' << EOF
{
"server":"$ip",
}
EOF
It does not export the $ip variable at all.
Correct and you did not wrap the here document in single quotes. Therefore $ip is substituted in the current context and the string passed to subprocesses standard input is
{
"server":"xxxx",
}
So in this example the subprocess does not need to know the $ip variable.
Simple example
$ x=1
$ sudo -E sh -c 'echo $x'
[sudo] Password for kalehmann:
This echos nothing because
'echo $x' is wrapped in single quotes. $x is therefore not substituted in the current context
$x is not exported. Therefore the subprocess does not know its value.
$ export y=2
$ sudo -E sh -c 'echo $y'
[sudo] Password for kalehmann:
2
This echos 2 because
'echo $y' is wrapped in single quotes. $x is therefore not substituted in the current context
$y is exported. Therefore the subprocess does know its value.
$ z=3
$ sudo -E sh -c "echo $z"
[sudo] Password for kalehmann:
3
This echos 3 because
"echo $z" is wrapped in double quotes. $z is therefore substituted in the current context
There little need to do the here document inside the subshell. Just do it outside.
sudo tee /etc/test.json <<EOF
{
"server":"$ip",
}
EOF
or
sudo sh -c 'cat > /etc/test.json' << EOF
{
"server":"$ip",
}
EOF
Generally, it is not safe to build a fragment of JSON using string interpolation, because it requires you to ensure the variables are properly encoded. Let a tool like jq to that for you.
Pass the output of jq to tee, and use sudo to run tee to ensure that the only thing you do as root is open the file with the correct permissions.
ip="xxxx"
jq --arg x "$ip" '{server: $x}' | sudo tee /etc/test.json > /dev/.null

Set a command to a variable in bash script problem

Trying to run a command as a variable but I am getting strange results
Expected result "1" :
grep -i nosuid /etc/fstab | grep -iq nfs
echo $?
1
Unexpected result as a variable command:
cmd="grep -i nosuid /etc/fstab | grep -iq nfs"
$cmd
echo $?
0
It seems it returns 0 as the command was correct not actual outcome. How to do this better ?
You can only execute exactly one command stored in a variable. The pipe is passed as an argument to the first grep.
Example
$ printArgs() { printf %s\\n "$#"; }
# Two commands. The 1st command has parameters "a" and "b".
# The 2nd command prints stdin from the first command.
$ printArgs a b | cat
a
b
$ cmd='printArgs a b | cat'
# Only one command with parameters "a", "b", "|", and "cat".
$ $cmd
a
b
|
cat
How to do this better?
Don't execute the command using variables.
Use a function.
$ cmd() { grep -i nosuid /etc/fstab | grep -iq nfs; }
$ cmd
$ echo $?
1
Solution to the actual problem
I see three options to your actual problem:
Use a DEBUG trap and the BASH_COMMAND variable inside the trap.
Enable bash's history feature for your script and use the hist command.
Use a function which takes a command string and executes it using eval.
Regarding your comment on the last approach: You only need one function. Something like
execAndLog() {
description="$1"
shift
if eval "$*"; then
info="PASSED: $description: $*"
passed+=("${FUNCNAME[1]}")
else
info="FAILED: $description: $*"
failed+=("${FUNCNAME[1]}")
done
}
You can use this function as follows
execAndLog 'Scanned system' 'grep -i nfs /etc/fstab | grep -iq noexec'
The first argument is the description for the log, the remaining arguments are the command to be executed.
using bash -x or set -x will allow you to see what bash executes:
> cmd="grep -i nosuid /etc/fstab | grep -iq nfs"
> set -x
> $cmd
+ grep -i nosuid /etc/fstab '|' grep -iq nfs
as you can see your pipe | is passed as an argument to the first grep command.

How to set a bash variable in a compound xargs statement

I am looking for a way to set a variable in the statements passed to xargs. The value is to be manipulated in one of the commands. Using a file or another utility is an option but I am not sure why setting the bash variable in the sequence is always coming up as empty.
$ ls c*txt
codebase.txt consoleText.txt
$ ls c*txt | xargs -i bash -c "echo processing {}; v1={} && echo ${v1/txt/file}"
codebase.txt consoleText.txt
processing codebase.txt
processing consoleText.txt
The example above distills the question to the basics. I was expecting the behavior to be something like this but inline:
$ fname=codebase.txt; echo ${fname/txt/file}
codebase.file
Thank you.
This line is resolving ${v1/txt/file} to a value before the command is executed:
$ ls c*txt | xargs -i bash -c "echo processing {}; v1={} && echo ${v1/txt/file}"
And that means the bash -c doesn't even see ${v1/txt/file}
In this line the single quotes inhibit the variable substitution so echo processing {}; v1={} && echo ${v1/txt/file} is actually passed to bash -c as a parameter:
$ ls c*txt | xargs -i bash -c 'echo processing {}; v1={} && echo ${v1/txt/file}'
You could accomplish the same thing by escaping the dollar sign:
$ ls c*txt | xargs -i bash -c "echo processing {}; v1={} && echo \${v1/txt/file}"

Bash function - second parameter in a function not taken

For some reason I cannot pass the 2nd parameter to a function which is on a another file, exactly here:
$lsValidLocal | xargs -n 1 -I {} bash -c 'Push "{}" "**$inFolder**"
The Push function on functions.sh does not read the 2nd parameter $inFolder.
I tried several different ways, the only working way till now is exporting the variable to make it globally accessible (not a good solution though)
script.sh
#!/bin/bash
#other machine
export otherachine="IP_address_otherachine"
#folders
inFolder="$HOME/folderIn"
outFolder="$HOME/folderOut"
#loading functions.sh
. /home/ec2-user/functions.sh
export lsValidLocal="lsValid $inFolder"
echo $inFolder
#execution
$lsValidLocal | xargs -n 1 -I {} bash -c 'Push "{}" "$inFolder"'
functions.sh
function Push() {
local FILE=$1
local DEST=$2
scp $FILE $otherachine:$DEST &&
rm $FILE ${FILE}_0 &&
ssh $otherachine "touch ${FILE}_0"
}
function lsValid() { #from directory
local DIR=$1
ls $DIR/*_0 | sed 's/.\{2\}$//'
}
export -f Push
export -f Pull
export -f lsValid
The problem with the code you have written is that $inFolder is inside single quotes (') which will prevent it being expanded.
$lsValidLocal | xargs -n 1 -I {} bash -c 'Push "{}" "**$inFolder**"'
This will be executed as three separate layers of processes
bash <your scrpit>
|
\xargs ...
|
\bash -c Push ...
Your code is not transferring the value across from the outer shell to inner shell... But you are expanding the variable inFolder using the inner shell. As you correctly point out it can be done with an exported environment variable.
The alternative is to have the outer shell expand it before passing to xargs.
$lsValidLocal | xargs -n 1 -I {} bash -c "Push '{}' '**$inFolder**'"
Notice I have reversed ' and " to allow $inFolder to be expanded before xargs is called.

Resources