bash functions with loops and pipes - bash

I have a bash script that pipes the contents of a file into a series of user defined functions each of which performs a sed operation on stdin, sending output to stdout.
For example:
#!/bin/bash
MOD_config ()
{
sed 's/config/XYZ/g'
}
MOD_ABC ()
{
sed 's/ABC/WHOA!/g'
}
cat demo.txt \
| MOD_config \
| MOD_ABC
So far so good. Everything is working great.
Now I want to allow additional pairs of pattern changes specified via the script's command line. For example, I'd like to allow the user to run any of these:
demo.sh # changes per script (MOD_config and MOD_ABC)
demo.sh CDE 345 # change all 'CDE' to '345'
demo.sh CDE 345 AAAAA abababa # .. also changes 'AAAAA' to 'abababa'
so I tried adding this to the script:
USER_MODS ()
{
if [ $# -lt 1]; then
#just echo stdin to stdout if no cmd line args exist
grep .
else
STARTING_ARGC=$#
I=1
while [ $I -lt $STARTING_ARGC ]; then
sed "s/$1/$2/g"
shift
shift
I=`expr $I + 2`
done
fi
}
cat demo.txt \
| MOD_config \
| MOD_ABC \
| USER_MODS
This approach works only if I have no command line args, or if I have only two. However, adding additional args on the command line has no effect.
Not sure exactly how to send stdout of one iteration of the while loop to the stdin of the next iteration. I think that's the crux of my problem.
Is there a fix for this? Or should I take a different approach altogether?

To have a dynamic list of pipes, you'll want a recursive solution. Have a function which applies one set of modifications and then calls itself with two fewer arguments. If the function has no arguments then simply call cat to copy stdin to stdout unchanged.
mod() {
if (($# >= 2)); then
search=$1
replace=$2
shift 2
sed "s/$search/$replace/g" | mod "$#"
else
cat
fi
}
# Apply two base modifications, plus any additional user mods ("$#")
mod config XYZ ABC 'WHOA!' "$#"

A remark: with more than 2 arguments, your seds are executed, but after the first one that has already consumed all the input. Instead you want to build up a chain of sed commands.
#!/bin/bash
mod_config() { sed 's/config/XYZ/g'; }
mod_abc() { sed 's/ABC/WHOA!/g'; }
user_mods() {
local IFS=';'
local sed_subs=()
while (($#>=2)); do
sed_subs+=( "s/$1/$2/g" )
shift 2
done
# at this point you have an array of sed s commands (maybe empty!).
# Just join then with a semi colon using the IFS already set
sed "${sed_subs[*]}"
}
cat demo.txt \
| mod_config \
| mod_abc \
| user_mods "$#" # <--- don't forget to pass the arguments to your function
And pray that your users aren't going to input stuff that will confuse sed, e.g., a slash!
(And sorry, I lowercased all your variables. Uppercases are sooooo ugly).

Try this, uses recursive call to go down the list of replacement pairs calling USER_MODS each time.
#!/bin/bash
MOD_config ()
{
sed 's/config/XYZ/g'
}
MOD_ABC ()
{
sed 's/ABC/WHOA!/g'
}
USER_MODS ()
{
if [ $# -lt 1 ]; then
#just echo stdin to stdout if no args exist
grep .
else
# grap the next two arguments
arg1=$1
arg2=$2
# remove them from the argument list
shift
shift
# do the replacement for these two and recursivly pipe to the function with
# the new argument list
sed "s/$arg1/$arg2/g" | USER_MODS $#
fi
}
cat demo.txt \
| MOD_config \
| MOD_ABC \
| USER_MODS $#

Related

Create a chained command line in bash function

I have a question: I would like to create a function that (dependantly from number of entered arguments) would create so called "cained" command line. The current code I wrote look as follow:
function ignore {
if [ -n "$#" ] && [ "$#" > 0 ]; then
count=$#
if [ ${count} -eq 1 ]; then
return "grep -iv $1"
else
for args in "$#" do
## Here should be code that would put (using pseudo code) as many "grep -iv $a | grep -iv $(a+1) | ... | grep -iv $(a+n)", where the part "$(a+..) represent the placeholder of next argument"
done
fi
fi
}
Any ideas? Thanks
Update
I would like to precise above. The above functions would become used as following:
some_bash_function | ignore
example:
apt-get cache search apache2 | ignore doc lib
Maybe this will help bit more
This seems horribly inefficient. A much better solution would look like grep -ive "${array[0]}" -e "${array[1]}" -e "${array[2]}" etc. Here's a simple way to build that.
# don't needlessly use Bash-only function declaration syntax
ignore () {
local args=()
local t
for t; do
args+=(-e "$t")
done
grep -iv "${args[#]}"
}
In the end, git status | ignore foo bar baz is not a lot simpler than git status | grep -ive foo -e bar -e baz so this function might not be worth these 116 bytes (spaces included). But hopefully at least this can work to demonstrate a way to build command lines programmatically. The use of arrays is important; there is no good way to preserve quoting of already quoted values if you smash everything into a single string.
A more sustainable solution still is to just combine everything into a single regex. You can do that with grep -iv 'foo\|bar\|baz' though personally, I would probably switch to the more expressive regex dialect of grep -E; grep -ivE 'foo|bar|baz'.
If you really wanted to build a structure of pipes, I guess a recursive function would work.
# FIXME: slow and ugly, prefer the one above
ignore_slowly () {
if [[ $# -eq 1 ]]; then
grep -iv "$1"
else
local t=$1
shift
grep -iv "$t" | ignore_slowly "$#"
fi
}
But generally, you want to minimize the number of processes you create.
Though inefficient, what you want can be done like this:
#!/bin/bash
ignore () {
printf -v pipe 'grep -iv %q | ' "$#"
pipe=${pipe%???} # remove trailing ' | '
bash -c "$pipe"
}
ignore 'regex1' 'regex2' … 'regexn' < file

Passing args to defined bash functions through GNU parallel

Let me show you a snippet of my Bash script and how I try to run parallel:
parallel -a "$file" \
-k \
-j8 \
--block 100M \
--pipepart \
--bar \
--will-cite \
_fix_col_number {} | _unify_null_value {} >> "$OUTPUT_DIR/$new_filename"
So, I am basically trying to process each line in a file in parallel using Bash functions defined inside my script. However, I am not sure how to pass each line to my defined functions "_fix_col_number" and "_unify_null_value". Whatever I do, nothing gets passed to the functions.
I am exporting the functions like this in my script:
declare -x NUM_OF_COLUMNS
export -f _fix_col_number
export -f _add_tabs
export -f _unify_null_value
The mentioned functions are:
_unify_null_value()
{
_string=$(echo "$1" | perl -0777 -pe "s/(?<=\t)\.(?=\s)//g" | \
perl -0777 -pe "s/(?<=\t)NA(?=\s)//g" | \
perl -0777 -pe "s/(?<=\t)No Info(?=\s)//g")
echo "$_string"
}
_add_tabs()
{
_tabs=""
for (( c=1; c<=$1; c++ ))
do
_tabs="$_tabs\t"
done
echo -e "$_tabs"
}
_fix_col_number()
{
line_cols=$(echo "$1" | awk -F"\t" '{ print NF }')
if [[ $line_cols -gt $NUM_OF_COLUMNS ]]; then
new_line=$(echo "$1" | cut -f1-"$NUM_OF_COLUMNS")
echo -e "$new_line\n"
elif [[ $line_cols -lt $NUM_OF_COLUMNS ]]; then
missing_columns=$(( NUM_OF_COLUMNS - line_cols ))
new_line="${1//$'\n'/}$(_add_tabs $missing_columns)"
echo -e "$new_line\n"
else
echo -e "$1"
fi
}
I tried removing {} from parallel. Not really sure what I am doing wrong.
I see two problems in the invocation plus additional problems with the functions:
With --pipepart there are no arguments. The blocks read from -a file are passed over stdin to your functions. Try the following commands to confirm this:
seq 9 > file
parallel -a file --pipepart echo
parallel -a file --pipepart cat
Theoretically, you could read stdin into a variable and pass that variable to your functions, ...
parallel -a file --pipepart 'b=$(cat); someFunction "$b"'
... but I wouldn't recommend it, especially since your blocks are 100MB each.
Bash interprets the pipe | in your command before parallel even sees it. To run a pipe, quote the entire command:
parallel ... 'b=$(cat); _fix_col_number "$b" | _unify_null_value "$b"' >> ...
_fix_col_number seems to assume its argument to be a single line, but receives 100MB blocks instead.
_unify_null_value does not read stdin, so _fix_col_number {} | _unify_null_value {} is equivalent to _unify_null_value {}.
That being said, your functions can be drastically improved. They start a lot of processes which becomes incredibly expensive for larger files. You can do some trivial improvements like combining perl ... | perl ... | perl ... into a single perl. Likewise, instead of storing everything in variables, you can process stdin directly: Just use f() { cmd1 | cmd2; } instead of f() { var=$(echo "$1" | cmd1); var=$(echo "$var" | cmd2); echo "$var"; }.
However, don't waste time on small things like these. A complete rewrite in sed, awk, or perl is easy and should outperfom every optimization on the existing functions.
Try
n="INSERT NUMBER OF COLUMNS HERE"
tabs=$(perl -e "print \"\t\" x $n")
perl -pe "s/\r?\$/$tabs/; s/\t\K(\.|NA|No Info)(?=\s)//g;" file |
cut -f "1-$n"
If you still find this too slow, leave out file; pack the command into a function, export that function and then call parallel -a file -k --pipepart nameOfTheFunction. The option --block is not necessary as pipepart will evenly split the input based on the number of jobs (can be specified with -j).

How can I pass input to output in bash?

I am trying to streamline a README, where I can easily pass commands and their outputs to a document. This step seems harder than I thought it would be.
I am trying to pass the input and output to a file, but everything I am trying just either displays echo test or test
The latest iteration, which is becoming absurd is:
echo test | xargs echo '#' | cat <(echo) <(cat -) just shows # test
I would like the results to be:
echo test
# test
You can make a bash function to demonstrate a command and its output like this:
democommand() {
printf '#'
printf ' %q' "$#"
printf '\n'
"$#"
}
This prints "#", then each argument the function was passed (i.e. the command and its arguments) with a space before each one (and the %q makes it quote/escape them as needed), then a newline, and then finally it runs all of its arguments as a command. Here's an example:
$ democommand echo test
# echo test
$ democommand ls
# ls
Desktop Downloads Movies Pictures Sites
Documents Library Music Public
Now, as for why your command didn't work... well, I'm not clear what you thought it was doing, but here's what it's actually doing:
The first command in the pipeline, echo test, simply prints the string "test" to its standard output, which is piped to the next command in the chain.
'xargs echo '#'takes its input ("test") and adds it to the command it's given (echo '#') as additional arguments. Essentially, it executes the commandecho '#' test`. This outputs "# test" to the next command in the chain.
cat <(echo) <(cat -) is rather complicated, so let's break it down:
echo prints a blank line
cat - simply copies its input (which is at this point in the pipeline is still coming from the output of the xargs command, i.e. "# test").
cat <(echo) <(cat -) takes the output of those two <() commands and concatenates them together, resulting in a blank line followed by "# test".
Pass the command as a literal string so that you can both print and evaluate it:
doc() { printf '$ %s\n%s\n' "$1" "$(eval "$1")"; }
Running:
doc 'echo foo | tr f c' > myfile
Will make myfile contain:
$ echo foo | tr f c
coo

Read a file and replace ${1}, ${2}... value with string

I have a file template.txt and its content is below:
param1=${1}
param2=${2}
param3=${3}
I want to replace ${1},{2},${3}...${n} string values by elements of scriptParams variable.
The below code, only replaces first line.
scrpitParams="test1,test2,test3"
cat template.txt | for param in ${scriptParams} ; do i=$((++i)) ; sed -e "s/\${$i}/$param/" ; done
RESULT:
param1=test1
param2=${2}
param3=${3}
EXPECTED:
param1=test1
param2=test2
param3=test3
Note: I don't want to save replaced file, want to use its replaced value.
If you intend to use an array, use a real array. sed is not needed either:
$ cat template
param1=${1}
param2=${2}
param3=${3}
$ scriptParams=("test one" "test two" "test three")
$ while read -r l; do for((i=1;i<=${#scriptParams[#]};i++)); do l=${l//\$\{$i\}/${scriptParams[i-1]}}; done; echo "$l"; done < template
param1=test one
param2=test two
param3=test three
Learn to debug:
cat template.txt | for param in ${scriptParams} ; do i=$((++i)) ; echo $i - $param; done
1 - test1,test2,test3
Oops..
scriptParams="test1 test2 test3"
cat template.txt | for param in ${scriptParams} ; do i=$((++i)) ; echo $i - $param; done
1 - test1
2 - test2
3 - test3
Ok, looks better...
cat template.txt | for param in ${scriptParams} ; do i=$((++i)) ; sed -e "s/\${$i}/$param/" ; done
param1=test1
param2=${2}
param3=${3}
Ooops... so what's the problem? Well, the first sed command "eats" all the input. You haven't built a pipeline, where one sed command feeding the next... You have three seds trying to read the same input. Obviously the first one processed the whole input.
Ok, let's take a different approach, let's create the arguments for a single sed command (note: the "" is there to force echo not to interpret -e as an command line switch).
sedargs=$(for param in ${scriptParams} ; do i=$((++i)); echo "" -e "s/\${$i}/$param/"; done)
cat template.txt | sed $sedargs
param1=test1
param2=test2
param3=test3
That's it. Note that this isn't perfect, you can have all sort of problems if the replace texts are complex (e.g.: contain space).
Let me think how to do this in a better way... (well, the obvious solution which comes to mind is not to use a shell script for this task...)
Update:
If you want to build a proper pipeline, here are some solutions: How to make a pipe loop in bash
You can do that with just bash alone:
#!/bin/bash
scriptParams=("test1" "test2" "test3") ## Better store it as arrays.
while read -r line; do
for i in in "${!scriptParams[#]}"; do ## Indices of array scriptParams would be populated to i starting at 0.
line=${line/"\${$((i + 1))}"/"${scriptParams[i]}"} ## ${var/p/r} replaces patterns (p) with r in the contents of var. Here we also add 1 to the index to fit with the targets.
done
echo "<br>$line</br>"
done < template.txt
Save it in a script and run bash script.sh to get an output like this:
<br>param1=test1</br>
<br>param2=test2</br>
<br>param3=test3</br>

How can I expand arguments to a bash function into a chain of piped commands?

I often find myself doing something like this a lot:
something | grep cat | grep bat | grep rat
when all I recall is that those three words must have occurred somewhere, in some order, in the output of something...Now, i could do something like this:
something | grep '.*cat.*bat.*rat.*'
but that implies ordering (bat appears after cat). As such, I was thinking of adding a bash function to my environment called mgrep which would turn:
mgrep cat bat rat
into
grep cat | grep bat | grep rat
but I'm not quite sure how to do it (or whether there is an alternative?). One idea would be to for loop over the parameters like so:
while (($#)); do
grep $1 some_thing > some_thing
shift
done
cat some_thing
where some_thing is possibly some fifo like when one does >(cmd) in bash but I'm not sure. How would one proceed?
I believe you could generate a pipeline one command at a time, by redirecting stdin at each step. But it's much simpler and cleaner to generate your pipeline as a string and execute it with eval, like this:
CMD="grep '$1' " # consume the first argument
shift
for arg in "$#" # Add the rest in a pipeline
do
CMD="$CMD | grep '$arg'"
done
eval $CMD
This will generate a pipeline of greps that always reads from standard input, as in your model. Note that it protects spaces in quoted arguments, so that it works correctly if you write:
mgrep 'the cat' 'the bat' 'the rat'
Thanks to Alexis, this is what I did:
function mgrep() #grep multiple keywords
{
CMD=''
while (($#)); do
CMD="$CMD grep \"$1\" | "
shift
done
eval ${CMD%| }
}
You can write a recursive function; I'm not happy with the base case, but I can't think of a better one. It seems a waste to need to call cat just to pass standard input to standard output, and the while loop is a bit inelegant:
mgrep () {
local e=$1;
# shift && grep "$e" | mgrep "$#" || while read -r; do echo "$REPLY"; done
shift && grep "$e" | mgrep "$#" || cat
# Maybe?
# shift && grep "$e" | mgrep "$#" || echo "$(</dev/stdin)"
}

Resources