Bash alias with an argument that works with pipe and globs - bash

I have a convenience function in my bashrc file that looks like following:
function get_pattern()
{
grep "P1" $1 | grep -v "BUT_NOT_THIS" | awk -F":" '{print $(1)}' | sort -u
}
alias gp='get_pattern'
This works fine if I run it on individual file like gp file_1.c. However I am unable to run it like find . -name "*.c" -type f | xargs gp. I also fail to run it like gp *.c. How do I code get_pattern so that I can have all these conveniences.
NOTE: I have simplified the function for easier understanding. Not expecting smart grep/awk/sed/sort hacks or tweaks. The question is I have an alias that takes filenames as arguments. Want it to work with pipes, and preferably with globs.

As pointed out by the experts, aliases are not suited for your requirement.
In your function, you are only passing the first argument to grep, as in grep "P1" $1. Change it to use all arguments, this way:
function get_pattern() {
grep "P1" "$#" | grep -v "BUT_NOT_THIS" | awk -F":" '{print $(1)}' | sort -u
}
Note:
When you invoke your function as get_pattern *.c and there are matching files, the function doesn't see *.c, it sees the list of matching files. The glob expansion is done by the shell while invoking the function, but not inside the function.
In the present format, the function doesn't read from stdin. So, piping the results of another command into your function may not work. To make it accept stdin, you need to change the first grep to:
grep "P1" - "$#"
That would mess up the invocation when you intend the function to only read the files. So, it would be better to rewrite the function this way:
function get_pattern() {
if (($# > 0)); then
# arguments passed, use them as file names to grep from
grep_args=("$#")
else
# no arguments passed, grep from stdin
grep_args=(-)
fi
grep "P1" "${grep_args[#]}" | grep -v "BUT_NOT_THIS" | awk -F":" '{print $(1)}' | sort -u
}

I'm just going to focus on the issue of piping data to xargs. That doesn't work because xargs doesn't know anything about the alias. However you can effectively pass the function definition in to make it work (non-portable solution which works in bash and possibly some other shells, but don't expect it to work everywhere) with export -f:
$ foo() { echo foo: $#; }
$ echo bar baz | xargs bash -c 'foo $#' _
_: foo: command not found
$ export -f foo
$ echo bar baz | xargs bash -c 'foo $#' _
foo: bar baz
I am not aware of any way to do this with an alias, but I'm also not aware of any reason to ever use an alias. Stop using aliases.

Related

How to properly pass filenames with spaces with $* from xargs to sed via sh?

Disclaimer: this happens on macOS (Big Sur); more info about the context below.
I have to write (almost did) a script which will replace images URLs in big text (xml) files by their Base64-encoded value.
The script should run the same way with single filenames or patterns, or both, e.g.:
./replace-encode single.xml
./replace-encode pattern*.xml
./replace-encode single.xml pattern*.xml
./replace-encode folder/*.xml
Note: it should properly handle files\ with\ spaces.xml
So I ended up with this script:
#!/bin/bash
#needed for `ls` command
IFS=$'\n'
ls -1 $* | xargs -I % sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' % | xargs -tI % sh -c 'sed -i "" "s#%#`curl -s % | base64`#" $0' "$*"
What it does: ls all files, pipe the list to xargs then search all URLs surrounded by anchors (hence the > and < in the search expr. - also had to use sed because grep is limited on macOS), then pipe again to a sh script which runs the sed search & replace, where the remplacement is the big Base64 string.
This works perfectly fine... but only for fileswithoutspaces.xml
I tried to play with $0 vs $1, $* vs $#, w/ or w/o " but to no avail.
I don't understand exactly how does the variable substitution (is it how it's called? - not a native English speaker, and above all, not a script-writer at all!!! just a Java dev. all day long...) work between xargs, sh or even bash with arguments like filenames.
The xargs -t is here to let me check out how the substitution works, and that's how I noticed that using a pattern worked but I have to let the " around the last $*, otherwise only the 1st file is searched & replaced; output is like:
user#host % ./replace-encode pattern*.xml
sh -c sed -i "" "s#https://www.some.com/public/123456.jpg#`curl -s https://www.some.com/public/123456.jpg | base64`#" $0 pattern_123.xml
pattern_456.xml
Both pattern_123.xml and pattern_456.xml are handled here; w/ $* instead of "$*" in the end of the command, only pattern_123.xml is handled.
So is there a simple way to "fix" this?
Thank you.
Note: macOS commands have some limitations (I know) but as this script is intended to non-technical users, I can't ask them to install (or have the IT team installed on their behalf) some alternate GNU-versions installed e.g. pcregrep or 'ggrep' like I've read many times...
Also: I don't intend to change from xargs to for loops or so because, 1/ don't have the time, 2/ might want to optimize the 2nd step where some URLs might be duplicate or so.
There's no reason for your software to use ls or xargs, and certainly not $*.
./replace-encode single.xml
./replace-encode pattern*.xml
./replace-encode single.xml pattern*.xml
./replace-encode folder/*.xml
...will all work fine with:
#!/usr/bin/env bash
while IFS= read -r line; do
replacement=$(curl -s "$line" | base64)
in="$line" out="$replacement" perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' "$#"
done < <(sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' "$#" | sort | uniq)
Finally ended up with this single-line script:
sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' "$#" | xargs -I% sh -c 'sed -i "" "s#%#`curl -s % | base64`#" "$#"' _ "$#"
which does properly support filenames with or without spaces.

How to make a script to make multiple grep's over a file?

I want to make a script that can do the following automatically:
grep 'string1' file.txt | grep 'string2' | grep 'string3' ... | grep 'stringN'
The idea is that the script can be run like this:
myScript.sh file.txt string1 string2 string3 ... stringN
and the script has to return all the lines of file.txt that contain all the strings.
For instance, if file.txt looks like this:
hello world
hello world run
hello planet world
And I can make a grep like this:
grep hello file.txt | grep world
and I get:
hello world
hello world run
hello planet world
I want to make a script that makes this automatically, with an undefined number of strings as parameters.
I found that it is hard to achieve this, since the number of strings can be variable. First, I tried to create an array called args like this in myScript.sh:
#!/bin/bash
args=("$#")
with the purpose of storing the arguments. I know that the ${args[0]} is going to be my file.txt and the rest are the strings that I need to use in the distinct greps, but I don't know how to proceed and if this is the best approach to solve the problem. I would appreciate any suggestion about how to program this.
sed is capable of doing this perfectly with a single process, and avoids these eval shenanigans. The resulting script is actually quite simple.
#!/bin/sh
file=$1
shift
printf '\\?%s?!d\n' "$#" |
sed -f - "$file"
We generate a line of sed script for each expression; if the expression is not (!) found, we delete (d) this input line, and start over with the next one.
This assumes your sed accepts - as the argument to -f to read the script from standard input. This is not completely portable; you would perhaps need to store the generated script in a temporary file instead if this is a problem.
This uses ? as the internal regex separator. If you need a literal ? in one of the patterns, you will need to backslash-escape it. In the general case, creating a script which finds an alternative separator which is in none of the search expressions would perhaps be possible, but at that point, I'd move to a proper scripting language (Python would be my preference) instead.
You can generate the pattern of operation and save it in a variable:
pattern="$(printf 'grep %s file.txt' "$1"; printf ' | grep %s' "${#:2}" ; printf '\n')"
and then
eval "$pattern"
Example:
% cat file.txt
foo bar
bar spam
egg
% grep_gen () { pattern="$(printf 'grep %s file.txt' "$1"; printf ' | grep %s' "${#:2}" ; printf '\n')"; eval "$pattern" ;}
% grep_gen foo bar
foo bar
You can create the command in a loop and then use eval to evaluate it.
This is using cat so you can group all the grep.
#! /bin/bash
file="$1"
shift
args=( "$#" )
cmd="cat '$file'"
for a in "${args[#]}"
do
cmd+=' | '
cmd+="grep '$a'"
done
eval $cmd
An eval-free alternative:
#!/bin/bash
temp1="$(mktemp)"
temp2="$(mktemp)"
grep "$2" "$1" > temp1
for arg in "${#:3}"; do
grep "$arg" temp1 > temp2
mv temp2 temp1
done
cat temp1
rm temp1
mktemp generates a temporary file with a unique name and returns its name; it should be widely available.
The loop then executes grep for each argument and renames the second temp file for the next loop.
This is the optimization of Diego Torres Milano's code and the answer to my original question:
#! /bin/bash
file=$1
shift
cmd="cat '$file'"
for 'a' in "$#"
do
cmd+=" | grep '$a'"
done
eval $cmd

Bash code error unexpected syntax error

I am not sure why i am getting the unexpected syntax '( err
#!/bin/bash
DirBogoDict=$1
BogoFilter=/home/nikhilkulkarni/Downloads/bogofilter-1.2.4/src/bogofilter
echo "spam.."
for i in 'cat full/index |fgrep spam |awk -F"/" '{if(NR>1000)print$2"/"$3}'|head -500'
do
cat $i |$BogoFilter -d $DirBogoDict -M -k 1024 -v
done
echo "ham.."
for i in 'cat full/index | fgrep ham | awk -F"/" '{if(NR>1000)print$2"/"$3}'|head -500'
do
cat $i |$BogoFilter -d $DirBogoDict -M -k 1024 -v
done
Error:
./score.bash: line 7: syntax error near unexpected token `('
./score.bash: line 7: `for i in 'cat full/index |fgrep spam |awk -F"/" '{if(NR>1000)print$2"/"$3}'|head -500''
Uh, because you have massive syntax errors.
The immediate problem is that you have an unpaired single quote before the cat which exposes the Awk script to the shell, which of course cannot parse it as shell script code.
Presumably you want to use backticks instead of single quotes, although you should actually not read input with for.
With a fair bit of refactoring, you might want something like
for type in spam ham; do
awk -F"/" -v type="$type" '$0 ~ type && NR>1000 && i++<500 {
print $2"/"$3 }' full/index |
xargs $BogoFilter -d $DirBogoDict -M -k 1024 -v
done
This refactors the useless cat | grep | awk | head into a single Awk script, and avoids the silly loop over each output line. I assume bogofilter can read file name arguments; if not, you will need to refactor the xargs slightly. If you can pipe all the files in one go, try
... xargs cat | $BogoFilter -d $DirBogoDict -M -k 1024 -v
or if you really need to pass in one at a time, maybe
... xargs sh -c 'for f; do $BogoFilter -d $DirBogoDict -M -k 1024 -v <"$f"; done' _
... in which case you will need to export the variables BogoFilter and DirBogoDict to expose them to the subshell (or just inline them -- why do you need them to be variables in the first place? Putting command names in variables is particularly weird; just update your PATH and then simply use the command's name).
In general, if you find yourself typing the same commands more than once, you should think about how to avoid that. This is called the DRY principle.
The syntax error is due to bad quoting. The expression whose output you want to loop over should be in command substitution syntax ($(...) or backticks), not single quotes.

how to read a value from filename and insert/replace it in the file?

I have to run many python script which differ just with one parameter. I name them as runv1.py, runv2.py, runv20.py. I have the original script, say runv1.py. Then I make all copies that I need by
cat runv1.py | tee runv{2..20..1}.py
So I have runv1.py,.., runv20.py. But still the parameter v=1 in all of them.
Q: how can I also replace v parameter to read it from the file name? so e.g in runv4.py then v=4. I would like to know if there is any one-line shell command or combination of commands. Thank you!
PS: direct editing each file is not a proper solution when there are too many files.
Below for loop will serve your purpose I think
for i in `ls | grep "runv[0-9][0-9]*.py"`
do
l=`echo $i | tr -d [a-z.]`
sed -i 's/v/'"$l"'/g' runv$l.py
done
Below command was to pass the parameter to script extracted from the filename itself
ls | grep "runv[0-9][0-9]*.py" | tr -d [a-z.] | awk '{print "./runv"$0".py "$0}' | xargs sh
in the end instead of sh you can use python or bash or ksh.

how to grep multiples variable in bash

I need to grep multiple strings, but i don't know the exact number of strings.
My code is :
s2=( $(echo $1 | awk -F"," '{ for (i=1; i<=NF ; i++) {print $i} }') )
for pattern in "${s2[#]}"; do
ssh -q host tail -f /some/path |
grep -w -i --line-buffered "$pattern" > some_file 2>/dev/null &
done
now, the code is not doing what it's supposed to do. For example if i run ./script s1,s2,s3,s4,.....
it prints all lines that contain s1,s2,s3....
The script is supposed to do something like grep "$s1" | grep "$s2" | grep "$s3" ....
grep doesn't have an option to match all of a set of patterns. So the best solution is to use another tool, such as awk (or your choice of scripting languages, but awk will work fine).
Note, however, that awk and grep have subtly different regular expression implementations. It's not clear from the question whether the target strings are literal strings or regular expression patterns, and if the latter, what the expectations are. However, since the argument comes delimited with commas, I'm assuming that the pieces are simple strings and should not be interpreted as patterns.
If you want the strings to be interpreted as patterns, you can change index to match in the following little program:
ssh -q host tail -f /some/path |
awk -v STRINGS="$1" -v IGNORECASE=1 \
'BEGIN{split(STRINGS,strings,/,/)}
{for(i in strings)if(!index($0,strings[i]))next}
{print;fflush()}'
Note:
IGNORECASE is only available in gnu awk; in (most) other implementations, it will do nothing. It seems that is what you want, based on the fact that you used -i in your grep invocation.
fflush() is also an extension, although it works with both gawk and mawk. In Posix awk, fflush requires an argument; if you were using Posix awk, you'd be better off printing to stderr.
You can use extended grep
egrep "$s1|$s2|$s3" fileName
If you don't know how many pattern you need to grep, but you have all of them in an array called s, you can use
egrep $(sed 's/ /|/g' <<< "${s[#]}") fileName
This creates a herestring with all elements of the array, sed replaces the field separator of bash (space) with | and if we feed that to egrep we grep all strings that are in the array s.
test.sh:
#!/bin/bash -x
a=" $#"
grep ${a// / -e } .bashrc
it works that way:
$ ./test.sh 1 2 3
+ a=' 1 2 3'
+ grep -e 1 -e 2 -e 3 .bashrc
(here is lots of text that fits all the arguments)

Resources