Find lines containing all keywords in bash script - bash

Essentially, I would like something that behaves similarly to:
cat file | grep -i keyword1 | grep -i keyword2 | grep -i keyword3
How can I do this with a bash script that takes a variable-length list of keyword arguments? The script should do a case-insensitive match of lines containing all keywords.

Use this as a script
#! /bin/bash
awk -v IGNORECASE=1 -f <(
P=; for k; do [ -z "$P" ] && P="/$k/" || P="$P&&/$k/"; done
echo "$P{print}"
)
and invoke it as
script.sh keyword1 keyword2 keyword3 < file

I don't know if this is efficient, and I think this is ugly, also there might be some utility for that, but:
#!/bin/bash
unset keywords matchlist
keywords=("$#")
for kw in "${keywords[#]}"; do
matchlist="$matchlist /$kw/ &&"
done
matchlist="${matchlist% &&}"
# awk "$matchlist { print; }" < <(tr '[:upper:]' '[:lower:]' <file)
awk "$matchlist { print; }" file
And yes, it needs some robustness regarding special characters and stuff. It's just to show the idea.

Give this a try:
shopt -s nocasematch
keywords="keyword1|keyword2|keyword3"
while read line; do [[ $line =~ $keywords ]] && echo $line; done < file
Edit:
Here's a version that tests for all keywords being present, not just any:
keywords=(keyword1 keyword2 keyword3) # or keywords=("$#")
qty=${#keywords[#]}
while read line
do
count=0
for keyword in "${keywords[#]}"
do
[[ "$line" =~ $keyword ]] && (( count++ ))
done
if (( count == qty ))
then
echo $line
fi
done < textlines

Found a way to do this with grep.
KEYWORDS=$#
MATCH_EXPR="cat file"
for keyword in ${KEYWORDS};
do
MATCH_EXPR="${MATCH_EXPR} | grep -i ${keyword}"
done
eval ${MATCH_EXPR}

you can use bash 4.0++
shopt -s nocasematch
while read -r line
do
case "$line" in
*keyword1*) f=1;;&
*keyword2*) g=1;;&
*keyword3*)
[ "$f" -eq 1 ] && [ "$g" -eq 1 ] && echo $line;;
esac
done < "file"
shopt -u nocasematch
or gawk
gawk '/keyword/&&/keyword2/&&/keyword3/' file

I'd do it in Perl.
For finding all lines that contain at least one of them:
perl -ne'print if /(keyword1|keyword2|keyword3)/i' file
For finding all lines that contain all of them:
perl -ne'print if /keyword1/i && /keyword2/i && /keyword3/i' file

Here is a script called search.sh in bash that will search lines within a file or folder for all keywords specified:
#!/bin/bash
if [ $# -lt 2 ]; then
echo "[-] $0 file_to_search/folder_to_search keyword1 keyword2 keyword3 ..."
exit
fi
all_args="$#"
i=0
results="" # this will store the cumulative results from each keyword search
for arg in $all_args; do
if [ $i -eq 0 ]; then
# first argument is the file/folder to search
file_to_search="$arg"
i=$(($i + 1))
elif [ $i -eq 1 ]; then
# search the file/folder with first keyword (first search)
results=`grep --color=always -r -n -i "$arg" "$file_to_search"`
i=$(($i + 1))
else
# now keep searching the results from first search for other keywords
results=`echo "$results" | grep --color=always -i "$arg"`
i=$(($i + 1))
fi
done
echo "$results"
Example invocation of script above will search the 'tools.txt' file for 'python' and 'jira' keywords:
./search.sh tools.txt python jira

Related

Expand shell glob in variable into array

In a bash script I have a variable containing a shell glob expression that I want to expand into an array of matching file names (nullglob turned on), like in
pat='dir/*.config'
files=($pat)
This works nicely, even for multiple patterns in $pat (e.g., pat="dir/*.config dir/*.conf), however, I cannot use escape characters in the pattern. Ideally, I would like to able to do
pat='"dir/*" dir/*.config "dir/file with spaces"'
to include the file *, all files ending in .config and file with spaces.
Is there an easy way to do this? (Without eval if possible.)
As the pattern is read from a file, I cannot place it in the array expression directly, as proposed in this answer (and various other places).
Edit:
To put things into context: What I am trying to do is to read a template file line-wise and process all lines like #include pattern. The includes are then resolved using the shell glob. As this tool is meant to be universal, I want to be able to include files with spaces and weird characters (like *).
The "main" loop reads like this:
template_include_pat='^#include (.*)$'
while IFS='' read -r line || [[ -n "$line" ]]; do
if printf '%s' "$line" | grep -qE "$template_include_pat"; then
glob=$(printf '%s' "$line" | sed -nrE "s/$template_include_pat/\\1/p")
cwd=$(pwd -P)
cd "$targetdir"
files=($glob)
for f in "${files[#]}"; do
printf "\n\n%s\n" "# FILE $f" >> "$tempfile"
cat "$f" >> "$tempfile" ||
die "Cannot read '$f'."
done
cd "$cwd"
else
echo "$line" >> "$tempfile"
fi
done < "$template"
Using the Python glob module:
#!/usr/bin/env bash
# Takes literal glob expressions on as argv; emits NUL-delimited match list on output
expand_globs() {
python -c '
import sys, glob
for arg in sys.argv[1:]:
for result in glob.iglob(arg):
sys.stdout.write("%s\0" % (result,))
' _ "$#"
}
template_include_pat='^#include (.*)$'
template=${1:-/dev/stdin}
# record the patterns we were looking for
patterns=( )
while read -r line; do
if [[ $line =~ $template_include_pat ]]; then
patterns+=( "${BASH_REMATCH[1]}" )
fi
done <"$template"
results=( )
while IFS= read -r -d '' name; do
results+=( "$name" )
done < <(expand_globs "${patterns[#]}")
# Let's display our results:
{
printf 'Searched for the following patterns, from template %q:\n' "$template"
(( ${#patterns[#]} )) && printf ' - %q\n' "${patterns[#]}"
echo
echo "Found the following files:"
(( ${#results[#]} )) && printf ' - %q\n' "${results[#]}"
} >&2

How to browse a line from a file?

I have a file that contains 10 lines with this sort of content:
aaaa,bbb,132,a.g.n.
I wanna walk throw every line, char by char and put the data before the " , " is met in an output file.
if [ $# -eq 2 ] && [ -f $1 ]
then
echo "Read nr of fields to be saved or nr of commas."
read n
nrLines=$(wc -l < $1)
while $nrLines!="1" read -r line || [[ -n "$line" ]]; do
do
for (( i=1; i<=$n; ++i ))
do
while [ read -r -n1 temp ]
do
if [ temp != "," ]
then
echo $temp > $(result$i)
else
fi
done
paste -d"\n" $2 $(result$i)
done
nrLines=$($nrLines-1)
done
else
echo "File not found!"
fi
}
In parameter $2 I have an empty file in which I will store the data from file $1 after I extract it without the " , " and add a couple of comments.
Example:
My input_file contains:
a.b.c.d,aabb,comp,dddd
My output_file is empty.
I call my script: ./script.sh input_file output_file
After execution the output_file contains:
First line info: a.b.c.d
Second line info: aabb
Third line info: comp
(yes, without the 4th line info)
You can do what you want very simply with parameter-expansion and substring-removal using bash alone. For example, take an example file:
$ cat dat/10lines.txt
aaaa,bbb,132,a.g.n.
aaaa,bbb,133,a.g.n.
aaaa,bbb,134,a.g.n.
aaaa,bbb,135,a.g.n.
aaaa,bbb,136,a.g.n.
aaaa,bbb,137,a.g.n.
aaaa,bbb,138,a.g.n.
aaaa,bbb,139,a.g.n.
aaaa,bbb,140,a.g.n.
aaaa,bbb,141,a.g.n.
A simple one-liner using native bash string handling could simply be the following and give the following results:
$ while read -r line; do echo ${line%,*}; done <dat/10lines.txt
aaaa,bbb,132
aaaa,bbb,133
aaaa,bbb,134
aaaa,bbb,135
aaaa,bbb,136
aaaa,bbb,137
aaaa,bbb,138
aaaa,bbb,139
aaaa,bbb,140
aaaa,bbb,141
Paremeter expansion w/substring removal works as follows:
var=aaaa,bbb,132,a.g.n.
Beginning at the left and removing up to, and including, the first ',' is:
${var#*,} # bbb,132,a.g.n.
Beginning at the left and removing up to, and including, the last ',' is:
${var##*,} # a.g.n.
Beginning at the right and removing up to, and including, the first ',' is:
${var%,*} # aaaa,bbb,132
Beginning at the left and removing up to, and including, the last ',' is:
${var%%,*} # aaaa
Note: the text to remove above is represented with a wildcard '*', but wildcard use is not required. It can be any allowable text. For example, to only remove ,a.g.n where the preceding number is 136, you can do the following:
${var%,136*},136 # aaaa,bbb,136 (all others unchanged)
To print 2016 th line from a file named file.txt u have to run a command like this-
sed -n '2016p' < file.txt
More-
sed -n '2p' < file.txt
will print 2nd line
sed -n '2011p' < file.txt
2011th line
sed -n '10,33p' < file.txt
line 10 up to line 33
sed -n '1p;3p' < file.txt
1st and 3th line
and so on...
For more detail, please have a look in this tutorial and this answer.
In native bash the following should do what you want, assuming you replace the contents of your script.sh with the below:
#!/bin/bash
IN_FILE=${1}
OUT_FILE=${2}
IFS=\,
while read line; do
set -- ${line}
for ((i=1; i<=${#}; i++)); do
((${i}==4)) && continue
((n+=1))
printf '%s\n' "Line ${n} info: ${!i}"
done
done < ${IN_FILE} > ${OUT_FILE}
This will not print the 4th field of each line within the input file, on a new line in the output file (I assume this is your requirement as per your comment?).
[wspace#wspace sandbox]$ awk -F"," 'BEGIN{OFS="\n"}{for(i=1; i<=NF-1; i++){print "line Info: "$i}}' data.txt
line Info: a.b.c.d
line Info: aabb
line Info: comp
This little snippet can ignore the last field.
updated:
#!/usr/bin/env bash
if [ ! -f "$1" -o $# -ne 2 ];then
echo "Usage: $(basename $0) input_file out_file"
exit 127
fi
input_file=$1
output_file=$2
: > $output_file
if [ "$(wc -l < $1)" -ne 0 ];then
while true
do
read -r -n1 char
if [ "$char" == "" ];then
break
elif [ $char != "," ];then
temp=$temp$char
else
echo "line info: $temp" >> $output_file
temp=""
fi
done < $input_file
else
echo "file $1 is empty"
fi
Maybe this is what you want
Did you try
sed "s|,|\n|g" $1 | head -n -1 > $2
I assume that only the last word would not have a comma on its right.
Try this (tested with you sample line) :
#!/bin/bash
# script.sh
echo "Number of fields to save ?"
read nf
while IFS=$',' read -r -a arr; do
newarr=${arr[#]:0:${nf}}
done < "$1"
for i in ${newarr[#]};do
printf "%s\n" $i
done > "$2"
Execute script with :
$ ./script.sh inputfile outputfile
Number of fields ?
3
$ cat outputfile
a.b.c.d
aabb
comp
All words separated with commas are stored into an array $arr
A tmp array $newarr removes last $n element ($n get the read command).
It loops over new array and prints result in $2, the outputfile.

bash, adding string after a line

I'm trying to put together a bash script that will search a bunch of files and if it finds a particular string in a file, it will add a new line on the line after that string and then move on to the next file.
#! /bin/bash
echo "Creating variables"
SEARCHDIR=testfile
LINENUM=1
find $SEARCHDIR* -type f -name *.xml | while read i; do
echo "Checking $i"
ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
cat $i | while read LINE; do
((LINENUM=LINENUM+1))
if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
echo "editing $i"
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
fi
done
fi
LINENUM=1
done
the bit I'm having trouble with is
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
if I just use $i at the end, it will output the content to the screen, if I use $i > $i then it will just erase the file and if I use $i >> $i it will get stuck in a loop until the disk fills up.
any suggestions?
Unfortunately awk dosen't have an in-place replacement option, similar to sed's -i, so you can create a temp file and then remove it:
awk '{commands}' file > tmpfile && mv tmpfile file
or if you have GNU awk 4.1.0 or newer, the -i inplace is added, so you can do:
awk -i inplace '{commands}' file
to modify the original
#cat $i | while read LINE; do
# ((LINENUM=LINENUM+1))
# if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
# echo "editing $i"
# awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
# fi
# done
# replaced by
sed -i 's/STRING_TO_SEARCH_FOR/&\n/g' ${i}
or use awk in place of sed
also
# ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
# if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
#by
if [ $( grep -c 'STRING_TO_SEARCH_FOR' ${i} ) -gt 0 ]; then
# if file are huge, if not directly used sed on it, it will be faster (but no echo about finding the file)
If you can, maybe use a temporary file?
~$ awk ... $i > tmpfile
~$ mv tmpfile $i
Or simply awk ... $i > tmpfile && mv tmpfile $i
Note that, you can use mktemp to create this temporary file.
Otherwise, with sed you can insert a line right after a match:
~$ cat f
auie
nrst
abcd
efgh
1234
~$ sed '/abcd/{a\
new_line
}' f
auie
nrst
abcd
new_line
efgh
1234
The command search if the line matches /abcd/, if so, it will append (a\) the line new_line.
And since sed as the -i to replace inline, you can do:
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
echo "editing $i"
sed -i "/STRING_TO_SEARCH_FOR/{a
\new line to insert
}" $i
fi

count number of lines from several files and store count from all text files into one variable using for loop

I want to count number of lines from many text files and then store them into a variable to find the lowest number. I am trying to do this in for loop but it stores only the result from last text file in loop.
for txt in home/data/*.txt
do
count_txt=$(cat $txt | wc -l) | bc
done
Thanks
give this one-liner a try:
wc -l /path/*.txt|awk 'NR==1{m=$1}{m=($1*1)<m?($1*1):m}END{print m}'
shopt -s nullglob
FILES=(home/data/*.txt) LOWEST_COUNT='N/A' FILE=''
[[ ${#FILES[#]} -gt 0 ]] && read -r LOWEST_COUNT FILE < <(exec wc -l "${FILES[#]}" | sort -n)
echo "$LOWEST_COUNT | $FILE"
You just need something like this (using GNU awk for ENDFILE):
awk 'ENDFILE{min = (min < FNR ? min : FNR)} END{print min}' home/data/*.txt
Update : According to EdMorton comment, awk is the right tool to use to solve this kind of problem, this approach isn't a final implementation and it fails for some filenames ( like filenames with spaces ),to conclude, awk is way more performant and reliable
If you want to use a for loop, you can do something like this :
#!/bin/bash
MAX="0"
MIN="INIT"
for F in home/data/*.txt
do
NBLINE=$(cat $F | wc -l)
if [[ "$NBLINE" -gt "$MAX" ]] ; then
MAX="$NBLINE"
BIG_FILE="$F"
fi
if [[ "$MIN" == "INIT" ]] ; then
MIN="$NBLINE"
SMA_FILE="$F"
fi
if [[ "$NBLINE" -lt "$MIN" ]] ; then
MIN="$NBLINE"
SMA_FILE="$F"
fi
done
echo "File = $BIG_FILE -- Lines = $MAX"
echo "File = $SMA_FILE -- Lines = $MIN"
exit

Bash: Native way to check if an entry is one line?

I have a find script that automatically opens a file if just one file is found. The way I currently handle it is doing a word count on the number of lines of the search results. Is there an easier way to do this?
if [ "$( cat "$temp" | wc -l | xargs echo )" == "1" ]; then
edit `cat "$temp"`
fi
EDITED - here is the context of the whole script.
term="$1"
temp=".aafind.txt"
find src sql common -iname "*$term*" | grep -v 'src/.*lib' >> "$temp"
if [ ! -s "$temp" ]; then
echo "ΓΈ - including lib..." 1>&2
find src sql common -iname "*$term*" >> "$temp"
fi
if [ "$( cat "$temp" | wc -l | xargs echo )" == "1" ]; then
# just open it in an editor
edit `cat "$temp"`
else
# format output
term_regex=`echo "$term" | sed "s%\*%[^/]*%g" | sed "s%\?%[^/]%g" `
cat "$temp" | sed -E 's%//+%/%' | grep --color -E -i "$term_regex|$"
fi
rm "$temp"
Unless I'm misunderstanding, the variable $temp contains one or more filenames, one per line, and if there is only one filename it should be edited?
[ $(wc -l <<< "$temp") = "1" ] && edit "$temp"
If $temp is a file containing filenames:
[ $(wc -l < "$temp") = "1" ] && edit "$(cat "$temp")"
Several of the results here will read through an entire file, whereas one can stop and have an answer after one line and one character:
if { IFS='' read -r result && ! read -n 1 _; } <file; then
echo "Exactly one line: $result"
else
echo "Either no valid content at all, or more than one line"
fi
For safely reading from find, if you have GNU find and bash as your shell, replace <file with < <(find ...) in the above. Even better, in that case, is to use NUL-delimited names, such that filenames with newlines (yes, they're legal) don't trip you up:
if { IFS='' read -r -d '' result && ! read -r -d '' -n 1 _; } \
< <(find ... -print0); then
printf 'Exactly one file: %q\n' "$result"
else
echo "Either no results, or more than one"
fi
Well, given that you are storing these results in the file $temp this is a little easier:
[ "$( wc -l < $temp )" -eq 1 ] && edit "$( cat $temp )"
Instead of 'cat $temp' you can do '< $temp', but it might take away some readability if you are not very familiar with redirection 8)
If you want to test whether the file is empty or not, test -s does that.
if [ -s "$temp" ]; then
edit `cat "$temp"`
fi
(A non-empty file by definition contains at least one line. You should find that wc -l agrees.)
If you genuinely want a line count of exactly one, then yes, it can be simplified substantially;
if [ $( wc -l <"$temp" ) = 1 ]; then
edit `cat "$temp"`
fi
You can use arrays:
x=($(find . -type f))
[ "${#x[*]}" -eq 1 ] && echo "just one || echo "many"
But you might have problems in case of filenames with whitespace, etc.
Still, something like this would be a native way
no this is the way, though you're making it over-complicated:
if [ "`wc -l $temp | cut -d' ' -f1`" = "1" ]; then
edit "$temp";
fi
what's complicating it is:
useless use of cat,
unuseful use of xargs
and I'm not sure if you really want the editcat $temp`` which is editing the file at the content of $temp

Resources