shell : parse file and tell if pattern exist or not - shell

I'm trying to make a pretty simple script which parse a file and then tell me if the string i'm looking for exists.
I can read the txt file line by line and then use a grep. But I can't test if the string does not exists and I don't know why.
#!/bin/bash
cat file.txt | grep '<span>my name is john</span>' -i | while IFS= read line ; do
if test -z "$line"
then
echo "\$line is empty" <--- Can't get here
else
echo "\$line is NOT empty"
fi
done

If you are trying to see which lines do and which don't -
while read line # simplistic - see other posts on handling with more finesse
do case "$line" in # replaces grep
*"$yourString"*) echo "found" ;;
*) echo "none" ;;
esac
done < file.txt # no need for cat
Alternately,
grep -i '<span>my name is john</span>' file.txt
gives you all the hits, and
grep -iv '<span>my name is john</span>' file.txt
gives you all the non-hits. Otherwise, you should probably put more info in your output for it to be useful.

Related

How can I grep a list of names from case?

So as an example, I have a bunch of apps that are constantly writing to /var/log/app//nonsence.file there's nothing else those folders, just logs from this one set of apps. so I can easily do:
cat /var/log/app/*/nonsence.file
and I'll get a nice stream of the app logs.
Mixed into this stream are periodic references to people. I'd like to build a script to trigger when certain names appear in the stream.
I can do this easily enough:
cat /var/log/app/*/nonsence.file | grep 'greg|john|suzy|stacy'
and I can put THAT into a simple script thusly:
#!/bin/sh
NAME=`cat /var/log/app/*/nonsence.file | grep 'greg\|john\|suzy\|stacy'`
case "$NAME" in
"greg" ) echo "I found greg!" >> ~/names.meh ;;
"john" ) echo "I found john!" >> ~/names.meh ;;
"suzy" ) echo "I found suzy!" >> ~/names.meh ;;
"stacy" ) echo "I found stacy!" >> ~/names.meh ;;
* ) echo "forever alone..." >> ~/names.meh ;;
esac
easy peasy!
the trouble is, the list of names change from time to time and I would really like a neater list.
After some thinking I believe what I REALLY want to do is add each name into the case section only. so what do I need to do in the NAME variable section to tell the command to grep the name referenced in the case section?
cat file | grep is a useless use of cat. Just grep file.
Command in a pipe are by default block buffered.
The >> ~/names.meh is just repetition. Just specify it once for the whole block.
The backticks ` are discouraged. It's preferred to use $(..) instead.
Each time NAME=... is assigned the file is read, while you seem to want to want:
... I'd like to build a script to trigger when certain names appear in the stream.
which suggest you want to react when the name appears in the script, not after some time.
You may try:
patterns=(greg john suzy stacy)
printf "%s\n" /var/log/app/*/nonsence.file |
# tail each file at the same time by spawning for each a background process
xargs -P0 -n1 tail -F -n+1 |
# grep for the patterns
# pass the patterns from a file
# the <(...) is a process substitution, a bash extension
grep --line-buffered -f <(printf "%s\n" "${patterns[#]}") -o |
# for each grepped content execute different action
while IFS= read -r line; do
case "$line" in)
"greg") someaction; ;;
# etc
*) echo "Internal error - unhandled pattern"; ;;
esac
done >> ~/names.me
Because specyfing patterns twice is lame, you could do an associative function to map the patterns to function names, or just use unique function names and geenerate from them the pattern list:
pattern_greg() { echo "greg"; }
pattern_kamil() { echo "well, not greg"; }
patterns=($(declare -F | sed 's/declare -f //; /^pattern_/!d; s/pattern_//'))
... |
while IFS= read -r line; do
if declare -f pattern_"$line" >/dev/null 2>&1; then
pattern_"$line"
else
echo "Internal error occured"
fi
done
alternatively, but I like the functions better:
greg_function() { echo do something; }
kamil_callback() { echo do something else; }
declare -A patterns
patterns=([greg]=greg_function [kamil]=kamil_callback)
... | grep -f <(printf "%s\n" ${!patterns[#]}) ... |
while IFS= read -r line; do
# I think this is how to check if array element is set
if [[ -n "${patterns[$line]}" ]]; then
"${patterns[$line]}"
else
echo error
fi
done

Issues with grep and get a count of a string in a loop

I have a set of search strings in a file (File1) and a content file (File2). I am trying to loop through all the search strings within File1 and get a count of each of the search string within File2 and output it - I want to automate this and make it generic so I can search through multiple content files. However, I dont seem to be able to get the exact count when I execute this loop. I get a "0" count for each of the strings although I have those strings in the file. Unable to figure out what I am doing wrong and can use some help !
Below is the script I came up with:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
count=$(echo cat "$2" | grep -c "$line")
echo "$count - $line"
done < "$1"
Command I am using to run this script:
./scanscript.sh File1.log File2.log
I say this since I searched this command separately and get the right value. This command works by itself but I want to put this in a loop
cat File2.log | grep -c "Search String"
Sample Data for File 1 (Search Strings):
/SERVER_NAME/Root/DEV/Database/NJ-CONTENT/Procs/
/SERVER_NAME3/Root/DEV/Database/NJ-CONTENT/Procs/
Sample Data for File 2 (Content File):
./SERVER_NAME/Root/DEV/Database/NJ-CONTENT/Procs/test.test_proc.sql:29:
./SERVER_NAME2/Root/DEV/Database/NJ-CONTENT/Procs/test.test_proc.sql:100:
./SERVER_NAME3/Root/DEV/Database/NJ-CONTENT/Procs/test.test_proc.sql:143:
./SERVER_NAME4/Root/DEV/Database/NJ-CONTENT/Procs/test.test_proc.sql:223:
./SERVER_NAME5/Root/DEV/Database/NJ-CONTENT/Procs/test.test_proc.sql:5589:
Problem is this line:
count=$(echo cat "$2" | grep -c "$line")
That should be changed to:
count=$(grep -Fc "$line" "$2")
Also note -F is to be used for fixed string search instead of regex search.
Full code:
while IFS='' read -r line || [[ -n "$line" ]]; do
count=$(grep -Fc "$line" "$2");
echo "$count - $line";
done < "$1"
Run it as:
./scanscript.sh File1.log File2.log
Output:
1 - /SERVER_NAME/Root/DEV/Database/NJ-CONTENT/Procs/
1 - /SERVER_NAME3/Root/DEV/Database/NJ-CONTENT/Procs/

Check if file has been modified

How can I validate that this replace command succeeded:
perl -pi -e 's/contoso/'"$hostname"'/g' /etc/inet/hosts
I have tried checking the return value:
if [ $? -eq 0 ]; then
echo "OK"
else
echo "Error."
fi
But the return value is not being set when the command fails.
Thanks
One option is to check, if file has been modified. You can achieve with adding extension of backup file to -i option:
perl -pi.orig -e 's/contoso/'"$hostname"'/g' /etc/inet/hosts
This command will store original content of /etc/inet/hosts into /etc/inet/hosts.orig. Then run the specified command. Then you can check if the files are different with, for example cmp command:
if ! cmp -s foo.txt foo.txt.orig; then
echo OK
else
echo ERROR
fi
Remove the .orig file after that.
The other option is to modify the script to read the content of the file, replace required entry, check is change actually happened and return proper status at the end to verify in the shell using $?. You have been given solution in this answer.
I don't know Perl, but you can manage multiple case of "error" (no match/no way to write in file) with a little Bash script like that :
#!/bin/bash
FILE="/etc/inet/hosts"
SEARCH="contoso"
REPLACE="$hostname"
NB=$(grep -c $SEARCH $FILE)
if [ $NB -ne 0 ]; then
perl -pi -e 's/${SEARCH}/'${REPLACE}'/g' "$FILE" && echo "${NB} replaced" || echo "Error (permission maybe)"
else
echo "No match in file"
fi
I think there is a better way by improving the Perl code or by using the sed command. But it should works.
If you expect your perl script to return a value that has some meaning, you will need to write your perl script to return a meaningful value. In your case, perhaps something as simple as:
perl -p -e 's/contoso/'"$hostname"'/g; $rv=1 if $&; END{ exit !$rv }'
Generally checksums is a very efficient way to detect changes in files.
md5sum [filename]
root#miaoulis:~# echo 'line 1' >>1.txt
root#miaoulis:~# md5sum 1.txt
5c2ce561e1e263695dbd267271b86fb8 1.txt
root#miaoulis:~# echo 'line 2' >>1.txt
root#miaoulis:~# md5sum 1.txt
c7253b64411b3aa485924efce6494bb5 1.txt
I guess the sum could be extracted from the output with AWK
root#miaoulis:~# echo $(md5sum 1.txt) | awk 'BEGIN{FS=" *"}{print "MD5:",$1}'
MD5: c7253b64411b3aa485924efce6494bb5
root#miaoulis:~# echo $(md5sum 1.txt) | awk 'BEGIN{FS=" *"}{print "filename:",$2}'
filename: 1.txt
FS=" *" instructs AWK to split the string on the occurrence of one or more spaces. $1 will be the MD5, $2 will be the filename.
MD5 checksum works fast for any size of file. The downside is that you don't really detect what exactly changed in the file, only the fact that it has changed. Should be good enough for most scenarios.

Bash (split) file name comparison fails

In my directory I have files (*fastq.gz.fasta) and directories, whose names contain the filenames (*fastq.gz.fasta-blastdb):
IVC6_Meino.clust.gz.fasta-blastdb
IVC5_Mehiv.clust.gz.fasta-blastdb
....
IVC6_Meino.clust.gz.fasta
IVC5_Mehiv.clust.gz.fasta
....
In a bash script I want to compare the filenames with the direcories using the cut option on the latter to extract only the filename part. If those two names match I want to do further stuff (for now echo match or no match respectively).
I have written the following piece of code:
#!/bin/bash
for file in *.fasta
do
for db in *-blastdb
do
echo $file, $db | cut -d '-' -f 1
if [[ $file = "$db | cut -d '-' -f 1" ]]; then
echo "match"
else
echo "no match"
fi
done
done
But it does not detect matches. The output looks like this:
...
IVC6_Meino.clust.gz.fasta, IIIA11_Meova.clust.gz.fasta
no match
IVC6_Meino.clust.gz.fasta, IVC5_Mehiv.clust.gz.fasta
no match
IVC6_Meino.clust.gz.fasta, IVC6_Meino.clust.gz.fasta
no match
The last line should read match as you can see, the strings look the same.
What am i missing?
You can use parameter expansion to do this more easily:
for file in *.fasta
do
for db in *-blastdb
do
echo "$file", "$db"
if [[ "${file%%.fasta}" = "${db%%.fasta-blastdb}" ]]; then
echo "match"
else
echo "no match"
fi
done
done
If you want to fix yours, the problem is the use of $db | cut -d '-' -f 1 With echo it appears that echo is printing the pipe. It isn't. cut is printing. When you do [[ $file = "$db | cut -d '-' -f 1" ]] it is equivalent to [[ $file = [return code from last pipe component] ]]
You need to use the $(..) shell construct to capture the output of the pipe and you need to echo to get the contents of $db to start the pipe. You should quote "$db" so you do not have word splitting or globbing from the contents of the variable.
Like so:
for file in *.fasta
do
for db in *-blastdb
do
ts=$(echo "$db" | cut -d '-' -f 1)
echo "$file", "$ts"
if [[ "$file" = "$ts" ]]; then
echo "match"
else
echo "no match"
fi
done
done # this works I think -- not tested...
Please be careful with your quoting with Bash and liberally use ShellCheck.
The structure you have is also not the most efficient. You will loop over the *-blastdb glob once for every file in *-blastdb. If you have a lot of files, that could get really slow.
To solve that, you could rewrite this loop with Bash arrays (best if you have Bash 4+) or use awk:
ext1=.fasta
ext2=.fasta-blastdb
awk 'FNR==NR{
s=$0
sub("\\"ext1"$","",s)
seen[s]=$0
next}
{
s=$0
sub("\\"ext2"$","",s)
if (s in seen)
print seen[s], $0
}
' ext1="$ext1" ext2="$ext2" <(for fn in *$ext1; do echo "$fn"; done) <(for fn in *$ext2; do echo "$fn"; done)
Each glob is only executing once and awk is using an array to test if the basenames are the same.
Best

Passing empty strings to grep command

I have this script where I ask for 4 patterns and then use those in a grep command. That is, I want to see if a line matches any of the patterns.
echo -n "Enter pattern1"
read pat1
echo -n "Enter pattern2"
read pat2
echo -n "Enter pattern3"
read pat3
echo -n "Enter pattern4"
read pat4
cat somefile.txt | grep $pat1 | grep $pat2 | grep $pat3 | grep $pat4
The problem I'm running into is that if the user doesn't supply one of the patterns (which I want to allow) the grep command doesn't work.
So, is there a way to have grep ignore one of the patterns if it's returned empty?
Your code has lots of problems:
Code duplication
Interactive asking for potentially unused information
using echo -n is not portable
useless use of cat
Here is what I wrote that is closer to what you should use instead:
i=1
printf %s "Enter pattern $i: "
read -r input
while [[ $input ]]; do
pattern+=(-e "$input")
let i++
printf %s "Enter pattern $i (Enter or Ctrl+D to stop entering patterns): "
read -r input
done
echo
grep "${pattern[#]}" somefile.txt
EDIT: This does not answer OP's question, this searches for multiple patterns with OR instead of AND...
Here is a working AND solution (it will stop prompting for patterns on the first empty one or after the 4th one):
pattern=
for i in {1..4}; do
printf %s "Enter pattern $i: "
read -r input
[[ $input ]] || break
pattern="${pattern:+"$pattern && "}/${input//\//\\/}/"
done
echo # skip a line
awk "$pattern" somefile.txt
Here are some links from which you can learn how to program in bash:
Bash Guide
Bash FAQ

Resources