Searching Inside a Document for Multiple Strings with 'GREP'

Searching Inside a Document for Multiple Strings with 'GREP' - bash

I am trying to search inside a word document for strings with specific text. So far, I have figured out how to search inside this document for a single string and return a message if this text is found using the below script. The challenge that I am now facing is figuring out how to search inside this document for either one of two strings.
Any idea of how I could write this script using the 'grep' command?
1 - Searching inside a document for a matching string.
#!/bin/bash
FILE="document.doc"
ISSUE_1="Identifies inactive services"
if grep -c "$ISSUE_1" $FILE
then
echo "There is an Issue"
else
echo "There is NO Issue"
fi
2 - Searching inside a document for more than one string.
#!/bin/bash
FILE="document.doc"
ISSUE_1="Identifies inactive services"
ISSUE_2="Determines the percentage CPU idle time"
if [[grep -c "$ISSUE_1" $FILE]] || [[grep -c "$ISSUE_2" $FILE]]
then
echo "There is an Issue"
else
echo "There is NO Issue"
fi

If you have a list of strings the easiest might be to put them in a file and use -f with grep to have it read a list of patterns from the file. E.g., create a file called, say, patterns
Identifies inactive services
Determines the percentage CPU idle time
then use grep like
grep -f patterns -c "$FILE"
if you don't want a separate file, grep actually takes a pattern, not (necessarily) a fixed string, and that pattern can do or with a |:
grep -c "$ISSUE_1\|$ISSUE_2" "$FILE"
though if you just want to know if any matches were found you don't need to get the count and check that value, grep will tell you with its exit status, so you could use -q to supress the output
if grep -q -f patterns "$FILE"; then
echo "At least one match"
else
echo "No matches"
fi
If you want to see the actual strings that were matched you can use -o to output only the portions of the line(s) that match one of the patterns. For example:
grep -f patterns -o "$FILE"
or
grep -o "$ISSUE_1\|$ISSUE_2" "$FILE"
As an aside as well, you should generally avoid using upper case for your variable names. All caps are for system envrionment variables, and applications are encouraged to use lowercase names.
Environment variable names used by the utilities in the Shell and Utilities volume of POSIX.1-2008 consist solely of uppercase letters, digits, and the ( '_' ) from the characters defined in Portable Character Set and do not begin with a digit. Other characters may be permitted by an implementation; applications shall tolerate the presence of such names. Uppercase and lowercase letters shall retain their unique identities and shall not be folded together. The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities.

You can simply use -e multiple times:
if grep -e issue1 -e issue2 file ; then
do_something
fi

Related

Bash File names will not append to file from script

Hello I am trying to get all files with Jane's name to a separate file called oldFiles.txt. In a directory called "data" I am reading from a list of file names from a file called list.txt, from which I put all the file names containing the name Jane into the files variable. Then I'm trying to test the files variable with the files in list.txt to ensure they are in the file system, then append the all the files containing jane to the oldFiles.txt file(which will be in the scripts directory), after it tests to make sure the item within the files variable passes.
#!/bin/bash
> oldFiles.txt
files= grep " jane " ../data/list.txt | cut -d' ' -f 3
if test -e ~data/$files; then
for file in $files; do
if test -e ~/scripts/$file; then
echo $file>> oldFiles.txt
else
echo "no files"
fi
done
fi
The above code gets the desired files and displays them correctly, as well as creates the oldFiles.txt file, but when I open the file after running the script I find that nothing was appended to the file. I tried changing the file assignment to a pointer instead files= grep " jane " ../data/list.txt | cut -d' ' -f 3 ---> files=$(grep " jane " ../data/list.txt) to see if that would help by just capturing raw data to write to file, but then the error comes up "too many arguments on line 5" which is the 1st if test statement. The only way I get the script to work semi-properly is when I do ./findJane.sh > oldFiles.txt on the shell command line, which is me essentially manually creating the file. How would I go about this so that I create oldFiles.txt and append to the oldFiles.txt all within the script?

The biggest problem you have is matching names like "jane" or "Jane's", etc. while not matching "Janes". grep provides the options -i (case insensitive match) and -w (whole-word match) which can tailor your search to what you appear to want without having to use the kludge (" jane ") of appending spaces before an after your search term. (to properly do that you would use [[:space:]]jane[[:space:]])
You also have the problem of what is your "script dir" if you call your script from a directory other than the one containing your script, such as calling your script from your $HOME directory with bash script/findJane.sh. In that case your script will attempt to append to $HOME/oldFiles.txt. The positional parameter $0 always contains the full pathname to the current script being run, so you can capture the script directory no matter where you call the script from with:
dirname "$0"
You are using bash, so store all the filenames resulting from your grep command in an array, not some general variable (especially since your use of " jane " suggests that your filenames contain whitespace)
You can make your script much more flexible if you take the information of your input file (e.g list.txt), the term to search for (e.g. "jane"), the location where to check for existence of the files (e.g. $HOME/data) and the output filename to append the names to (e.g. "oldFile.txt") as command line [positonal] parameters. You can give each default values so it behaves as you currently desire without providing any arguments.
Even with the additional scripting flexibility of taking the command line arguments, the script actually has fewer lines simply filling an array using mapfile (synonymous with readarray) and then looping over the contents of the array. You also avoid the additional subshell for dirname with a simple parameter expansion and test whether the path component is empty -- to replace with '.', up to you.
If I've understood your goal correctly, you can put all the pieces together with:
#!/bin/bash
# positional parameters
src="${1:-../data/list.txt}" # 1st param - input (default: ../data/list.txt)
term="${2:-jane}" # 2nd param - search term (default: jane)
data="${3:-$HOME/data}" # 3rd param - file location (defaut: ../data)
outfn="${4:-oldFiles.txt}" # 4th param - output (default: oldFiles.txt)
# save the path to the current script in script
script="$(dirname "$0")"
# if outfn not given, prepend path to script to outfn to output
# in script directory (if script called from elsewhere)
[ -z "$4" ] && outfn="$script/$outfn"
# split names w/term into array
# using the -iw option for case-insensitive whole-word match
mapfile -t files < <(grep -iw "$term" "$src" | cut -d' ' -f 3)
# loop over files array
for ((i=0; i<${#files[#]}; i++)); do
# test existence of file in data directory, redirect name to outfn
[ -e "$data/${files[i]}" ] && printf "%s\n" "${files[i]}" >> "$outfn"
done
(note: test expression and [ expression ] are synonymous, use what you like, though you may find [ expression ] a bit more readable)
(further note: "Janes" being plural is not considered the same as the singular -- adjust the grep expression as desired)
Example Use/Output
As was pointed out in the comment, without a sample of your input file, we cannot provide an exact test to confirm your desired behavior.
Let me know if you have questions.

As far as I can tell, this is what you're going for. This is totally a community effort based on the comments, catching your bugs. Obviously credit to Mark and Jetchisel for finding most of the issues. Notable changes:
Fixed $files to use command substitution
Fixed path to data/$file, assuming you have a directory at ~/data full of files
Fixed the test to not test for a string of files, but just the single file (also using -f to make sure it's a regular file)
Using double brackets — you could also use double quotes instead, but you explicitly have a Bash shebang so there's no harm in using Bash syntax
Adding a second message about not matching files, because there are two possible cases there; you may need to adapt depending on the output you're looking for
Removed the initial empty redirection — if you need to ensure that the file is clear before the rest of the script, then it should be added back, but if not, it's not doing any useful work
Changed the shebang to make sure you're using the user's preferred Bash, and added set -e because you should always add set -e
#!/usr/bin/env bash
set -e
files=$(grep " jane " ../data/list.txt | cut -d' ' -f 3)
for file in $files; do
if [[ -f $HOME/data/$file ]]; then
if [[ -f $HOME/scripts/$file ]]; then
echo "$file" >> oldFiles.txt
else
echo "no matching file"
fi
else
echo "no files"
fi
done

How could I change this egrep script to a zgrep script and still have it work?

I'm trying to look for phone numbers in any of the following formats: +1.570.555.1212, 570.555.1212, (570)555-1212, and 570-555-1212. We also need to look in compressed folders using zgrep, however I would have my code come back "No matches found". The code is working as it is below to find phone numbers from txt files. It is very bad, but here it is below
Code:
#!/bin/bash
egrep '[0-9]{3}-[0-9]{3}-[0-9]{4}|[0-9]{3}.[0-9]{3}.[0-9]{4}|([0-9]{3})[0-9]{3}-[0-9]{4}|+(1).[0-9]{3}.[0-9]{3}.[0-9]{4}' *
if [ $? -eq 0 ] ; then echo $1 ; else echo "No matches found" ; fi 2>/dev/null

zgrep without any options is equivalent in its regex capabilities to grep; you need to say zgrep -E if you want to use grep -E (aka egrep) regex syntax when searching compressed files.
#!/bin/bash
if zgrep -E -q '[0-9]{3}-[0-9]{3}-[0-9]{4}|[0-9]{3}.[0-9]{3}.[0-9]{4}|([0-9]{3})[0-9]{3}-[0-9]{4}|+(1).[0-9]{3}.[0-9]{3}.[0-9]{4}' *
then
echo "$1"
else
echo "No matches found" >&2
fi
Notice also Why is testing “$?” to see if a command succeeded or not, an anti-pattern? and When to wrap quotes around a shell variable as well as the preference for -q over redirecting to /dev/null, and the displaying of error messages on standard error (>&2 redirection).
Your regex could also use some refactoring; maybe try
(\+\(1\).)?[0-9]{3}.[0-9]{3}.[0-9]{4}
Notice how round brackets and the plus sign need to be backslash-escaped to match literally, and how after refactoring out the +(1) prefix as optional the rest of the regex subsumes all the other variants you had enumerated, because . matches - and ( and . and many other characters. (The optional prefix could also be dropped completely and this would still match the same strings, but I had to guess some things so I am leaving it in with this remark.)

Test existing a substring in string shell

I have two parameters:
the list of linux group of user
groupsUsers=$(id -nG ${utilisateur})
the list of all the group linux of users (linux groups + applications)
list_all_groups=$(curl -u GET "${edge_admin_nodes}_${port_http}"/applications/lists)
How can I test if the groupsUsers exist in list_all_groups or no, and in the case "no" I store the result in a variable ?
I did this solution but I'm not sure that working.
for groupUser in ${groupsUsers}
do
if echo "$list_all_groups" | grep -o "$groupUser" then
echo "${groupUser}"
then
my_result=$( echo "$groups,$groupUser" )
result="${groups},\"${groupUser}\""
fi
done

Generally, I prefer proper parsing. However, a common solution is to put the delimiter around both strings, e.g.:
if echo ",$list_all_groups," | grep -q ",$groupUser," then
This checks a string that has commas around the original list (so we don't have to deal with beginning/end-of-string differences) against a particular entry, also with its delimiters, so that we don't match a groupUser of foo with an entry in list_all_groups of foobar.

Multiple simultaneous patterns for grep

I need to see if user exists in /etc/passwd. I'm using grep, but I'm having a hard time passing multiple patterns to grep.
I tried
if [[ ! $( cat /etc/passwd | egrep "$name&/home" ) ]];then
#user doesn't exist, do something
fi
I used ampersand instead of | because both conditions must be true, but it's not working.

Try doing this :
$ getent passwd foo bar base
Finally :
if getent &>/dev/null passwd user_X; then
do_something...
else
do_something_else...
fi

Contrary to your assumptions, regex does not recognize & for intersection, even though it would be a logical extension.
To locate lines which match multiple patterns, try
grep -e 'pattern1.*pattern2' -e 'pattern2.*pattern1' file
to match the patterns in any order, or switch to e.g. Awk:
awk '/pattern1/ && /pattern2/' file
(though in your specific example, just "$name.*/home" ought to suffice because the matches must always occur in this order).
As an aside, your contorted if condition can be refactored to just
if grep -q pattern file; then ...
The if conditional takes as its argument a command, runs it, and examines its exit code. Any properly written Unix command is written to this specification, and returns zero on success, a nonzero exit code otherwise. (Notice also the absence of a useless cat -- almost all commands accept a file name argument, and those which don't can be handled with redirection.)

Shell Script : If a string is present in a file

I am a newbie to shell scriptng and I want to check if 3 strings("hello","who","when " etc) are present in a file.
I find many ways when I google out awk,cat ,grep etc ,What can be the best way and how Can I do it.
I just need to know if the strings are present or not .

Your question is a little incomplete:
do you want to find strings or words? So when the word Othello appears, does that count as hello?
in your question there is whitespace behind the when. Is that intentional?
do you want to know whether all three words are in the file, or is one of the words enough?
The general solution is to use grep or egrep to search for text in a file. The exact command line depends on the answers to the above questions.
to search for words (Othello doesn't count as hello) you need to pass the -w option to grep.
I'm assuming thhat the whitespace was a mistake.
When you need all the words, you can do egrep -wo 'hello|who|when' | sort -u. The egrep command finds all instances of the given words, and prints them out one per line. At that point, you will have many duplicates. Therefore the sort -u command sorts them and only keeps the unique lines (that's what the -u means). In a complete program, I would do it as follows:
filename="story.txt"
words=$(egrep -wo 'hello|who|when' "$filename" | sort -u)
n=$(echo "$words" | wc -l)
if [ $n = 3 ]; then
echo "found all words in the file"
else
echo "didn't find all words, only \""$words"\"."
fi
There's a lot more that I could tell you about this little piece of code, and why I wrote it exactly like that, but for a beginner, it's already enough to understand.
But just in case that you need a simple solution and the file is small anyway, so performance is not critical, you can do this:
filename="story.txt"
if egrep -wl 'hello' "$filename" 1>/dev/null; then
if egrep -wl 'when' "$filename" 1>/dev/null; then
if egrep -wl 'who' "$filename" 1>/dev/null; then
echo "found all three words"
fi
fi
fi
[Update:]
This second code snippet also checks whether the given file contains all three words. Each of the if clauses checks for one of the words. The option -l (lowercase ell) to egrep makes it potentially faster, but you probably don't need that option at all.
Normally egrep prints all lines that match the given expressions (your three words in this case). Since we don't need that output, we redirect it using the arrow operator > to a special file called /dev/null. Whatever you write into that file is discarded.
The if statement takes another command as its argument, and if that command returns successfully, the then branch is taken. The nice thing about the egrep command is that it returns successfully iff the given search expression is contained in the file, so these two things perfectly fit together.
For further reading you should try the reference documentation from the Open Group website: http://www.google.com/search?q=opengroup+grep

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio