Shell Script : If a string is present in a file - shell

I am a newbie to shell scriptng and I want to check if 3 strings("hello","who","when " etc) are present in a file.
I find many ways when I google out awk,cat ,grep etc ,What can be the best way and how Can I do it.
I just need to know if the strings are present or not .

Your question is a little incomplete:
do you want to find strings or words? So when the word Othello appears, does that count as hello?
in your question there is whitespace behind the when. Is that intentional?
do you want to know whether all three words are in the file, or is one of the words enough?
The general solution is to use grep or egrep to search for text in a file. The exact command line depends on the answers to the above questions.
to search for words (Othello doesn't count as hello) you need to pass the -w option to grep.
I'm assuming thhat the whitespace was a mistake.
When you need all the words, you can do egrep -wo 'hello|who|when' | sort -u. The egrep command finds all instances of the given words, and prints them out one per line. At that point, you will have many duplicates. Therefore the sort -u command sorts them and only keeps the unique lines (that's what the -u means). In a complete program, I would do it as follows:
filename="story.txt"
words=$(egrep -wo 'hello|who|when' "$filename" | sort -u)
n=$(echo "$words" | wc -l)
if [ $n = 3 ]; then
echo "found all words in the file"
else
echo "didn't find all words, only \""$words"\"."
fi
There's a lot more that I could tell you about this little piece of code, and why I wrote it exactly like that, but for a beginner, it's already enough to understand.
But just in case that you need a simple solution and the file is small anyway, so performance is not critical, you can do this:
filename="story.txt"
if egrep -wl 'hello' "$filename" 1>/dev/null; then
if egrep -wl 'when' "$filename" 1>/dev/null; then
if egrep -wl 'who' "$filename" 1>/dev/null; then
echo "found all three words"
fi
fi
fi
[Update:]
This second code snippet also checks whether the given file contains all three words. Each of the if clauses checks for one of the words. The option -l (lowercase ell) to egrep makes it potentially faster, but you probably don't need that option at all.
Normally egrep prints all lines that match the given expressions (your three words in this case). Since we don't need that output, we redirect it using the arrow operator > to a special file called /dev/null. Whatever you write into that file is discarded.
The if statement takes another command as its argument, and if that command returns successfully, the then branch is taken. The nice thing about the egrep command is that it returns successfully iff the given search expression is contained in the file, so these two things perfectly fit together.
For further reading you should try the reference documentation from the Open Group website: http://www.google.com/search?q=opengroup+grep

Related

Searching Inside a Document for Multiple Strings with 'GREP'

I am trying to search inside a word document for strings with specific text. So far, I have figured out how to search inside this document for a single string and return a message if this text is found using the below script. The challenge that I am now facing is figuring out how to search inside this document for either one of two strings.
Any idea of how I could write this script using the 'grep' command?
1 - Searching inside a document for a matching string.
#!/bin/bash
FILE="document.doc"
ISSUE_1="Identifies inactive services"
if grep -c "$ISSUE_1" $FILE
then
echo "There is an Issue"
else
echo "There is NO Issue"
fi
2 - Searching inside a document for more than one string.
#!/bin/bash
FILE="document.doc"
ISSUE_1="Identifies inactive services"
ISSUE_2="Determines the percentage CPU idle time"
if [[grep -c "$ISSUE_1" $FILE]] || [[grep -c "$ISSUE_2" $FILE]]
then
echo "There is an Issue"
else
echo "There is NO Issue"
fi
If you have a list of strings the easiest might be to put them in a file and use -f with grep to have it read a list of patterns from the file. E.g., create a file called, say, patterns
Identifies inactive services
Determines the percentage CPU idle time
then use grep like
grep -f patterns -c "$FILE"
if you don't want a separate file, grep actually takes a pattern, not (necessarily) a fixed string, and that pattern can do or with a |:
grep -c "$ISSUE_1\|$ISSUE_2" "$FILE"
though if you just want to know if any matches were found you don't need to get the count and check that value, grep will tell you with its exit status, so you could use -q to supress the output
if grep -q -f patterns "$FILE"; then
echo "At least one match"
else
echo "No matches"
fi
If you want to see the actual strings that were matched you can use -o to output only the portions of the line(s) that match one of the patterns. For example:
grep -f patterns -o "$FILE"
or
grep -o "$ISSUE_1\|$ISSUE_2" "$FILE"
As an aside as well, you should generally avoid using upper case for your variable names. All caps are for system envrionment variables, and applications are encouraged to use lowercase names.
Environment variable names used by the utilities in the Shell and Utilities volume of POSIX.1-2008 consist solely of uppercase letters, digits, and the ( '_' ) from the characters defined in Portable Character Set and do not begin with a digit. Other characters may be permitted by an implementation; applications shall tolerate the presence of such names. Uppercase and lowercase letters shall retain their unique identities and shall not be folded together. The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities.
You can simply use -e multiple times:
if grep -e issue1 -e issue2 file ; then
do_something
fi

Write a script that prints from a dir k strokes sorted by date

I have a directory "Main Dir" and I want to write a script, which will get 2 parameters: sorted_by_date , that will find in the directory the id-worker directory (it does exist) and in it, from a file called "sent.txt", it will print results-num (an integer) strokes sorted by date.
I'm a begginer in bash (have knowledge and skills mainly in C), and I stil didn't saw how to write scripts, but I've tried to do something from a little commands I learned and from a little search in the internet.
Can somebody help a newbie like me in my first script-writing ?
I'll paste here my first try:
#!/bin/bash
id_worker = "$1"
results_num = "$2"
sort -k3 -t "./Main Dir/id_worker/sent.text"
head -n+3 $results_num
I'm going to go out on a limb here, and assume your sort command is producing the information you want from the id_worker sent.txt file and that you are talking about the number of lines you want when you say strokes. Given the extended discussion in the comments, that is about the only thing I see that makes sense.
With that in mind, you were not that far off in your first attempt. What you needed to do to fix the sort command was to dereference your id_worker with $ to get the value you passed. In bash you assign variables as id_worker="something", but to get the value back, you must precede the variable with a $, just as you see with your id_worker="$1". NOTE: there are NO spaces allowed on either side of the '=' sign in bash. Putting that together, it looks like you intended:
sort -k3 -t "./Main Dir/$id_worker/sent.text"
Where you are beginning in the directory above Main Dir running your script because you have given a relative path "./Main Dir/stuff".
Now if you want to limit the number of lines to the first results_num lines of the sorted output, then you can use head, but you need to remove the "+" sign (which is only relevant with the tail command). To use it with the sorted output, you mustpipe the results of sort to head using the '|' pipe character. For example:
sort -k3 -t "./Main Dir/$id_worker/sent.text" | head -n $results_num
Putting all of the pieces that I think you intended, and including a short check to make sure both id_worker and results_num are given on the command line, you would end up with something like:
#!/bin/bash
## verify both arguments given
[ -z $1 -o -z $2 ] && {
printf "error: insufficient input. usage: %s worker num\n" "${0##*/}"
exit 1
}
id_worker="$1"
results_num="$2"
## pipe the results of sort to head to print first $results_num lines
sort -k3 -t "./Main Dir/$id_worker/sent.text" | head -n $results_num
Note: if you are having trouble with your script, run it with:
bash -x scriptname id_worker results_num
to enable line-by-line debugging output from bash. Let me know if I have not understood what you were saying or if the results are not what you intended. There are several ways of approaching this problem, but I do need to clearly understand what you want to go further. Good luck.

Find negative numbers in file with grep

i have this script that reads a file, the file looks like this:
711324865,438918283,2
-333308476,886548365,2
1378685449,-911401007,2
-435117907,560922996,2
259073357,714183955,2
...
the script:
#!/bin/bash
while IFS=, read childId parentId parentLevel
do
grep "\$parentId" parent_child_output_level2.csv
resul=$?
echo "child is $childId, parent is $parentId parentLevel is $parentLevel resul is $resul"
done < parent_child_output_level1.csv
but it is not working, resul is allways returning me 1, which is a false positive.
I know that because i can launch the next command, equivalent, i think:
[core#dub-vcd-vms165 generated-and-saved-to-hdfs]$
grep "\-911401007"parent_child_output_level2.csv
-911401007,-157143722,3
Please help.
grep command to print only the negative numbers.
$ grep -oP '(^|,)\K-\d+' file.csv
-333308476
-911401007
-435117907
(^|,) matches the start of a line or comma.
\K discards the previously matched characters.
-\d+ Matches - plus the following one or more numbers.
Your title is inconsistent with your question. Your title asks for how to grep negative numbers, which Avinash Raj answered well, although I'd suggest you don't even need the (Perl-style) look-behind positive assertion (^|,)\K to match start-of-field, because if the file is well-formed, then -\d+ would match all numbers just as well. So you could just run (edit: realized that with a leading - you need -- to prevent grep from taking the pattern as an option):
grep -oP -- '-\d+' file.csv;
Your question includes a script whose intention seems to be to grep for any number (positive or negative) in the first field (childId) of one file (parent_child_output_level2.csv) that occurs in the second field (parentId) of another file (parent_child_output_level1.csv). To accomplish this, I wouldn't use grep, because you're trying to do an exact numerical equality test, which can even be done as an exact string equality test assuming your numbers are always consistently represented (e.g. no redundant leading zeroes). Repeatedly grepping through the entire file just to search for a number in one column is also wasteful of CPU.
Here's what I would do:
parentIdList=($(cut -d, -f2 parent_child_output_level1.csv));
childIdList=($(cut -d, -f1 parent_child_output_level2.csv));
for parentId in "${parentIdList[#]}"; do
for childId in "${childIdList[#]}"; do
if [[ "$childId" == "$parentId" ]]; then
echo "$parentId";
fi;
done;
done;
With this approach, you precompute both the parent id list and the child id list just once, using cut to extract the appropriate field from each file. Then you can use the shell-builtin for loop, shell-builtin if conditional, and shell-builtin [[ test command to accomplish the check, and finally finish with a shell-builtin echo to print the matches. Everything is shell-builtin, after the initial command substitutions that run the cut external executable.
If you also want to filter these results on negative numbers, you could grep for ^- in the results of the above script, or grep for it in the results of each (or just the first) cut command, or add the following line just inside the outer for loop:
if [[ "${parentId:0:1}" != '-' ]]; then continue; fi;
Alternative approach:
if [[ "$parentId" != -* ]]; then continue; fi;
Either approach will skip non-negatives.

Match exact word in bash script, extract number from string

I'm trying to create a very simple bash script that will open new link base on the input command
Use case #1
$ ./myscript longname55445
It should take the number 55445 and then assign that to a variable which will later be use to open new link based on the given number.
Use case #2
$ ./myscript l55445
It should do the exact same thing as above by taking the number and then open the same link.
Use case #3
$ ./myscript 55445
If no prefix given then we just simply open that same link as a fallback.
So far this is what I have
#!/bin/sh
BASE_URL=http://api.domain.com
input=$1
command=${input:0:1}
if [ "$command" == "longname" ]; then
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
elseif [ "$command" == "l" ]; then
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
else
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
fi
But this will always fallback to the elseif there.
I'm using zsh at the moment.
input=$1
command=${input:0:1}
sets command to the first character of the first argument. It's not possible for a one character string to be equal to an eight-character string ("longname"), so the if condition must always fail.
Furthermore, both your elseif and your else clauses set
number=${input:1:${#input}}
Which you could have written more simply as
number=${input:1}
But in both cases, you're dropping the first character of input. Presumably in the else case, you wanted the entire first argument.
see whether this construct is helpful for your purpose:
#!/bin/bash
name="longname55445"
echo "${name##*[A-Za-z]}"
this assumes a letter adjacent to number.
The following is NOT another way to write the same, because it is wrong.
Please see comments below by mklement0, who noticed this. Mea culpa.
echo "${name##*[:letter:]}"
You have command=${input:0:1}
It takes the first single char, and you compare it to "longname", of course it will fail, and go to elseif.
The key problem is to check if the input is beginning with l or longnameor nothing. If in one of the 3 cases, take the trailing numbers.
One grep line could do it, you can just grep on input and get the returned text:
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"l234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"longname234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"foobar234"
<we got nothing>
You can use regex matching in bash.
[[ $1 =~ [0-9]+ ]] && number=$BASH_REMATCH
You can also use regex matching in zsh.
[[ $1 =~ [0-9]+ ]] && number=$MATCH
Based on the OP's following clarification in a comment,
I'm only looking for the numbers [...] given in the input.
the solution can be simplified as follows:
#!/bin/bash
BASE_URL='http://api.domain.com'
# Strip all non-digits from the 1st argument to get the desired number.
number=$(tr -dC '[:digit:]' <<<"$1")
open "$BASE_URL?id=$number"
Note the use of a bash shebang, given the use of 'bashism' <<< (which could easily be restated in a POSIX-compliant manner).
Similarly, the OP's original code should use a bash shebang, too, due to use of non-POSIX substring extraction syntax.
However, judging by the use of open to open a URL, the OP appears to be on OSX, where sh is essentially bash (though invocation as sh does change behavior), so it'll still work there. Generally, though, it's safer to be explicit about the required shell.

Handle special characters in bash for...in loop

Suppose I've got a list of files
file1
"file 1"
file2
a for...in loop breaks it up between whitespace, not newlines:
for x in $( ls ); do
echo $x
done
results:
file
1
file1
file2
I want to execute a command on each file. "file" and "1" above are not actual files. How can I do that if the filenames contains things like spaces or commas?
It's a little trickier than I think find -print0 | xargs -0 could handle, because I actually want the command to be something like "convert input/file1.jpg .... output/file1.jpg" so I need to permutate the filename in the process.
Actually, Mark's suggestion works fine without even doing anything to the internal field separator. The problem is running ls in a subshell, whether by backticks or $( ) causes the for loop to be unable to distinguish between spaces in names. Simply using
for f in *
instead of the ls solves the problem.
#!/bin/bash
for f in *
do
echo "$f"
done
UPDATE BY OP: this answer sucks and shouldn't be on top ... #Jordan's post below should be the accepted answer.
one possible way:
ls -1 | while read x; do
echo $x
done
I know this one is LONG past "answered", and with all due respect to eduffy, I came up with a better way and I thought I'd share it.
What's "wrong" with eduffy's answer isn't that it's wrong, but that it imposes what for me is a painful limitation: there's an implied creation of a subshell when the output of the ls is piped and this means that variables set inside the loop are lost after the loop exits. Thus, if you want to write some more sophisticated code, you have a pain in the buttocks to deal with.
My solution was to take the "readline" function and write a program out of it in which you can specify any specific line number that you may want that results from any given function call. ... As a simple example, starting with eduffy's:
ls_output=$(ls -1)
# The cut at the end of the following line removes any trailing new line character
declare -i line_count=$(echo "$ls_output" | wc -l | cut -d ' ' -f 1)
declare -i cur_line=1
while [ $cur_line -le $line_count ] ;
do
# NONE of the values in the variables inside this do loop are trapped here.
filename=$(echo "$ls_output" | readline -n $cur_line)
# Now line contains a filename from the preceeding ls command
cur_line=cur_line+1
done
Now you have wrapped up all the subshell activity into neat little contained packages and can go about your shell coding without having to worry about the scope of your variable values getting trapped in subshells.
I wrote my version of readline in gnuc if anyone wants a copy, it's a little big to post here, but maybe we can find a way...
Hope this helps,
RT

Resources