Search for value and print something if found (BASH) - bash

I have the following list:
COX1
COX1
COX1
COX1
COX1
Cu-oxidase
Cu-oxidase_3
Cu-oxidase_3
Fer4_NifH
and I want to search if COX1 and Cu-oxidase is in the list, I want to print xyz, if Cu-oxidase_3 and Fer4_NifHis in the list too (independent if the first two are in the list, then it should print abc.
This is what I could script so far:
if grep 'COX1' file.txt; then echo xyz; else exit 0; fi
but it is of course incomplete.
Any solution to that?
ideally my output would be:
xyz
abc

Awk lets you easily search for multiple regular expressions and print something else than the matched string itself. (grep can easily search for multiple patterns, too, but it will print the match or its line number or file name, not some arbitrary string.)
The following assumes that you have a single token per line. This assumption makes the script really simple, though it would also not be hard to support other scenarios.
awk '{ a[$1]++ }
END { if (("COX1" in a) && ("Cu-oxidase" in a)) print "xyz";
if (("Cu-oxidase_3" in a) && ("Fer4_NifH" in a)) print "abc" }' file.txt
This builds an associative array of each token (actually the first whitespace-separated token on each line) and then at the end, when it has read every line in the file, checks whether the sought tokens exist as keys in the array.
Performing a single pass over the input file is a big win especially if you have a large input file and many patterns. Just for completeness, the syntax for performing multiple passes with grep is very straightforward;
if grep -qx 'COX1' file.txt && grep -qx 'Cu-oxidase' file.txt
then
echo xyz
fi
which can be further abbreviated to
grep -qx 'COX1' file.txt && grep -qx 'Cu-oxidase' file.txt && echo xyz
Notice the -x switch to require the whole line to match (otherwise the regex 'Cu-oxidase' would also match on the Cu-oxidase_3 lines).

Above is a very verbose way to achieve this. There are ways to write the same with less ifs and less greps, but I really wanted to show you the logic:
you run a grep command, check for its return value with $?, and finally acts on the conditions.
# default values
HAS_COX1=0
HAS_CUOX=0
HAS_CUO3=0
HAS_FER4=0
# run silently grep
grep -q 'COX1' file.txt
# check for return value and set variable accordingly
if [ $? -eq 0 ]; then HAS_COX1=1; fi
# same as above
grep -q 'Cu-oxidase' file.txt
if [ $? -eq 0 ]; then HAS_CUOX=1; fi
grep -q 'Cu-oxidase_3' file.txt
if [ $? -eq 0 ]; then HAS_CUO3=1; fi
grep -q 'Fer4_NifH' file.txt
if [ $? -eq 0 ]; then HAS_FER4=1; fi
if [ $HAS_COX1 -eq 1 ]; then
if [ $HAS_CUOX -eq 1 ]; then
echo 'xyz'
exit 0
fi
fi
if [ $HAS_CUO3 -eq 1 ]; then
if [ $HAS_FER4 -eq 1 ]; then
echo 'abc'
exit 0
fi
fi
echo 'None of the checks where matched'
exit 1
Beware: this code is untested, so there might be bugs ☺
The code isn't perfect, as it cannot print both 'xyz' and 'abc' when both conditions are met (but that would be an easy fix with the syntax I provide). Also $HAS_CUOX will be set to 1 whenever $HAS_CUO3 is found (no boundary checking in the grep regex).
You could take that code further by using a single grep for each set of conditions to check, using something like 'COX1\|Cu_oxidase' as the regex for grep. And also fix the minor issues I mentioned above.
ideally my output would be:
xyz
abc
You added your expected output after I wrote the above script, but given the elements I gave you, you should be able to figure how to improve that (basically removing the exit 0 where I placed them, and doing exit 1 when no output has been given.
Or just remove all exits as a dirty solution.

Related

Finding presence of substring within a string in BASH

I have a script that is trying to find the presence of a given string inside a file of arbitrary text.
I've settled on something like:
#!/bin/bash
file="myfile.txt"
for j in `cat blacklist.txt`; do
echo Searching for $j...
unset match
match=`grep -i -m1 -o "$j" $file`
if [ $match ]; then
echo "Match: $match"
fi
done
Blacklist.txt contains lines of potential matches, like so:
matchthis
"match this too"
thisisasingleword
"This is multiple words"
myfile.txt could be something like:
I would matchthis if I could match things with grep. I really wish I could.
When I ask it to match this too, it fails to matchthis. It should match this too - right?
If I run this at a bash prompt, like so:
j="match this too"
grep -i -m1 -o "$j" myfile.txt
...I get "match this too".
However, when the batch file runs, despite the variables being set correctly (verified via echo lines), it never greps properly and returns nothing.
Where am I going wrong?
Wouldn't
grep -owF -f blacklist.txt myfile.txt
instead of writing an inefficient loop, do what you want?
Would you please try:
#!/bin/bash
file="myfile.txt"
while IFS= read -r j; do
j=${j#\"}; j=${j%\"} # remove surrounding double quotes
echo "Searching for $j..."
match=$(grep -i -m1 -o "$j" "$file")
if (( $? == 0 )); then # if match
echo "Match: $match" # then print it
fi
done < blacklist.txt
Output:
Searching for matchthis...
Match: matchthis
Searching for match this too...
Match: match this too
match this too
Searching for thisisasingleword...
Searching for This is multiple words...
I wound up abandoning grep entirely and using sed instead.
match=`sed -n "s/.*\($j\).*/\1/p" $file
Works well, and I was able to use unquoted multiple word phrases in the blacklist file.
With this:
if [ $match ]; then
you are passing random arguments to test. This is not how you properly check for variable net being empty. Use test -n:
if [ -n "$match" ]; then
You might also use grep's exit code instead:
if [ "$?" -eq 0 ]; then
for ... in X splits X at spaces by default, and you are expecting the script to match whole lines.
Define IFS properly:
IFS='
'
for j in `cat blacklist.txt`; do
blacklist.txt contains "match this too" with quotes, and it is read like this by for loop and matched literally.
j="match this too" does not cause j variable to contain quotes.
j='"match this too"' does, and then it will not match.
Since whole lines are read properly from the blacklist.txt file now, you can probably remove quotes from that file.
Script:
#!/bin/bash
file="myfile.txt"
IFS='
'
for j in `cat blacklist.txt`; do
echo Searching for $j...
unset match
match=`grep -i -m1 -o "$j" "$file"`
if [ -n "$match" ]; then
echo "Match: $match"
fi
done
Alternative to the for ... in ... loop (no IFS= needed):
while read; do
j="$REPLY"
...
done < 'blacklist.txt'

How can I get the return value and matched line by grep in bash at once?

I am learning bash. I would like to get the return value and matched line by grep at once.
if cat 'file' | grep 'match_word'; then
match_by_grep="$(cat 'file' | grep 'match_word')"
read a b <<< "${match_by_grep}"
fi
In the code above, I used grep twice. I cannot think of how to do it by grep once. I am not sure match_by_grep is always empty even when there is no matched words because cat may output error message.
match_by_grep="$(cat 'file' | grep 'match_word')"
if [[ -n ${match_by_grep} ]]; then
# match_by_grep may be an error message by cat.
# So following a and b may have wrong value.
read a b <<< "${match_by_grep}"
fi
Please tell me how to do it. Thank you very much.
You can avoid the double use of grep by storing the search output in a variable and seeing if it is not empty.
Your version of the script without double grep.
#!/bin/bash
grepOutput="$(grep 'match_word' file)"
if [ ! -z "$grepOutput" ]; then
read a b <<< "${grepOutput}"
fi
An optimization over the above script ( you can remove the temporary variable too)
#!/bin/bash
grepOutput="$(grep 'match_word' file)"
[[ ! -z "$grepOutput" ]] && (read a b <<< "${grepOutput}")
Using double-grep once for checking if-condition and once to parse the search result would be something like:-
#!/bin/bash
if grep -q 'match_word' file; then
grepOutput="$(grep 'match_word' file)"
read a b <<< "${grepOutput}"
fi
When assigning a variable with a string containing a command expansion, the return code is that of the (rightmost) command being expanded.
In other words, you can just use the assignment as the condition:
if grepOutput="$(cat 'file' | grep 'match_word')"
then
echo "There was a match"
read -r a b <<< "${grepOutput}"
(etc)
else
echo "No match"
fi
Is this what you want to achieve?
grep 'match_word' file ; echo $?
$? has a return value of the command run immediately before.
If you would like to keep track of the return value, it will be also useful to have PS1 set up with $?.
Ref: Bash Prompt with Last Exit Code

Using bash, separate servers into separate file depending on even or odd numbers

The output comes from a command I run from our netscaler. It outputs the following ... One thing to note is that the middle two numbers change but the even/odd criteria is always on the last digit. We never have more than 2 digits, so we'll never hit 10.
WC-01-WEB1
WC-01-WEB4
WC-01-WEB3
WC-01-WEB5
WC-01-WEB8
I need to populate a file called "even" and "odds." If we're dealing with numbers I can figure it out, but having the number within a string is throwing me off.
Example code but I'm missing the part where I need to match the string.
if [ $even_servers -eq 0 ]
then
echo $line >> evenfile
else
echo $line >> oddfile
fi
This is a simple awk command:
awk '/[02468]$/{print > "evenfile"}; /[13579]$/{print > "oddfile"}' input.txt
There must be better way.
How about this version:
for v in `cat <my_file>`; do export type=`echo $v | awk -F 'WEB' '{print $2%2}'`; if [ $type -eq 0 ]; then echo $v >> evenfile ; else echo $v >> oddfile; fi; done
I assume your list of servers is stored in the filename <my_file>. The basic idea is to tokenize on WEB using awk and process the chars after WEB to determine even-ness. Once this is known, we export the value to a variable type and use this to selectively dump to the appropriate file.
For the case when the name is the output of another command:
export var=`<another command>`; export type=`echo $var | awk -F 'WEB' '{print $2%2}'`; if [ $type -eq 0 ]; then echo $var >> evenfile ; else echo $var >> oddfile; fi;
Replace <another command> with your perl script.
As always grep is your friend:
grep "[2468]$" input_file > evenfile
grep "[^2468]$" input_file > oddfile
I hope this helps.

BASH - Tell if duplicate lines exist (y/n)

I am writing a script to manipulate a text file.
First thing I want to do is check if duplicate entries exist and if so, ask the user whether we wants to keep or remove them.
I know how to display duplicate lines if they exist, but what I want to learn is just to get a yes/no answer to the question "Do duplicates exist?"
It seems uniq will return 0 either if duplicates were found or not as long as the command completed without issues.
What is that command that I can put in an if-statement just to tell me if duplicate lines exist?
My file is very simple, it is just values in single column.
I'd probably use awk to do this but, for the sake of variety, here is a brief pipe to accomplish the same thing:
$ { sort | uniq -d | grep . -qc; } < noduplicates.txt; echo $?
1
$ { sort | uniq -d | grep . -qc; } < duplicates.txt; echo $?
0
sort + uniq -d make sure that only duplicate lines (which don't have to be adjacent) get printed to stdout and grep . -c counts those lines emulating wc -l with the useful side effect that it returns 1 if it doesn't match (i.e. a zero count) and -q just silents the output so it doesn't print the line count so you can use it silently in your script.
has_duplicates()
{
{
sort | uniq -d | grep . -qc
} < "$1"
}
if has_duplicates myfile.txt; then
echo "myfile.txt has duplicate lines"
else
echo "myfile.txt has no duplicate lines"
fi
You can use awk combined with the boolean || operator:
# Ask question if awk found a duplicate
awk 'a[$0]++{exit 1}' test.txt || (
echo -n "remove duplicates? [y/n] "
read answer
# Remove duplicates if answer was "y" . I'm using `[` the shorthand
# of the test command. Check `help [`
[ "$answer" == "y" ] && uniq test.txt > test.uniq.txt
)
The block after the || will only get executed if the awk command returns 1, meaning it found duplicates.
However, for a basic understanding I'll also show an example using an if block
awk 'a[$0]++{exit 1}' test.txt
# $? contains the return value of the last command
if [ $? != 0 ] ; then
echo -n "remove duplicates? [y/n] "
read answer
# check answer
if [ "$answer" == "y" ] ; then
uniq test.txt > test.uniq.txt
fi
fi
However the [] are not just brackets like in other programming languages. [ is a synonym for the test bash builtin command and ] it's last argument. You need to read help [ in order to understand
A quick bash solution:
#!/bin/bash
INPUT_FILE=words
declare -A a
while read line ; do
[ "${a[$line]}" = 'nonempty' ] && duplicates=yes && break
a[$line]=nonempty
done < $INPUT_FILE
[ "$duplicates" = yes ] && echo -n "Keep duplicates? [Y/n]" && read keepDuplicates
removeDuplicates() {
sort -u $INPUT_FILE > $INPUT_FILE.tmp
mv $INPUT_FILE.tmp $INPUT_FILE
}
[ "$keepDuplicates" != "Y" ] && removeDuplicates
The script reads line by line from the INPUT_FILE and stores each line in the associative array a as the key and sets the string nonempty as value. Before storing the value, it first checks whether it is already there - if it is it means it found a duplicate and it sets the duplicates flag and then it breaks out of the cycle.
Later it only checks if the flag is set and asks the user whether to keep the duplicates. If they answer anything else than Y then it calls the removeDuplicates function which uses sort -u to remove the duplicates. ${a[$line]} evaluates to the value of the associative array a for the key $line. [ "$duplicates" = yes ] is a bash builtin syntax for a test. If the test succeeds then whatever follows after && is evaluated.
But note that the awk solutions will likely be faster so you may want to use them if you expect to process bigger files.
You can do uniq=yes/no using this awk one-liner:
awk '!seen[$0]{seen[$0]++; i++} END{print (NR>i)?"no":"yes"}' file
awk uses an array of uniques called seen.
Every time we put an element in unique we increment an counter i++.
Finally in END block we compare # of records with unique # of records in this code: (NR>i)?
If condition is true that means there are duplicate records and we print no otherwise it prints yes.

bash picking arguments

I want to write a function for when I have something like the following
echo 1 2 3|pick
Pick will then take the arguments and I will do something with them.
How do I do this?
Are you looking for xargs?
pick() {
read -r arg1 arg2 remainder
echo first arg is $arg1
echo The remaining args are $remainder
}
--EDIT (response to question in comment)
One way to loop through the arguments:
pick() {
read args;
set $args;
while test $# -ne 0; do
echo $1
shift
done
}
On each iteration of the loop, $1 refers to an argument.
If I'm not mistaken, the OP wants the same thing I do: you feed it a string, and if the string containes multiple {words,lines}, it presents you a menu, and you pick one, and it returns the one you pick on stdout.
If there's only one item, it just returns it.
This is useful for--to use my particular use-case--a log file viewer script: you give it a substring of a filename, and it greps through find /var/log -name \*$arg\* -print to see what it can find. If it gets a unique hit, it hands it back to your script, which runs less against it. If it gets more than one hit, it shows you a menu, and lets you pick one.
ISTR that KSH has a builtin for this, but that I wasn't all that impressed with it; I don't recall if bash has one.
I am here because I was searching to see if someone had already written it before writing it myself. :-)
UPDATE: Nope; I wrote it myself:
Here's some example code:
/usr/local/bin/msg:
PATH=$PATH:/usr/local/bin
[ $UID = 0 ] || exec sudo su root -c "$0 $*"
FILE=/var/log/messages
[ $# -eq 1 ] &&
FILE=`find /var/log/ -name \*$1\* -print |
egrep -v '2011|.[0-9]$' |
pick`
echo "$FILE"
less +F $FILE
Since I'm piping the name to less +F I want to grep out archived log files; this is for interactive log viewing.
/usr/local/bin/pick:
# Present the user a bash Select menu, and let them pick
# Try to be smart about multi-line responses
# must take input on stdin if it might be multiline
# get multiline input from stdin
while read LINE </dev/stdin
do
CHOICES+=( $LINE )
done
# add on anything specified as arguments
while [ $# -gt 0 ]
do
CHOICES+=( $1 )
shift
done
# if only one thing to pick, just pick it
if [ ${#CHOICES[*]} -eq 1 ]
then
echo $CHOICES
exit
fi
# eval set $CHOICES
select CHOSEN in ${CHOICES[#]}
do
echo $CHOSEN
exit
done </dev/tty

Resources