In bash, How do I test if a word is not in a list? - bash

Good day everybody,
I want to make an if conditional with the following aim:
have two files, the script check a word of file1 (locate in variable $word2test) if exits in file2 (each word locate in variable $wordINlist) do nothing | if the word is not in file2, print it to stdout
My first approach is:
if ! [[ "$word2test" =~ "$wordINlist" ]] ; then
echo $word2test
fi
Thanks in advance for any suggestion

Try this simple bash sample script :
word=foobar
grep -q "\<$word\>" FILE || echo "$word is *not* in FILE"
Another way with REGEX :
word=foobar
grep -q "^$word *$" FILE || echo "$word is *not* in FILE"

If your files are simple lists of one word per line, try this:
grep -Fvf file2 file1
or
join -v 1 <(sort file1) <(sort file2)

Assuming $wordINlist is an array (you say "list" but I'm assuming you meant array), you can iterate through it like so:
for item in ${wordINlist[#]}; do
[[ $item == $word2test ]] || echo $word2test
done
If $wordINlist is a file, then you can simply grep through it:
egrep -q "\b${word2test}\b" "$wordINlist" || echo "$word2test"
When egrep finds a match it returns true, otherwise it returns false. So that simply says, "either a match was found, or echo $word2test"
If all you're wanting to do is see which items are in file1 and NOT in file2, use comm:
comm -23 <(sort -u file1) <(sort -u file2)

Related

How to compare 2 files word by word and storing the different words in result output file

Suppose there are two files:
File1.txt
My name is Anamika.
File2.txt
My name is Anamitra.
I want result file storing:
Result.txt
Anamika
Anamitra
I use putty so can't use wdiff, any other alternative.
not my greatest script, but it works. Other might come up with something more elegant.
#!/bin/bash
if [ $# != 2 ]
then
echo "Arguments: file1 file2"
exit 1
fi
file1=$1
file2=$2
# Do this for both files
for F in $file1 $file2
do
if [ ! -f $F ]
then
echo "ERROR: $F does not exist."
exit 2
else
# Create a temporary file with every word from the file
for w in $(cat $F)
do
echo $w >> ${F}.tmp
done
fi
done
# Compare the temporary files, since they are now 1 word per line
# The egrep keeps only the lines diff starts with > or <
# The awk keeps only the word (i.e. removes < or >)
# The sed removes any character that is not alphanumeric.
# Removes a . at the end for example
diff ${file1}.tmp ${file2}.tmp | egrep -E "<|>" | awk '{print $2}' | sed 's/[^a-zA-Z0-9]//g' > Result.txt
# Cleanup!
rm -f ${file1}.tmp ${file2}.tmp
This uses a trick with the for loop. If you use a for to loop on a file, it will loop on each word. NOT each line like beginners in bash tend to believe. Here it is actually a nice thing to know, since it transforms the files into 1 word per line.
Ex: file content == This is a sentence.
After the for loop is done, the temporary file will contain:
This
is
a
sentence.
Then it is trivial to run diff on the files.
One last detail, your sample output did not include a . at the end, hence the sed command to keep only alphanumeric charactes.

Read each line of a column of a file and execute grep

I have file.txt exemplary here:
This line contains ABC
This line contains DEF
This line contains GHI
and here the following list.txt:
contains ABC<TAB>ABC
contains DEF<TAB>DEF
Now I am writing a script that executes the following commands for each line of this external file list.txt:
take the string from column 1 of list.txt and search in a third file file.txt
if the first command is positive, return the string from column 2 of list.txt
So my output.txt is:
ABC
DEF
This is my code for grep/echo with putting the query/return strings manually:
if grep -i -q 'contains abc' file.txt
then
echo ABC >output.txt
else
echo -n
fi
if grep -i -q 'contains def' file.txt
then
echo DEF >>output.txt
else
echo -n
fi
I have about 100 search terms, which makes the task laborious if done manually. So how do I include while read line; do [commands]; done<list.txt together with the commands about column1 and column2 inside that script?
I would like to use simple grep/echo/awkcommands if possible.
Something like this?
$ awk -F'\t' 'FNR==NR { a[$1] = $2; next } {for (x in a) if (index($0, x)) {print a[x]}} ' list.txt file.txt
ABC
DEF
For the lines of the first file (FNR==NR), read the key-value pairs to array a. Then for the lines of the second line, loop through the array, check if the key is found on the line, and if so, print the stored value. index($0, x) tries to find the contents of x from (the current line) $0. $0 ~ x would instead take x as a regex to match with.
If you want to do it in the shell, starting a separate grep for each and every line of list.txt, something like this:
while IFS=$'\t' read k v ; do
grep -qFe "$k" file.txt && echo "$v"
done < list.txt
read k v reads a line of input and splits it (based on IFS) into k and v.
grep -F takes the pattern as a fixed string, not a regex, and -q prevents it from outputting the matching line. grep returns true if any matching lines are found, so $v is printed if $k is found in file.txt.
Using awk and grep:
for text in `awk '{print $4}' file.txt `
do
grep "contains $text" list.txt |awk -F $'\t' '{print $2}'
done

Bash Shell: Infinite Loop

The problem is the following I have a file that each line has this form:
id|lastName|firstName|gender|birthday|joinDate|IP|browser
i want to sort alphabetically all the firstnames in that file and print them one on each line but each name only once
i have created the following program but for some reason it creates an infinite loop:
array1=()
while read LINE
do
if [ ${LINE:0:1} != '#' ]
then
IFS="|"
array=($LINE)
if [[ "${array1[#]}" != "${array[2]}" ]]
then
array1+=("${array[2]}")
fi
fi
done < $3
echo ${array1[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
NOTES
if [ ${LINE:0:1} != '#' ] : this command is used because there are comments in the file that i dont want to print
$3 : filename
array1 : is used for all the seperate names
Wow, there's a MUCH simpler and cleaner way to achieve this, without having to mess with the IFS variable or using arrays. You can use "for" to do this:
First I created a file with the same structure as yours:
$ cat file
id|lastName|Douglas|gender|birthday|joinDate|IP|browser
id|lastName|Tim|gender|birthday|joinDate|IP|browser
id|lastName|Andrew|gender|birthday|joinDate|IP|browser
id|lastName|Sasha|gender|birthday|joinDate|IP|browser
#id|lastName|Carly|gender|birthday|joinDate|IP|browser
id|lastName|Madson|gender|birthday|joinDate|IP|browser
Here's the script I wrote using "for":
#!/bin/bash
for LINE in `cat file | grep -v "^#" | awk -F'|' '{print$3}' | sort -u`
do
echo $LINE
done
And here's the output of this script:
$ ./script.sh
Andrew
Douglas
Madson
Sasha
Tim
Explanation:
for LINE in `cat file`
Creates a loop that reads each line of "file". The commands between ` are run by linux, for example, if you wanted to store the date inside of a variable you could use "VARDATE=`date`".
grep -v "^#"
The option -v is used to exclude results matching the pattern, in this case the pattern is "^#". The "^" character means "line begins with". So grep -v "^#" means "exclude lines beginning with #".
awk -F'|' '{print$3}'
The -F option switches the column delimiter from the default (the default is a space) to whatever you put between ' after it, in this case the "|" character.
The '{print$3}' prints the 3rd column.
sort -u
And the "sort -u" command to sort the names alphabetically.

bash script: check if all words from one file are contained in another, otherwise issue error

I was wondering if you could help. I am new to bash scripting.
I want to be able to compare two lists. File1.txt will contain a list of a lot of parameters and file2.txt will only contain a section of those parameters.
File1.txt
dbipAddress=192.168.175.130
QAGENT_QCF=AGENT_QCF
QADJUST_INVENTORY_Q=ADJUST_INVENTORY_Q
QCREATE_ORDER_Q=CREATE_ORDER_Q
QLOAD_INVENTORY_Q=LOAD_INVENTORY_Q
File2.txt
AGENT_QCF
ADJUST_INVENTORY_Q
CREATE_ORDER_Q
I want to check if all the Qs in file1.txt are contained in file2.txt (after the =). If they aren't, then the bash script should stop and echo a message.
So, in the example above the script should stop as File2.txt does not contain the following Q: LOAD_INVENTORY_Q.
The Qs in file1.txt or file2.txt do not follow any particular order.
The following command will print out lines in file1.txt with values (anything appearing after =) that do not appear in file2.txt.
[me#home]$ awk -F= 'FNR==NR{keys[$0];next};!($2 in keys)' file2.txt file1.txt
dbipAddress=192.168.175.130
QLOAD_INVENTORY_Q=LOAD_INVENTORY_Q
Breakdown of the command:
awk -F= 'FNR==NR{keys[$0];next};!($2 in keys)' file2.txt file1.txt
--- ---------------------- -------------
| | |
change the | Target lines in file1.txt where
delimiter | the second column (delimited by `=`) do
to '=' | not exist in the keys[] array.
Store each line in
file2.txt as a key
in the keys[] array
To do something more elaborate, say if you wish to run the command as a pre-filter to make sure the file is valid before proceeding with your script, you can use:
awk -F= 'FNR==NR{K[$0];N++;next};!($2 in K) {print "Line "(NR-N)": "$0; E++};END{exit E}' file2.txt file1.txt
ERRS=$?
if [ $ERRS -ne 0 ]; then
# errors found, do something ...
fi
That will print out all lines (including line numbers) in file1.txt that do not meet the bill, and returns an exit code that matches the number of non-conforming lines. That way your script can detect the errors easily by checking $? and respond accordingly.
Example output:
[me#home]$ awk -F= 'FNR==NR{K[$0];N++;next};!($2 in K) {print "Line "(NR-N)": "$0;E++};END{exit E}' file2.txt file1.txt
Line 1: dbipAddress=192.168.175.130
Line 5: QLOAD_INVENTORY_Q=LOAD_INVENTORY_Q
[me#home]$ echo $?
2
You can use cut to get only the part after =. comm can be used to output the lines contained in the first file but not the second one:
grep ^Q File1.txt | cut -d= -f2- | sort | comm -23 - <(sort File2.txt)
The following command line expression will filter out the lines that occur in file2.txt but not file1.txt:
cat file1.txt | grep -Fvf file2.txt | grep '^Q'
explanation:
-F : match patterns exactly (no expansion etc.) ; much faster
-v : only print lines that don't match
-f : get your patterns from the file specified
| grep '^Q' : pipe the output into grep, and look for lines that start with "Q"
This isn't exactly "stop the bash script when..." since it will process and print every mismatch; also, it doesn't test that there's an "=" in front of the pattern - but I hope it's useful.
Here's another way:
missing=($(comm -23 <(awk -F= '/^Q/ {print $2}' file1.txt | sort) <(sort file2.txt)))
if (( ${#missing[#]} )); then
echo >&2 "The following items are missing from file2.txt:"
printf '%s\n' "${missing[#]}"
exit 1
fi
Assuming that the relevant lines in file1.txt always start with a Q:
grep "^Q" file1.txt | while IFS= read -r line
do
what=${line#*=}
grep -Fxq "$what" file2.txt || echo "error: $what not found"
done
Output:
error: LOAD_INVENTORY_Q not found

Nested for loop comparing files

I am trying to write a bash script that looks at two files with the same name, each in a different directory.
I know this can be done with diff -r, however, I would like to take everything that is in the second file that is not in the first file and output it into an new file (also with the same file name)
I have written a (nested) loop with a grep command but it's not good and gives back a syntax error:
#!/bin/bash
FILES=/Path/dir1/*
FILES2=/Path/dir2/*
for f in $FILES
do
for i in $FILES2
do
if $f = $i
grep -vf $i $f > /Path/dir3/$i
done
done
Any help much appreciated.
try this
#!/bin/bash
cd /Path/dir1/
for f in *; do
comm -13 <(sort $f) <(sort /Path/dir2/$f) > /Path/dir3/$f
done
if syntax in shell is
if test_command;then commands;fi
commands are executed if test_command exit code is 0
if [ $f = $i ] ; then grep ... ; fi
but in your case it will be more efficient to get the file name
for i in $FILES; do
f=/Path/dir2/`basename $i`
grep
done
finally, maybe this will be more efficient than grep -v
comm -13 <(sort $f) <(sort $i)
comm -13 will get everything which is in the second and not in first ; comm without arguments generates 3 columns of output : first is only in first, second only in second and third what is common.
-13 or -1 -3 removes first and third column
#!/bin/bash
DIR1=/Path/dir1
DIR2=/Path/dir2
DIR3=/Path/dir3
for f in $DIR1/*
do
for i in $DIR2/*
do
if [ "$(basename $f)" = "$(basename $i)" ]
then
grep -vf "$i" "$f" > "$DIR3/$(basename $i)"
fi
done
done
This assumes no special characters in filenames. (eg, whitespace. Use double quotes if that is unacceptable.):
a=/path/dir1
b=/path/dir2
for i in $a/*; do test -e $b/${i##*/} &&
diff $i $b/${i##*/} | sed -n '/^< /s///p'; done

Resources