File comparision using shell script - shell

I have two files named file1 and file2.
Content of file1 --->
Hello/Good/Morning
World/India
Content of file2 --->
Hello/Good/Morning
World/China
I need to check if the contents of these files are equal or not.Since both the files have "Hello/Good/Morning" in common it should print "EQUAL" as per my requirement.I have written a code for this:
file1=/app/webmcore1/Demo/FORLOOP/Kasturi/xyz/pqr.txt
file2=/app/webmcore1/Demo/FORLOOP/Prashast/xyz/pqr.txt
IFS=` `
for i in cat $file1
do
if [ "$i" != '' ]; then
echo "$i"
for j in cat $file2
do
if [ "$j" != '' ]; then
echo "$j"
if [[ $i -eq $j ]]; then
echo "EQUAL"
fi
fi
done
fi
done
But it is not displaying the output properly.

diff compares files, line by line. If diff filename outputs anything, the files are different.
If the output of diff is empty, they are the same.

There already is a tool to compare files, it's called diff (and actually much more powerful than just deciding equal or not, but can be used for this).
diff -q file1 file2 >/dev/null && echo "EQUAL"
If you also want to to print return something in case the files are not equal:
diff -q file1 file2 >/dev/null && echo "EQUAL" || echo "NOT EQUAL"

So, the files are "equal" if they have any single word in common?
result=$(
comm -12 <(tr '[:space:]' '\n' <file1 | sort) <(tr '[:space:]' '\n' <file2 | sort)
)
[[ -n $result ]] && echo EQUAL
Or, just in bash
words=( $(< file1) )
for word in $(< file2); do
if [[ " ${words[*]} " == *" $word "* ]]; then
echo "EQUAL due to $word"
break
fi
done
EQUAL due to Hello/Good/Morning

Related

Separate Directories from Files with "----" Bash Scripting

I want to separate directories from files in a list. I would like them to appear as follows:
DirectoryName1
DirectoryNameA
DirectoryName_Two
--
FileName1
FileNameA
FileName_Two
Basically, I want two or three dashes in between my directories and files.
Here is what the following code looks like.
DirectoryName1
DirectoryNameA
DirectoryName_Two
FileName1
FileNameA
FileName_Two
Here is my code:
#!/bin/bash
if [[ $# -ge 1 ]]; then
cd "$1" 2> /dev/null
if [[ $? = 1 ]]; then
echo "Please enter a valid directory."
else
ls -a | sort -k 1 | awk '{printf "(%d) %s\n", NR, $0;}'
fi
else
ls -a | sort -k 1| awk '{printf "(%d) %s\n", NR, $0;}'
fi
Here's one possible solution:
#!/bin/bash
if [[ $# -ge 1 ]]; then
dir_to_list=$1
if [[ ! -d ${dir_to_list} ]]; then
echo "Please enter a valid directory."
exit
fi
else
dir_to_list="."
fi
files=`ls --group-directories-first $dir_to_list`
DIRS="TRUE"
i=0
for f in ${files}; do
if [[ ${DIRS} == "TRUE" && ! -d ${dir_to_list}/${f} ]]; then
# First non-directory entry
echo ----
DIRS="FALSE"
fi
(( i++ ))
echo ${i}. ${f}
done
Cheers
Update: fixed bug for listing other directories

bash, adding string after a line

I'm trying to put together a bash script that will search a bunch of files and if it finds a particular string in a file, it will add a new line on the line after that string and then move on to the next file.
#! /bin/bash
echo "Creating variables"
SEARCHDIR=testfile
LINENUM=1
find $SEARCHDIR* -type f -name *.xml | while read i; do
echo "Checking $i"
ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
cat $i | while read LINE; do
((LINENUM=LINENUM+1))
if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
echo "editing $i"
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
fi
done
fi
LINENUM=1
done
the bit I'm having trouble with is
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
if I just use $i at the end, it will output the content to the screen, if I use $i > $i then it will just erase the file and if I use $i >> $i it will get stuck in a loop until the disk fills up.
any suggestions?
Unfortunately awk dosen't have an in-place replacement option, similar to sed's -i, so you can create a temp file and then remove it:
awk '{commands}' file > tmpfile && mv tmpfile file
or if you have GNU awk 4.1.0 or newer, the -i inplace is added, so you can do:
awk -i inplace '{commands}' file
to modify the original
#cat $i | while read LINE; do
# ((LINENUM=LINENUM+1))
# if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
# echo "editing $i"
# awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
# fi
# done
# replaced by
sed -i 's/STRING_TO_SEARCH_FOR/&\n/g' ${i}
or use awk in place of sed
also
# ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
# if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
#by
if [ $( grep -c 'STRING_TO_SEARCH_FOR' ${i} ) -gt 0 ]; then
# if file are huge, if not directly used sed on it, it will be faster (but no echo about finding the file)
If you can, maybe use a temporary file?
~$ awk ... $i > tmpfile
~$ mv tmpfile $i
Or simply awk ... $i > tmpfile && mv tmpfile $i
Note that, you can use mktemp to create this temporary file.
Otherwise, with sed you can insert a line right after a match:
~$ cat f
auie
nrst
abcd
efgh
1234
~$ sed '/abcd/{a\
new_line
}' f
auie
nrst
abcd
new_line
efgh
1234
The command search if the line matches /abcd/, if so, it will append (a\) the line new_line.
And since sed as the -i to replace inline, you can do:
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
echo "editing $i"
sed -i "/STRING_TO_SEARCH_FOR/{a
\new line to insert
}" $i
fi

Bash - sometimes creates only empty output

I am trying to create a bash dictionary script that accepts first argument and creates file named after that, then script accepts next arguments (which are files inside same folder) and outputs their content into file (first argument). It also sorts, deletes symbols etc., but main problem is, that sometimes ouptut file is empty (I am passing one non empty file and one non existing file), after deleting and running script few more times it is sometimes empty sometimes not.
#!/bin/bash
numberoffileargs=$(( $# - 1 ))
exitstat=0
counterexit=0
acceptingstdin=0;
> "$1";
#check if we have given input files given
if [ "$#" -gt 1 ]; then
#for cycle going through input files
for i in "${#:2}"
do
#check whether input file is readable
if [ -r "${i}" ]; then
cat "${i}" >> "$1"
#else redirect to standard output
else
exitstat=2
counterexit=$((counterexit + 1))
echo "file does not exist" 1>&2
fi
done
else
echo "stdin code to be done"
acceptingstdin=1
#stdin input to output file
#stdin=$(cat)
fi
#one word for each line, alphabetical sort, alphabet only, remove duplicates
#all lowercase
#sort -u >> "$1"
if [ "$counterexit" -eq "$numberoffileargs" ] && [ "$acceptingstdin" -eq 0 ]; then
exitstat=3
fi
cat "$1" | sed -r 's/[^a-zA-Z\-]+/ /g' | tr A-Z a-z | tr ' ' '\n' | sort -u | sed '/^$/d' > "$1"
echo "$numberoffileargs"
echo "$counterexit"
echo "$exitstat"
exit $exitstat
Here is your script with some syntax improvement. Your trouble came from the fact that the dictionary was both on input and output on your pipeline; I added a temp file to fix it.
#!/bin/bash
(($# >= 1)) || { echo "Usage: $0 dictionary file ..." >&2 ; exit 1;}
dict="$1"
shift
echo "Creating $dict ..."
>| "$dict" || { echo "Failed." >&2 ; exit 1;}
numberoffileargs=$#
exitstat=0
counterexit=0
acceptingstdin=0
if (($# > 0)); then
for i ; do
#check whether input file is readable
if [ -r "${i}" ]; then
cat "${i}" >> "$dict"
else
exitstat=2
let counterexit++
echo "file does not exist" >&2
fi
done
else
echo "stdin code to be done"
acceptingstdin=1
fi
if ((counterexit == numberoffileargs && acceptingstdin == 0)); then
exitstat=3
fi
sed -r 's/[^a-zA-Z\-]+/ /g' < "$dict" | tr '[:upper:]' '[:lower:]' | tr ' ' '\n' |
sort -u | sed '/^$/d' >| tmp$$
mv -f tmp$$ "$dict"
echo "$numberoffileargs"
echo "$counterexit"
echo "$exitstat"
exit $exitstat
The pipeline might be improved.

how to map one csv file content to second csv file and write it another csv using unix

After writing some unix scripts I am able to manage to get data from different xml files to csv format and now I got stuck with the following problem
file1.csv : contains
1,5,6,7,8
2,3,4,5,9
1,6,10,11,12
1,5,11,12
file2.csv : contains
1,Mango,Tuna,Webby,Through,Franky,Sam,Sumo
2,Franky
3,Sam
4,Sumo
5,Mango,Tuna,Webby
6,Tuna,Webby,Through
7,Through,Sam,Sumo
8,Nothing
9,Sam,Sumo
10,Sumo,Mango,Tuna
11,Mango,Tuna,Webby,Through
12,Mango,Tuna,Webby,Through,Franky
output I want is
1,5,6,7,8
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Tuna,Webby,Through
Through,Sam,Sumo
Nothing
Common word:None
2,3,4,5,9
Franky
Sam
Sumo
Mango,Tuna,Webby
Sam, Sumo
Common Word:None
1,6,10,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Tuna,Webby,Through
Sumo,Mango,Tuna
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Tuna
1,5,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Mango,Tuna,Webby
I apprecaite any help.
Thanks
I got some solution but not complete
##!/bin/bash
count=1
count_2=1
for i in `cat file1.csv`
do
echo $i > $count.txt
cat $count.txt | tr "," "\n" > $count_2.txt
count=`expr $count + 1`
count_2=`expr $count_2 + 1`
done;
#this code will create separte files for each line in file1.csv,
bash file3_search.sh
##########################
file3_search.sh
================
##!/bin/bash
cat file2.csv | sed '/^$/d' | sed 's/[ ]*$//' > trim.txt
dos2unix -q 1.txt 1.txt
dos2unix 2.txt 2.txt
dos2unix 3.txt 3.txt
echo "1st Combination results"
for i in `cat 1.txt`
do
cat trim.txt | egrep -w $i
done > Combination1.txt;
echo "2nd Combination results"
for i in `cat 2.txt`
do
cat trim.txt | egrep -w $i
done > Combination2.txt;
echo "3rd Combination results"
for i in `cat 3.txt`
do
cat trim.txt | egrep -w $i
done > Combination3.txt;
Guys I am not good at programming (I am software tester) please someone can re-factor my code and also please tell me how to get the common word in those Combination.txt file
IMHO it works:
for line in $(cat 1.csv) ; do
echo $line ;
grepline=`echo $line | sed 's/ \+//g;s/,/,|/g;s/^\(.*\)$/^(\1,)/'`;
egrep $grepline 2.csv
egrep $grepline 2.csv | \
awk -F "," '
{ for (i=2;i<=NF;i++)
{s[$i]+=1}
}
END { for (key in s)
{if (s[key]==NR) { tp+=key "," }
}
if (tp!="") {print "Common word(s): " gensub(/,$/,"","g",tp)}
else {print "Common word: None"}}'
echo
done
HTH
Here's an answer for you. It depends on associative array capabilities of bash version 4:
IFS=,
declare -a words
# read and store the words in file2
while read line; do
set -- $line
n=$1
shift
words[$n]="$*"
done < file2.csv
# read file1 and process
while read line; do
echo "$line"
set -- $line
indexes=( "$#" )
NF=${#indexes[#]}
declare -A common
for (( i=0; i<$NF; i++)); do
echo "${words[${indexes[$i]}]}"
set -- ${words[${indexes[$i]}]}
for word; do
common[$word]=$(( ${common[$word]} + 1))
done
done
printf "Common words: "
n=0
for word in "${!common[#]}"; do
if [[ ${common[$word]} -eq $NF ]]; then
printf "%s " $word
(( n++ ))
fi
done
[[ $n -eq 0 ]] && printf "None"
unset common
printf "\n\n"
done < file1.csv

Find lines containing all keywords in bash script

Essentially, I would like something that behaves similarly to:
cat file | grep -i keyword1 | grep -i keyword2 | grep -i keyword3
How can I do this with a bash script that takes a variable-length list of keyword arguments? The script should do a case-insensitive match of lines containing all keywords.
Use this as a script
#! /bin/bash
awk -v IGNORECASE=1 -f <(
P=; for k; do [ -z "$P" ] && P="/$k/" || P="$P&&/$k/"; done
echo "$P{print}"
)
and invoke it as
script.sh keyword1 keyword2 keyword3 < file
I don't know if this is efficient, and I think this is ugly, also there might be some utility for that, but:
#!/bin/bash
unset keywords matchlist
keywords=("$#")
for kw in "${keywords[#]}"; do
matchlist="$matchlist /$kw/ &&"
done
matchlist="${matchlist% &&}"
# awk "$matchlist { print; }" < <(tr '[:upper:]' '[:lower:]' <file)
awk "$matchlist { print; }" file
And yes, it needs some robustness regarding special characters and stuff. It's just to show the idea.
Give this a try:
shopt -s nocasematch
keywords="keyword1|keyword2|keyword3"
while read line; do [[ $line =~ $keywords ]] && echo $line; done < file
Edit:
Here's a version that tests for all keywords being present, not just any:
keywords=(keyword1 keyword2 keyword3) # or keywords=("$#")
qty=${#keywords[#]}
while read line
do
count=0
for keyword in "${keywords[#]}"
do
[[ "$line" =~ $keyword ]] && (( count++ ))
done
if (( count == qty ))
then
echo $line
fi
done < textlines
Found a way to do this with grep.
KEYWORDS=$#
MATCH_EXPR="cat file"
for keyword in ${KEYWORDS};
do
MATCH_EXPR="${MATCH_EXPR} | grep -i ${keyword}"
done
eval ${MATCH_EXPR}
you can use bash 4.0++
shopt -s nocasematch
while read -r line
do
case "$line" in
*keyword1*) f=1;;&
*keyword2*) g=1;;&
*keyword3*)
[ "$f" -eq 1 ] && [ "$g" -eq 1 ] && echo $line;;
esac
done < "file"
shopt -u nocasematch
or gawk
gawk '/keyword/&&/keyword2/&&/keyword3/' file
I'd do it in Perl.
For finding all lines that contain at least one of them:
perl -ne'print if /(keyword1|keyword2|keyword3)/i' file
For finding all lines that contain all of them:
perl -ne'print if /keyword1/i && /keyword2/i && /keyword3/i' file
Here is a script called search.sh in bash that will search lines within a file or folder for all keywords specified:
#!/bin/bash
if [ $# -lt 2 ]; then
echo "[-] $0 file_to_search/folder_to_search keyword1 keyword2 keyword3 ..."
exit
fi
all_args="$#"
i=0
results="" # this will store the cumulative results from each keyword search
for arg in $all_args; do
if [ $i -eq 0 ]; then
# first argument is the file/folder to search
file_to_search="$arg"
i=$(($i + 1))
elif [ $i -eq 1 ]; then
# search the file/folder with first keyword (first search)
results=`grep --color=always -r -n -i "$arg" "$file_to_search"`
i=$(($i + 1))
else
# now keep searching the results from first search for other keywords
results=`echo "$results" | grep --color=always -i "$arg"`
i=$(($i + 1))
fi
done
echo "$results"
Example invocation of script above will search the 'tools.txt' file for 'python' and 'jira' keywords:
./search.sh tools.txt python jira

Resources