Comparison function that compares two text files in Unix - bash

I was wondering if anyone could tell me if there is a function available in unix, bash that compares all of the lines of the files. If they are different it should output true/false, or -1,0,1. I know these cmp functions exist in other languages. I have been looking around the man pages but have been unsuccessful. If it is not available, could someone help me come up with an alternative solution?
Thanks

There are several ways to do this:
cmp -s file1 file2: Look at the value of $?. Zero if both files match or non-zero otherwise.
diff file1 file2 > /dev/null: Some forms of the diff command can take a parameter that tells it not to output anything. However, most don't. After all, you use diff to see the differences between two files. Again, the exit code (you can check the value of $? will be 0 if the files match and non-zero otherwise.
You can use these command in a shell if statement:
if cmp -s file1 file2
then
echo "The files match"
else
echo "The files are different"
fi
The diff command is made specifically for text files. The cmp command should work with all binary files too.

There is a simple cmp file file command that does just that. It returns 0 if they are equal and 1 if they are different, so it's trivial to use in ifs:
if cmp file1 file1; then
...
fi
Hope this helps =)

#!/bin/bash
file1=old.txt
file2=new.txt
echo " TEST 1 : "
echo
if [ $( cmp -s ${file1} ${file2}) ]
then
echo "The files match : ${file1} - ${file2}"
else
echo "The files are different : ${file1} - ${file2}"
fi
echo
echo " TEST 2 : "
echo
bool=$(cmp -s "$file1" "$file2" )
if cmp -s "$file1" "$file2"
then
echo "The files match"
else
echo "The files are different"
fi
echo
echo " TEST 3 : md5 / md5sum - compute and check MD5 message digest"
echo
md1=$(md5 ${file1});
md2=$(md5 ${file2});
mdd1=$(echo $md1 | awk '{print $4}' )
mdd2=$(echo $md2 | awk '{print $4}' )
# or md5sum depends on your linux flavour :D
#md1=$(md5sum ${file1});
#md2=$(md5sum ${file2});
#mdd1=$(echo $md1 | awk '{print $1}' )
#mdd2=$(echo $md2 | awk '{print $1}' )
echo $md1
echo $mdd1
echo $md2
echo $mdd2
echo
#if [ $mdd1 = $mdd2 ];
if [ $mdd1 -eq $mdd2 ];
then
echo "The files match : ${file1} - ${file2}"
else
echo "The files are different : ${file1} - ${file2}"
fi

You could do an md5 on the two files, then compare the results in bash.
No Unix box here to test, but this should be right.
#!/bin/bash
md1=$(md5 file1);
md2=$(md5 file2);
if [ $md1 -eq $ $md2 ]; then
echo The same
else
echo Different
fi

echo "read first file"
read f1
echo "read second file"
read f2
diff -s f1 f2 # prints if both files are identical

Related

How to use bash variable prefixes under sh, ksh, csh

I have bash script which checks presence of certain files and that the content has a valid format. It uses variable prefixes so i can easily add/remove new files w/o the need of further adjustments.
Problem is that i need to run this on AIX servers where bash is not present. I've adjusted the script except the part with variable prefixes. After some attempts i am lost and have no idea how to properly migrate the following piece of code so it runs under sh ( $(echo ${!ifile_#}) ). Alternatively i have ksh or csh if plain sh is not an option.
Thank you in advance for any help/hints
#!/bin/sh
# Source files
ifile_one="/path/to/file/one.csv"
ifile_two="/path/to/file/two.csv"
ifile_three="/path/to/file/three.csv"
ifile_five="/path/to/file/four.csv"
min_columns='10'
existing_files=""
nonexisting_files=""
valid_files=""
invalid_files=""
# Check that defined input-files exists and can be read.
for input_file in $(echo ${!ifile_#})
do
if [ -r ${!input_file} ]; then
existing_files+="${!input_file} "
else
nonexisting_files+="${!input_file} "
fi
done
echo "$existing_files"
echo "$nonexisting_files"
# Check that defined input files have proper number of columns.
for input_file_a in $(echo "$existing_files")
do
check=$(grep -v "^$" $input_file_a | sed 's/[^;]//g' | awk -v min_columns="$min_columns" '{ if (length == min_columns) {print "OK"} else {print "KO"} }' | grep -i KO)
if [ ! -z "$check" ]; then
invalid_files+="${input_file_a} "
else
valid_files+="${input_file_a} "
fi
done
echo "$invalid_files"
echo "$valid_files"
Bash returns expected output (of the four ECHOes):
/path/to/file/one.csv /path/to/file/two.csv /path/to/file/three.csv
/path/to/file/four.csv
/path/to/file/three.csv
/path/to/file/one.csv /path/to/file/two.csv
ksh/sh throws:
./report.sh[14]: "${!ifile_#}": 0403-011 The specified substitution is not valid for this command.
Thanks #Benjamin W. and #user1934428 , ksh93 arrays are the answer.
So bellow code works for me as desired.
#!/bin/ksh93
typeset -A ifile
ifile[one]="/path/to/file/one.csv"
ifile[two]="/path/to/file/two.csv"
ifile[three]="/path/to/file/three.csv"
ifile[whatever]="/path/to/file/something.csv"
existing_files=""
nonexisting_files=""
for input_file in "${!ifile[#]}"
do
if [ -r ${ifile[$input_file]} ]; then
existing_files+="${ifile[$input_file]} "
else
nonexisting_files+="${ifile[$input_file]} "
fi
done

Get number of line of word number X in file

Need to make a shell script that splits every csv file that uses \n as separator, the limit per file is the number of words and
I can't cut the line in half.
Finished script with the help of a wizard!
Example:
sh SliceByWords.sh 1000 .
Slices every file by 1000 words and put every part into subfolder
function has_number_number_of_words {
re='^[0-9]+$'
if ! [[ $1 =~ $re ]] ; then
echo "error: Not a number, please run the command with the number of words per file" >&2; exit 1
fi
}
#MAIN
has_number_number_of_words $1
declare -i WORDLIMIT=$1 # N of lines to part each file
subdir="Result"
mkdir $subdir
format=*.csv
for name in $format; do mv "$name" "${name// /___}"; done
for i in $format;
do
if [[ "$i" == "$format" ]]
then
echo "No Files"
else
( locali=$(echo $i | awk '{gsub(/ /,"\\ ");print}');
localword=$i;
FILENAMEWITHOUTEXTENSION="${localword%.*}" ;
subnoext=$subdir"/"$FILENAMEWITHOUTEXTENSION;
echo Processing file "$FILENAMEWITHOUTEXTENSION";
awk -v NOEXT=$subnoext -v wl=$WORDLIMIT -F" " 'BEGIN{fn=1}{c+=NF}{sv=NOEXT"_snd_"fn".csv";print $0>sv;}c>wl{c=0;++fn;close(sv);}' $localword;
)&
fi
done
wait #wait
for name in $format; do mv "$name" "${name//___/ }"; done
echo All files done.
Since i couldnt figure out how to enter awk files with spaces , im using
for name in $format; do mv "$name" "${name//___/ }"; done
I think this would be a lot easier to handle with awk:
awk -F" " 'BEGIN{filenumber=1}{counter+=NF}{print $0 > FILENAME"_part_"filenumber} counter>1000{counter=0;++filenumber}' yourinputfile
awk here is:
Splitting each line by space -F" "
Before processing the file set the filenumber variable to 1
Bump the counter variable by the number of fields in the line {counter+=NF}
Print out the line to the file, numbered by a variable. Using the FILENAME built-in variable here to pull through yourinputfile. {print $0 > FILENAME"_part_"filenumber}
If the counter has popped over 1000, then send it back to 0 and bump the filenumber variable by 1 counter>1000{counter=0;++filenumber}
Minimized a bit:
awk -F" " 'BEGIN{fn=1}{c+=NF}{print $0>FILENAME"_part_"fn}c>1000{c=0;++fn}' yourinputfile

How to list files with words exceeding n characters in all subdirectories

I have to write a shell script that creates a file containing the name of each text files from a folder (given as parameter) and it's subfolders that contain words longer than n characters (read n from keyboard).
I wrote the following code so far :
#!/bin/bash
Verifies if the first given parameter is a folder:
if [ ! -d $1 ]
then echo $1 is not a directory\!
exit 1
fi
Reading n
echo -n "Give the number n: "
read n
echo "You entered: $n"
Destination where to write the name of the files:
destinatie="destinatie"
the actual part that i think it makes me problems:
nr=0;
#while read line;
#do
for fisier in `find $1 -type f`
do
counter=0
for word in $(<$fisier);
do
file=`basename "$fisier"`
length=`expr length $word`
echo "$length"
if [ $length -gt $n ];
then counter=$(($counter+1))
fi
done
if [ $counter -gt $nr ];
then echo "$file" >> $destinatie
fi
done
break
done
exit
The script works but it does a few more steps that i don't need.It seems like it reads some files more than 1 time. If anyone can help me please?
Does this help?
egrep -lr "\w{$n,}" $1/* >$destinatie
Some explanation:
\w means: a character that words consist of
{$n,} means: number of consecutive characters is at least $n
Option -l lists files and does not print the grepped text and -r performs a recursive scan on your directory in $1
Edit:
a bit more complete version around the egrep command:
#!/bin/bash
die() { echo "$#" 1>&2 ; exit 1; }
[ -z "$1" ] && die "which directory to scan?"
dir="$1"
[ -d "$dir" ] || die "$dir isn't a directory"
echo -n "Give the number n: "
read n
echo "You entered: $n"
[ $n -le 0 ] && die "the number should be > 0"
destinatie="destinatie"
egrep -lr "\w{$n,}" "$dir"/* | while read f; do basename "$f"; done >$destinatie
This code has syntax errors, probably leftovers from your commented-out while loop: It would be best to remove the last 3 lines: done causes the error, break and exit are unnecessary as there is nothing to break out from and the program always terminates at its end.
The program appears to output files multiple times because you just append to $destinatie. You could simply delete that file when you start:
rm "$destinatie"
You echo the numbers to stdout (echo "$length") and the file names to $destinatie (echo "$file" >> $destinatie). I do not know if that is intentional.
I found the problem.The problem was the directory in which i was searching.Because i worked on the files from the direcotry and modified them , it seems that there remained some files which were not displayed in file explorer but the script would find them.i created another directory and i gived it as parameter and it works. Thank you for your answers
.

Using bash, separate servers into separate file depending on even or odd numbers

The output comes from a command I run from our netscaler. It outputs the following ... One thing to note is that the middle two numbers change but the even/odd criteria is always on the last digit. We never have more than 2 digits, so we'll never hit 10.
WC-01-WEB1
WC-01-WEB4
WC-01-WEB3
WC-01-WEB5
WC-01-WEB8
I need to populate a file called "even" and "odds." If we're dealing with numbers I can figure it out, but having the number within a string is throwing me off.
Example code but I'm missing the part where I need to match the string.
if [ $even_servers -eq 0 ]
then
echo $line >> evenfile
else
echo $line >> oddfile
fi
This is a simple awk command:
awk '/[02468]$/{print > "evenfile"}; /[13579]$/{print > "oddfile"}' input.txt
There must be better way.
How about this version:
for v in `cat <my_file>`; do export type=`echo $v | awk -F 'WEB' '{print $2%2}'`; if [ $type -eq 0 ]; then echo $v >> evenfile ; else echo $v >> oddfile; fi; done
I assume your list of servers is stored in the filename <my_file>. The basic idea is to tokenize on WEB using awk and process the chars after WEB to determine even-ness. Once this is known, we export the value to a variable type and use this to selectively dump to the appropriate file.
For the case when the name is the output of another command:
export var=`<another command>`; export type=`echo $var | awk -F 'WEB' '{print $2%2}'`; if [ $type -eq 0 ]; then echo $var >> evenfile ; else echo $var >> oddfile; fi;
Replace <another command> with your perl script.
As always grep is your friend:
grep "[2468]$" input_file > evenfile
grep "[^2468]$" input_file > oddfile
I hope this helps.

I want a to compare a variable with files in a directory and output the equals

I am making a bash script where I want to find files that are equal to a variable. The equals will then be used.
I want to use "mogrify" to shrink a couple of image files that have the same name as the ones i gather from a list (similar to "dpkg -l"). It is not "dpkg -l" I am using but it is similar. My problem is that it prints all the files not just the equals. I am pretty sure this could be done with awk instead of a for-loop but I do not know how.
prog="`dpkg -l | awk '{print $1}'`"
for file in $dirone* $dirtwo*
do
if [ "basename ${file}" = "${prog}" ]; then
echo ${file} are equal
else
echo ${file} are not equal
fi
done
Could you please help me get this working?
First, I think there's a small typo. if [ "basename ${file}" =... should have backticks inside the double quotes, just like the prog=... line at the top does.
Second, if $prog is a multi-line string (like dpkg -l) you can't really compare a filename to the entire list. Instead you have to compare one item at a time to the filename.
Here's an example using dpkg and /usr/bin
#!/bin/bash
progs="`dpkg -l | awk '{print $2}'`"
for file in /usr/bin/*
do
base=`basename ${file}`
for prog in ${progs}
do
if [ "${base}" = "${prog}" ]; then
echo "${file}" matches "${prog}"
fi
done
done
The condition "$file = $prog" is a single string. You should try "$file" = "$prog" instead.
The following transcript shows the fix:
pax> ls -1 qq*
qq
qq.c
qq.cpp
pax> export xx=qq.cpp
pax> for file in qq* ; do
if [[ "${file} = ${xx}" ]] ; then
echo .....${file} equal
else
echo .....${file} not equal
fi
done
.....qq equal
.....qq.c equal
.....qq.cpp equal
pax> for file in qq* ; do
if [[ "${file}" = "${xx}" ]] ; then
echo .....${file} equal
else
echo .....${file} not equal
fi
done
.....qq not equal
.....qq.c not equal
.....qq.cpp equal
You can see in the last bit of output that only qq.cpp is shown as equal since it's the only one that matches ${xx}.
The reason you're getting true is because that's what non-empty strings will give you:
pax> if [[ "" ]] ; then
echo .....equal
fi
pax> if [[ "x" ]] ; then
echo .....equal
fi
.....equal
That's because that form is the string length checking variation. From the bash manpage under CONDITIONAL EXPRESSIONS:
string
-n string
True if the length of string is non-zero.
Update:
The new code in your question won't quite work as expected. You need:
if [[ "$(basename ${file})" = "${prog}" ]]; then
to actually execute basename and use its output as the first part of the equality check.
you can use case/esac
case "$file" in
"$prog" ) echo "same";;
esac

Resources