compare multiple directories on remote hosts - bash

Is there a way to diff on contents of multiple directories instead of two directories? Or diff a single directory on multiple hosts.
I wrote the following bash script to diff a directory on three hosts
#!/bin/bash
if [[ -n "$(diff <(ssh user#host1 ls -r /user/test1) <(ssh user#host2 ls -r /user/test1))" || -n "$(diff <(ssh user#host2 ls -r /user/test1) <(ssh user#host3 ls -r /user/test1))" ]]; then
echo "There are differences"
fi
Is there a better way to do this?

Yes, GNU diff has an option --from-file that allows the comparison of one reference file or directory to many others.
diff -r --from-file=ref-dir dir1 dir2 ... dirN
Note that it will only compare ref-dir to dir1, ..., dirN; it won't compare dir1 to dir2, ..., dirN.
As for your remote directories, since you have ssh access to the machines, you can mount them locally with sshfs in order to execute diff over them.

You could use MD5 checksums for files lists for each host. It will allow you to use the same script for different count of servers. If lists are the same, you should receive the same values for checksums. And then you just compare all the sums with the previous one. If it differs from any other checksum, then you have differences.
#!/bin/bash
MD5SUMS=$(
for hostname in host{1,2,3}
do
result=$(ssh user#${hostname} ls -r /user/test1 | md5sum)
result=${result%% *}
done
)
PREVSUM=""
for SUM in ${MD5SUMS}
do
if [ -z "$PREVSUM" ]
then
PREVSUM=$SUM
continue
else
if [ "$PREVSUM" != "$SUM" ]
then
echo "There are differences"
fi
PREVSUM=$SUM
fi
done

Related

Listing only directories using ls in Bash but preserve ls format and without dirname

There are already many answers for this similar question. But none of them satisfy my requirements. I want
List all directories under a directory without using glob (*) syntax, i.e. I want to directly use lsdir somedir
Output should containing basename of the directories like when you just use ls, like:
$ lsdir path/to/some/dir
dir1 dir2 dir3 dir4
but not this:
$ lsdir path/to/dir
path/to/dir/dir1 path/to/dir/dir2 path/to/dir/dir3 path/to/dir/dir4
To satisfy requirement 1, it seems feasible to define a function, but anyway we are going to use -d option, to list the directories themselves of the ls command parameters.
And when using -d option, ls list directory names with its parent prepended, like above.
ls format (color, align, sort) should be preserved.
To satisfy requirement 2, we can use find but in this way we lose all the ls output format, like coloring (based on customized dircolors theme), alignment (output in aligned columns), sorting (sorting customized with various flags and in a column-first manner), and maybe some other things.
I know it's too greedy to want this many features simultaneously, and indeed I can live without all of them.
It's possible to emulate ls output format manually but that's too inconsistent.
I wonder if there is a way to achieve this and still utilize ls, i.e. how to achieve requirement 2 using ls.
This may be what you're looking for:
cd path/to/dir && dirs=(*/) && ls -d "${dirs[#]%?}"
or, perhaps
(shopt -s nullglob; cd path/to/dir && dirs=(*/) && ((${#dirs[#]} > 0)) && ls -d "${dirs[#]%?}")
The second version runs in a subshell and prints nothing if there is no any subdirectory inside path/to/dir.
Based on #M. Nejat Aydin's excellent answer, I am going to improve a little more to make it a useful command, especially with respect to processing options and multiple directories.
list_directories() {
local opts=()
local args=()
for i in $(seq $#); do
if [[ "${!i}" == -* ]]; then
opts+=("${!i}")
else
args+=("${!i}")
fi
done
(( ${#args[#]} == 0 )) && args=('.')
local -i args_n_1=${#args[#]}-1
for i in $(seq 0 $args_n_1); do
if (( ${#args[#]} > 1 )); then
(( i > 0 )) && echo
echo "${args[i]}:"
fi
(
shopt -s nullglob
cd "${args[i]}" &&
dirs=(*/) &&
(( ${#dirs[#]} > 0 )) &&
ls -d "${opts[#]}" "${dirs[#]%?}"
)
done
}
alias lsd=list_directories
This lsd can be used with any number of ls options and directories freely mixed.
$ lsd -h dir1 dir2 -rt ~
Note: Semantic meaning changes when you use globs with lsd.
lsd path/to/dir* list all directories under each directory starting with "path/to/dir".
To list all directories starting with "path/to/dir", use plain old ls -d path/to/dir*.

Bash script backup, check if directory contains the files from another directory

I am making a bash backup script and I want to implement a functionality that checks if the files from a directory are already contained in another directory, if they are not I want to output the name of these files
#!/bin/bash
TARGET_DIR=$1
INITIAL_DIR=$2
TARG_ls=$(ls -A $1)
INIT_ls=$(ls -A $2)
if [[ "$(ls -A $2)" ]]; then
if [[ ! -n "$(${TARG_ls} | grep ${INIT_ls})" ]]; then
echo All files in ${INITIAL_DIR} have backups for today in ${TARGET_DIR}
exit 0
else
#code for listing the missing files
fi
else
echo Error!! ${INITIAL_DIR} has no files
exit 1
fi
I have thought about storing the ls output of both directories inside strings and comparing them, as it is shown in the code, but in the event where I have to list the files from INITIAL_DIR that are missing in TARGET_DIR, I just don't know how to proceed.
I tried using the diff command comparing the two directories but that takes into account the preexisting files of TARGET_DIR.
In the above code if [[ "$(ls -A $2)" ]]; checks if the CURRENT_DIR contains any files and if [[ ! -n "$(${TARG_ls} | grep ${INIT_ls})" ]]; checks if the target directory contains all the initial directory files.
Anyone have a suggestion, hint?
you can use comm command
$ comm <(ls -A a) <(ls -A b)
will give you files in a only, both in a and b, and in only b in three columns. To get the list of files in a only for example
$ comm -23 <(ls -A a) <(ls -A b)
rsync has a --dry-run switch that will show you what files have changed between 2 directories. Before doing rsync copies of my home directory I preview the changes this way to see if there could be evidence of mass mal encryption or corruption before proceeding.

why 'ls' command printing the directory content multiple times

I have the following shell script in which I want to check the specific directory content on the remote machines and print them in a file.
file=serverList.csv
n=0
while [ $n -le 2 ]
do
while IFS=: read -r f1 f2
do
# echo line is stored in $line
if echo $f1 | grep -q "xx.xx.xxx";
then
ssh user#$f1 ls path/*war_* > path/$f1.txt < /dev/null; ls path/*zip_* >> path/$f1.txt < /dev/null;
ssh user#$f1 ls -d /apps/jetty*_* >> path/$f1.txt < /dev/null;
fi
done < "$file"
sleep 15
n=$(( n+1 ))
done
I am using this script inside a cron job for every 2 minute as following:
*/2 * * * * /path/myscript.sh
but somehow I am ending up with the following output file:
/apps/jetty/webapps_wars/test_new.war
path/ReleaseTest.static.zip_2020-08-05
path/ReleaseTest.static.zip_2020-08-05
path/ReleaseTest.static.zip_2020-08-05
path/jetty_xx.xx_2020-08-05
path/jetty_new
path/jetty_xx.xx_2020-08-05
path/jetty_new
I am not sure why am I getting the files in the list twice, sometimes 3 times. but I execute the shell directly from putty, it works fine. What do I need to change in order to correct this script?
Example:
~$ cd tmp
~/tmp$ mkdir test
~/tmp$ cd !$
cd test
~/tmp/test$ mkdir -p apps/jetty/webapp_wars/ && touch apps/jetty/webapp_wars/test_new.war
~/tmp/test$ mkdir path
~/tmp/test$ touch path/{ReleaseTest.static.zip_2020-08-05,jetty_xx.xx_2020-08-05,jetty_new}
~/tmp/test$ cd ..
~/tmp$ listpath=$(find test/path \( -name "*2020-08-05" -o -name "*new" \) )
~/tmp$ listapps=$(find test/apps/ -name "*war" )
~/tmp$ echo ${listpath[#]}" "${listapps[#]} | tr " " "\n" | sort > resultfile
~/tmp$
~/tmp$ cat resultfile
test/apps/jetty/webapp_wars/test_new.war
test/path/jetty_new
test/path/jetty_xx.xx_2020-08-05
test/path/ReleaseTest.static.zip_2020-08-05
~/tmp$ rm -rf test/ && unset listapps && unset listpath && rm resultfile
~/tmp$
This way you get only one result for each pattern you are looking for in your if...then...else block of code.
Just adapt the ssh ..... find commands and take care of quotes & parentheses but there is the easiest solution, this way you do not have to rewrite the script from scratch. And be careful on local / remote variables if you use them.
You really should not use ls but the fundamental problem is probably that three separate commands with three separate wildcards could match the same file three times.
Also, one of your commands is executed locally (you forgot to put ssh etc in front of the second one), so if the wildcard matches on your local computer, that would produce a result which doesn't reflect the situation on the remote server.
Try this refactoring.
file=serverList.csv
n=0
while [ $n -le 2 ]
do
while IFS=: read -r f1 f2
do
# echo line is stored in $line <- XXX this is not true
if echo "$f1" | grep -q "xx.xx.xxx";
then
ssh user#$f1 "printf '%s\n' path/*war_* path/*zip_* /apps/jetty*_*" | sort -u >path/"$f1".txt < /dev/null
fi
done < "$file"
sleep 15
n=$(( n+1 ))
done
The sort gets rid of any duplicates. This assumes none of your file names contain newlines; if they do, you'd need to use something which robustly handles them (try printf '%s\0' and sort -z but these are not portable).
ls would definitely also accept three different wildcards but like the link above explains, you really never want to use ls in scripts.

Recursively compare filenames in two directories, ignoring contents and return an exit code depending on result

This is for a bash script to compare a local and remote mount.
Local files and directories are symlinked to remote, except for new files.
I need to replace diff because it compares the contents and becomes very slow over the internet.
I have been trying things like diff <( ls /local/ ) <( ls /remote/ ) and diff <( tree /local/ ) <( tree /remote/ ) but cannot make them work, because it they are not recursive or the symlinks get in the way.
rsync would be my go to for determining missing files, but I cannot find a way to manage exit codes and integrate into the script.
Script looks something like this:
#!/bin/bash
set -e
echo "Backup Command"
sleep 10
while :; do
echo "Testing backup"
if
diff -r /local/ /remote/ ; then
echo "diff matches!"
break
else
echo "diff didn't match, waiting for cache"
sleep 600
fi
done
echo "Finished!"
Use cmp to check if two things are the same.
Use find to get all the possible paths.
sort the paths before passing to cmp, so they are the same.
Use zero terminated strings to handle all special characters in filenames.
if cmp -s <(cd local && find -print0 | sort -z) <(cd remote && find -print0 | sort -z); then
echo "local and remote have the same directory and file structure"
else
echo "Oooh! local and remote differ. Or there was trouble."
fi
The answer was rather simple and obvious and I had tried both, but not together. To make this work, due to the symlinks, find -L is required to follow the links and then sort is necessary to make them match.
I replaced the diff command with the following:
diff <( find -L local |sort) <( find -L remote |sort)

what is the purpase of the command rsync -rvzh

im trying to understand what this two command doing:
config=$(date +%s)
rsync -rvzh $1 /var/lib/tomcat7/webapps/ROOT/DataMining/target > /var/lib/tomcat7/webapps/ROOT/DataMining/$config
this line appears in a bigger script - script.sh looking like this:
#! /bin/bash
config=$(date +%s)
rsync -rvzh $1 /var/lib/tomcat7/webapps/ROOT/DataMining/target > /var/lib/tomcat7/webapps/ROOT/DataMining/$config
countC=0
countS=`wc -l /var/lib/tomcat7/webapps/ROOT/DataMining/$config | sed 's/^\([0-9]*\).*$/\1/'`
let countS--
let countS--
let countS--
while read LINEC #read line
do
if [ "$countC" -gt 0 ]; then
if [ "$countC" -lt "$countS" ]; then
FILENAME="/var/lib/tomcat7/webapps/ROOT/DataMining/target/"$LINEC
count=0
countW=0
while read LINE
do
for word in $LINE;
do
echo "INSERT INTO data_mining.data (word, line, numWordLine, file) VALUES ('$word', '$count', '$countW', '$FILENAME');" >> /var/lib/tomcat7/webapps/ROOT/DataMining/query
mysql -u root -Alaba1515< /var/lib/tomcat7/webapps/ROOT/DataMining/query
echo > /var/lib/tomcat7/webapps/ROOT/DataMining/query
let countW++
done
countW=0
let count++
done < $FILENAME
count=0
rm -f /var/lib/tomcat7/webapps/ROOT/DataMining/query
rm -f /var/lib/tomcat7/webapps/ROOT/DataMining/$config
fi
fi
let countC++
done < /var/lib/tomcat7/webapps/ROOT/DataMining/$config #finish while
i was able to find lots of documentary about rsync and what it is doing but i don't understand whats the rest of the command do. any help please?
The first command assigns the current time (in seconds since epoch) to the shell variable config. For example:
$ config=$(date +%s)
$ echo $config
1446506996
rsync is a file copying utility. The second command thus makes a backup copy of the directory listed in argument 1 (referred to as $1). The backup copy is placed in /var/lib/tomcat7/webapps/ROOT/DataMining/target. A log file of what was copied is saved in var/lib/tomcat7/webapps/ROOT/DataMining/$config:
rsync -rvzh $1 /var/lib/tomcat7/webapps/ROOT/DataMining/target > /var/lib/tomcat7/webapps/ROOT/DataMining/$config
The rsync options mean:
-r tells rsync to copy files diving recursively into subdirectories
-v tells it to be verbose so that it shows what is copied.
-z tells it to compress files during their transfer from one location to the other.
-h tells it to show any numbers in the output in human-readable format.
Note that because $1 is not inside double-quotes, this script will fail if the name of directory $1 contains whitespace.

Resources