Compare the contents of all files inside a directory - bash

I have a directory with multiple txt files, I need to compare the contents of each txt file and print the output as "Yay all files are same" else " Oops, File are not same"
cd tmp_dir
ls
abc.txt cde.txt fgh.txt ... xyz.txt
[ tmp_dir ]$ cat abc.txt
2022-08-01_20:14:36
[ tmp_dir ]$ cat def.txt
2022-08-01_07:40:29
[ tmp_dir ]$
How to loops through files and compare the contents
for file in tmp_dir/*; do
if [ -f "$file" ]; then
cmp -s -- files # need to compare all the files under directory
fi
done
Expected Output:
#If contents are same [Output should be in Green color]
Yay, all files are same
#If contents are not same [In red color]
Oops, Files are not the same

This can be done in a single pipeline by hashing all of the files and then counting how many unique hashes you get. If the answer is 1, all of the files are the same.
distinct_hashes="$(
find dir/ -type f -exec sha512sum {} + | # hash all files in `dir/`
awk '{print $1}' | # strip file names from output
sort -u | # remove duplicate hashes
wc -l # count distinct hashes
)"
case "$distinct_hashes" in
0) echo "no files";;
1) echo "all the same";;
*) echo "not all the same";;
esac
Alternatively, you could use cmp as you tried, and it would be more efficient. You'll just have to manually loop over all of the files. Note that you don't have to compare all pairs of files, which would be O(n2); you can keep it O(n) by comparing each file to one other.
first_file=
same=1
for file in dir/*; do
[[ -f "$file" ]] || continue
if [[ -z "$first_file" ]]; then
first_file="$file"
elif ! cmp -s "$file" "$first_file"; then
same=0
break
fi
done
if [[ -z "$first_file" ]]; then
echo "no files"
elif ((same)); then
echo "all the same"
else
echo "not all the same"
fi
Advanced shell scripters might point out the quotes in distinct_hashes="$(...)", case "$distinct_hashes", [[ -f "$file" ]], [[ -z "$first_file" ]], and first_file="$file" are unnecessary. I like to include optional quotes. Quoting variable expansions is a really important habit to develop and not everyone will know the intricacies of when they are and aren't required.

Related

rename all files in one directory based on a matching substring from files in another directory

I would like to rename all files in one directory based on a matching substring of the current filename to the full filename of the files in the other directory. For example I have two directories with 1200 files similar to the following example where I want the second directory of files to be renamed to the full filename based on the unique matching ID/substring:
Directory 1:
ABC_MA123.bed
EFG_MA124.bed
XYZ_MA125.bed
Directory 2:
MA123.bed
MA124.bed
MA125.bed
Desired result:
Directory 2:
ABC_MA123.bed
EFG_MA124.bed
XYZ_MA125.bed
Is there an easy way to do this with a bash/awk script?
Are you sure each file in directory 2 has exactly one match in directory 1?
You could do something like this, assuming the directories are /tmp/1 and /tmp/2:
(cd /tmp/2; ls -1) |
(cd /tmp/1; while read f2; do f1=(/tmp/1/*$f2); echo mv $f2 ${f1##*/}; done)
For each file in directory 2, find a file in directory 1 with the same suffix, then rename the file in directory 2 to the name of the matching file in directory 1. The ${f1##*/} strips the path off of f1.
Assuming filenames in dir2 are substrings of filenames in dir1, simple globbing should work:
d1="/full/path/to/dir1"
d2="/full/path/to/dir2"
cd "$d2"
for f2 in *; do
f1=("$d1"/*"$f2"*)
if [[ ${#f1[#]} != 1 ]]; then
printf "ERROR: multiple matches: %s\n" "$f2" 1>&2
elif [[ ! -f $f1 ]]; then
printf "ERROR: no match: %s\n" "$f2" 1>&2
else
new="${f1##*/}" # strip to bare filename
if [[ $f2 == $new ]]; then
printf "WARNING: no change: %s\n" "$f2" 1>&2
elif [[ -f $new ]]; then
printf "ERROR: exists: %s\n" "$new" 1>&2
else
mv "$f2" "${f1##*/}"
fi
fi
done

Count filename characters, then copy those files to another directory

I need to write a bash script that copies the files to dir2 that match the character count in their filename with a given int value given as an argument to the script. I've tried to do something but I cannot manage to get the files copied at all.
read number
list=`for file in *; do echo -n "$file" | wc -m; done`
for file in $list
do
if [ $file -eq $number ]
then
cp file dir2
fi
done
In your code, list is a list of filename lengths, not filenames. So $file is just a number. You also missed the leading $ on $file.
You don't need t use the wc program, you can get the length of a variable name using ${#name}. I think you need something like this:
while [[ $number != +([0-9]) ]]
do
read -p "Enter number: " number
done
for file in *
do
if (( ${#file} == $number ))
then
cp "$file" dir2
fi
done

Iterating through a folder that's passed in as a paramter to a Bash script

I'm trying to iterate over a folder, running a grep on each file, and putting them into separate files, tagged with a .res extension. Here's what I have so far....
#!/bin/bash
directory=$(pwd)
searchterms="searchterms.txt"
extension=".end"
usage() {
echo "usage: fmat [[[-f file ] [-d directory ] [-e ext]] | [-h]]"
echo " file - text file containing a return-delimited list of materials"
echo " directory - directory to process"
echo " ext - file extension of files to process"
echo ""
}
while [ "$1" != "" ]; do
case $1 in
-d | --directory ) shift
directory=$1
;;
-f | --file ) shift
searchterms=$1
;;
-e | --extension ) shift
extension=$1
;;
-h | --help ) usage
exit
;;
* ) usage
exit 1
esac
shift
done
if [ ! -d "$directory" ]; then
echo "Sorry, the directory '$directory' does not exist"
exit 1
fi
if [ ! -f "$searchterms" ]; then
echo "Sorry, the searchterms file '$searchterms' does not exist"
exit 1
fi
echo "Searching '$directory' ..."
for file in "${directory}/*"; do
printf "File: %s\n" ${file}
[ -e "$file" ] || continue
printf "%s\n" ${file}
if [ ${file: -3} == ${extension} ]; then
printf "%s will be processed\n" ${file}
#
# lots of processing here
#
fi
done
I know that it's down to my poor understanding of of globbing... but I can't get the test on the extension to work.
Essentially, I want to be able to specify a source directory, a file with search terms, and an extension to search for.
NOW, I realise there may be quicker ways to do this, e.g.
grep -f searchterms.txt *.end > allchanges.end.res
but I may have other processing I need to do to the files, and I want to save them into separate files: so bing.end, bong.end, would be grep'ed into bing.end.res, bong.end.res .
Please let me know, just how stupid I'm being ;-)
Just for completeness sake, here's the last part, working, thanks to #chepner and #Gordon Davisson :
echo "Searching '$directory' ..."
for file in "${directory}"/*; do
[ -e "$file" ] || continue
# show which files will be processed
if [[ $file = *.${extension#.} ]]; then
printf "Processing %s \n" "$file"
head -n 1 "${file}" > "${file}.res"
grep -f $searchterms "${file}" >> "${file}.res"
fi
done
You just need to leave the * out of the quotes, so that it isn't treated as a literal *:
for file in "${directory}"/*; do
Unlike most languages, the quotes don't define a string (as everything in bash is already a string: it's the only data type). They simply escape each character inside the quotes. "foo" is exactly the same as \f\o\o, which (because escaping most characters doesn't really have any effect) is the same as foo. Quoted or not, all characters not separated by word-splitting characters are part of the same word.
http://shellcheck.net will catch this, although not with the most useful error message. (It will also catch the other parameter expansions that you did not quote but should.)

Copy last modified binary file over the other one

I would like to compare two binary files (very small, 100Kb each) and replace the oldest with the last modified one.
I have created a simple script, but I would need your help to make it running properly:
#!/bin/sh
# select the two files
FILE1="/dir1/file1.binary"
FILE2="/dir2/file2.binary"
# create the hash of the two files
HASH1="$(md5sum $FILE1 | cut -c 1-32)"
HASH2="$(md5sum $FILE2 | cut -c 1-32)"
# compare the two hashes
if [ "$HASH1" == "$HASH2" ];
# if the two hashes are the same, exit
then
echo "the two files are identical"
exit 0
# otherwise compare which of them has been last modified
fi
DATE1="(stat -c %Y $FILE1)"
DATE2="(stat -c %Y $FILE2)"
# if FILE1 is newer than FILE2, replace FILE2 with FILE1
if [ "${DATE1}" -gt "${DATE2}" ];
then
cp $FILE1 $FILE2
echo "${FILE2} was replaced by ${FILE1}"
# if FILE2 is newer than FILE1, replace FILE1 with FILE2
fi
cp $FILE2 $FILE1
echo "${FILE1} was replaced by ${FILE2}"
exit 0
The file seems working (at least if the two files are identical), but if one file has been modified, I receive the following error:
line 24: [: {(stat -c %Y test1)}: integer expression expected
What is wrong?
By the way, is there a better way to solve this problem?
Thanks
Thank you so much everybody for your help. Here is how the script looks like now. There is also notification on QTS for QNAP, but it can be taken out if running elsewhere or not needed.
#!/bin/sh
# select the two files
FILE1="/dir1/file1"
FILE2="/dir2/file2"
# use or create a log file with timestamp of the output
LOG="/dir1/ScriptLog.txt"
TIMESTAMP=$(date +"%Y-%m-%d %Hh:%M")
if [ ! -e $LOG ]; then
touch $LOG
echo "$TIMESTAMP - INFO: '$LOG' does not exists but has been created." >&2
# else
# echo "$TIMESTAMP - INFO: '$LOG' exists and it will be used if any change to '$FILE1'
# or to '$FILE2' is needed." >&2
fi
# You can also pass the two file names as arguments for the script
if [[ $# == 2 ]]; then
FILE1=$1
FILE2=$2
fi
# check if the two files exist and are regular
if [ -f "$FILE1" -a -f "$FILE2" ]; then
# meanwhile compare FILE1 against FILE2
# if files are identical, stop there
if cmp "$FILE1" "$FILE2" 2>/dev/null>/dev/null; then
echo "$TIMESTAMP - INFO: '$FILE1' and '$FILE2' are identical." >&2 | >> $LOG
# if FILE1 is newer than FILE2, copy FILE1 over FILE2
elif [ "$FILE1" -nt "$FILE2" ]; then
if cp -p "$FILE1" "$FILE2"; then
echo "$TIMESTAMP - INFO: '$FILE1' replaced '$FILE2'." >&2 | >> $LOG
# if copy is successful, notify it into QTS
/sbin/notice_log_tool -a "$TIMESTAMP - INFO: '$FILE1' replaced '$FILE2'." --severity=5 >&2
else
echo "$TIMESTAMP - ERROR: FAILED to replace '$FILE2' with '$FILE1'." >&2 | >> $LOG
exit 1
fi
# if FILE1 is older than FILE2, copy FILE2 over FILE1
elif [ "$FILE1" -ot "$FILE2" ]; then
if cp -p "$FILE2" "$FILE1"; then
echo "$TIMESTAMP - INFO: '$FILE2' replaced '$FILE1'." >&2 | >> $LOG
# if copy is successful, notify it into QTS
/sbin/notice_log_tool -a "$TIMESTAMP - INFO: '$FILE2' replaced '$FILE1'." --severity=5 >&2
else
echo "$TIMESTAMP - ERROR: FAILED to replace '$FILE2' with '$FILE1'." >&2 | >> $LOG
exit 1
fi
# if two files are not identical but with same modification date
else
echo "$TIMESTAMP - ERROR: We should never reach this point. Something is wrong in the script." >&2 | >> $LOG
exit 1
fi
# if one file does not exist or is not valid, exit
else
echo "$TIMESTAMP - ERROR: One of the files does not exist, has been moved or renamed." >&2 | >> $LOG
# if error, notify it into QTS
/sbin/notice_log_tool -a "$TIMESTAMP - ERROR: One of the files does not exist, has been moved or renamed." --severity=5 >&2
exit 1
fi
I'm also going to suggest refactoring this, both to simplify the code, and to save your CPU cycles.
#!/bin/sh
# If both files exist....
if [ -f "$1" -a -f "$2" ]; then
# If they have the same content...
if cmp "$1" "$2" >/dev/null 2>/dev/null; then
echo "INFO: These two files are identical." >&2
# If one is newer than the other...
elif [ "$1" -nt "$2" ]; then
if cp -p "$1" "$2"; then
echo "INFO: Replaced file '$2' with '$1'." >&2
else
echo "ERROR: FAILED to replace file." >&2
exit 1
fi
# If the other is newer than the one...
elif [ "$1" -ot "$2" ]; then
if cp -p "$2" "$1"; then
echo "INFO: Replaced file '$1' with '$2'." >&2
else
echo "ERROR: FAILED to replace file." >&2
exit 1
fi
else
echo "ERROR: we should never reach this point. Something is wrong." >&2
exit 1
fi
else
echo "ERROR: One of these files does not exist." >&2
exit 1
fi
A few things that you may find useful.
This avoids calculating an md5 on each of the files. While comparing sums may be fine for small files like yours, it gets mighty expensive as your files grow. And it's completely unnecessary, because you have the cmp command available. Better to get in the habit of writing code that will work with less modification when you recycle it for the next project.
An if statement runs a command, usually [ or [[, but it can be any command. Here, we're running cmp and cp within an if, so that we can easily check the results.
This doesn't use stat anymore. While it's possible that you may never look beyond Linux, it's always a good idea to keep portability in mind, and if you can make your script portable, that's great.
This is not a bash script. Neither was your script -- if you call your script with /bin/sh, then you're in POSIX compatibility mode, which already makes this more portable than you thought. :-)
Indenting helps. You might want to adopt it for your own scripts, so that you can have a better visual idea of what commands are associated with the various conditions that are being tested.
What about something a bit simpler like the following?
#!/bin/sh
# select the two files from cli
# $1 = current file
# $2 = new file
FILE1=$1
FILE2=$2
# otherwise compare which of them has been last modified
DATE1=`(stat -c %Y $FILE1)`
DATE2=`(stat -c %Y $FILE2)`
if [ $DATE2 -gt $DATE1 ]; then
echo "cp -f $FILE2 $FILE1"
# cp -f $FILE2 $FILE1
fi
Almost there. Cleaning up your code and tweaking it a bit here is what I got
#!/bin/bash
# select the two files (default option)
FILE1="/dir1/file1.binary"
FILE2="/dir1/file2.binary"
# You can also pass the two file names as arguments for the script
if [ $# -eq 2 ]; then
FILE1=$1
FILE2=$2
fi
# create the hash of the two files
HASH1="$(md5sum $FILE1 | sed -n -e 's/^.*= //p')"
HASH2="$(md5sum $FILE2 | sed -n -e 's/^.*= //p')"
# get the dates of last modification
DATE1="$(stat -f '%m%t%Sm' $FILE1 | cut -c 1-10)"
DATE2="$(stat -f '%m%t%Sm' $FILE2 | cut -c 1-10)"
# Uncomment to see the values
#echo $FILE1 ' = hash: ' $HASH1 ' date: ' $DATE1
#echo $FILE2 ' = hash: ' $HASH2 ' date: ' $DATE2
# compare the two hashes
if [ $HASH1 == $HASH2 ]; then
# if the two hashes are the same, exit
echo "the two files are identical"
exit 0
fi
# compare the dates
if [ $DATE1 -gt $DATE2 ]; then
# if FILE1 is newer than FILE2, replace FILE2 with FILE1
cp $FILE1 $FILE2
echo "${FILE2} was replaced by ${FILE1}"
elif [ $DATE1 -lt $DATE2 ]; then
# else if FILE2 is newer than FILE1, replace FILE1 with FILE2
cp $FILE2 $FILE1
echo "${FILE1} was replaced by ${FILE2}"
else
# else the files are identical
echo "the two files are identical"
fi
Your way of getting the date was wrong, at least on my machine. So I rewrote it.
Your hash string was wrong. You were effectively cropping the string to the first 32 characters. By using sed you can actually get rid of the first part of the command and simply store the result of the md5sum.
You also misused the conditional statements as HuStmpHrrr pointed out.
The rest is cosmetics.

How do I manipulate filenames in bash?

I have a bunch of images that I need to rename, so I can use them and I was wondering how to do this.
The way they need to be is that first 5 will be kept and then for the 6th I would write a number from 1-3. I only know that the first 5 are static; on pics belonging to same "family" and can be used for comparison and the 6th char is not known.
Example:
12345random.jpg
12345randomer.jpg
0987654more_random.jpg
09876awesome.jpg
09876awesomer.jpg
09876awesomest.jpg
09876soawesomegalaxiesexplode.jpg
would become.
12345.jpg
123452.jpg
09876.jpg
098761.jpg
098762.jpg
It would be cool if it would only handle the loop so that 3 pics could be only renamed and rest skipped.
I found some stuff on removing letters to certain point, but nothing that use, since I am quite poor at bash scripting.
Here is my approach, but it kind of sucks, since I tried modifying scripts I found, but the idea is there
//I could not figure how to remove the chars after 5th not the other way around
for file in .....*; do echo mv $file `echo $file | cut -c6-`; done
done
//problem also is that once the names conflict it produces only 1 file named 12345.jpg 2nd one will not be created
//do not know how to read file names to array
name=somefile
if [[ -e $name.jpg]] ; then
i=0
while [[ -e $name-$i.jpg]] ; do
let i++
done
name=$name-$i
fi
touch $name.jpg
You can have:
new_file=${file%%[^0-9]*.jpg}.jpg
As a concept you can have this to rename files:
for file in *.jpg; do
[[ $file == [0-9]*[^0-9]*.jpg ]] || continue ## Just a simple check.
new_file=${file%%[^0-9]*.jpg}.jpg
[[ -e $new_file ]] || continue ## Do not overwrite. Delete line if not wanted.
echo "Renaming $file to $new_file." ## Optional message.
mv -- "$file" "$new_file" || echo "Failed to rename $file to $new_file."
done
If you're going to process files that also contain directory names, you'll need some more changes:
for file in /path/to/other/dirs/*.jpg *.jpg; do
base=${file##*/}
[[ $base == [0-9]*[^0-9]*.jpg ]] || continue
if [[ $file == */* ]]; then
new_file=${file%/*}/${base%%[^0-9]*.jpg}.jpg
else
new_file=${file%%[^0-9]*.jpg}.jpg
fi
[[ -e $new_file ]] || continue
echo "Renaming $file to $new_file."
mv -- "$file" "$new_file"
done
you can also try the following code
but be careful all the files should be in .jpg format and pass the name of folder as an argument
#!/bin/bash
a=`ls $1`
for b in $a
do
echo $b
if (( i<4 ))
then
c=`echo $b | cut -c1-5`
let i=i+1
c="$c$i.jpg"
echo $c
else
c=`echo $b | cut -c1-5`
c="$c.jpg"
break
fi
mv $1$b $1$c
done

Resources