BASH while loop to check lines in file displays too many times - bash

I am writing a script where I want to take each line from a file and check for a match in another file.
If I find a match I want to say that I found a match and if not, say that I did not find a match.
The 2 files contain md5 hashes. The old file is the original and the new file is to check if there have been any changes since the original file.
original file: chksum
new file:chksum1
#!/bin/bash
while read e; do
while read f; do
if [[ $e = $f ]]
then
echo $e "is the same"
else
if [[ $e != $f]]
then
echo $e "has been changed"
fi
fi
done < chksum1
done < chksum
My issue is that for the files that have been changed I get an echo for every time the check in the loop is done and I only want it to display the file once and say that it was not found.
Hope this is clear.

you could use the same script but put a reminder.
#!/bin/bash
while read e; do
rem=0
while read f; do
if [[ $e = $f ]]
then
rem=1
fi
done < chksum1
if [[ rem = 1 ]]
then
echo $e "is the same"
else
echo $e "has been changed"
fi
done < chksum
This should work correctly

You were real close. This will work:
while read e; do
while read f; do
found=0
if [[ $e = $f ]]
then
# echo $e "is the same"
found=1
break
fi
done < chksum1
if [ $found -ne 0 ]
then
echo "$e is the the same"
else
echo "$e has been changed"
fi
done < chksum

A little bit simplified version which avoid the multiple read of the same file (bash 4.0 and above). I assume that the files contain unique filenames and the file format is the output of the md5sum command.
#!/bin/bash
declare -A hash
while read md5 file; do hash[$file]=$md5; done <chksum
while read md5 file; do
[ -z "${hash[$file]}" ] && echo "$file new file" && continue
[ ${hash[$file]} == $md5 ] && echo "$file is same" && continue
echo "$file has been changed"
done <chksum1
This script reads the first file to an associative array, called hash. The index is the name of the file, and the value is the MD5 checksum. The second loop reads the second checksum file; the file name is not in the hash it prints file new file; if it is in the hash and the value equals then it is the same file; if it does not equals it writes file has been changed.
Input files:
$ cat chksum
eed0fc0313f790cec0695914f1847bca ./a.txt
9ee9e1fffbb3c16357bf80c6f7a27574 ./b.txt
a91a408e113adce865cba3c580add827 ./c.txt
$ cat chksum1
eed0fc0313f790cec0695914f1847bca ./a.txt
8ee9e1fffbb3c16357bf80c6f7a27574 ./b.txt
a91a408e113adce865cba3c580add827 ./d.txt
Output:
./a.txt is same
./b.txt has been changed
./d.txt new file
EXTENDED VERSION
Also detect deleted files.
#!/bin/bash
declare -A hash
while read md5 file; do hash[$file]=$md5; done <chksum
while read md5 file; do
[ -z "${hash[$file]}" ] && echo "$file new file" && continue
if [ ${hash[$file]} == $md5 ]; then echo "$file is same"
else echo "$file has been changed"
fi
unset hash[$file]
done <chksum1
for file in ${!hash[*]};{ echo "$file deleted file";}
Output:
./a.txt is same
./b.txt has been changed
./d.txt new file
./c.txt deleted file

I'd like to suggest an alternate solution: How about you don't read line by line, but use sort and uniq -c to see if there are differences. There is no need for a loop where a simple pipe can do your job.
In this case you want all the lines that have changed in file chksum1, so
sort chksum chksum1 chksum1 | uniq -c | egrep '^\s+2\s' | sed 's%\s\+2\s%%'
This also reads chksum1 only 2 times, as compared to the loop based example, which reads it once per line of chksum.
Reusing the input files from one of the other answers:
samveen#precise:~/so$ cat chksum
eed0fc0313f790cec0695914f1847bca ./a.txt
9ee9e1fffbb3c16357bf80c6f7a27574 ./b.txt
a91a408e113adce865cba3c580add827 ./c.txt
samveen#precise:~/so$ cat chksum1
eed0fc0313f790cec0695914f1847bca ./a.txt
8ee9e1fffbb3c16357bf80c6f7a27574 ./b.txt
a91a408e113adce865cba3c580add827 ./d.txt
samveen#precise:~/so$ sort chksum chksum1 chksum1 |uniq -c | egrep '^\s+2\s' |sed 's%\s\+2\s%%'
8ee9e1fffbb3c16357bf80c6f7a27574 ./b.txt
a91a408e113adce865cba3c580add827 ./d.txt
Another possible solution is (as suggested in the question's comments) to use diff in conjunction with sort:
diff <(sort chksum) <(sort chksum1) |grep '^>'
The output:
samveen#precise:~/so$ diff <(sort chksum) <(sort chksum1) |grep '^>'
> 8ee9e1fffbb3c16357bf80c6f7a27574 ./b.txt
> a91a408e113adce865cba3c580add827 ./d.txt

Simple solution:
diff -q chksum1 chksum

What about using the command grep. Every line that you read from chksum will serve as a search pattern in chksum1. If grep finds a match,the "$?" which contains the return value of the grep will be equal to 0, otherwise, it will be equal to 1
while read e; do
grep $e checksum1
if[ $? == "0" ];then
echo $e "is the same"
else
echo $e "has been changed"
fi
done < chksum

Related

Getting errors while imitating tail command in shell script without using inbuilt tail command

I have written this code to imitate tail command without using inbuilt command and it is giving error in for loop though syntax is correct!
I am not able to figure it out where actually the error is?
Also Expected output:
It should print last n number of lines from the file.
n is scanned from user.
#!/bin/sh echo -n "Enter file name: " read file_name echo -n "Enter number of lines: " read n touch $file_name str="end" temp="temp" echo "Start entering data of file and write end for stopping: " while [ 1 ] do read string touch $temp echo $string > $temp grep -w "$str" $temp if [ $? -eq 0 ] then break else cat $temp >> $file_name fi done rm $temp echo "---------------------------------------------------------" total_lines= wc -l < $file_name echo $total_lines a=expr $total_lines - $n exec < $file_name for ((i=$a;i<$total_lines;i++ )) do read a1 echo $a1 done

Loop through first 12 files in directory and break out if file is found

In Bash, what is the best way to loop through the first 12 folders in a directory in search of a file, and if the file is found then exit out of both loops.
This is my attempt so far. It doesn't:
limit the search scope to the first 12 folders
break out of the nested for loop when a file is found
How do I fix this?
#!/bin/bash
value="test.txt"
file_found = false
cd /backup/logs || exit 1
ls -1tr | head -n -12 | while read -r folder; do
cd /backup/logs/${folder}
ls -1tr | while read -r file; do
if [[ ${file} == ${value} ]]; then
file_found=true
echo "file found in ${folder}"
break
fi
done
if [[ ${file_found} == true ]]; then
break
fi
done
Use head -n 12 to list the first 12 lines of the output, not -n -12: it lists all the files but the last 12.
Use break 2 to break from nested loops of depth 2.
Use ls -d */ to only list directories (i.e. don't show files, don't show contents of the directories).
Use a for loop, not a while loop. You don't need a second loop; you can test directly if a file named "$value" exists.
i=0
for folder in /backup/logs/*
if [ "$i" = 12 ]; then break; fi
cd /backup/logs/"$folder"
if [ -f "$value" ]; then
echo "file found in $folder"
break
fi
i=$((i+1))
done

can't read all file lines in bash pipeline

I searched and couldn't find anything, maybe I can't understand the problem properly.
I have a bash function who read files in current dir and sub dir's, I'm trying to arrange the text and analyze the data but somehow I'm losing lines if I'm using pipeline.
the code:
function recursiveFindReq {
for file in *.request; do
if [[ -f "$file" ]]; then
echo handling "$file"
echo ---------------with pipe-----------------------
cat "$file" | while read -a line; do
if (( ${#line} > 1 )); then
echo ${line[*]}
fi
done
echo ----------------without pipe----------------------
cat "$file"
echo
echo num of lines: `cat "$file" | wc -l`
echo --------------------------------------
fi
done
for dir in ./*; do
if [[ -d "$dir" ]]; then
echo cd to $dir
cd "$dir"
recursiveFindReq "$1"
cd ..
fi
done
}
the output is:
losing lines even when they meet requirements
I marked with 2 red arrows the place I'm losing info

Copy last modified binary file over the other one

I would like to compare two binary files (very small, 100Kb each) and replace the oldest with the last modified one.
I have created a simple script, but I would need your help to make it running properly:
#!/bin/sh
# select the two files
FILE1="/dir1/file1.binary"
FILE2="/dir2/file2.binary"
# create the hash of the two files
HASH1="$(md5sum $FILE1 | cut -c 1-32)"
HASH2="$(md5sum $FILE2 | cut -c 1-32)"
# compare the two hashes
if [ "$HASH1" == "$HASH2" ];
# if the two hashes are the same, exit
then
echo "the two files are identical"
exit 0
# otherwise compare which of them has been last modified
fi
DATE1="(stat -c %Y $FILE1)"
DATE2="(stat -c %Y $FILE2)"
# if FILE1 is newer than FILE2, replace FILE2 with FILE1
if [ "${DATE1}" -gt "${DATE2}" ];
then
cp $FILE1 $FILE2
echo "${FILE2} was replaced by ${FILE1}"
# if FILE2 is newer than FILE1, replace FILE1 with FILE2
fi
cp $FILE2 $FILE1
echo "${FILE1} was replaced by ${FILE2}"
exit 0
The file seems working (at least if the two files are identical), but if one file has been modified, I receive the following error:
line 24: [: {(stat -c %Y test1)}: integer expression expected
What is wrong?
By the way, is there a better way to solve this problem?
Thanks
Thank you so much everybody for your help. Here is how the script looks like now. There is also notification on QTS for QNAP, but it can be taken out if running elsewhere or not needed.
#!/bin/sh
# select the two files
FILE1="/dir1/file1"
FILE2="/dir2/file2"
# use or create a log file with timestamp of the output
LOG="/dir1/ScriptLog.txt"
TIMESTAMP=$(date +"%Y-%m-%d %Hh:%M")
if [ ! -e $LOG ]; then
touch $LOG
echo "$TIMESTAMP - INFO: '$LOG' does not exists but has been created." >&2
# else
# echo "$TIMESTAMP - INFO: '$LOG' exists and it will be used if any change to '$FILE1'
# or to '$FILE2' is needed." >&2
fi
# You can also pass the two file names as arguments for the script
if [[ $# == 2 ]]; then
FILE1=$1
FILE2=$2
fi
# check if the two files exist and are regular
if [ -f "$FILE1" -a -f "$FILE2" ]; then
# meanwhile compare FILE1 against FILE2
# if files are identical, stop there
if cmp "$FILE1" "$FILE2" 2>/dev/null>/dev/null; then
echo "$TIMESTAMP - INFO: '$FILE1' and '$FILE2' are identical." >&2 | >> $LOG
# if FILE1 is newer than FILE2, copy FILE1 over FILE2
elif [ "$FILE1" -nt "$FILE2" ]; then
if cp -p "$FILE1" "$FILE2"; then
echo "$TIMESTAMP - INFO: '$FILE1' replaced '$FILE2'." >&2 | >> $LOG
# if copy is successful, notify it into QTS
/sbin/notice_log_tool -a "$TIMESTAMP - INFO: '$FILE1' replaced '$FILE2'." --severity=5 >&2
else
echo "$TIMESTAMP - ERROR: FAILED to replace '$FILE2' with '$FILE1'." >&2 | >> $LOG
exit 1
fi
# if FILE1 is older than FILE2, copy FILE2 over FILE1
elif [ "$FILE1" -ot "$FILE2" ]; then
if cp -p "$FILE2" "$FILE1"; then
echo "$TIMESTAMP - INFO: '$FILE2' replaced '$FILE1'." >&2 | >> $LOG
# if copy is successful, notify it into QTS
/sbin/notice_log_tool -a "$TIMESTAMP - INFO: '$FILE2' replaced '$FILE1'." --severity=5 >&2
else
echo "$TIMESTAMP - ERROR: FAILED to replace '$FILE2' with '$FILE1'." >&2 | >> $LOG
exit 1
fi
# if two files are not identical but with same modification date
else
echo "$TIMESTAMP - ERROR: We should never reach this point. Something is wrong in the script." >&2 | >> $LOG
exit 1
fi
# if one file does not exist or is not valid, exit
else
echo "$TIMESTAMP - ERROR: One of the files does not exist, has been moved or renamed." >&2 | >> $LOG
# if error, notify it into QTS
/sbin/notice_log_tool -a "$TIMESTAMP - ERROR: One of the files does not exist, has been moved or renamed." --severity=5 >&2
exit 1
fi
I'm also going to suggest refactoring this, both to simplify the code, and to save your CPU cycles.
#!/bin/sh
# If both files exist....
if [ -f "$1" -a -f "$2" ]; then
# If they have the same content...
if cmp "$1" "$2" >/dev/null 2>/dev/null; then
echo "INFO: These two files are identical." >&2
# If one is newer than the other...
elif [ "$1" -nt "$2" ]; then
if cp -p "$1" "$2"; then
echo "INFO: Replaced file '$2' with '$1'." >&2
else
echo "ERROR: FAILED to replace file." >&2
exit 1
fi
# If the other is newer than the one...
elif [ "$1" -ot "$2" ]; then
if cp -p "$2" "$1"; then
echo "INFO: Replaced file '$1' with '$2'." >&2
else
echo "ERROR: FAILED to replace file." >&2
exit 1
fi
else
echo "ERROR: we should never reach this point. Something is wrong." >&2
exit 1
fi
else
echo "ERROR: One of these files does not exist." >&2
exit 1
fi
A few things that you may find useful.
This avoids calculating an md5 on each of the files. While comparing sums may be fine for small files like yours, it gets mighty expensive as your files grow. And it's completely unnecessary, because you have the cmp command available. Better to get in the habit of writing code that will work with less modification when you recycle it for the next project.
An if statement runs a command, usually [ or [[, but it can be any command. Here, we're running cmp and cp within an if, so that we can easily check the results.
This doesn't use stat anymore. While it's possible that you may never look beyond Linux, it's always a good idea to keep portability in mind, and if you can make your script portable, that's great.
This is not a bash script. Neither was your script -- if you call your script with /bin/sh, then you're in POSIX compatibility mode, which already makes this more portable than you thought. :-)
Indenting helps. You might want to adopt it for your own scripts, so that you can have a better visual idea of what commands are associated with the various conditions that are being tested.
What about something a bit simpler like the following?
#!/bin/sh
# select the two files from cli
# $1 = current file
# $2 = new file
FILE1=$1
FILE2=$2
# otherwise compare which of them has been last modified
DATE1=`(stat -c %Y $FILE1)`
DATE2=`(stat -c %Y $FILE2)`
if [ $DATE2 -gt $DATE1 ]; then
echo "cp -f $FILE2 $FILE1"
# cp -f $FILE2 $FILE1
fi
Almost there. Cleaning up your code and tweaking it a bit here is what I got
#!/bin/bash
# select the two files (default option)
FILE1="/dir1/file1.binary"
FILE2="/dir1/file2.binary"
# You can also pass the two file names as arguments for the script
if [ $# -eq 2 ]; then
FILE1=$1
FILE2=$2
fi
# create the hash of the two files
HASH1="$(md5sum $FILE1 | sed -n -e 's/^.*= //p')"
HASH2="$(md5sum $FILE2 | sed -n -e 's/^.*= //p')"
# get the dates of last modification
DATE1="$(stat -f '%m%t%Sm' $FILE1 | cut -c 1-10)"
DATE2="$(stat -f '%m%t%Sm' $FILE2 | cut -c 1-10)"
# Uncomment to see the values
#echo $FILE1 ' = hash: ' $HASH1 ' date: ' $DATE1
#echo $FILE2 ' = hash: ' $HASH2 ' date: ' $DATE2
# compare the two hashes
if [ $HASH1 == $HASH2 ]; then
# if the two hashes are the same, exit
echo "the two files are identical"
exit 0
fi
# compare the dates
if [ $DATE1 -gt $DATE2 ]; then
# if FILE1 is newer than FILE2, replace FILE2 with FILE1
cp $FILE1 $FILE2
echo "${FILE2} was replaced by ${FILE1}"
elif [ $DATE1 -lt $DATE2 ]; then
# else if FILE2 is newer than FILE1, replace FILE1 with FILE2
cp $FILE2 $FILE1
echo "${FILE1} was replaced by ${FILE2}"
else
# else the files are identical
echo "the two files are identical"
fi
Your way of getting the date was wrong, at least on my machine. So I rewrote it.
Your hash string was wrong. You were effectively cropping the string to the first 32 characters. By using sed you can actually get rid of the first part of the command and simply store the result of the md5sum.
You also misused the conditional statements as HuStmpHrrr pointed out.
The rest is cosmetics.

How do I manipulate filenames in bash?

I have a bunch of images that I need to rename, so I can use them and I was wondering how to do this.
The way they need to be is that first 5 will be kept and then for the 6th I would write a number from 1-3. I only know that the first 5 are static; on pics belonging to same "family" and can be used for comparison and the 6th char is not known.
Example:
12345random.jpg
12345randomer.jpg
0987654more_random.jpg
09876awesome.jpg
09876awesomer.jpg
09876awesomest.jpg
09876soawesomegalaxiesexplode.jpg
would become.
12345.jpg
123452.jpg
09876.jpg
098761.jpg
098762.jpg
It would be cool if it would only handle the loop so that 3 pics could be only renamed and rest skipped.
I found some stuff on removing letters to certain point, but nothing that use, since I am quite poor at bash scripting.
Here is my approach, but it kind of sucks, since I tried modifying scripts I found, but the idea is there
//I could not figure how to remove the chars after 5th not the other way around
for file in .....*; do echo mv $file `echo $file | cut -c6-`; done
done
//problem also is that once the names conflict it produces only 1 file named 12345.jpg 2nd one will not be created
//do not know how to read file names to array
name=somefile
if [[ -e $name.jpg]] ; then
i=0
while [[ -e $name-$i.jpg]] ; do
let i++
done
name=$name-$i
fi
touch $name.jpg
You can have:
new_file=${file%%[^0-9]*.jpg}.jpg
As a concept you can have this to rename files:
for file in *.jpg; do
[[ $file == [0-9]*[^0-9]*.jpg ]] || continue ## Just a simple check.
new_file=${file%%[^0-9]*.jpg}.jpg
[[ -e $new_file ]] || continue ## Do not overwrite. Delete line if not wanted.
echo "Renaming $file to $new_file." ## Optional message.
mv -- "$file" "$new_file" || echo "Failed to rename $file to $new_file."
done
If you're going to process files that also contain directory names, you'll need some more changes:
for file in /path/to/other/dirs/*.jpg *.jpg; do
base=${file##*/}
[[ $base == [0-9]*[^0-9]*.jpg ]] || continue
if [[ $file == */* ]]; then
new_file=${file%/*}/${base%%[^0-9]*.jpg}.jpg
else
new_file=${file%%[^0-9]*.jpg}.jpg
fi
[[ -e $new_file ]] || continue
echo "Renaming $file to $new_file."
mv -- "$file" "$new_file"
done
you can also try the following code
but be careful all the files should be in .jpg format and pass the name of folder as an argument
#!/bin/bash
a=`ls $1`
for b in $a
do
echo $b
if (( i<4 ))
then
c=`echo $b | cut -c1-5`
let i=i+1
c="$c$i.jpg"
echo $c
else
c=`echo $b | cut -c1-5`
c="$c.jpg"
break
fi
mv $1$b $1$c
done

Resources