Creating a shell script with diff function to compare multiple files - bash

I have five different files and all are in different directory, I want to check matching files and find out the unique files as well.
I am not sure how should I handle this.

You can look to the output of
chksum "path1/file1" "path2/f2" "p3/f3" "p4/f4" "p5/f5" | sort
You can also make a script looping through the files with
files=("path1/file1" "path2/f2" "p3/f3" "p4/f4" "p5/f5")
for i in {0..4}; do
((j=$i+1))
while [ $j -le 4 ]; do
diff "${files[i]}" "${files[j]}" >/dev/null
if [ $? -eq 0 ]; then
echo "${files[i]} and ${files[j]} are the same."
else
echo "${files[i]} and ${files[j]} are different."
fi
((j++))
done
done

You can use cksum ou md5sum to detect identical files :
find . -type f | while read f; do md5sum "$f"; done > tmp.txt
cat tmp.txt | cut -d" " -f1 | while read c
do n=`grep $c tmp.txt | wc -l`
if [ "$n" != "1" ]; then
grep $c tmp.txt
fi
done | sort -u

Related

Looping through each file in directory - bash

I'm trying to perform certain operation on each file in a directory but there is a problem with order it's going through. It should do one file at the time. The long line (unzipping, grepping, zipping) works fine on a single file without a script, so there is a problem with a loop. Any ideas?
Script should grep through through each zipped file and look for word1 or word2. If at least one of them exist then:
unzip file
grep word1 and word2 and save it to file_done
remove unzipped file
zip file_done to /donefiles/ with original name
remove file_done from original directory
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
for file in *.gz; do
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
done
else
echo "nothing to do here"
fi
done
The code snipped you've provided has a few problems, e.g. unneeded nested for cycle and erroneous pipeline
(the whole line gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip...).
Note also your code will work correctly only if *.gz files don't have spaces (or special characters) in names.
Also zgrep -c 'word1\|word2' will also match strings like line_starts_withword1_orword2_.
Here is the working version of the script:
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c -E 'word1|word2' $file) # now counter is the number of word1/word2 occurences in $file
if [[ $counter -gt 0 ]]; then
name=$(basename $file .gz)
zcat $file | grep -E 'word1|word2' > ${name}_done
gzip -f -c ${name}_done > /donefiles/$file
rm -f ${name}_done
else
echo 'nothing to do here'
fi
done
What we can improve here is:
since we unzipping the file anyway to check for word1|word2 presence, we may do this to temp file and avoid double-unzipping
we don't need to count how many word1 or word2 is inside the file, we may just check for their presence
${name}_done can be a temp file cleaned up automatically
we can use while cycle to handle file names with spaces
#!/bin/bash
tmp=`mktemp /tmp/gzip_demo.XXXXXX` # create temp file for us
trap "rm -f \"$tmp\"" EXIT INT TERM QUIT HUP # clean $tmp upon exit or termination
find . -maxdepth 1 -mindepth 1 -type f -name '*.gz' | while read f; do
# quotes around $f are now required in case of spaces in it
s=$(basename "$f") # short name w/o dir
gunzip -f -c "$f" | grep -P '\b(word1|word2)\b' > "$tmp"
[ -s "$tmp" ] && gzip -f -c "$tmp" > "/donefiles/$s" # create archive if anything is found
done
It looks like you have an inner loop inside the outer one :
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
for file in *.gz; do #<<< HERE
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
done
else
echo "nothing to do here"
fi
done
The inner loop goes through all the files in the directory if one of them contains file1 or file2. You probably want this :
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
else
echo "nothing to do here"
fi
done

Need to remove the extra empty lines from the output of shell script

i'm trying to write a code which will print all files taking more than min_size (lets say 10G) in a directory. the problem is output off the below code is all files irrespective of the min_size. i will be getting other details like mtime , owner as well later in the code but this part itself doesnt work fine, whats wrong here ?
#!/bin/sh
if (( $# <3 )); then
echo "$0 dirname min_size count"
exit 1
else
dirname="$1";
min_size="$2";
count="$3";
#shift 3
fi
tmpfile=$(mktemp /lawdump/pulkit/files.XXXXXX)
exec 3> "$tmpfile"
find "${dirname}" -type f -print0 2>&1 | grep -v "Permission denied" | xargs -0 -I {} echo "{}" > "$tmpfile"
for i in `cat tmpfile`
do
x="`du -ah $i | awk '{print $1}' | grep G | sort -nr -k 1`"
size=$(echo $x | sed 's/[A-Za-z]*//g')
if [ size > $min_size ];then
echo $size
fi
done
Note : i know this can be done through find or du but i need to write a shell script to have an email sent out regularly with all the details.

Shell script: check if any files in one directory are newer than any files in another directory

I want to run a command in a shell script if files in one directory have changed more recently than files in another directory.
I would like something like this
if [ dir1/* <have been modified more recently than> dir2/* ]; then
echo 'We need to do some stuff!'
fi
As described in BashFAQ #3, broken down here into reusable functions:
newestFile() {
local latest file
for file; do
[[ $file && $file -nt $latest ]] || latest=$file
done
}
directoryHasNewerFilesThan() {
[[ "$(newestFile "$1"/*)" -nt "$(newestFile "$2" "$2"/*)" ]]
}
if directoryHasNewerFilesThan dir1 dir2; then
echo "We need to do something!"
else
echo "All is well"
fi
If you want to count the directories themselves as files, you can do that too; just replace "$(newestFile "$1"/*)" with "$(newestFile "$1" "$1"/*)", and likewise for the call to newestFile for $2.
Using /bin/ls
#!/usr/bin/ksh
dir1=$1
dir2=$2
#get modified time of directories
integer dir1latest=$(ls -ltd --time-style=+"%s" ${dir1} | head -n 2 | tail -n 1 | awk '{print $6}')
integer dir2latest=$(ls -ltd --time-style=+"%s" ${dir2} | head -n 2 | tail -n 1 | awk '{print $6}')
#get modified time of the latest file in the directories
integer dir1latestfile=$(ls -lt --time-style=+"%s" ${dir1} | head -n 2 | tail -n 1 | awk '{print $6}')
integer dir2latestfile=$(ls -lt --time-style=+"%s" ${dir2} | head -n 2 | tail -n 1 | awk '{print $6}')
#sort the times numerically and get the highest time
val=$(/bin/echo -e "${dir1latest}\n${dir2latest}\n${dir1latestfile}\n${dir2latestfile}" | sort -n | tail -n 1)
#check to which file the highest time belongs to
case $val in
#(${dir1latest}|${dir1latestfile})) echo $dir1 is latest ;;
#(${dir2latest}|${dir2latestfile})) echo $dir2 is latest ;;
esac
It's simple, get times stamps of both the folders in machine format(epoch time) then do simple comparison. that's all

Bash script loop through subdirectories and write to file

I have no idea I have spent a lot of hours dealing with this problem. I need to write script. Script should loop recursively through subdirectories in current directory. It should check files count in each directory. If file count is greater than 10 it should write all names of these file in file named "BigList" otherwise it should write in file "ShortList". This should look like
---<directory name>
<filename>
<filename>
<filename>
<filename>
....
---<directory name>
<filename>
<filename>
<filename>
<filename>
....
My script only works if subdirecotries don't include subdirectories in turn.
I am confused about this. Because it doesn't work as I expect. It will take less than 5 minutes to write this on any programming language for my.
Please help to solve this problem , because I have no idea how to do this.
Here is my script
#!/bin/bash
parent_dir=""
if [ -d "$1" ]; then
path=$1;
else
path=$(pwd)
fi
parent_dir=$path
loop_folder_recurse() {
local files_list=""
local cnt=0
for i in "$1"/*;do
if [ -d "$i" ];then
echo "dir: $i"
parent_dir=$i
echo before recursion
loop_folder_recurse "$i"
echo after recursion
if [ $cnt -ge 10 ]; then
echo -e "---"$parent_dir >> BigList
echo -e $file_list >> BigList
else
echo -e "---"$parent_dir >> ShortList
echo -e $file_list >> ShortList
fi
elif [ -f "$i" ]; then
echo file $i
if [ $cur_fol != $main_pwd ]; then
file_list+=$i'\n'
cnt=$((cnt + 1))
fi
fi
done
}
echo "Base path: $path"
loop_folder_recurse $path
I believe that this does what you want:
find . -type d -exec env d={} bash -c 'out=Shortlist; [ $(ls "$d" | wc -l) -ge 10 ] && out=Biglist; { echo "--$d"; ls "$d"; echo; } >>"$out"' ';'
If we don't want either to count subdirectories to the cut-off or to list them in the output, then use this version:
find . -type d -exec env d={} bash -c 'out=Shortlist; [ $(ls -p "$d" | grep -v "/$" | wc -l) -ge 10 ] && out=Biglist; { echo "--$d"; ls -p "$d"; echo; } | grep -v "/$" >>"$out"' ';'

Bash: Native way to check if an entry is one line?

I have a find script that automatically opens a file if just one file is found. The way I currently handle it is doing a word count on the number of lines of the search results. Is there an easier way to do this?
if [ "$( cat "$temp" | wc -l | xargs echo )" == "1" ]; then
edit `cat "$temp"`
fi
EDITED - here is the context of the whole script.
term="$1"
temp=".aafind.txt"
find src sql common -iname "*$term*" | grep -v 'src/.*lib' >> "$temp"
if [ ! -s "$temp" ]; then
echo "ΓΈ - including lib..." 1>&2
find src sql common -iname "*$term*" >> "$temp"
fi
if [ "$( cat "$temp" | wc -l | xargs echo )" == "1" ]; then
# just open it in an editor
edit `cat "$temp"`
else
# format output
term_regex=`echo "$term" | sed "s%\*%[^/]*%g" | sed "s%\?%[^/]%g" `
cat "$temp" | sed -E 's%//+%/%' | grep --color -E -i "$term_regex|$"
fi
rm "$temp"
Unless I'm misunderstanding, the variable $temp contains one or more filenames, one per line, and if there is only one filename it should be edited?
[ $(wc -l <<< "$temp") = "1" ] && edit "$temp"
If $temp is a file containing filenames:
[ $(wc -l < "$temp") = "1" ] && edit "$(cat "$temp")"
Several of the results here will read through an entire file, whereas one can stop and have an answer after one line and one character:
if { IFS='' read -r result && ! read -n 1 _; } <file; then
echo "Exactly one line: $result"
else
echo "Either no valid content at all, or more than one line"
fi
For safely reading from find, if you have GNU find and bash as your shell, replace <file with < <(find ...) in the above. Even better, in that case, is to use NUL-delimited names, such that filenames with newlines (yes, they're legal) don't trip you up:
if { IFS='' read -r -d '' result && ! read -r -d '' -n 1 _; } \
< <(find ... -print0); then
printf 'Exactly one file: %q\n' "$result"
else
echo "Either no results, or more than one"
fi
Well, given that you are storing these results in the file $temp this is a little easier:
[ "$( wc -l < $temp )" -eq 1 ] && edit "$( cat $temp )"
Instead of 'cat $temp' you can do '< $temp', but it might take away some readability if you are not very familiar with redirection 8)
If you want to test whether the file is empty or not, test -s does that.
if [ -s "$temp" ]; then
edit `cat "$temp"`
fi
(A non-empty file by definition contains at least one line. You should find that wc -l agrees.)
If you genuinely want a line count of exactly one, then yes, it can be simplified substantially;
if [ $( wc -l <"$temp" ) = 1 ]; then
edit `cat "$temp"`
fi
You can use arrays:
x=($(find . -type f))
[ "${#x[*]}" -eq 1 ] && echo "just one || echo "many"
But you might have problems in case of filenames with whitespace, etc.
Still, something like this would be a native way
no this is the way, though you're making it over-complicated:
if [ "`wc -l $temp | cut -d' ' -f1`" = "1" ]; then
edit "$temp";
fi
what's complicating it is:
useless use of cat,
unuseful use of xargs
and I'm not sure if you really want the editcat $temp`` which is editing the file at the content of $temp

Resources