Bash : Find and Remove duplicate files from different folders

Bash : Find and Remove duplicate files from different folders - bash

I have two folders with some common files, I want to delete duplicate files from xyz folder.
folder1:
/abc/file1.csv
/abc/file2.csv
/abc/file3.csv
/abc/file4.csv
folder2:
/xyz/file1.csv
/xyz/file5.csv
I want to compare both folders and remove duplicate from /xyz folder. Output should be: file5.csv
For now I am using :
find "/xyz" "/abc" "/abc" -printf '%P\n' | sort | uniq -u | -exec rm {} \;
But it failing with reason : if -exec is not a typo you can run the following command to lookup the package that contains the binary:
command-not-found -exec
-bash: -exec: command not found

-exec is an option to find, you've already exited the command find when you started the pipes.
Try xargs instead, it take all the data from stdin and appends to the program.
UNTESTED
find "/xyz" "/abc" "/abc" -printf '%P\n' | sort | uniq -u | xargs rm

Find every file in 234 and 123 directory get filename by -printf, sort them, uniq -d give list of duplications, give back path by sed, using 123 directory to delete the duplications from, and pass files to xargs rm
Command:
find ./234 ./123 -type f -printf '%P\n' | sort | uniq -d | sed 's/^/.\/123\//g' | xargs rm
sed don't needed if you are in the ./123 directory and using full path for folders in find.

Another approach: just find the files in abc and attempt to remove them from xyz:
UNTESTED
find /abc -type f -printf 'rm -f /xyz/%P' | sh

Remove Duplicate Files From Particular Directory
FileList=$(ls)
for D1 in $FileList ;do
if [[ -f $D1 ]]; then
for D2 in $FileList ;do
if [[ -f $D2 ]]; then
if [[ $D1 == $D2 ]]; then
: 'Skip Orignal File'
else
if [[ $(md5sum $D1 | cut -d'=' -f 2 | cut -d ' ' -f 1 ) == $(md5sum $D2 | cut -d'=' -f 2 | cut -d ' ' -f 1 ) ]]; then
echo "Duplicate File Found : $D2"
rm -rf $D2
fi #Detect Duplicate Using MD5
fi #Skip Orginal File
fi #D2 File available Then Next
done
fi #D1 File available Then Next
done

Related

execute an if statement on every folder

I have for example 3 files (it could 1 or it could be 30) like this :
name_date1.tgz
name_date2.tgz
name_date3.tgz
When extracted it will look like :
name_date1/data/info/
name_date2/data/info/
name_date3/data/info/
Here how it looks inside each folder:
name_date1/data/info/
you.log
you.log.1.gz
you.log.2.gz
you.log.3.gz
name_date2/data/info/
you.log
name_date3/data/info/
you.log
you.log.1.gz
you.log.2.gz
What I want to do is concatenate all you file from each folder and concatenate one more time all the concatenated one to one single file.
1st step: extract all the folder
for a in *.tgz
do
a_dir=${a%.tgz}
mkdir $a_dir 2>/dev/null
tar -xvzf $a -C $a_dir >/dev/null
done
2nd step: executing an if statement on each folder available and cat everything
myarray=(`find */data/info/ -maxdepth 1 -name "you.log.*.gz"`)
ls -d */ | xargs -I {} bash -c "cd '{}' &&
if [ ${#myarray[#]} -gt 0 ];
then
find data/info -name "you.log.*.gz" -print0 | sort -z -rn -t. -k4 | xargs -0 zcat | cat -
data/info/you.log > youfull1.log
else
cat - data/info/you.log > youfull1.log
fi "
cat */youfull1.log > youfull.log
My issue when I put multiple name_date*.tgzit gives me this error:
gzip: stdin: unexpected end of file
With the error, I still have all my files concatenated, but why error message ?
But when I put only one .tgz file then I don't have any issue regardless the number you file.
any suggestion please ?

Try something simpler. No need for myarray. Pass files one at a time as they are inputted and decide what to do with them one at a time. Try:
find */data/info -type f -maxdepth 1 -name "you.log*" -print0 |
sort -z |
xargs -0 -n1 bash -c '
if [[ "${1##*.}" == "gz" ]]; then
zcat "$1";
else
cat "$1";
fi
' --
If you have to iterate over directories, don't use ls, still use find.
find . -maxdepth 1 -type d -name 'name_date*' -print0 |
sort -z |
while IFS= read -r -d '' dir; do
cat "$dir"/data/info/you.log
find "$dir"/data/info -type f -maxdepth 1 -name 'you.log.*.gz' -print0 |
sort -z -t'.' -n -k3 |
xargs -r -0 zcat
done
or (if you have to) with xargs, which should give you the idea how it's used:
find . -maxdepth 1 -type d -name 'name_date*' -print0 |
sort -z |
xargs -0 -n1 bash -c '
cat "$1"/data/info/you.log
find "$1"/data/info -type f -maxdepth 1 -name "you.log.*.gz" -print0 |
sort -z -t"." -n -k3 |
xargs -r -0 zcat
' --
Use -t option with xargs to see what it's doing.

Find and count compressed files by extension

I have a bash script that counts compressed files by file extension and prints the count.
#!/bin/bash
FIND_COMPRESSED=$(find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -rn | grep -Ei '(deb|tgz|tar|gz|zip)$')
COUNT_LINES=$($FIND_COMPRESSED | wc -l)
if [[ $COUNT_LINES -eq 0 ]]; then
echo "No archived files found!"
else
echo "$FIND_COMPRESSED"
fi
However, the script works only if there are NO files with .deb .tar .gz .tgz .zip.
If there are some, say test.zip and test.tar in the current folder, I get this error:
./arch.sh: line 5: 1: command not found
Yet, if I copy the contents of the FIND_COMPRESSED variable into the COUNT_LINES, all works fine.
#!/bin/bash
FIND_COMPRESSED=$(find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -rn | grep -Ei '(deb|tgz|tar|gz|zip)$')
COUNT_LINES=$(find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -rn | grep -Ei '(deb|tgz|tar|gz|zip)$'| wc -l)
if [[ $COUNT_LINES -eq 0 ]]; then
echo "No archived files found!"
else
echo "$FIND_COMPRESSED"
fi
What am I missing here?

So when you do that variable like that, it tries to execute it like a command, which is why it fails when it has contents. When it's empty, wc simply returns 0 and it marches on.
Thus, you need to change that line to this:
COUNT_LINES=$(echo $FIND_COMPRESSED | wc -l)
But, while we're at it, you can also simplify the other line with something like this:
FIND_COMPRESSED=$(find . -type f -iname "*deb" -or -iname "*tgz" -or -iname "*tar*") #etc

you can do
mapfile FIND_COMPRESSED < <(find . -type f -regextype posix-extended -regex ".*(deb|tgz|tar|gz|zip)$" -exec bash -c '[[ "$(file {})" =~ compressed ]] && echo {}' \;)
COUNT_LINES=${#FIND_COMPRESSED[#]}

Rename files to unique names and move them into a single destination directory

i have 100s of directories with same filename of content.html along with other files.
I am trying to copy all these content.html files under 1 directory, but since they have same name, it overwrites each other
so how can i rename and move all these under 1 directory
Eg:
./0BD3D9D2-F8B1-4472-95C2-13319650A45C:
card.png content.html note.xhtml quickLook.png snippet.txt
./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0:
card.png content.html note.xhtml quickLook.png related snippet.txt
./1A33F29E-3938-4C2F-BA99-6B98FD045742:
card.png content.html note.xhtml quickLook.png snippet.txt
command i tried:
rename content.html to content
find . -type f | grep content.html | while read f; do mv $f ${f/.html/}; done
append number to filename "content" to make it unique
find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
MacBook-Pro$ find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
mv ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content1.html
mv ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content1.html
mv ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content1.html
once above step is successful, i should be able do this to achieve my desired output:
find . -type f | grep content | while read f; do mv $f ../; done
however, i am sure i can do this in 1 step command and also my step 2 is not working (incrementing i)
any idea why step2 is not working??

bash script:
#!/bin/bash
find . -type f -name content.html | while IFS= read -r f; do
name=$(basename $f)
((++i))
mv "$f" "for_content/${name%.*}$i.html"
done
replace for_content with your destination folder name

Suppose in your base directory, you create a folder named final for storing
content.html files, then do something like below
find . -path ./final -prune -o -name "content.html" -print0 |
while read -r -d '' name
do
mv "$name" "./final/content$(mktemp -u XXXX).html"
# mktemp with -u option just creates random characters, or it is just a dry run
done
At the end you'll get all the content.html files under ./final folder in the format contentXXXX.html where XXXX are random characters.
Note:-path ./final -prune -o in find prevents it from descending to our results folder.

The inode of the of the files should be unique and so you could use the following:
find $(pwd) -name "content.html" -printf %f" "%i" "%p"\n" | awk '{ system("mv "$3" <directorytomoveto>"$2$1) }'

I'd use something like this:
find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
You can replace ./output-dir/ with your destination directory
Example:
[root#sl7-o2 test]# ls -R
.:
1 2 3 output-dir
./1:
test
./2:
test
./3:
test
./output-dir:
[root#sl7-o2 test]# find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
[root#sl7-o2 test]# ls ./output-dir/
content_000.txt content_001.txt content_002.txt

You can use shopt -s globstar to grab all content.html files recursively and then use a loop to rename them:
#!/bin/bash
set -o globstar
counter=0
dest_dir=/path/to/destination
for f in **/content.html; do # pick up all content.html files
[[ -f "$f" ]] || continue # skip if not a regular file
mv "$f" "$dest_dir/content_$((++counter).html"
done

List files (recursively) which have no matching pair

I have a set of files in multiple directories. Most of them have a related pair with a different extension and the same base name. The related files are always within the same directory. I need to list only files (and path) without pairs within a directory including all sub directories. How can I do that in bash?
file1.xxx
file1.yyy
file2.xxx
file2.zzz
file3.xxx
file3.aaa
file4.xxx
Any help is much appreciated!

You could use find and pipe to perl to sort the data
find . -type f -print0 |\
perl -0 -l012 -ne 'if(/.*\/(.*)\./){$x{$1}++;$y{$1}=$_}
}{for(keys %x){print $y{$_} if $x{$_}==1}'
This adds the name with no suffix to a hash and incremements for each match, whilst adding the full line to another hash with the same key.
In the end it just checks which have a single match and prints.
As the filenames are null delimited it should work with all filenames.

You can list all the files under your directory and then count how many matches you can find of their whole name in the same tree directory which has the same path name (excluding extension).
If your file matches with less or one names, that means it has not "companion" files:
for f in $(find -type f); do
c=$(find -wholename "$(echo $f | rev | cut --complement -d . -f 1 | rev).*" | wc -l);
if [ "$c" -le "1" ]; then echo $f; fi;
done
Edit:
It might more readable if the pattern composition is performed in a different line:
for f in $(find -type f); do
compPattern="$(echo $f | rev | cut --complement -d . -f 1 | rev).*"
c=$(find -wholename "$compPattern" | wc -l);
if [ "$c" -le "1" ]; then echo $f; fi;
done
Edit (2)
To avoid parsing the output of the find you can use read:
find -type f | while read f; do
if [ $(find -wholename "$(echo $f | rev | cut --complement -d . -f 1 | rev).*" | wc -l) -le "1" ]; then echo $f; fi;
done
Edit(3)
To handle special chars, spaces etc. you can use the following.
while IFS= read -r -d '' f ; do
c=$(find -wholename "$(echo $f | rev | cut --complement -d . -f 1 | rev).*" | wc -l);
if [ "$c" -le "1" ]; then echo $f; fi;
done < <(find -type f -print0)

iterate over lines in file then find in directory

I am having trouble looping and searching. It seems that the loop is not waiting for the find to finish. What am I doing wrong?
I made a loop the reads a file line by line. I then want to use that "name" to search a directory looking to see if a folder has that name. If it exists copy it to a drive.
#!/bin/bash
DIRFIND="$2"
DIRCOPY="$3"
if [ -d $DIRFIND ]; then
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "$line"
FILE=`find "$DIRFIND" -type d -name "$line"`
if [ -n "$FILE" ]; then
echo "Found $FILE"
cp -a "$FILE" "$DIRCOPY"
else
echo "$line not found."
fi
done < "$1"
else
echo "No such file or directory"
fi

Have you tried xargs...
Proposed Solution
cat filenamelist | xargs -n1 -I {} find . -type d -name {} -print | xargs -n1 -I {} mv {} .
what the above does is pipe a list of filenames into find (one at a time), when found find prints the name and passes to xarg which moves the file...
Expansion
file = yogo
yogo -> | xargs -n1 -I yogo find . -type d -name yogo -print | xargs -n1 -I {} mv ./<path>/yogo .
I hope the above helps, note that xargs has the advantage that you do not run out of command line buffer.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash : Find and Remove duplicate files from different folders - bash

-exec is an option to find, you've already exited the command find when you started the pipes. Try xargs instead, it take all the data from stdin and appends to the program. UNTESTED find "/xyz" "/abc" "/abc" -printf '%P\n' | sort | uniq -u | xargs rm

Another approach: just find the files in abc and attempt to remove them from xyz: UNTESTED find /abc -type f -printf 'rm -f /xyz/%P' | sh

Related

execute an if statement on every folder

Find and count compressed files by extension

Rename files to unique names and move them into a single destination directory

List files (recursively) which have no matching pair

iterate over lines in file then find in directory

Categories

Resources