bash - find duplicate file in directory and rename - bash

I have a directory that has thousands of files in it with various extensions. I also have a drop location where users drop files to be migrated to this directory. I'm looking for a script that will scan the target directory for a duplicate file name, if found, rename the file in the drop folder, then move it to the target directory.
Example:
/target/file.doc
/drop/file.doc
Script will rename file.doc to file1.doc then move it to /target/.
It needs to maintain the file extension too.

for fil in /drop/*
do
test -f "/target/$fil"
if [ "$?" = 0 ]
then
suff=$(awk -F\. '{ print "."$NF }' <<<$fil)
bdot=$(basename -s $suff $fil)
mv "/drop/$fil" "/drop/${bdot}1$suff"
cp "/drop/${bdot}1.$suff" "/target/${bdot}1$suff"
fi
done
Take each file in the drop directory and check it is existing the /target using test -e. If it does then move (rename) and then copy.

You have to take a bit more care than simply checking if a file exists before moving in order to provide a flexible solution that can handle files with or without extensions. You also may want to provide a way of forming duplicate filenames that preserves sort order. e.g. if file.txt already exists, you may want to use file_001.txt as the duplicate in target rather than file1.txt as when you reach 10 you will no longer have a canonical sort by filename.
Also, you never want to iterate with for i in $(ls dir) that is wrought with pitfalls. See Bash Pitfalls No. 1
Putting those pieces together, and including detail in the comments below, you could do something similar to the following and have a reasonable flexible solution allowing you to specify only the filename.ext to move or /path/to/drop/filename.ext. You must specify the drop and target directories in the script to meet your circumstance., e.g.
#!/bin/bash
tgt=target ## set target and drop directories as required
drp=drop
declare -i cnt=1 ## counter for filename_$cnt
test -z "$1" && { ## validate one argument given
printf "error: insufficient input\nusage: %s filename\n" "${0##*/}"
exit 1
}
test -w "$1" || test -w "$drp/$1" || { ## validate valid filename is writeable
printf "error: file not found or lack permission to move '%s'.\n" "$1"
exit 1
}
fn="${1##*/}" ## strip any path info from filename
if test "$1" != "${1%.*}" ; then
ext="${fn##*.}" ## get file extension
fnwoe="${fn%."$ext"}" ## get filename without extension
test "$fnwoe" = '' && ext= ## was a dotfile, reset ext
fi
vfn="$fn" ## set valid filename = filename
## form valid filename e.g. "$fn_001.$ext" if duplicate found
while test -e "$tgt/$vfn"; do
if test -n "$ext" ## did we have have an extension?
then
printf -v vfn "%s_%03d.%s" "$fnwoe" "$((cnt++))" "$ext"
else
printf -v vfn "%s_%03d" "$fn" "$((cnt++))"
fi
done
mv "$drp/$fn" "$tgt/$vfn" ## move file under non-conflicting name
Example drop and target
$ ls -1 drop
file
file.txt
$ ls -1 target
file.txt
file_001.txt
file_002.txt
Example Use
$ bash mvdrop.sh file
$ bash mvdrop.sh drop/file.txt
Resulting drop and target
$ ls -1 drop
$ ls -1 target
file
file.txt
file_001.txt
file_002.txt
file_003.txt

This will test to see if it exists, preserve the extension (along with any structure before the extension such as in the case of FILE.tar.gz), and move it to the target directory.
#!/bin/bash
TARGET="\target\"
DROP="\drop\"
for F in `ls $DROP`; do
if [[ -f $TARGET$F ]]; then
EXT=`echo $F | awk -F "." '{print $NF}'`
PRE=`echo $F | awk -F "." '{$NF="";print $0}' | sed -e 's/ $//g;s/ /./g'`
mv $DROP$F $DROP$PRE"1".$EXT
F=$PRE"1".$EXT
fi
mv $DROP$F $TARGET
done
Additionally you may want to do come restricting in the ls command, so that you aren't copying entire directories.
Display only regular files (no directories or symbolic links)
ls -p $DROP | grep -v /

Related

Bash script backup, check if directory contains the files from another directory

I am making a bash backup script and I want to implement a functionality that checks if the files from a directory are already contained in another directory, if they are not I want to output the name of these files
#!/bin/bash
TARGET_DIR=$1
INITIAL_DIR=$2
TARG_ls=$(ls -A $1)
INIT_ls=$(ls -A $2)
if [[ "$(ls -A $2)" ]]; then
if [[ ! -n "$(${TARG_ls} | grep ${INIT_ls})" ]]; then
echo All files in ${INITIAL_DIR} have backups for today in ${TARGET_DIR}
exit 0
else
#code for listing the missing files
fi
else
echo Error!! ${INITIAL_DIR} has no files
exit 1
fi
I have thought about storing the ls output of both directories inside strings and comparing them, as it is shown in the code, but in the event where I have to list the files from INITIAL_DIR that are missing in TARGET_DIR, I just don't know how to proceed.
I tried using the diff command comparing the two directories but that takes into account the preexisting files of TARGET_DIR.
In the above code if [[ "$(ls -A $2)" ]]; checks if the CURRENT_DIR contains any files and if [[ ! -n "$(${TARG_ls} | grep ${INIT_ls})" ]]; checks if the target directory contains all the initial directory files.
Anyone have a suggestion, hint?
you can use comm command
$ comm <(ls -A a) <(ls -A b)
will give you files in a only, both in a and b, and in only b in three columns. To get the list of files in a only for example
$ comm -23 <(ls -A a) <(ls -A b)
rsync has a --dry-run switch that will show you what files have changed between 2 directories. Before doing rsync copies of my home directory I preview the changes this way to see if there could be evidence of mass mal encryption or corruption before proceeding.

Change date modified of multiple folders to match that of their most recently modified file

I've been using the following shell bin/bash script as an app which I can drop a folder on, and it will update the date modified of the folder to match the most recently modified file in that folder.
for f in each "$#"
do
echo "$f"
done
$HOME/setMod "$#"
This gets the folder name, and then passes it to this setMod script in my home folder.
#!/bin/bash
# Check that exactly one parameter has been specified - the directory
if [ $# -eq 1 ]; then
# Go to that directory or give up and die
cd "$1" || exit 1
# Get name of newest file
newest=$(stat -f "%m:%N" * | sort -rn | head -1 | cut -f2 -d:)
# Set modification date of folder to match
touch -r "$newest" .
fi
However, if I drop more than one folder on it at a time, it won't work, and I can't figure out how to make it work with multiple folders at once.
Also, I learned from Apple Support that the reason so many of my folders keep getting the mod date updated is due to some Time Machine-related process, despite the fact I haven't touched some of them in years. If anyone knows of a way to prevent this from happening, or to somehow automatically periodically update the date modified of folders to match the date/time of the most-recently-modified file in them, that would save me from having to run this step manually pretty regularly.
The setMod script current accepts only one parameter.
You could either make it accept many parameters and loop over them,
or you could make the calling script use a loop.
I take the second option, because the caller script has some mistakes and weak points. Here it is corrected and extended for your purpose:
for dir; do
echo "$dir"
"$HOME"/setMod "$dir"
done
Or to make setMod accept multiple parameters:
#!/bin/bash
setMod() {
cd "$1" || return 1
# Get name of newest file
newest=$(stat -f "%m:%N" * | sort -rn | head -1 | cut -f2 -d:)
# Set modification date of folder to match
touch -r "$newest" .
}
for dir; do
if [ ! -d "$dir" ]; then
echo not a directory, skipping: $dir
continue
fi
(setMod "$dir")
done
Notes:
for dir; do is equivalent to for dir in "$#"; do
The parentheses around (setMod "$dir") make it run in a sub-shell, so that the script itself doesn't change the working directory, the effect of the cd operation is limited to the sub-shell within (...)

changing the name of many files by increasing the number

I want to change the file names from a terminal. I have many files, so I cannot change all of them one by one.
a20170606_1257.txt -> a20170606_1300.txt
a20170606_1258.txt -> a20170606_1301.txt
I am only able to change it by:
rename 57.txt 00.txt *57.txt
but it is not enough.
Just playing with parameter expansion to extract the longest and shortest strings of type ${str##*} and ${str%%*}
offset=43
for file in *.txt; do
[ -f "$file" ] || continue
woe="${file%%.*}"; ext="${file##*.}"
num="${woe##*_}"
echo "$file" "${woe%%_*}_$((num+offset)).${ext}"
done
Once you have it working, remove the echo line and replace it with mv -v. Change the offset variable as you wish, depending on where you want to start your re-named files from.
Perl e flag to rescue
rename -n -v 's/(?<=_)(\d+)/$1+43/e' *.txt
test
dir $ ls | cat -n
1 a20170606_1257.txt
2 a20170606_1258.txt
dir $
dir $
dir $ rename -n -v 's/(?<=_)(\d+)/$1+43/e' *.txt
rename(a20170606_1257.txt, a20170606_1300.txt)
rename(a20170606_1258.txt, a20170606_1301.txt)
dir $
dir $ rename -v 's/(?<=_)(\d+)/$1+43/e' *.txt
a20170606_1257.txt renamed as a20170606_1300.txt
a20170606_1258.txt renamed as a20170606_1301.txt
dir $
dir $ ls | cat -n
1 a20170606_1300.txt
2 a20170606_1301.txt
dir $
rename --help:
Usage:
rename [ -h|-m|-V ] [ -v ] [ -n ] [ -f ] [ -e|-E *perlexpr*]*|*perlexpr*
[ *files* ]
Options:
-v, -verbose
Verbose: print names of files successfully renamed.
-n, -nono
No action: print names of files to be renamed, but don't rename.
-f, -force
Over write: allow existing files to be over-written.
-h, -help
Help: print SYNOPSIS and OPTIONS.
-m, -man
Manual: print manual page.
-V, -version
Version: show version number.
-e Expression: code to act on files name.
May be repeated to build up code (like "perl -e"). If no -e, the
first argument is used as code.
-E Statement: code to act on files name, as -e but terminated by

How to identify files which are not in list using bash?

Unfortunately my knowledge in bash not so well and I have very non-standard task.
I have a file with the files list.
Example: /tmp/my/file1.txt /tmp/my/file2.txt
How can I write a script which can check that files from folder /tmp/my exist and to have two types messages after script is done.
1 - Files exist and show files:
/tmp/my/file1.txt
/tmp/my/file2.txt
2 - The folder /tmp/my including files and folders which are not in your list. The files and folders:
/tmp/my/test
/tmp/my/1.txt
You speak of files and folders, which seems unclear.
Anyways, I wanted to try it with arrays, so here we go :
unset valid_paths; declare -a valid_paths
unset invalid_paths; declare -a invalid_paths
while read -r line
do
if [ -e "$line" ]
then
valid_paths=("${valid_paths[#]}" "$line")
else
invalid_paths=("${invalid_paths[#]}" "$line")
fi
done < files.txt
echo "VALID PATHS:"; echo "${valid_paths[#]}"
echo "INVALID PATHS:"; echo "${invalid_paths[#]}"
You can check for the files' existence (assuming a list of files, one filename per line) and print the existing ones with a prefix using this
# Part 1 - check list contents for files
while read thefile; do
if [[ -n "$thefile" ]] && [[ -f "/tmp/my/$thefile" ]]; then
echo "Y: $thefile"
else
echo "N: $thefile"
fi
done < filelist.txt | sort
# Part 2 - check existing files against list
for filepath in /tmp/my/* ; do
filename="$(basename "$filepath")"
grep "$filename" filelist.txt -q || echo "U: $filename"
done
The files that exist are prefixed here with Y:, all others are prefixed with N:
In the second section, files in the tmp directory that are not in the file list are labelled with U: (unaccounted for/unexpected)
You can swap the -f test which checks that a path exists and is a regular file for -d (exists and is a directory) or -e (exists)
See
man test
for more options.

BASH filepath and file commands

I am trying to recreate the folder structure from a source in a target location and perform a command on each file found in the process using BASH.Based on some feedback and some searches I am trying to get this solution to work properly. Right now it is breaking because the windows folders have directories with spaces that it refuses to find.
I was able to get this to work after installing some additional features for my cygwin.
source='/cygdrive/z/austin1/QA/Platform QA/8.0.0/Test Cases'
target='/cygdrive/c/FullBashScripts'
# let ** be recursive
shopt -s globstar
for file in "$source"/**/*.restomatic; do
cd "${file%/test.restomatic}"
locationNew="$target${file#$source}"
mkdir -p "$(dirname "$target${file#$source}")"
sed -e 's/\\/\//g' test.restomatic | awk '{if ($1 ~ /^(LOAD|IMPORT)/) system("cat " $2); else print;}' | sed -e 's/\\/\//g' |awk '{if ($1 ~ /^(LOAD|IMPORT)/) system("cat " $2); else print;}' > $locationNew
done
If your bash version is 4 or above, this should work:
source="testing/web testing/"
target="c:/convertedFiles/"
# let ** be recursive
shopt -s globstar
for file in "$source"/**/*.test; do
newfile= "$target/${file#$source}"
mkdir -p "$(dirname "$newfile")"
conversion.command "$file" > "$newfile"
done
${file#$source} lops $source off the beginning of $file.
If you can guarantee that no files have newlines in their name, you can use find to get the names:
source="testing/web testing/"
target="c:/convertedFiles/"
find "$source" -name \*.test | while read file; do
newfile= "$target/${file#$source}"
mkdir -p "$(dirname "$newfile")"
conversion.command "$file" > "$newfile"
done
Your best bet would be to find to get the list of files:
You can do it as follows:
export IFS=`/bin/echo -ne "\n"` # set field separator to new lines only
cd testing # change to the source directory
find . -type d > /tmp/test.dirs # make a list of local directories
for i in `cat /tmp/test.dirs`; do # for each directory
mkdir -p "c:/convertedFiles/$i" # create it in the new location
done
find . -iname *.test > /tmp/test.files # record local file paths as needed
for i in `cat /tmp/test.files`; do # for each test file
process "$i" > "c:/convertedFiles/$i" # process it and store in new dir
done
Note that this is not the most optimal way -- but the easiest to understand and follow. This should work with spaces in filenames. You may have to tweak it further to get it to work under windows.
I would look into a tool called sshfs, or Secure Shell File System. It lets you mount a portion of a remote file system to somewhere local to you.
Once you have the remote fs mounted locally, you can run the follow shell script:
for f in *.*;
do
echo "do something to $f file..";
done
EDIT: I initially did not realize that target was always local anyway.

Resources