Bash Script to find the most recently modified file - bash

I have two folders, for arguments sake /Volumes/A and /Volumes/B. They are mounted network shares from a Windows server containing a load of .bkf files from Backup Exec.
I am looking for a script which will look into both folders and find the most recently modified .bkf file and copy it to another location. There are other files in the folders which must be ignored.
Thanks in advance!!
Shaun
Edit:
I knocked this together:
cp ls -alt /Volumes/E /Volumes/F| grep bkf | head -n 1 | awk '{print $8}' /Volumes/$xservedisk/Windows/
Can anyone think of any reasons why I shouldnt use it?
Thanks again
Shaun

I prefer this for finding the most recently modified file:
find . -type f -printf '%TY-%Tm-%Td %TT %p\n' | sort

NEWEST=
for f in /Volumes/A/*.bkf /Volumes/B/*.bkf
do
if [ -z "$NEWEST" ]
then
NEWEST=$f
elif [ "$f" -nt "$NEWEST" ]
then
NEWEST=$f
fi
done

Goes through some twists just to make sure to handle filenames with odd characters well, which mouviciel's doesn't:
NEWEST=$(find /Volumes/A /Volumes/B -name '*.bkf' -printf '%T# %p\0' | \
sort -rnz | xargs -0n1 2>/dev/null | head -n1 | cut -d' ' -f2-)
[[ -n "$NEWEST" ]] && cp -v "$NEWEST" /other-location
Actually, since these files are coming from Windows and are thus pretty much guaranteed not to have odd characters in their names (like embedded newlines),
NEWEST=$(find /Volumes/A /Volumes/B -name '*.bkf' -printf '%T# %p\n' | \
sort -rn | head -n1 | cut -d' ' -f2-)
[[ -n "$NEWEST" ]] && cp -v "$NEWEST" /other-location

Finding files is done with: find /Volumes/[AB] -name '*.bkf'
Sorting files by modification time is done with: ls -t
if load of files is not that much, you can simply use:
ls -lrt `find /Volumes/[AB] -name '*.bkf'`
The last displayed file is the most recently modified.
edit
A more robust solution (thanks ephemient) is:
find /Volumes/[AB] -type f -name '*.bkf' -print0 | xargs -0 ls -lrt

cp `find /Volumes/[AB] -name '*bkf' -type f -printf "%A#\t%p\n" |sort -nr |head -1 |cut -f2` dst_directory/

Related

Bash command: head

I am trying to find all files with dummy* in the folder named dummy. Then I need to sort them according to time of creation and get the 1st 10 files. The command I am trying is:
find -L /home/myname/dummy/dummy* -maxdepth 0 -type f -printf '%T# %p\n' | sort -n | cut -d' ' -f 2- | head -n 10 -exec readlink -f {} \;
But this doesn't seem to work with the following error:
head: invalid option -- 'e'
Try 'head --help' for more information.
How do I make the bash to not read -exec as part of head command?
UPDATE1:
Tried the following:
find -L /home/myname/dummy/dummy* -maxdepth 0 -type f -exec readlink -f {} \; -printf '%T# %p\n' | sort -n | cut -d' ' -f 2- | head -n 10
But this is not according to timestamp sort because both find and printf are printing the files and sort is sorting them all together.
Files in dummy are as follows:
dummy1, dummy2, dummy3 etc. This is the order in which they are created.
How do I make the bash to not read -exec as part of head command?
The -exec and subsequent arguments appear intended to be directed to find. The find command stops at the first |, so you would need to move those arguments ahead of that:
find -L /home/myname/dummy/dummy* -maxdepth 0 -type f -printf '%T# %p\n' -exec readlink -f {} \; | sort -n | cut -d' ' -f 2- | head -n 10
However, it doesn't make much sense to both -printf file details and -exec readlink the results. Possibly you wanted to run readlink on each filename that makes it past head. In that case, you might want to look into the xargs command, which serves exactly the purpose of converting data read from the standard input into arguments to a command. For example:
find -L /home/myname/dummy/dummy* -maxdepth 0 -type f -printf '%T# %p\n' |
sort -n |
cut -d' ' -f 2- |
head -n 10 |
xargs -rd '\n' readlink -f
I think you are over-complicating things here. Using just ls and head should get you the results you want:
ls -lt /home/myname/dummy/dummy* | head -10
To sort by ctime specifically, use the -c flag for ls:
ls -ltc /home/myname/dummy/dummy* | head -10

bash shell script not working as intended using cmp with output redirection

I am trying to write a bash script that remove duplicate files from a folder, keeping only one copy.
The script is the following:
#!/bin/sh
for f1 in `find ./ -name "*.txt"`
do
if test -f $f1
then
for f2 in `find ./ -name "*.txt"`
do
if [ -f $f2 ] && [ "$f1" != "$f2" ]
then
# if cmp $f1 $f2 &> /dev/null # DOES NOT WORK
if cmp $f1 $f2
then
rm $f2
echo "$f2 purged"
fi
fi
done
fi
done
I want to redirect the output and stderr to /dev/null to avoid printing them to screen.. But using the commented statement this script does not work as intended and removes all files but the first..
I'll give more informations if needed.
Thanks
Few comments:
First, the:
for f1 in `find ./ -name "*.txt"`
do
if test -f $f1
then
is the same as (find only plain files with the txt extension)
for f1 in `find ./ -type f -name "*.txt"`
Better syntax (bash only) is
for f1 in $(find ./ -type f -name "*.txt")
and finally the whole is wrong, because if the filename contains a space, the f1 variable will not get the full path name. So instead the for do:
find ./ -type f -name "*.txt" -print | while read -r f1
and as #Sir Athos pointed out, the filename can contain \n so the best is to use
find . -type f -name "*.txt" -print0 | while IFS= read -r -d '' f1
Second:
Use "$f1" instead of $f1 - again, because the $f1 can contain space.
Third:
doing N*N comparisons is not very effective. You should make a checksum (md5 or better sha256) for every txt file. When the checksum is identical - the files are dups.
If you don't trust checksums, simply compare only files what has identical checksums. Files with different checksum are SURE not duplicates. ;)
Making checksums are slow to, so you should 1st compare ony files with the same size. Different sized files are not duplicates...
You can skip empty txt files - they are duplicates all :).
so the final command can be:
find -not -empty -type f -name \*.txt -printf "%s\n" | sort -rn | uniq -d |\
xargs -I% -n1 find -type f -name \*.txt -size %c -print0 | xargs -0 md5sum |\
sort | uniq -w32 --all-repeated=separate
commented:
#find all non-empty file with the txt extension and print their size (in bytes)
find . -not -empty -type f -name \*.txt -printf "%s\n" |\
#sort the sizes numerically, and keep only duplicated sizes
sort -rn | uniq -d |\
#for each sizes (what are duplicated) find all files with the given size and print their name (path)
xargs -I% -n1 find . -type f -name \*.txt -size %c -print0 |\
#make an md5 checksum for them
xargs -0 md5sum |\
#sort the checksums and keep duplicated files separated with an empty line
sort | uniq -w32 --all-repeated=separate
The output now, you can simply edit the output file and decide what want remove and what file want keep.
&> is bash syntax, you'll need to change the shebang line (first line) to #!/bin/bash (or the appropriate path to bash.
Or if you're really using the Bourne Shell (/bin/sh), then you have to use old-style redirection, i.e.
cmp ... >/dev/null 2>&1
Also, I think the &> was only introduced in bash 4, so if you're using bash, 3.X you'll still need the old-style redirections.
IHTH
Credit to #kobame for this answer: this is really a comment but for the formatting.
You don't need to call find twice, print out the size and the filename in the find command
find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
# find the files that have duplicate sizes
sort -n | uniq -Dw 8 |
# strip off the size and get the md5 sum
cut -c 10- | xargs md5sum
An example
$ cat a.txt
this is file a
$ cat b.txt
this is file b
$ cat c.txt
different contents
$ cp a.txt d.txt
$ cp b.txt e.txt
$ find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
sort -n | uniq -Dw 8 | cut -c 10- | xargs md5sum
76fd4c1589ef708d9203f3cf09cfd032 ./a.txt
e2d75fd6a1080efb6230d0608b1f9014 ./b.txt
76fd4c1589ef708d9203f3cf09cfd032 ./d.txt
e2d75fd6a1080efb6230d0608b1f9014 ./e.txt
To keep one and delete the rest, I would pipe the output into:
... | awk '++seen[$1] > 1 {print $2}' | xargs echo rm
rm ./d.txt ./e.txt
Remove the echo if your testing is satisfactory.
Like many complex pipelines, filenames containing newlines will break it.
All nice answers, so only one short suggestion: you can install and use the
fdupes -r .
from the man:
Searches the given path for duplicate files. Such files are found by
comparing file sizes and MD5 signatures, followed by a byte-by-byte
comparison.
Added by #Francesco
fdupes -rf . | xargs rm -f
for remove dupes. (the -f in fdupes omit the 1st occurence the file, so list only dupes)

using pipes with a find command

I have a series of delimited files, some of which have some bad data and can be recognized by doing a column count on them. I can find them with the following command:
find ./ -name 201201*gz -mtime 12
They are all gzipped and I do not want to un-archive them all. So to check the column counts I've been doing I'm running this as a second command on each file:
zcat ./path/to/file.data | awk '{print NF}' | head
I know I can run a command on each file through find with -exec, but how can I also get it to run through the pipes? A couple things I tried, neither of which I expected to work and neither of which did:
find ./ -name 201201*gz -mtime 12 -print -exec zcat {} \; | awk '{print NF}'| head
find ./ -name 201201*gz -mtime 12 -print -exec "zcat {} | awk '{print NF}'| head" \;
I'd use a explicit loop aproach:
find . -name 201201*gz -mtime 12 | while read file; do
echo "$file: "
zcat "$file" | awk '{print NF}' | head
done
More or less you pipe things through find like:
find . -name "foo" -print0 | xargs -0 echo
So your command would look like:
find ./ -name "201201*gz" -mtime 12 -print0 | xargs -0 zcat | awk '{print NF}'| head
-print0 and xargs -0 just helps to make sure files with special characters dont break the pipe.

How to find the executable files in the current directory and find out their extensions?

I need to find all executable files from /bin. How to do it using
find . -executable
and how to check if the file is script (for example, sh, pl, bash)?
#!/bin/bash
for file in `find /bin` ; do
if [ -x $file ] ; then
file $file
fi
done
and even better to do
find /bin -type f -perm +111 -print0 | xargs -0 file
find /bin/ -executable returns all executable files from /bin/ directory.
To filtering extension there are usable -name flag. For example, find /bin/ -executable -name "*.sh" returns sh-scripts.
UPD:
If file is not a binary file and do not have extension, it's possible to figured out it's type from shabang.
For example find ~/bin/ -executable | xargs grep --files-with-matches '#!/bin/bash' returns files from ~/bin/ directory, which contains #!/bin/bash.
To find all the scripts shell
find . -type f -executable | xargs file -i | grep x-shellscript | cut -d":" -f1
To find all the executable
find . -type f -executable | xargs file -i | grep x-exec | cut -d":" -f1
To find all the shared libraries
find . -type f -executable | xargs file -i | grep x-sharedlib | cut -d":" -f1
This worked for me & thought of sharing...
find ./ -type f -name "*" -exec sh -c '
case "$(head -n 1 "$1")" in
?ELF*) exit 0;;
MZ*) exit 0;;
#!*/ocamlrun*)exit0;;
esac
exit 1
' sh {} \; -print

Get the newest directory to a variable in Bash

I would like to find the newest sub directory in a directory and save the result to variable in bash.
Something like this:
ls -t /backups | head -1 > $BACKUPDIR
Can anyone help?
BACKUPDIR=$(ls -td /backups/*/ | head -1)
$(...) evaluates the statement in a subshell and returns the output.
There is a simple solution to this using only ls:
BACKUPDIR=$(ls -td /backups/*/ | head -1)
-t orders by time (latest first)
-d only lists items from this folder
*/ only lists directories
head -1 returns the first item
I didn't know about */ until I found Listing only directories using ls in bash: An examination.
This ia a pure Bash solution:
topdir=/backups
BACKUPDIR=
# Handle subdirectories beginning with '.', and empty $topdir
shopt -s dotglob nullglob
for file in "$topdir"/* ; do
[[ -L $file || ! -d $file ]] && continue
[[ -z $BACKUPDIR || $file -nt $BACKUPDIR ]] && BACKUPDIR=$file
done
printf 'BACKUPDIR=%q\n' "$BACKUPDIR"
It skips symlinks, including symlinks to directories, which may or may not be the right thing to do. It skips other non-directories. It handles directories whose names contain any characters, including newlines and leading dots.
Well, I think this solution is the most efficient:
path="/my/dir/structure/*"
backupdir=$(find $path -type d -prune | tail -n 1)
Explanation why this is a little better:
We do not need sub-shells (aside from the one for getting the result into the bash variable).
We do not need a useless -exec ls -d at the end of the find command, it already prints the directory listing.
We can easily alter this, e.g. to exclude certain patterns. For example, if you want the second newest directory, because backup files are first written to a tmp dir in the same path:
backupdir=$(find $path -type -d -prune -not -name "*temp_dir" | tail -n 1)
The above solution doesn't take into account things like files being written and removed from the directory resulting in the upper directory being returned instead of the newest subdirectory.
The other issue is that this solution assumes that the directory only contains other directories and not files being written.
Let's say I create a file called "test.txt" and then run this command again:
echo "test" > test.txt
ls -t /backups | head -1
test.txt
The result is test.txt showing up instead of the last modified directory.
The proposed solution "works" but only in the best case scenario.
Assuming you have a maximum of 1 directory depth, a better solution is to use:
find /backups/* -type d -prune -exec ls -d {} \; |tail -1
Just swap the "/backups/" portion for your actual path.
If you want to avoid showing an absolute path in a bash script, you could always use something like this:
LOCALPATH=/backups
DIRECTORY=$(cd $LOCALPATH; find * -type d -prune -exec ls -d {} \; |tail -1)
With GNU find you can get list of directories with modification timestamps, sort that list and output the newest:
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\0" | sort -z -n | cut -z -f2- | tail -z -n1
or newline separated
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\n" | sort -n | cut -f2- | tail -n1
With POSIX find (that does not have -printf) you may, if you have it, run stat to get file modification timestamp:
find . -mindepth 1 -maxdepth 1 -type d -exec stat -c '%Y %n' {} \; | sort -n | cut -d' ' -f2- | tail -n1
Without stat a pure shell solution may be used by replacing [[ bash extension with [ as in this answer.
Your "something like this" was almost a hit:
BACKUPDIR=$(ls -t ./backups | head -1)
Combining what you wrote with what I have learned solved my problem too. Thank you for rising this question.
Note: I run the line above from GitBash within Windows environment in file called ./something.bash.

Resources