find oldest file from list - bash

I've a file with a list of files in different directories and want to find the oldest one.
It feels like something that should be easy with some shell scripting but I don't know how to approach this. I'm sure it's really easy in perl and other scripting languages but I'd really like to know if I've missed some obvious bash solution.
Example of the contents of the source file:
/home/user2/file1
/home/user14/tmp/file3
/home/user9/documents/file9

#!/bin/sh
while IFS= read -r file; do
[ "${file}" -ot "${oldest=$file}" ] && oldest=${file}
done < filelist.txt
echo "the oldest file is '${oldest}'"

You can use stat to find the last modification time of each file, looping over your source file:
oldest=5555555555
while read file; do
modtime=$(stat -c %Y "$file")
[[ $modtime -lt $oldest ]] && oldest=$modtime && oldestf="$file"
done < sourcefile.txt
echo "Oldest file: $oldestf"
This uses the %Y format of stat, which is the last modification time. You could also use %X for last access time, or %Z for last change time.

Use find() to find the oldest file:
find /home/ -type f -printf '%T+ %p\n' | sort | head -1 | cut -d' ' -f2-
And with source file:
find $(cat /path/to/source/file) -type f -printf '%T+ %p\n' | sort | head -1 | cut -d' ' -f2-

Related

bash iterate over a directory sorted by file size

As a webmaster, I generate a lot of junk files of code. Periodically I have to purge the unneeded files filtered by extention. Example: "cleaner txt" Easy enough. But I want to sort the files by size and process them for the "for" loop. How can I do that?
cleaner:
#/bin/bash
if [ -z "$1" ]; then
echo "Please supply the filename suffixes to delete.";
exit;
fi;
filter=$1;
for FILE in *.$filter; do clear;
cat $FILE; printf '\n\n'; rm -i $FILE; done
You can use a mix of find (to print file sizes and names), sort (to sort the output of find) and cut (to remove the sizes). In case you have very unusual file names containing any possible character including newlines, it is safer to separate the files by a character that cannot be part of a name: NUL.
#/bin/bash
if [ -z "$1" ]; then
echo "Please supply the filename suffixes to delete.";
exit;
fi;
filter=$1;
while IFS= read -r -d '' -u 3 FILE; do
clear
cat "$FILE"
printf '\n\n'
rm -i "$FILE"
done 3< <(find . -mindepth 1 -maxdepth 1 -type f -name "*.$filter" \
-printf '%s\t%p\0' | sort -zn | cut -zf 2-)
Note that we must use a different file descriptor than stdin (3 in this example) to pass the file names to the loop. Else, if we use stdin, it will also be used to provide the answers to rm -i.
Inspired from this answer, you could use the find command as follows:
find ./ -type f -name "*.yaml" -printf "%s %p\n" | sort -n
find command prints the the size of the files and the path so that the sort command prints the results from the smaller one to the larger.
In case you want to iterate through (let's say) the 5 bigger files you can do something like this using the tail command like this:
for f in $(find ./ -type f -name "*.yaml" -printf "%s %p\n" |
sort -n |
cut -d ' ' -f 2)
do
echo "### $f"
done
If the file names don't contain newlines and spaces
while read filesize filename; do
printf "%-25s has size %10d\n" "$filename" "$filesize"
done < <(du -bs *."$filter"|sort -n)
while read filename; do
echo "$filename"
done < <(du -bs *."$filter"|sort -n|awk '{$0=$2}1')

How to get list of certain strings in a list of files using bash?

The title is maybe not really descriptive, but I couldn't find a more concise way to describe the problem.
I have a directory containing different files which have a name that e.g. looks like this:
{some text}2019Q2{some text}.pdf
So the filenames have somewhere in the name a year followed by a capital Q and then another number. The other text can be anything, but it won't contain anything matching the format year-Q-number. There will also be no numbers directly before or after this format.
I can work something out to get this from one filename, but I actually need a 'list' so I can do a for-loop over this in bash.
So, if my directory contains the files:
costumerA_2019Q2_something.pdf
costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerB_2019Q3_something.pdf
costumerC_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerD2020Q2something.pdf
I want a for loop that goes over 2019Q2, 2019Q3, 2020Q1, and 2020Q2.
EDIT:
This is what I have so far. It is able to extract the substrings, but it still has doubles. Since I'm already in the loop and I don't see how I can remove the doubles.
find original/*.pdf -type f -print0 | while IFS= read -r -d '' line; do
echo $line | grep -oP '[0-9]{4}Q[0-9]'
done
# list all _filanames_ that end with .pdf from the folder original
find original -maxdepth 1 -name '*.pdf' -type f -print "%p\n" |
# extract the pattern
sed 's/.*\([0-9]{4}Q[0-9]\).*/\1/' |
# iterate
while IFS= read -r file; do
echo "$file"
done
I used -print %p to print just the filename, instead of full path. The GNU sed has -z option that you can use with -print0 (or -print "%p\0").
With how you have wanted to do this, if your files have no newline in the name, there is no need to loop over list in bash (as a rule of a thumb, try to avoid while read line, it's very slow):
find original -maxdepth 1 -name '*.pdf' -type f | grep -oP '[0-9]{4}Q[0-9]'
or with a zero seprated stream:
find original -maxdepth 1 -name '*.pdf' -type f -print0 |
grep -zoP '[0-9]{4}Q[0-9]' | tr '\0' '\n'
If you want to remove duplicate elements from the list, pipe it to sort -u.
Try this, in bash:
~ > $ ls
costumerA_2019Q2_something.pdf costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf other.pdf
costumerA_2020Q1_something.pdf someother.file.txt
~ > $ for x in `(ls)`; do [[ ${x} =~ [0-9]Q[1-4] ]] && echo $x; done;
costumerA_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerB_2019Q2_something.pdf
~ > $ (for x in *; do [[ ${x} =~ ([0-9]{4}Q[1-4]).+pdf ]] && echo ${BASH_REMATCH[1]}; done;) | sort -u
2019Q2
2019Q3
2020Q1

Bash : Find and Remove duplicate files from different folders

I have two folders with some common files, I want to delete duplicate files from xyz folder.
folder1:
/abc/file1.csv
/abc/file2.csv
/abc/file3.csv
/abc/file4.csv
folder2:
/xyz/file1.csv
/xyz/file5.csv
I want to compare both folders and remove duplicate from /xyz folder. Output should be: file5.csv
For now I am using :
find "/xyz" "/abc" "/abc" -printf '%P\n' | sort | uniq -u | -exec rm {} \;
But it failing with reason : if -exec is not a typo you can run the following command to lookup the package that contains the binary:
command-not-found -exec
-bash: -exec: command not found
-exec is an option to find, you've already exited the command find when you started the pipes.
Try xargs instead, it take all the data from stdin and appends to the program.
UNTESTED
find "/xyz" "/abc" "/abc" -printf '%P\n' | sort | uniq -u | xargs rm
Find every file in 234 and 123 directory get filename by -printf, sort them, uniq -d give list of duplications, give back path by sed, using 123 directory to delete the duplications from, and pass files to xargs rm
Command:
find ./234 ./123 -type f -printf '%P\n' | sort | uniq -d | sed 's/^/.\/123\//g' | xargs rm
sed don't needed if you are in the ./123 directory and using full path for folders in find.
Another approach: just find the files in abc and attempt to remove them from xyz:
UNTESTED
find /abc -type f -printf 'rm -f /xyz/%P' | sh
Remove Duplicate Files From Particular Directory
FileList=$(ls)
for D1 in $FileList ;do
if [[ -f $D1 ]]; then
for D2 in $FileList ;do
if [[ -f $D2 ]]; then
if [[ $D1 == $D2 ]]; then
: 'Skip Orignal File'
else
if [[ $(md5sum $D1 | cut -d'=' -f 2 | cut -d ' ' -f 1 ) == $(md5sum $D2 | cut -d'=' -f 2 | cut -d ' ' -f 1 ) ]]; then
echo "Duplicate File Found : $D2"
rm -rf $D2
fi #Detect Duplicate Using MD5
fi #Skip Orginal File
fi #D2 File available Then Next
done
fi #D1 File available Then Next
done

Print the content of all the files in the newest directory in BASH [duplicate]

Is there any sort option available in find command to get directory with least access date/time
find . -type d -printf "%A# %p\n" | sort -n | tail -n 1 | cut -d " " -f 2-
If you prefer the filename without leading path, replace %p by %f.
the below linux command displays the access and modified time along with size
stat -f
find -type d -printf '%T+ %p\n' | sort | head -1
source
find -type d -printf '%T+ %p\n' | sort
This sound like more of a job for ls:
ls -ultd *|grep ^d
The problem with using find, at least on my system (cygwin/bash), is that find accesses the dirs, so all access-times result in current time, defeating your apparent purpose.
A simple shell script will also do:
unset -v oldest
for i in "$dir"/*; do
[ "$i" -ot "$oldest" -o "$oldest" = "" ] && oldest="$i"
done
note: to find the oldest directory use "$dir"/*/ above (thanks Cyrus) and -type d below with the find command.
In bash if you need a recursive solution, then you can rewrite it as a while loop with process substitution using find
unset -v oldest
while IFS= read -r i; do
[ "$i" -ot "$oldest" -o "$oldest" = "" ] && oldest="$i"
done < <(find "$dir" -type f)

Bash script to limit a directory size by deleting files accessed last

I had previously used a simple find command to delete tar files not accessed in the last x days (in this example, 3 days):
find /PATH/TO/FILES -type f -name "*.tar" -atime +3 -exec rm {} \;
I now need to improve this script by deleting in order of access date and my bash writing skills are a bit rusty. Here's what I need it to do:
check the size of a directory /PATH/TO/FILES
if size in 1) is greater than X size, get a list of the files by access date
delete files in order until size is less than X
The benefit here is for cache and backup directories, I will only delete what I need to to keep it within a limit, whereas the simplified method might go over size limit if one day is particularly large. I'm guessing I need to use stat and a bash for loop?
I improved brunner314's example and fixed the problems in it.
Here is a working script I'm using:
#!/bin/bash
DELETEDIR="$1"
MAXSIZE="$2" # in MB
if [[ -z "$DELETEDIR" || -z "$MAXSIZE" || "$MAXSIZE" -lt 1 ]]; then
echo "usage: $0 [directory] [maxsize in megabytes]" >&2
exit 1
fi
find "$DELETEDIR" -type f -printf "%T#::%p::%s\n" \
| sort -rn \
| awk -v maxbytes="$((1024 * 1024 * $MAXSIZE))" -F "::" '
BEGIN { curSize=0; }
{
curSize += $3;
if (curSize > maxbytes) { print $2; }
}
' \
| tac | awk '{printf "%s\0",$0}' | xargs -0 -r rm
# delete empty directories
find "$DELETEDIR" -mindepth 1 -depth -type d -empty -exec rmdir "{}" \;
Here's a simple, easy to read and understand method I came up with to do this:
DIRSIZE=$(du -s /PATH/TO/FILES | awk '{print $1}')
if [ "$DIRSIZE" -gt "$SOMELIMIT" ]
then
for f in `ls -rt --time=atime /PATH/TO/FILES/*.tar`; do
FILESIZE=`stat -c "%s" $f`
FILESIZE=$(($FILESIZE/1024))
DIRSIZE=$(($DIRSIZE - $FILESIZE))
if [ "$DIRSIZE" -lt "$LIMITSIZE" ]; then
break
fi
done
fi
I didn't need to use loops, just some careful application of stat and awk. Details and explanation below, first the code:
find /PATH/TO/FILES -name '*.tar' -type f \
| sed 's/ /\\ /g' \
| xargs stat -f "%a::%z::%N" \
| sort -r \
| awk '
BEGIN{curSize=0; FS="::"}
{curSize += $2}
curSize > $X_SIZE{print $3}
'
| sed 's/ /\\ /g' \
| xargs rm
Note that this is one logical command line, but for the sake of sanity I split it up.
It starts with a find command based on the one above, without the parts that limit it to files older than 3 days. It pipes that to sed, to escape any spaces in the file names find returns, then uses xargs to run stat on all the results. The -f "%a::%z::%N" tells stat the format to use, with the time of last access in the first field, the size of the file in the second, and the name of the file in the third. I used '::' to separate the fields because it is easier to deal with spaces in the file names that way. Sort then sorts them on the first field, with -r to reverse the ordering.
Now we have a list of all the files we are interested in, in order from latest accessed to earliest accessed. Then the awk script adds up all the sizes as it goes through the list, and begins outputting them when it gets over $X_SIZE. The files that are not output this way will be the ones kept, the other file names go to sed again to escape any spaces and then to xargs, which runs rm them.

Resources