complicated find on bash - bash

I have the following task: delete old "builds" older than 30 days. And this solution works perfectly:
find $jenkins_jobs -type d -name builds -exec find {} -type d -mtime +30 \; >> $filesToBeDelete
cat $filesToBeDelete | xargs rm -rf
But later some condition were added: delete only in case when we have more than 30 builds and clean the oldest ones. So in results we should keep 30 newest build and delete rest.
Also I have found that I can use if statement in find like that:
if [ $(find bla-bla | wc -l) -gt 30 ]; then
...
fi
but I am wandering how can I delete that files.
Is it clear? For example we have in "build" folder 100 builds and all of them are older than 30 days. So I want to keep 30 new builds and delete another 70.

Pretty hacky but should be pretty robust for weird filenames
find -type d -name "builds" -mtime +30 -printf "%T# %p\0" |\
awk -vRS="\0" -vORS="\0" '{match($0,/([^ ]* )(.*)/,a);b[a[2]]=a[1];c[a[1]]=a[2]}END{x=asort(b);for(i=x-30;i>0;i--)print c[b[i]]}' |\
xargs -0 -I{} rm -r {}
I tested with echo and it seems to work but i'd make sure it's showing the right files before using rm -r.
So what it does is passes null terminated strings through so filenames are preserved.
The main limitation is that if two files were created in the same second then it will miss one as it uses an associative array.

Here is a relatively safe answer to list the dirs, if your stat is close enough to mine (cygwin/bash):
now=$(date +%s)
find $jenkins_jobs -type d -name builds -exec find {} -type d |
while read f; do stat -c'%Y %n' "$f"; done |
sort -nr |
tail -n +31 |
awk $now'-$1>2592000'|
sed 's/^[0-9]* //'
This is working with epoch time (seconds since 1970) as provided by the %s of date and the %Y of stat. The sort and tail are removing the newest 30, and the awk is removing any 30 days old or newer. (2592000 is the number of seconds in 30 days.) The final sed is just removing what stat added, leaving only the dirname.

This will list all, but 30 newest directoiries.
find -type d -name builds -exec ls -d -l --time-style="+%s" {} \;|sed "s#[^ ]\+ \w\+ \w\+ \w\+ \w\+ ##"|sort -r |sed "s#[^ ]\+ ##"|tail -n +31
after you are sure you want to remove them, you can use the | xargs rm -rf
It reads this way:
find all build dirs
list them with time from epoch
drop (sed - away) rights, user, group atc, leaving only time and name
sort by time from newest
drop those times
tail will show everything from 31. entry (so skip 30 newest)

Related

How to use the find command to find any file where the creation time and modified time are equal?

I know I can use the find command with options like -mtime and -ctime, but those expect a number to be set in the command.
In my case I don't care what the time is, I just want to find any files where the -ctime and -mtime are equal to each other. (Im on a Mac so technically its -mtime and -Btime)
Im having a harder time than I expected finding how to do this.
Edit: I’m trying to do this in macOS and the file system is APFS
As you can see here, creation time is not really stored on Unix-like systems. Some filesystems may support this feature and you can check the output of stat file command, for me the last line of this output is Birth: -. So in case you do have creation times, you could get files never modified by this:
find . -type f -print0 | xargs -0 stat -c "%n %W %Y" |
awk '$NF==$(NF-1) {$(NF-1)=$NF=""; print}'
%W will print birth time (probably 0 if not supported) and %Y the last modification time. The last awk command above prints only filenames where these times are matching.
for macOS:
find . -type f -print0 | xargs -0 stat -f "%N %B %m" |
awk '$NF==$(NF-1) {$(NF-1)=$NF=""; print}'
see also macOS stat man page
I think this is not possible with just find, but you may filter these files using external tools, e.g. shell:
find . -type f -printf '%T# %C# %f\n' | while read mtime ctime fname; do
[ "$mtime" == "$ctime" ] && echo "$fname"
done

Unix Count Multiple Folders Needed

I have a directory on unix server.
cd /home/client/files
It has multiple client folders like these below.
cd /home/client/files/ibm
cd /home/client/files/aol
cd /home/client/files/citi
All of them send us a file starting with either lower or upper case like below:
pre-ibm-03222017
PRE-aol-170322
Once we recieve the files, we process them and convert pre to pro as below:
pro-ibm-03222017
PRO-aol-170322
I want to count the files processed each day. Here is what I am looking for:
If I can just get the total count per client, that would be perfect. If not, then the total count overall.
Keep in mind it has all files as below:
cd /home/client/files/ibm
pre-ibm-03222017
pro-ibm-03222017
cd /home/client/files/aol
PRE-aol-170322
PRO-aol-170322
And I want to COUNT ONLY the PRO/pro that will either be lower or upper case. One folder can get more than 1 file per day.
I am using the below command:
find /home/client/files -type f -mtime -1 -exec ls -1 {} \;| wc -l
But it is giving me the total count of pre and pro files and also it is counting files for last 24 hours....and not the current day only.
For Example. It is currently 09:00 PM. The above command include files received yesterday between 09:00 PM and 12:00 AM as well. I don't wan't those. In other words if I run it at 01:00 AM....it should have all files for 1 hour only and not last 24 hours.
Thanks
---- Update -----
This works great for me.
touch -t 201703230000 first
touch -t 201703232359 last
find /home/client/files/ -newer first ! -newer last | grep -i pro | wc -l
Now, I was just wondering if I can pass the above as parameter.
For example, instead of using touch -t date and alias.....I want to type shortcuts and dates only to get the output. I have made the following aliases:
alias reset='touch -t `date +%m%d0000` /tmp/$$'
alias count='find /home/client/files/ -type f -newer /tmp/$$ -exec ls -1 {} \; | grep -i pro | wc -l'
This way as soon as I logon to the server, I type reset and then I type count and I get my daily number.
I was wondering if I can do something similar for any duration of days by setting date1 and date2 as aliases. If not, then perhaps a short script that would ask for parameters.
What about this?
touch -t `date +%m%d0000` /tmp/$$
find /home/client/files -type f -newer /tmp/$$ -exec ls -1 {} \; | grep -i pro | wc -l
rm /tmp/$$
Other options for finding a file created today can be found in this question:
How do I find all the files that were created today
Actually, a better way to do this is to just use this:
find /home/client/files -type f -m 0 | grep -i pro | wc -l
You can replace -m 0 with -m 5 to find files 5 days old.
For the same day issue u can use -daystart (GNU find)
The regex define a contains of /pre
find /home/client/files -regex '.*\/[pP][rR][eE].*' -type f -daystart -mtime -1 -exec ls -1 {} \;| wc -l

remove files from subfolders without the last three

I have a structure like that:
/usr/local/a/1.txt
/usr/local/a/2.txt
/usr/local/a/3.txt
/usr/local/b/4.txt
/usr/local/b/3.txt
/usr/local/c/1.txt
/usr/local/c/7.txt
/usr/local/c/6.txt
/usr/local/c/12.txt
...
I want to delete all the files *.txt in subfolders except the last three files with the greatest modification date, but here I am in current directory
ls -tr *.txt | head -n-3 |xargs rm -f
I need to combine that with the code:
find /usr/local/**/* -type f
Should I use the maxdepth option?
Thanks for helping,
aola
Added maxdepth options to find for one level, sorting files by last modification time, tail to ignore the oldest modified 3 files and xargs with -r to remove the files only if they are found.
for folder in $(find /usr/local/ -type d)
do
find $folder -maxdepth 1 -type f -name "*.txt" | xargs -r ls -1tr | tail -n+3 | xargs -r rm -f
done
Run the above command once without rm to ensure that the previous commands pick the proper files for deletion.
You've almost got the solution: use find to get the files,ls to sort them by modification date and tail to omit three most recently modified ones:
find /usr/lib -type f | xargs ls -t | tail -n +4 | xargs rm
If you would like to remove only the files at a specified depth add -mindepth 4 -maxdepth 4 to find parameters.
You can use find's -printf option, to print the modification time in front of the file name and then sort and strip the date off. This avoids using ls at all.
find /usr/local -type f -name '*.txt' -printf '%T#|%p\n' | sort -r | cut -d '|' -f 2 | head -n-3 | xargs rm -f
The other Answers using xargs ls -t can lead to incorrect results, when there are more results than xargs can put in a single ls -t command.
but for each subfolder, so when I have
/usr/local/a/1.txt
/usr/local/a/2.txt
/usr/local/a/3.txt
/usr/local/a/4.txt
/usr/local/b/4.txt
/usr/local/b/3.txt
/usr/local/c/1.txt
/usr/local/c/7.txt
/usr/local/c/6.txt
/usr/local/c/12.txt
I want to to use the code for each subfolder separately
head -n-3 |xargs rm -f
so I bet if I have it sorted by date then the files to delete:
/usr/local/a/4.txt
/usr/local/c/12.txt
I want to leave in any subfolder three newest files

Remove all files older than X days, but keep at least the Y youngest [duplicate]

This question already has answers here:
How can I delete files over (n) days old but leave (n) files regardless of age?
(2 answers)
Closed 2 years ago.
I have a script that removes DB dumps that are older than say X=21 days from a backup dir:
DB_DUMP_DIR=/var/backups/dbs
RETENTION=$((21*24*60)) # 3 weeks
find ${DB_DUMP_DIR} -type f -mmin +${RETENTION} -delete
But if for whatever reason the DB dump jobs fails to complete for a while, all dumps will eventually be thrown away. So as a safeguard i want to keep at least the youngest Y=7 dumps, even it all or some of them are older than 21 days.
I look for something that is more elegant than this spaghetti:
DB_DUMP_DIR=/var/backups/dbs
RETENTION=$((21*24*60)) # 3 weeks
KEEP=7
find ${DB_DUMP_DIR} -type f -printf '%T# %p\n' | \ # list all dumps with epoch
sort -n | \ # sort by epoch, oldest 1st
head --lines=-${KEEP} |\ # Remove youngest/bottom 7 dumps
while read date filename ; do # loop through the rest
find $filename -mmin +${RETENTION} -delete # delete if older than 21 days
done
(This snippet might have minor bugs - Ignore them. It's to illustrate what i can come up with myself, and why i don't like it)
Edit: The find option "-mtime" is one-off: "-mtime +21" means actually "at least 22 days old". That always confused me, so i use -mmin instead. Still one-off, but only a minute.
Use find to get all files that are old enough to delete, filter out the $KEEP youngest with tail, then pass the rest to xargs.
find ${DB_DUMP_DIR} -type f -printf '%T# %p\n' -mmin +$RETENTION |
sort -nr | tail -n +$KEEP |
xargs -r echo
Replace echo with rm if the reported list of files is the list you want to remove.
(I assume none of the dump files have newlines in their names.)
I'm opening a second answer because I just I have a different solution - one using awk: just add the time to the 21 day (in seconds) period, minus the current time and remove the negative ones! (after sorting and removing the newest 7 from the list):
DB_DUMP_DIR=/var/backups/dbs
RETENTION=21*24*60*60 # 3 weeks
CURR_TIME=`date +%s`
find ${DB_DUMP_DIR} -type f -printf '%T# %p\n' | \
awk '{ print int($1) -'${CURR_TIME}' + '${RETENTION}' ":" $2}' | \
sort -n | head -n -7 | grep '^-' | cut -d ':' -f 2- | xargs rm -rf
None of these answers quite worked for me, so I adapted chepner's answer and came to this, which simply retains the last $KEEP backups.
find ${DB_DUMP_DIR} -printf '%T# %p\n' | # print entries with creation time
sort -n | # sort in date-ascending order
head -n -$KEEP | # remove the $KEEP most recent entries
awk '{ print $2 }' | # select the file paths
xargs -r rm # remove the file paths
I believe chepner's code retains the $KEEP oldest, rather than the youngest.
You can use -mtime instead of -mmin which means you don't have to calculate the number of minutes in a day:
find $DB_DUMP_DIR -type f -mtime +21
Instead of deleting them, you could use stat command to sort the files in order:
find $DB_DUMP_DIR -type f -mtime +21 | while read file
do
stat -f "%-10m %40N" $file
done | sort | awk 'NR > 7 {print $2}'
This will list all files older than 21 days, but not the seven youngest that are older than 21 days.
From there, you could feed this into xargs to do the remove:
find $DB_DUMP_DIR -type f -mtime +21 | while read file
do
stat -f "%-10m %40N" $file
done | sort | awk 'NR > 7 {print $2]' | xargs rm
Of course, this is all assuming that you don't have spaces in your file names. If you do, you'll have to take a slightly different tack.
This will also keep the seven youngest files over 21 days old. You might have files younger than that, and don't want to really keep those. However, you could simply run the same sequence again (except remove the -mtime parameter:
find $DB_DUMP_DIR -type f | while read file
do
stat -f "%-10m %40N" $file
done | sort | awk 'NR > 7 {print $2} | xargs rm
You need to look at your stat command to see what the options are for the format. This varies from system to system. The one I used is for OS X. Linux is different.
Let's take a slightly different approach. I haven't thoroughly tested this, but:
If all of the files are in the same directory, and none of the file names have whitespace in them:
ls -t | awk 'NR > 7 {print $0}'
Will print out all of the files except for the seven youngest files. Maybe we can go with that?
current_seconds=$(date +%S) # Seconds since the epoch
((days = 60 * 60 * 24 * 21)) # Number of seconds in 21 days
((oldest_allowed = $current_seconds - $days)) # Oldest allowed file
ls -t | awk 'NR > 7 {print $0}' | stat -f "%Dm %N" $file | while date file
do
[ $date < $oldest_allowed ] || rm $file
done
The ls ... | awk will shave off the seven youngest. After that, we can take stat to get the name of the file and the date. Since the date is seconds after the epoch, we had to calculate what 21 days prior to the current time would be in seconds before the epoch.
After that, it's pretty simple. We look at the date of the file. If it's older than 21 days before the epoch (i.e., it's timestamp is lower) we can delete it.
As I said, I haven't thoroughly tested this, but this will delete all files over 21 days, and only files over 21 days, but always keep the seven youngest.
What I ended up using is:
always keep last N items
then for the rest, if the file is older than X days, delete it
for f in $(ls -1t | tail -n +31); do
if [[ $(find "$f" -mtime +30 -print) ]]; then
echo "REMOVING old backup: $f"
rm $f
fi
done
explanation:
ls, sort by time, skip first 30 items: $(ls -1t | tail -n +31)
test if find can find the file being older than 30 days: if [[ $(find "$f" -mtime +30 -print) ]]
You could do the loop yourself:
t21=$(date -d "21 days ago" +%s)
cd "$DB_DUMP_DIR"
for f in *; do
if (( $(stat -c %Y "$f") <= $t21 )); then
echo rm "$f"
fi
done
I'm assuming you have GNU date
Here is a BASH function that should do the trick. I couldn't avoid two invocations of find easily, but other than that, it was a relative success:
# A "safe" function for removing backups older than REMOVE_AGE + 1 day(s), always keeping at least the ALWAYS_KEEP youngest
remove_old_backups() {
local file_prefix="${backup_file_prefix:-$1}"
local temp=$(( REMOVE_AGE+1 )) # for inverting the mtime argument: it's quirky ;)
# We consider backups made on the same day to be one (commonly these are temporary backups in manual intervention scenarios)
local keeping_n=`/usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime -"$temp" -printf '%Td-%Tm-%TY\n' | sort -d | uniq | wc -l`
local extra_keep=$(( $ALWAYS_KEEP-$keeping_n ))
/usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime +$REMOVE_AGE -printf '%T# %p\n' | sort -n | head -n -$extra_keep | cut -d ' ' -f2 | xargs -r rm
}
It takes a backup_file_prefix env variable or it can be passed as the first argument and expects enviroment variables ALWAYS_KEEP (minimum number of files to keep) and REMOVE_AGE (num days to pass to -mtime). It expects a gz or tgz extension. There are a few other assumptions as you can see in the comments, mostly in the name of safety.
Thanks to ireardon and his answer (which doesn't quite answer the question) for the inspiration!
Happy safe backup management :)
From the solutions given in the other solutions, I've experimented and found many bugs or situations that were not wanted.
Here is the solution I finally came up with :
# Sample variable values
BACKUP_PATH='/data/backup'
DUMP_PATTERN='dump_*.tar.gz'
NB_RETENTION_DAYS=10
NB_KEEP=2 # keep at least the 2 most recent files in all cases
find ${BACKUP_PATH} -name ${DUMP_PATTERN} \
-mtime +${NB_RETENTION_DAYS} > /tmp/obsolete_files
find ${BACKUP_PATH} -name ${DUMP_PATTERN} \
-printf '%T# %p\n' | \
sort -n | \
tail -n ${NB_KEEP} | \
awk '{ print $2 }' > /tmp/files_to_keep
grep -F -f /tmp/files_to_keep -v /tmp/obsolete_files > /tmp/files_to_delete
cat /tmp/files_to_delete | xargs -r rm
The ideas are :
Most of the time, I just want to keep files that are not aged more than NB_RETENTION_DAYS.
However, shit happens, and when for some reason there are no recent files anymore (backup scripts are broken), I don't want to remove the NB_KEEP more recent ones, for security (NB_KEEP should be at least 1).
I my case, I have 2 backups a day, and set NB_RETENTION_DAYS to 10 (thus, I normally have 20 files in normal situation)
One could think that I would thus set NB_KEEP=20, but in fact, I chose NB_KEEP=2, and that's why :
Let's imagine my backup scripts are broken, and I don't have backup for a month. I really don't care having my 20 latest files that are more than 30 days old. Having at least one is what I want.
However, being able to easily identify that there is a problem is very important (obviously my monitoring system is really blind, but that's another point). And having my backup folder having 10 times less files than usual is maybe something that could ring a bell...

bash shell script not working as intended using cmp with output redirection

I am trying to write a bash script that remove duplicate files from a folder, keeping only one copy.
The script is the following:
#!/bin/sh
for f1 in `find ./ -name "*.txt"`
do
if test -f $f1
then
for f2 in `find ./ -name "*.txt"`
do
if [ -f $f2 ] && [ "$f1" != "$f2" ]
then
# if cmp $f1 $f2 &> /dev/null # DOES NOT WORK
if cmp $f1 $f2
then
rm $f2
echo "$f2 purged"
fi
fi
done
fi
done
I want to redirect the output and stderr to /dev/null to avoid printing them to screen.. But using the commented statement this script does not work as intended and removes all files but the first..
I'll give more informations if needed.
Thanks
Few comments:
First, the:
for f1 in `find ./ -name "*.txt"`
do
if test -f $f1
then
is the same as (find only plain files with the txt extension)
for f1 in `find ./ -type f -name "*.txt"`
Better syntax (bash only) is
for f1 in $(find ./ -type f -name "*.txt")
and finally the whole is wrong, because if the filename contains a space, the f1 variable will not get the full path name. So instead the for do:
find ./ -type f -name "*.txt" -print | while read -r f1
and as #Sir Athos pointed out, the filename can contain \n so the best is to use
find . -type f -name "*.txt" -print0 | while IFS= read -r -d '' f1
Second:
Use "$f1" instead of $f1 - again, because the $f1 can contain space.
Third:
doing N*N comparisons is not very effective. You should make a checksum (md5 or better sha256) for every txt file. When the checksum is identical - the files are dups.
If you don't trust checksums, simply compare only files what has identical checksums. Files with different checksum are SURE not duplicates. ;)
Making checksums are slow to, so you should 1st compare ony files with the same size. Different sized files are not duplicates...
You can skip empty txt files - they are duplicates all :).
so the final command can be:
find -not -empty -type f -name \*.txt -printf "%s\n" | sort -rn | uniq -d |\
xargs -I% -n1 find -type f -name \*.txt -size %c -print0 | xargs -0 md5sum |\
sort | uniq -w32 --all-repeated=separate
commented:
#find all non-empty file with the txt extension and print their size (in bytes)
find . -not -empty -type f -name \*.txt -printf "%s\n" |\
#sort the sizes numerically, and keep only duplicated sizes
sort -rn | uniq -d |\
#for each sizes (what are duplicated) find all files with the given size and print their name (path)
xargs -I% -n1 find . -type f -name \*.txt -size %c -print0 |\
#make an md5 checksum for them
xargs -0 md5sum |\
#sort the checksums and keep duplicated files separated with an empty line
sort | uniq -w32 --all-repeated=separate
The output now, you can simply edit the output file and decide what want remove and what file want keep.
&> is bash syntax, you'll need to change the shebang line (first line) to #!/bin/bash (or the appropriate path to bash.
Or if you're really using the Bourne Shell (/bin/sh), then you have to use old-style redirection, i.e.
cmp ... >/dev/null 2>&1
Also, I think the &> was only introduced in bash 4, so if you're using bash, 3.X you'll still need the old-style redirections.
IHTH
Credit to #kobame for this answer: this is really a comment but for the formatting.
You don't need to call find twice, print out the size and the filename in the find command
find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
# find the files that have duplicate sizes
sort -n | uniq -Dw 8 |
# strip off the size and get the md5 sum
cut -c 10- | xargs md5sum
An example
$ cat a.txt
this is file a
$ cat b.txt
this is file b
$ cat c.txt
different contents
$ cp a.txt d.txt
$ cp b.txt e.txt
$ find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
sort -n | uniq -Dw 8 | cut -c 10- | xargs md5sum
76fd4c1589ef708d9203f3cf09cfd032 ./a.txt
e2d75fd6a1080efb6230d0608b1f9014 ./b.txt
76fd4c1589ef708d9203f3cf09cfd032 ./d.txt
e2d75fd6a1080efb6230d0608b1f9014 ./e.txt
To keep one and delete the rest, I would pipe the output into:
... | awk '++seen[$1] > 1 {print $2}' | xargs echo rm
rm ./d.txt ./e.txt
Remove the echo if your testing is satisfactory.
Like many complex pipelines, filenames containing newlines will break it.
All nice answers, so only one short suggestion: you can install and use the
fdupes -r .
from the man:
Searches the given path for duplicate files. Such files are found by
comparing file sizes and MD5 signatures, followed by a byte-by-byte
comparison.
Added by #Francesco
fdupes -rf . | xargs rm -f
for remove dupes. (the -f in fdupes omit the 1st occurence the file, so list only dupes)

Resources