Removing old folders in bash backup script - bash

I have a bash script that rsyncs files onto my NAS to the directory below:
mkdir /backup/folder_`date +%F`
How would I go about writing a cleanup script that removes directories older than 7 days old based upon the date in directories name?

#!/bin/bash
shopt -s extglob
OLD=$(exec date -d "now - 7 days" '+%s')
cd /backup || exit 1 ## If necessary.
while read DIR; do
if read DATE < <(exec date -d "${DIR#*folder_}" '+%s') && [[ $DATE == +([[:digit:]]) && DATE -lt OLD ]]; then
echo "Removing $DIR." ## Just an example message. Or we could just exclude this and add -v option to rm.
rm -ir "$DIR" ## Change to -fr to skip confirmation.
fi
done < <(exec find -maxdepth 1 -type d -name 'folder_*')
exit 0
We could actually use more careful approaches like -rd $'\0', -print0 and IFS= but I don't think they are really necessary this time.

Create a list of folders with the pattern you want to remove, remove the folders you want to keep from the list, delete everything else.

How about a simple find:
find /backup -name 'folder_*' -type d -ctime 7 -exec rm -rf {} \;

Related

bash move 500 directories at a time to subdirectory from a total of 160,000 directories

I needed to move a large s3 bucket to a local file store for a variety of reasons, and the files were stored as 160,000 directories with subdirectories.
As this is just far too many folders to look at with something like a gui FTP interface, I'd like to move the 160,000 root directories into, say, 320 directories - 500 directories in each.
I'm a newbie at bash scripting, and I just wrote this up, but I'm scared I'm going to mangle the whole thing and have to redo the transfer. I tested with [[ "$i" -ge 3 ]]; and some directories with subdirectories and it looked like it worked okay, but I'm quite nervous. Do not want to retransfer all this data.
i=0;
j=0;
for file in *; do
if [[ -d "$file" && ! -L "$file" ]];
then
((i++))
echo "directory $file is being written to assets_$j";
mv $file ./assets_$j/;
if [[ "$i" -ge 499 ]];
then
((j++));
((i=0));
fi
fi;
done
Thanks for the help!
find all the directories in the current folder.
Read a count of the folders.
Exec mv for each chunk
find . -mindepth 1 -maxdepth 1 -type d |
while IFS= readarray -n10 -t files && ((${#files[#]})); do
dest="./assets_$((j++))/"
echo mkdir -v -p "$dest"
echo mv -v "${files[#]}" "$dest";
done
On the condition that assets_1, assets_2, etc. do not exist in the working directory yet:
dirs=(./*/)
for (( i=0,j=1; i<${#dirs[#]}; i+=500,j++ )); do
echo mkdir ./assets_$j/
echo mv "${dirs[#]:i:500}" ./assets_$j/
done
If you're happy with the output, remove echos.
A possible way, but you have no control on the counter, is:
find . -type d -mindepth 1 -maxdepth 1 -print0 \
| xargs -0 -n 500 sh -c 'echo mkdir -v ./assets_$$ && echo mv -v "$#" ./assets_$$' _
This gets the counter of assets from the PID which only recycles when the wrap-around is reached (Linux PID recycling)
The order which findreturns is slight different then the glob * (Find command default sorting order)
If you want to have the sort order alphabetically, you can add a simple sort:
find . -type d -mindepth 1 -maxdepth 1 -print0 | sort -z \
| xargs -0 -n 500 sh -c 'echo mkdir -v ./assets_$$ && echo mv -v "$#" ./assets_$$' _
note: remove the echo if you are pleased with the output

shell script find file older than x days and delete them if they were not listet in log files

I am a newbie to scripting and i need a little shell script doing the following:
find all .txt files they are older than x days
delete them if they were not listed in logfiles (textfiles and gzipped textfiles)
I know the basics about find -mtime, grep, zgrep, etc., but it is very tricky for me to get this in a working script.
I tried something like this:
#! /bin/sh
for file in $(find /test/ -iname '*.txt')
do
echo "$file" ls -l "$file"
echo $(grep $file /test/log/log1)
done
IFS='
'
for i in `find /test/ -ctime +10`; do
grep -q $i log || echo $i # replace echo with rm if satisfied
done
Sets Internal field separator for the cases with spaces in
filenames.
Finds all files older than 10 days in /test/ folder.
Greps path in log file.
I would use something like this:
#!/bin/bash
# $1 is the number of days
log_files=$(ls /var/log)
files=$(find -iname "*.rb" -mtime -$1)
for f in $files; do
found="false"
base=$(basename $f)
for logfile in $log_files; do
res=$(zgrep $base $logfile)
if [ "x$res" != "x" ]; then
found="true"
rm $f
fi
if [ "$found" = "true" ]; then
break
fi
done
done
and call it:
#> ./find_and_delete.sh 10
You could create a small bash script that checks whether a file is in the logs or not:
$ cat ~/bin/checker.sh
#!/usr/bin/env bash
n=$(basename $1)
grep -q $n $2
$ chmod +x ~/bin/checker.sh
And then use it in a single find command:
$ find . -type f ! -exec ./checker.sh {} log \; -exec echo {} \;
This should print only the files to be deleted. Once convinced that it does what you want:
$ find . -type f ! -exec ./checker.sh {} log \; -exec rm {} \;
deletes them.

how to get basename in -exec of find?

I cannot get the following piece of script (which is part of a larger backup script) to work correctly:
BACKUPDIR=/BACKUP/db01/physical/incremental # Backups base directory
FULLBACKUPDIR=$BACKUPDIR/full # Full backups directory
INCRBACKUPDIR=$BACKUPDIR/incr # Incremental backups directory
KEEP=5 # Number of full backups (and its incrementals) to keep
...
FIRST_DELETE=`expr $KEEP + 1` # add one to the number of backups to keep, this will be the first deleted
FILE0=`ls -ltr $FULLBACKUPDIR | awk '{print $9}' | tail -$FIRST_DELETE | head -1` # search for the first backup to be deleted
...
find $FULLBACKUPDIR -maxdepth 1 -type d ! -newer $FULLBACKUPDIR/$FILE0 -execdir echo "removing: "$FULLBACKUPDIR/$(basename {}) \; -execdir bash -c 'rm -rf $FULLBACKUPDIR/$(basename {})' \; -execdir echo "removing: "$INCRBACKUPDIR/$(basename {}) \; -execdir bash -c 'rm -rf $INCRBACKUPDIR/$(basename {})' \;
So the find works correctly which on its own will output something like this:
/BACKUPS/db01/physical/incremental/full/2013-08-12_17-51-28
/BACKUPS/db01/physical/incremental/full/2013-08-12_17-51-28
/BACKUPS/db01/physical/incremental/full/2013-08-12_17-25-07
What I want is the -exec to echo a line showing what is being removed and then remove the folder from both directories.
I've tried various ways to get just the basename but nothing seems to be working. I get this:
removing: /BACKUPS/mysql/physical/incremental/full/"/BACKUPS/mysql/physical/incremental/full/2013-08-12_17-51-28"
removing: /BACKUPS/mysql/physical/incremental/incr/"/BACKUPS/mysql/physical/incremental/full/2013-08-12_17-51-28"
removing: /BACKUPS/mysql/physical/incremental/full/"/BACKUPS/mysql/physical/incremental/full/2013-08-12_17-25-07"
And of course the folders arn't deleted because they don't exist, just fail silently because of the -f option. If I remove the -f I get the 'cannot be found' error on each rm.
How do I accomplish this? Because backups and parts of backups may be stored across different storage systems I really need the ability to just get the folder name for use in any known path.
the $(basename {}) is run first, making removing: "$INCRBACKUPDIR/$(basename {}) to removing: "$INCRBACKUPDIR/{} then the replacement is done of {}.
a way around it may be to pipe it to bash:
-exec echo "echo \"removing: \\\"$INCRBACKUPDIR/\$(basename {})\\\"\" | bash" \;
Lots of broken here.
All caps variables are by convention env vars and should not be used in scripts.
Using legacy backticks instead of $()
Parsing the output of ls (!)
Parsing the output of ls -l (!!!)
Expanding variables known to contain paths without full quotes.
All you absolutely need in order to improve this is to -exec bash properly, e.g.
-execdir bash -c 'filepath="$1" ; base=$(basename "$filepath") ; echo use $filepath and $base here' -- {} \;
But how about this instead:
#!/usr/bin/env bash
backup_base=/BACKUP/db01/physical/incremental
full_backup="$backup_base"/full
incremental_backup="$backup_base"/incr
keep=5
rm=echo
let n=0
while IFS= read -r -d $'\0' line ; do
file="${line#* }"
if [[ $n -lt $keep ]] ; then
let n=n+1
continue
fi
base=$(basename "$file")
echo "removing: $full_backup/$base"
"$rm" -rf -- "$full_backup"/"$base"
echo "removing: $incremental_backup/$base"
"$rm" -rf -- "$incremental_backup"/"$base"
done < <(find "$full_backup" -maxdepth 1 -printf '%T#.%p\0' 2>/dev/null | sort -z -r -n -t. -k1,2)
Iterate over files and directories immediately under the backup dir and skip the first 5 newest. Delete from the full and incremental dirs files matching the names of the rest.
This is an essentially safe version, except of course for timing attacks.
I have defined rm as being echo to avoid accidental deletes; swap it back to rm for actual deletion once you're sure it's correct.

bash: After testing mtime by following a symlink, I need to delete the symlink itself and not the target file

Right now I have a script that creates symlinks to anything newer than 2 weeks in the public folders into another folder. However, I can't find any good way of getting rid of the stale symlinks individually as opposed to wiping everything out. I need to test the symlink target mtime and if it's older than 2 weeks, delete the symlink itself and not the linked file.
#!/bin/bash
source="/media/public/"
dest="/pool/new/"
if [[ ! -d $dest ]]; then
exit 1
fi
if [ `hostname` == "punk" ] && [ `uname -o` == "GNU/Linux" ]; then
#rm -f $dest/*
find -L $dest -mtime 14 -type f -exec echo "delete symlink: " {} \;
find -L $source -mtime -14 -type f -exec ln -s -t $dest {} \;
fi
Right now the first find command will delete the target as opposed to the symlink.
Use simply
-exec rm {} +
rm will delete the link itself, not the target.

How do I copy directory structure containing placeholders

I have the situation, where a template directory - containing files and links (!) - needs to be copied recursively to a destination directory, preserving all attributes. The template directory contains any number of placeholders (__NOTATION__), that need to be renamed to certain values.
For example template looks like this:
./template/__PLACEHOLDER__/name/__PLACEHOLDER__/prog/prefix___FILENAME___blah.txt
Destination becomes like this:
./destination/project1/name/project1/prog/prefix_customer_blah.txt
What I tried so far is this:
# first create dest directory structure
while read line; do
dest="$(echo "$line" | sed -e 's#__PLACEHOLDER__#project1#g' -e 's#__FILENAME__#customer#g' -e 's#template#destination#')"
if ! [ -d "$dest" ]; then
mkdir -p "$dest"
fi
done < <(find ./template -type d)
# now copy files
while read line; do
dest="$(echo "$line" | sed -e 's#__PLACEHOLDER__#project1#g' -e 's#__FILENAME__#customer#g' -e 's#template#destination#')"
cp -a "$line" "$dest"
done < <(find ./template -type f)
However, I realized that if I want to take care about permissions and links, this is going to be endless and very complicated. Is there a better way to replace __PLACEHOLDER__ with "value", maybe using cp, find or rsync?
I suspect that your script will already do what you want, if only you replace
find ./template -type f
with
find ./template ! -type d
Otherwise, the obvious solution is to use cp -a to make an "archive" copy of the template, complete with all links, permissions, etc, and then rename the placeholders in the copy.
cp -a ./template ./destination
while read path; do
dir=`dirname "$path"`
file=`basename "$path"`
mv -v "$path" "$dir/${file//__PLACEHOLDER__/project1}"
done < <(`find ./destination -depth -name '*__PLACEHOLDER__*'`)
Note that you'll want to use -depth or else renaming files inside renamed directories will break.
If it's very important to you that the directory tree is created with the names already changed (i.e. you must never see placeholders in the destination), then I'd recommend simply using an intermediate location.
First copy with rsync, preserving all the properties and links etc.
Then change the placeholder strings in the destination filenames:
#!/bin/bash
TEMPL="$PWD/template" # somewhere else
DEST="$PWD/dest" # wherever it is
mkdir "$DEST"
(cd "$TEMPL"; rsync -Hra . "$DEST") #
MyRen=$(mktemp)
trap "rm -f $MyRen" 0 1 2 3 13 15
cat >$MyRen <<'EOF'
#!/bin/bash
fn="$1"
newfn="$(echo "$fn" | sed -e 's#__PLACEHOLDER__#project1#g' -e s#__FILENAME__#customer#g' -e 's#template#destination#')"
test "$fn" != "$newfn" && mv "$fn" "$newfn"
EOF
chmod +x $MyRen
find "$DEST" -depth -execdir $MyRen {} \;

Resources