Backup script: How to keep the last N entries? - bash

For a backup script, I need to clean old backups. How can I keep the last N backups and delete the rest?
A backup is either a single folder or a single file and the script will either keep all backups in folders or files (no mixing).
If possible, I'd like to avoid parsing of the output of ls. Even though all the entries in the backup folder should have been created by the backup script and there should be no funny characters in the entry names, a hacker might be able to create new entries in there.

This should do it (untested!):
#!/usr/bin/env bash
set -o errexit -o noclobber -o nounset -o pipefail
i=0
max=7 # Could be anything you want
while IFS= read -r -d '' -u 9
do
let ++i
if [ "$i" -gt "$max" ]
then
rm -- "$REPLY"
fi
done 9< <(find /var/backup -type f -maxdepth 1 -regex '.*/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\.tar\.gz' -print0 | sort -rz)
Explained from the outside in:
Ensure that the script stops at any common errors.
Find all files in /var/backup (and not subdirectories) matching a YYYY-MM-DD.tar.gz format.
Reverse sort these, so the latest are listed first.
Send these to file descriptor 9. This avoids any problems with cat, ssh or other programs which read standard input by default.
Read files one by one from FD 9, separated by NUL.
Count files until you get past your given max.
Nuke the rest from orbit.

Related

How to recursively find & replace whole files with bash?

I have hundreds of files that I need to recursively replace as the files are currently stored like so:
/2019/01/
file1.pdf
file2.pdf
/2019/02
file3.pdf
file4.pdf
etc
I then have all of the updated files in another directory like so:
/new-files
file1.pdf
file2.pdf
file3.pdf
file4.pdf
Could someone please tell me the best way of doing this with a bash script? I'd basically like to read the new-files directory and then replace any matching file names in the other folders.
Thanks in advance for any help!
Assuming that the 'new-files' directory and all the directory trees containing PDF files are under the current directory, try this Shellcheck-clean Bash code:
#! /bin/bash -p
find . -path ./new-files -prune -o -type f -name '*.pdf' -print0 \
| while IFS= read -r -d '' pdfpath; do
pdfname=${pdfpath##*/}
new_pdfpath=new-files/$pdfname
if [[ -f $new_pdfpath ]]; then
printf "Replace '%s' with '%s'\n" "$pdfpath" "$new_pdfpath" >&2
# cp -- "$new_pdfpath" "$pdfpath"
fi
done
The -path ./new-files -prune in the find command stops the 'new-files' directory from being searched.
The -o in the find command causes the next test and actions to be tried after checking for 'new-files'.
See BashFAQ/001 (How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?) for an explanation of the use of the -print0 option to find and the while IFS= read -r -d '' .... In short, the code can handle arbitrary file paths, including ones with whitespace and newline characters in them.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${pdfpath##*/}.
It's not clear to me if you want to copy or move the new file to replace the old file, or do something else. Run the code as it is to check if it is identifying the correct replacements to be done. If you are happy with it, uncomment the cp line, and modify it to do something different if that is what you want.
The -- in the cp command protects against arguments beginning with dash characters being interpreted as options. It's unnecessary in this case, but I always use it when arguments begin with variable (or other) expansions so the code will remain safe if it is used in other contexts.
I think this calls for a bash array.
#!/usr/bin/env bash
# Make an associative array
declare -A files=()
# Populate array as $files[file.pdf]="/path/to/file.pdf"
for f in 20*/*/*.pdf; do
files[${f##*/}]="$f"
done
# Step through files and replace
for f in new-files/*.pdf; do
if [[ ! -e "${files[${f##*/}]}" ]]; then
echo "ERROR: missing $f" >&2
continue
fi
mv -v "$f" "${files[${f##*/}]}"
done
Note that associative arrays require bash version 4 or above. If you're using the native bash on a Mac, this won't work as-is.
Note also that if you remove continue in the final lines, then the mv command will NOT safely move files that do not exist in the date hash directories, since no target is known.
If you wanted further protection you might use test -nt or friends to confirm that an update is happening in the right direction.

About: extracting *.gz files and move a original file to other folder

I am almost new on shell script but don't know some commands.
I am trying to write below shell script , please give some direction.
1. Read *.gz files from specific directory
2. Extract it to other folder
3. Move a original file to another folder.
i can do it three separate shell scripts but i want it include one shell script. Then this script will be cronjob and will run every 5 minutes.
i was trying to start like below but somehow i am bit confused how to get filelist. I can do here another script but want to include in one script."
#!/bin/bash
while IFS= read file; do
gzip -c "$file" > "zipdir/$(basename "$file").gz"
done < filelist
-----------------------------------------
PS: Files are created in every 5 minutes.
There are several ways to implement what you're looking for (I would consider notify). Anyhow... this is a very simple implementation:
$ source=~/tmp/source # directory where .gz files will be created
$ target=~/tmp/target # target directory for uncompressed files
$ archive=~/tmp/archive # archive dir for .gz files
$ shopt -s nullglob # avoid retiring unexpanded paths
$ for gz in ${source}/*.gz ; do gzip -dc "$gz" > ${target}/$(basename "$gz" .gz) ; mv "$gz" ${archive}/ ; done
$ shopt -u nullglob # reset nullglob
If you know for sure "source" directory will always contain .gz files you can avoid shopt.
Another solution (not requiring shopt) is this:
find ${source} -name '*.gz' -print0 | while read -d '' -r gz; do
gzip -dc "$gz" > ${target}/$(basename "$gz" .gz)
mv "$gz" ${archive}/
done
The first line looks a little bit complicated because it manages source file names containing spaces...

Bash: remove first line of file, create new file with prefix in new dir

I have a bunch of files in a directory, old_dir. I want to:
remove the first line of each file (e.g. using "sed '1d'")
save the output as a new file with a prefix, new_, added to the original filename (e.g. using "{,new_}old_filename")
add these files to a different directory, new_dir, overwriting any conflicting filenames
How do I do this with a Bash script? Having trouble putting the pieces together.
#!/usr/bin/env bash
old_dir="/path/to/somewhere"
new_dir="/path/to/somewhere_else"
prefix="new_"
if [ ! -d "$old_dir" -o ! -d "$new_dir" ]; then
echo "ERROR: We're missing a directory. Aborting." >&2
exit 1
fi
for file in "$old_dir"/*; do
tail +2 "$file" > "$new_dir"/"${prefix}${file##*/}"
done
The important parts of this are:
The for loop, which allows you do to work on each $file.
tail +2 which is notation which should remove the first line of the file. If your tail does not support this, you can get the same result with sed -e 1d.
${file##*/} which is functionally equivalent to basename "$file" but without spawning a child.
Really, none of this is bash-specific. You could run this in /bin/sh in most operating systems.
Note that the code above is intended to explain a process. Once you understand that process, you may be able to come up with faster, shorter strategies for achieving the same thing. For example:
find "$old_dir" -depth 1 -type f -exec sh -c "tail +2 \"{}\" > \"$new_dir/$prefix\$(basename {})\"" \;
Note: I haven't tested this. If you plan to use either of these solutions, do make sure you understand them before you try, so that you don't clobber your data by accident.

Shell Script to delete specific image-files recursively

I do have a third-party program, which uploads files to a webserver. These files are images, in different folders and with different names. Those files get references into a database. The program imports new images and upload those to those folders. If there is an existing file, it just takes the name and add a special counter, create a new reference in the database and the old one will be removed. But instead of removing the file as well, it keeps a copy.
Lets say, we have a image-file name "109101.jpg".
There is a new version of the file and it will be uploaded with the filename: "109101_1.jpg". This goes further till "109101_103.jpg" for example.
Now, all the 103 files before this one are outdated and could be deleted.
Due to the fact, that the program is not editable and third-party, I am not able to change that behavior. Instead, I need a Shell script, which walks through those folders and deletes all the images before the latest one. So only "109101_103.jpg" will survive and all the others before this number will be deleted.
And as a side effect, there are also images, with a double underscored name (only these, no tripple ones or so).
For example: "109013_35_1.jpg" is the original one, the next one is "109013_35_1_1.jpg" and now its at "109013_35_1_24.jpg". So only "109013_35_1_24.jpg" has to survive.
Right now I am not even having an idea, how to solve this problem. Any ideas?
Here's a one line pipeline, because I felt like it. Shown with newlines inserted because I'm not evil.
for F in $(find . -iname '*.jpg' -exec basename {} .jpg \;
| sed -r -e 's/^([^_]+|[^_]+_[^_]+_[^_]+)_[0-9]+$/\1/'
| sort -u); do
find -regex ".*${F}_[0-9]*.jpg"
| sort -t _ -k 2 -n | sort -n -t _ -k 4 -s | head -n -1;
done
The following script deletes the files in a given directory:
#! /bin/bash
cd $1
shopt -s extglob # Turn on extended patterns.
shopt -s nullglob # Non matched pattern expands to null.
delete=()
for file in +([^_])_+([0-9]).jpg \
+([^_])_+([0-9])_+([0-9])_+([0-9]).jpg ; do # Only loop over non original files.
[[ $file ]] || continue # No files in the directory.
base=${file%_*} # Delete everything after the last _.
num=${file##*_} # Delete everything before the last _.
num=${num%.jpg} # Delete the extension.
[[ -f $base.jpg ]] && rm $base.jpg # Delete the original file.
[[ -f "$base"_$((num+1)).jpg ]] && delete+=($file) # The file itself is scheduled for deletion.
done
(( ${#delete[#]} )) && rm "${delete[#]}"
The numbered files are not deleted immediately, because that could remove a "following" file for another file. They are just remembered in an array and deleted at the end.
To apply the script recursively, you can run
find /top/directory -type d -exec script.sh {} \;

Script for deleting old files in directory except latest N

I want to write a script that will clean backup directory from old files, leaving only latest N there. I also want to do it without using ls.
Yesterday I ended up with the following piece of code:
counter=0
while IFS=/ read -rd '' time file; do
((counter++ < ${KEEP_NUMBER:?})) && continue
rm -f "$file"
done < <(find . -maxdepth 1 -type f -printf "%T#/%p\0" | sort -znr)
However, this isn't portable at all, as there is find -printf been used, leaving behind all the boxes without GNU extensions.
Are there any better ways to accomplish such a task?
It can be done relatively easily provided you have
A shell that supports arrays (as bash does)
Access to test (which can compare the age of 2 files)
Here's my attempt (it probably doesn't cope with spaces in filenames, but that could be fixed easily enough
#!/bin/bash
declare -a files
for file in *
do
files[${#files[#]}]=$file
done
# Bubble sort the files from newest to oldest
declare -i i j count=${#files[#]}
while true
do
declare -i sorted=1
for ((i=1;i<count;i++))
do
let j=i-1
if [ ${files[$j]} -ot ${files[$i]} ]
then
sorted=0
f=${files[$j]}
files[$j]=${files[$i]}
files[$i]=$f
fi
done
[ $sorted -eq 1 ] && break
done
# Delete everything except the first 5 files
for ((i=5;i<count;i++))
do
rm "${files[$i]}"
done
Well, there's POSIX -newer primary to the find. It - with some minor coding - can be used to separate files by their modification time. I would have done something like taking the first file, finding the ones newer. There i have two sets - newer and older. If there are more newer files than N, i repeat the process for the newer set only, otherwise i remember the newer set, subtract this number from N and repeat the process for the older set.
Or you can use pax to create an archive, and then use -o listopt to extract modification time in the format you want, there you have all the information:)
http://www.google.fr/search?q=zfs
else ...
man rsnapshoot
else ...
Stop trying to reinvent a backup suite
else ...
man find | grep -- -ctime
man find | grep -- -delete

Resources