Script for deleting old files in directory except latest N - bash

I want to write a script that will clean backup directory from old files, leaving only latest N there. I also want to do it without using ls.
Yesterday I ended up with the following piece of code:
counter=0
while IFS=/ read -rd '' time file; do
((counter++ < ${KEEP_NUMBER:?})) && continue
rm -f "$file"
done < <(find . -maxdepth 1 -type f -printf "%T#/%p\0" | sort -znr)
However, this isn't portable at all, as there is find -printf been used, leaving behind all the boxes without GNU extensions.
Are there any better ways to accomplish such a task?

It can be done relatively easily provided you have
A shell that supports arrays (as bash does)
Access to test (which can compare the age of 2 files)
Here's my attempt (it probably doesn't cope with spaces in filenames, but that could be fixed easily enough
#!/bin/bash
declare -a files
for file in *
do
files[${#files[#]}]=$file
done
# Bubble sort the files from newest to oldest
declare -i i j count=${#files[#]}
while true
do
declare -i sorted=1
for ((i=1;i<count;i++))
do
let j=i-1
if [ ${files[$j]} -ot ${files[$i]} ]
then
sorted=0
f=${files[$j]}
files[$j]=${files[$i]}
files[$i]=$f
fi
done
[ $sorted -eq 1 ] && break
done
# Delete everything except the first 5 files
for ((i=5;i<count;i++))
do
rm "${files[$i]}"
done

Well, there's POSIX -newer primary to the find. It - with some minor coding - can be used to separate files by their modification time. I would have done something like taking the first file, finding the ones newer. There i have two sets - newer and older. If there are more newer files than N, i repeat the process for the newer set only, otherwise i remember the newer set, subtract this number from N and repeat the process for the older set.
Or you can use pax to create an archive, and then use -o listopt to extract modification time in the format you want, there you have all the information:)

http://www.google.fr/search?q=zfs
else ...
man rsnapshoot
else ...
Stop trying to reinvent a backup suite
else ...
man find | grep -- -ctime
man find | grep -- -delete

Related

BASH: copy only images from directory, not copying folder structure and rename copied files in sequential order

I have found an old HDD which was used in the family computer back in 2011. There are a lot of images on it which I would love to copy to my computer and print out in a nice photobook as a surprise to my parents and sister.
However, I have a problem: These photos have been taken with older cameras. Which means that I have a lot of photos with names such as: 01, 02, etc. These are now in hunderds of sub-folders.
I have already tried the following command but I still get exceptions where the file cannot be copied because one with the same name already exists.
Example: cp: cannot create regular file 'C:/Users/patri/Desktop/Fotoboek/battery.jpg': File exists
The command I execute:
$ find . -type f -regex '.*\(jpg\|jpeg\|png\|gif\|bmp\|mp4\)' -exec cp --backup=numbered '{}' C:/Users/patri/Desktop/Fotoboek \;
I had hoped that the --backup=numbered would solve my problem. (I thought that it would add either a 0,1,2 etc to the filename if it already exists, which it unfortunately doesn't do successfully).
Is there a way to find only media files such as images and videos like I have above and make it so that every file copied gets renamed to a sequential number? So the first copied image would have the name 0, then the 2nd 1, etc.
** doesn't do successfully ** is not a clear question. If I try your find command on sample directories on my system (Linux Mint 20), it works just fine. It creates files with ~1~, ~2~, ... added to the filename (mind you after the extension).
If you want a quick and dirty solution, you could do:
#!/bin/bash
counter=1
find sourcedir -type f -print0 | while IFS= read -r -d '' file
do
filename=$(basename -- "$file")
extension="${filename##*.}"
fileonly="${filename%.*}"
cp "$file" "targetdir/${fileonly}_${counter}.$extension"
(( counter += 1 ))
done
In this solution the counter is incremented every time a file is copied. The numbers are not sequential for each filename.
Yes I know it is an anti-pattern, and not ideal but it works.
If you want a "more evolved" version of the previous, where the numbers are sequential, you could do:
#!/bin/bash
find sourcedir -type f -print0 | while IFS= read -r -d '' file
do
filename=$(basename -- "$file")
extension="${filename##*.}"
fileonly="${filename%.*}"
counter=1
while [[ -f "targetdir/${fileonly}_${counter}.$extension" ]]
do
(( counter += 1 ))
done
cp "$file" "targetdir/${fileonly}_${counter}.$extension"
done
This version increments the counter every time a file is found to exist with that counter. Ex. if you have 3 a.jpg files, they will be named a_1.jpg, a_2.jpg, a_3.jpg

Bash: Find exclude directory error

I have this folder structure:
incoming/
Printing/
|------ done/
\------ error/
The server is monitoring the Printing folder, waiting for .txt files to appear in it. When a new file is detected, it sends it to a printer and moves the file to done on success or to error on failure.
The script I am working on must do the following: scan the incoming directory for files, and transfer them one by one to the Printing folder. I started with this script I found here on StackOverflow:
#!/usr/bin/env bash
while true; do
target="/var/www/test";
dest="/var/www/incoming";
find $dest -maxdepth 1 -type f | sort -r | while IFS= read -r file; do
counter=0;
while [ $counter -eq 0 ]; do
if find "$target" -maxdepth 0 -mindepth 0 -empty | read; then
mv -v "$file" "$target" && counter=1;
else
echo "Directory not empty: $(find "$target" -mindepth 1)"
sleep 2;
fi;
done;
done
done
The problem is that it detects the two subfolders done and error and refuses to copy files, always emitting the "Directory not empty" message.
I need a way to make the script ignore those folders.
I tried variations on the find command involving -prune and ! -path, but I did not find anything that worked. How can I fix the find command in the inner loop to do as I require?
The command at issue is this:
find "$target" -maxdepth 0 -mindepth 0 -empty
Start by recognizing what it does:
it operates on the directory, if any, named by "$target"
because of -maxdepth 0, it tests only that path itself
the -empty predicate matches empty regular files and directories
(the -mindepth 0 is the default; expressing it explicitly has no additional effect)
Since your expectation is that the target directory will never be empty (it will contain at least the two subdirectories you described), you need an approach that is not based on the -empty predicate. find offers no way to modulate what "empty" means.
There are multiple ways to approach this, some including find and others not. Since find is kinda heavyweight, and it has a somewhat obscure argument syntax for complex tests, I suggest an alternative: ls + grep. Example:
# File names to ignore in the target directory
ignore="\
.
..
done
error"
# ...
while /bin/true; do
files=$(ls -a "$target" | grep -Fxv "$ignore")
if [ -z "$files" ]; then
mv -v "$file" "$target"
break
else
# non-ignored file(s) found
echo "Directory not empty:"
echo "$files"
sleep 2
fi
done
Things to note:
the -a option is presented to ls to catch dotfiles and thereby match the behavior of find's -empty predicate. It is possible that you instead would prefer to ignore dotfiles, in which case you can simply drop the -a.
the F option to grep specifies that it is to match fixed strings (not patterns) and the -x option tells it that it must match whole lines. The -v option inverts the sense of the matching, so those three together result in matching lines (filenames) other than those specified in the ignore variable.
capturing the file list in a variable is more efficient than recomputing it, and avoids a race condition in which a file is detected just before it is moved. By capturing the file list, you can be sure to recapitulate the exact data on which the script bases its decision to delay.
It is possible for filenames to include newlines, and carefully crafted filenames containing newlines could fool this script into thinking the directory (effectively) empty when in fact it isn't. If that's a concern for you then you'll need something a bit more robust, maybe using find after all.

Effeciantly moving half a million files based on extention in bash

Scenario:
With Locky virus on the rampage the computer center I work for have found the only method of file recovery is using tools like Recuva now the problem with that is it dumps all the recovered files into a single directory. I would like to move all those files based on there file extensions into categories. All JPG in one all BMP in another ... etc. i have looked around Stackoverflow and based off of various other questions and responses I managed to build a small bash script (sample provided) that kinda does that however it takes forever to finish and i think i have the extensions messed up.
Code:
#!/bin/bash
path=$2 # Starting path to the directory of the junk files
var=0 # How many records were processed
SECONDS=0 # reset the clock so we can time the event
clear
echo "Searching $2 for file types and then moving all files into grouped folders."
# Only want to move Files from first level as Directories are ok were they are
for FILE in `find $2 -maxdepth 1 -type f`
do
# Split the EXT off for the directory name using AWK
DIR=$(awk -F. '{print $NF}' <<<"$FILE")
# DEBUG ONLY
# echo "Moving file: $FILE into directory $DIR"
# Make a directory in our path then Move that file into the directory
mkdir -p "$DIR"
mv "$FILE" "$DIR"
((var++))
done
echo "$var Files found and orginized in:"
echo "$(($diff / 3600)) hours, $((($diff / 60) % 60)) minutes and $(($diff % 60)) seconds."
Question:
How can i make this more efficient while dealing with 500,000+ files? The find takes forever to grab a list of files and in the loop its attempting to create a directory (even if that path is already there). I would like to more efficiently deal with those two particular aspects of the loop if at possible.
The bottleneck of any bash script is usually the number of external processes you start. In this case, you can vastly reduce the number of calls to mv you make by recognizing that a large percentage of the files you want to move will have a common suffix like jpg, etc. Start with those.
for ext in jpg mp3; do
mkdir -p "$ext"
# For simplicity, I'll assume your mv command supports the -t option
find "$2" -maxdepth 1 -name "*.$ext" -exec mv -t "$ext" {} +
done
Use -exec mv -t "$ext" {} + means find will pass as many files as possible to each call to mv. For each extension, this means one call to find and a minimum number of calls to mv.
Once those files are moved, then you can start analyzing files one at a time.
for f in "$2"/*; do
ext=${f##*.}
# Probably more efficient to check in-shell if the directory
# already exists than to start a new process to make the check
# for you.
[[ -d $ext ]] || mkdir "$ext"
mv "$f" "$ext"
done
The trade-off occurs in deciding how much work you want to do beforehand identifying the common extensions to minimize the number of iterations of the second for loop.

Rename all files in a directory by omitting last 3 characters

I am trying to write a bash command that will rename all the files in the current directory by omitting the last 3 characters. I am not sure if it is possible thats why I am asking here.
I have a lots of files named like this : 720-1458907789605.ts
I need to rename all of them by omitting last 3 characters to obtain from 720-1458907789605.ts ---> 720-1458907789.ts for all files in the current directory.
Is it possible using bash commands? I am new to bash scripts.
Thank you!
Native bash solution:
for f in *.ts; do
[[ -f "$f" ]] || continue # if you do not need to rename directories
mv "$f" "${f:: -6}.ts"
done
This solution is slow if you have really many files: star-expansion in for will take up memory and time.
Ref: bash substring extraction.
If you have a really large data set, a bit more complex but faster solution will be:
find . -type f -name '*.ts' -depth 1 -print0 | while read -d $\0 f; do
mv "$f" "${f%???.ts}.ts"
done
With Larry Wall's rename:
rename -n 's/...\.ts$/.ts/' *.ts
If everything looks okay remove dry run option -n.

Backup script: How to keep the last N entries?

For a backup script, I need to clean old backups. How can I keep the last N backups and delete the rest?
A backup is either a single folder or a single file and the script will either keep all backups in folders or files (no mixing).
If possible, I'd like to avoid parsing of the output of ls. Even though all the entries in the backup folder should have been created by the backup script and there should be no funny characters in the entry names, a hacker might be able to create new entries in there.
This should do it (untested!):
#!/usr/bin/env bash
set -o errexit -o noclobber -o nounset -o pipefail
i=0
max=7 # Could be anything you want
while IFS= read -r -d '' -u 9
do
let ++i
if [ "$i" -gt "$max" ]
then
rm -- "$REPLY"
fi
done 9< <(find /var/backup -type f -maxdepth 1 -regex '.*/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\.tar\.gz' -print0 | sort -rz)
Explained from the outside in:
Ensure that the script stops at any common errors.
Find all files in /var/backup (and not subdirectories) matching a YYYY-MM-DD.tar.gz format.
Reverse sort these, so the latest are listed first.
Send these to file descriptor 9. This avoids any problems with cat, ssh or other programs which read standard input by default.
Read files one by one from FD 9, separated by NUL.
Count files until you get past your given max.
Nuke the rest from orbit.

Resources