bash script to remove previous backup - bash

I set up a daily cron job to backup my server.
In my folder backup, the backup command generates 2 files : the archive itself .tar.gz and a file .info.json like the ones below:
-rw-r--r-- 1 root root 1617 Feb 2 16:17 20200202-161647.info.json
-rw-r--r-- 1 root root 48699726 Feb 2 16:17 20200202-161647.tar.gz
-rw-r--r-- 1 root root 1617 Feb 3 06:25 20200203-062501.info.json
-rw-r--r-- 1 root root 48737781 Feb 3 06:25 20200203-062501.tar.gz
-rw-r--r-- 1 root root 1618 Feb 4 06:25 20200204-062501.info.json
-rw-r--r-- 1 root root 48939569 Feb 4 06:25 20200204-062501.tar.gz
How to I write a bash script that will only keep the last 2 archives and deletes all the others backup (targ.gz and info.json).
In this example, that would mean deleted 20200204-062501.info.json and 20200204-062501.tar.gz .
Edit:
I replace -name by -wholename in the script but when I run it, it doesn't have any effects apparently.The old archives are still there and they have not been deleted.
the script :
#!/bin/bash
DEBUG="";
DEBUG="echo DEBUG..."; #put last to safely debug without deleting files
keep=2;
for suffix in /home/archives .json .tar; do
list=( $( find . -wholename "*$suffix" ) ); #allow for zero names
if [ ${#list[#]} -gt $keep ]; then
# delete all but last $keep oldest files
${DEBUG}rm -f "$( ls -tr "${list[#]}" | head -n-$keep )";
fi
done
Edit 2:
if I run #sorin script, does it actually delete everything if I believe the script output?
The archive folder before running the script:
https://pastebin.com/7WtwVHCK
The script I run:
find home/archives/ \( -name '*.json' -o -name '*.tar.gz' \) -print0 |\
sort -zr |\
sed -z '3,$p' | \
xargs -0 echo rm -f
The script output:
https://pastebin.com/zd7a2zcq
Edit 3 :
The command find /home/archives/ -daystart \( -name '*.json' -o -name '*.tar.gz' \) -mtime +1 -exec echo rm -f {} + works and does the job.
Marked as solved

If the file is generated daily, a simple approach would be to take advantage of the -mtime find condition:
find /home/archives/ -daystart \( -name '*.json' -o -name '*.tar.gz' \) -mtime +1 -exec echo rm -f {} +
-daystart - use the start of the day for comparing modification times
\( -name '*.json' -o -name '*.tar.gz' \) - select files that end either in *.json or *.tar.gz
-mtime +1 - modification time is older than 24 hours (from the day start)
-exec echo rm -f {} + - remove the files (remove the echo after testing and verifying the result is what you want)
A simpler solution avoiding ls and it's pitfalls and not depending on the modification time of the files:
find /home/archives/ \( -name '*.json' -o -name '*.tar.gz' \) -print0 |\
sort -zr |\
sed -nz '3,$p' | \
xargs -0 echo rm -f
\( -name '*.json' -o -name '*.tar.gz' \) - find files that end in either *.json or tar.gz
-print0 - print them null separated
sort -zr - -z tells sort to use null as a line separator, -r sorts them in reverse
sed -nz '3,$p' - -z same as above. '3,$p' - print lines between 3rd and the end ($)
xargs -0 echo rm -f - execute rm with the piped arguments (remove the echo after you tested and you are satisfied with the command)
Note: not all sort and sed support the -z but most do. If you are stuck with such a situation, you might have to use a higher level language

Find the two most recent files in path:
most_recent_json=$(ls -t *.json | head -1)
most_recent_tar_gz=$(ls -t *.tar.gz | head -1)
Remove everything else ignoring the found recent files:
rm -i $(ls -I $most_recent_json -I $most_recent_tar_gz)

Automatic deleting can be hazardous to your mental state if it deletes unwanted files or aborts long scripts early due to unexpected errors. Say when there are fewer than 1+2 files in your example. Be sure the script does not fail if there are no files at all.
tdir=/home/archives/; #target dir
DEBUG="";
DEBUG="echo DEBUG..."; #put last to safely debug without deleting files
keep=2;
for suffix in .json .tar; do
list=( $( find "$tdir" -name "*$suffix" ) ); #allow for zero names
if [ ${#list[#]} -gt $keep ]; then
# delete all but last $keep oldest files
${DEBUG}rm -f "$( ls -tr "${list[#]}" | head -n-$keep )";
fi
done

Assuming that you have fewer than 10 files and that they are created in pairs, then you can do something straightforward like this:
files_to_delete=$(ls -tr1 | tail -n+3)
rm $files_to_delete
The -tr1 tells the ls command to list the files in reverse chronological order by modification time, each on a single line.
The tail -n+3 tells the tail command to start at the third line (skipping the first two lines).
If you have more than 10 files, a more complicated solution will be necessary, or you would need to run this multiple times.

Related

Keep latest pair of files and move older files to another (Unix)

For example I have following files in a directory
FILE_1_2021-01-01.csum
FILE_1_2021-01-01.csv
FILE_1_2021-01-02.csum
FILE_1_2021-01-02.csv
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
I want to keep FILE_1_2021-01-03.csum and FILE_1_2021-01-03.csv in current directory but zip and move rest of the older files to another directory.
So far I have tried like this but stuck how to correctly identify the pairs
file_count=0
PATH=/path/to/dir
ARCH=/path/to/dir
for file in ${PATH}/*
do
if [[ ! -d $file ]]
then
file_count=$(($file_count+1))
fi
done
echo "file count $file_count"
if [ $file_count -gt 2 ]
then
echo "moving old files to $ARCH"
// How to do it
fi
Since the timestamps are in a format that naturally sorts out with earliest first, newest last, an easy approach is to just use filename expansion to store the .csv and .csum filenames in a pair of arrays, and then do something with all but the last element of both:
declare -a csv=( FILE_*.csv ) csum=( FILE_*.csum )
mv "${csv[#]:0:${#csv[#]}-1}" "${csum[#]:0:${#csum[#]}-1}" new_directory/
(Or tar them up first, or whatever.)
First off ...
it's bad practice to use all uppercase variables as these can clash with OS-level variables (also all uppercase); case in point ...
PATH is a OS-level variable for keeping track of where to locate binaries but in this case ...
OP has just wiped out the OS-level variable with the assignment PATH=/path/to/dir
As for the question, some assumptions:
each *.csv file has a matching *.csum file
the 2 files to 'keep' can be determined from the first 2 lines of output resulting from a reverse sort of the filenames
not sure what OP means by 'zip and move' (eg, zip? gzip? tar all old files into a single .tar and then (g)zip?) so for the sake of this answer I'm going to just gzip each file and move to a new directory (OP can adjust the code to fit the actual requirement)
Setup:
srcdir='/tmp/myfiles'
arcdir='/tmp/archive'
rm -rf "${srcdir}" "${arcdir}"
mkdir -p "${srcdir}" "${arcdir}"
cd "${srcdir}"
touch FILE_1_2021-01-0{1..3}.{csum,csv} abc XYZ
ls -1
FILE_1_2021-01-01.csum
FILE_1_2021-01-01.csv
FILE_1_2021-01-02.csum
FILE_1_2021-01-02.csv
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
XYZ
abc
Get list of *.csum/*.csv files and sort in reverse order:
$ find . -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r
/tmp/myfiles/FILE_1_2021-01-03.csv
/tmp/myfiles/FILE_1_2021-01-03.csum
/tmp/myfiles/FILE_1_2021-01-02.csv
/tmp/myfiles/FILE_1_2021-01-02.csum
/tmp/myfiles/FILE_1_2021-01-01.csv
/tmp/myfiles/FILE_1_2021-01-01.csum
Eliminate first 2 files (ie, generate list of files to zip/move):
$ find "${srcdir}" -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r | tail +3
/tmp/myfiles/FILE_1_2021-01-02.csv
/tmp/myfiles/FILE_1_2021-01-02.csum
/tmp/myfiles/FILE_1_2021-01-01.csv
/tmp/myfiles/FILE_1_2021-01-01.csum
Process our list of files:
while read -r fname
do
gzip "${fname}"
mv "${fname}".gz "${arcdir}"
done < <(find "${srcdir}" -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r | tail +3)
NOTE: the find|sort|tail results could be piped to xargs (or parallel) to perform the gzip/mv operations but without more details on what OP means by 'zip and move' I've opted for a simpler, albeit less performant, while loop
Results:
$ ls -1 "${srcdir}"
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
XYZ
abc
$ ls -1 "${arcdir}"
FILE_1_2021-01-01.csum.gz
FILE_1_2021-01-01.csv.gz
FILE_1_2021-01-02.csum.gz
FILE_1_2021-01-02.csv.gz
Your algorithm of counting files can be simplified using find. You seem to look for non-directories. The option -not -type d does exactly that. By default find searches into the subfolders, so you need to pass -maxdepth 1 to limit the search to a depth of 1.
find "$PATH" -maxdepth 1 -not -type d
If you want to get the number of files, you may pipe the command to wc:
file_count=$(find "$PATH" -maxdepth 1 -not -type d | wc -l)
Now there are two ways of detecting which file is the more recent: by looking at the filename, or by looking at the date when the files were last created/modified/etc. Since your naming convention looks pretty solid, I would recommend the first option. Sorting by creation/modification date is more complex and there are numerous cases where this information is not reliable, such as copying files, zipping/unzipping them, touching files, etc.
You can sort with sort and then grab the last element with tail -1:
find "$PATH" -maxdepth 1 -not -type d | sort | tail -1
You can do the same thing by sorting in reverse order using sort -r and then grab the first element with head -1. From a functional point of view, it is strictly equivalent, but it is slightly faster because it stops at the first result instead of parsing all results. Plus it will be more relevant later on.
find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1
Once you have the filename of the most recent file, you can extract the base name in order to create a pattern out of it.
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1)
most_recent_file=${most_recent_file%.*}
most_recent_file=${most_recent_file##*/}
Let’s explain this:
first, we grab the filename into a variable called most_recent_file
then we remove the extension using ${most_recent_file%.*} ; the % symbol will cut at the end, and .* will cut everything after the last dot, including the dot itself
finally, we remove the folder using ${most_recent_file##*/} ; the ## symbol will cut at the beginning with a greedy catch, and */ will cut everything before the last slash, including the slash itself
The difference between # and ## is how greedy the pattern is. If your file is /path/to/file.csv then ${most_recent_file#*/} (single #) will cut the first slash only, i.e. it will output path/to/file.csv, while ${most_recent_file##*/} (double #) will cut all paths, i.e. it will output file.csv.
Once you have this string, you can make a pattern to include/exclude similar files using find.
find "$PATH" -maxdepth 1 -not -type d -name "$most_recent_file.*"
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*"
The first line will list all files which match your pattern, and the second line will list all files which do not match the pattern.
Since you want to move your 'old' files to a folder, you may execute a mv command for the last list.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -exec mv {} "$ARCH" \;
If your version of find supports it, you may use + in order to batch the move operations.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -exec mv -t "$ARCH" {} +
Otherwise you can pipe to xargs.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" | xargs mv -t "$ARCH"
If put altogether:
file_count=0
PATH=/path/to/dir
ARCH=/path/to/dir
file_count=$(find "$PATH" -maxdepth 1 -not -type d | wc -l)
echo "file count $file_count"
if [ $file_count -gt 2 ]
then
echo "moving old files to $ARCH"
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1)
most_recent_file=${most_recent_file%.*}
most_recent_file=${most_recent_file##*/}
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" | xargs mv -t "$ARCH"
fi
As a last note, if your path has newlines, it will not work. If you want to handle this case, you need a few modifications. Counting files would be done like this:
file_count=$(find "$PATH" -maxdepth 1 -not -type d -print . | wc -c)
Getting the most recent file:
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d -print0 | sort -rz | grep -zm1)
Moving files with xargs:
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -print0 | xargs -0 mv -t "$ARCH"
(There’s no problem if moving files using -exec)
I won’t go into details, but just know that the issue is known and these are the kind of solutions you can apply if need be.

How can I use sed to generate a list of files to be deleted via rm?

I have a compute cluster that takes input files and generates two output files from stdout and stderr and the error file is empty if thing ran correctly. Since we do a lot of runs in parallel, I just want to maintain the inputs and logs when there is an error. This breaks down to the following tasks:
Find all of the empty error logs.
Find the prefix that corresponds to a given batch.
Delete all of the files related to that batch.
A batch of files looks like the following:
25584-0.8-0.170-bfa.yml
25584-0.8-0.175-bfa.pbs
25584-0.8-0.175-bfa.pbs.e20693449
25584-0.8-0.175-bfa.pbs.o20693449
So far I've sorted out how to find the files:
find . -name '*.pbs.e*' -size 0
and how to extract the generic prefix that applies to all of the files:
sed 's/\(.*\)\.pbs.e.*/\1.*/'
so piping all of these together I would expect the following to delete all of the files associated with compute jobs that had no errors:
find . -name '*.pbs.e*' -size 0 | sed 's/\(.*\)\.pbs.e.*/\1.*/' | xargs -d '\n' rm
However the output I get for all matching cases is the following:
rm: cannot remove `./25584-0.8-0.170-bfa.*': No such file or directory
Manually typing out the command (e.g., rm 25584-0.8-0.170-bfa.*) works a expected and it appears that sed appends a \n to the output which means the | xargs rm or | xargs -d '\n' rm pipes are generating the error.
How can I format the output from sed (or similar tool) so that the matching files will be deleted?
The reason you get
rm: cannot remove './25584-0.8-0.170-bfa.*': No such file or directory
is that this command was executed:
rm "./25584-0.8-0.170-bfa.*"
where the argument is a string containing a star and not a glob expression. A glob needs to be expanded by a shell and given to the command as arguments.
For this directory
-rw-r--r-- 1 t users 0 Jul 20 22:10 25584-0.8-0.175-bfa.pbs
-rw-r--r-- 1 t users 0 Jul 20 22:10 25584-0.8-0.175-bfa.pbs.e20693449
-rw-r--r-- 1 t users 0 Jul 20 22:10 25584-0.8-0.175-bfa.pbs.o20693449
-rw-r--r-- 1 t users 0 Jul 20 22:10 25584-0.8-0.176-bfa.pbs
-rw-r--r-- 1 t users 0 Jul 20 22:10 25584-0.8-0.176-bfa.pbs.e20693449
-rw-r--r-- 1 t users 0 Jul 20 22:10 25584-0.8-0.176-bfa.pbs.o20693449
Here we print two strings:
> find . -name '*.pbs.e*' -size 0 | sed 's/\(.*\)\.pbs.e.*/\1.*/' |\
xargs -I# echo #
./25584-0.8-0.175-bfa.*
./25584-0.8-0.176-bfa.*
Here we call a shell which is expanding the arguments before giving them to the command:
> find . -name '*.pbs.e*' -size 0 | sed 's/\(.*\)\.pbs.e.*/\1.*/' |\
xargs -I# bash -c "echo #"
./25584-0.8-0.175-bfa.pbs ./25584-0.8-0.175-bfa.pbs.e20693449 ./25584-0.8-0.175-bfa.pbs.o20693449
./25584-0.8-0.176-bfa.pbs ./25584-0.8-0.176-bfa.pbs.e20693449 ./25584-0.8-0.176-bfa.pbs.o20693449
which is what you need, so your command could be modified to:
find . -name '*.pbs.e*' -size 0 | sed 's/\(.*\)\.pbs.e.*/\1.*/' | xargs -I# bash -c "rm -- #"
I suspect that there is more than one way of doing this, but since a one liner isn't needed the issues with sed can be bypassed by just using a loop:
for item in `find . -name '*.pbs.e*' -size 0 | sed 's/\(.*\)\.pbs.e.*/\1.*/'`; do
rm $item
done
the asterisk * is taken as literal character and not expected filename expansion. you can pass your sed to another find which takes literal * in -name pattern without evaluating it
find . -maxdepth 1 -type f -name '*.pbs.e*' -size 0 | sed 's,^\./\(.*\)\.pbs\.e.*,\1.*,' | xargs -n1 -I{} find . -maxdepth 1 -type f -name '{}' -print -delete
to make the sed a little more safe you can run it on full path, only culprit you must extra escape meta characters in path name
# search dir
dir=.
dir=$(realpath "$dir")
# that should escape meta-characters in non-trivial dir name
# stackoverflow.com/q/15783701
printf -v dirstr '%q' "$dir"
# $dirstr is used for sed but can replaced with $dir for simple dir name
find "$dir" -maxdepth 1 -type f -name '*.pbs.e*' -size 0 | sed "s,^$dirstr/\(.*\)\.pbs\.e.*,\1.*," | xargs -n1 -I{} find "$dir" -maxdepth 1 -type f -name {} -print -delete
should work with whitespace and other non-trivial file names (except for \n)

How to remove all directories except for the last one

I can get the amount of dirs with command
ls -dtr */ | wc -l
But how do I specifically delete the N-1 directories, leaving the most recent one?
As usual for any bash operation on "a bunch of files", you have to be aware of the pain that is spaces and newlines, which may legally appear in file names. These break naïve xargs / for based approaches and require some extra hoops to jump.
Many tools support -z or -0 options, which use NUL bytes as line separator instead of newlines -- NUL may never be part of a file name.
Unfortunately, ls is not one of them, so we have to go through find to get the latest directory.
find . -maxdepth 1 -mindepth 1 -type d
This gets you all the directories in the current directory. (Note: As opposed to your ls -dtr */, this will also find "hidden" directories, i.e. ones starting with a .. Add ! -name ".*" to avoid that.)
-maxdepth 1 avoids recursion, -mindepth 1 keeps the parent directory (.) out of the list.
find . -maxdepth 1 -mindepth 1 -type d -printf "%T+ %f\0"
This lists the directories and their timestamps, using NUL instead of newline for line separation. (NUL can never be part of a file name.)
find . -maxdepth 1 -mindepth 1 -type d -printf "%T+ %f\0" | sort -z
This sorts the results, using NUL instead of newline for line separation.
find . -maxdepth 1 -mindepth 1 -type d -printf "%T+ %f\0" | sort -z | head -z -n -1
This takes everything but the last entry (the latest directory and its timestamp) from the list, using NUL instead of newline for line separation.
find . -maxdepth 1 -mindepth 1 -type d -printf "%T+ %f\0" | sort -z | head -z -n -1 | cut -z -d' ' -f 2-
Using NUL instead of newline for line separation and space as field delimiter, this filters the first field (the timestamp) from the output.
find . -maxdepth 1 -mindepth 1 -type d -printf "%T+ %f\0" | sort -z | head -z -n -1 | cut -z -d' ' -f 2- | xargs -0 rm -rf
Using NUL instead of newline for line separation, this calls rm -rf for each entry.
Use tail:
ls -dt */ | tail -n +2 | while read dirName; do rmdir "$dirName"; done
When specifying the -n option of tail with a +, it skips the first lines.
This will list the directories sorted by date, skip the most recent one and keep the rest, which will be deleted. You can replace rmdir with rm -rf following your use case.
You can try it out:
mkdir test && cd test
for i in {1..100}; do mkdir subdir-$i; done
ls -dt */ | tail -n +2 | while read dirName; do rmdir "$dirName"; done
ls
Output:
subdir-100
EDIT: if any of the directory names contain newline characters, this approach will fail because of tail. See #DevSolar's answer for a more complete one in that case.
This is a pure Bash solution:
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s dotglob # Globs include files whose names start with '.'
newest_dir=
for sdir in */ ; do
dir=${sdir%/} # Remove slash to enable a symlink check
[[ -L $dir ]] && continue # Skip symlinks to directories
if [[ -z $newest_dir ]] ; then
newest_dir=$dir
elif [[ $dir -nt $newest_dir ]] ; then
echo rm -rf -- "$newest_dir"
newest_dir=$dir
else
echo rm -rf -- "$dir"
fi
done
In its current form it just prints the commands to remove the old subdirectories. Remove the echos to make it functional.
So many ways, here's another. :)
rm -rf `ls -1dt * | tail -n +2 | tr '\n' ' '`
First, the ls -1dt * just lists all the directories sorted by modification time, one per line. Then tail -n +2 removes the first, most recently modified. Then tr '\n ' ' puts them all into a single line. Then the rm -rf the dirs.

remove files from subfolders without the last three

I have a structure like that:
/usr/local/a/1.txt
/usr/local/a/2.txt
/usr/local/a/3.txt
/usr/local/b/4.txt
/usr/local/b/3.txt
/usr/local/c/1.txt
/usr/local/c/7.txt
/usr/local/c/6.txt
/usr/local/c/12.txt
...
I want to delete all the files *.txt in subfolders except the last three files with the greatest modification date, but here I am in current directory
ls -tr *.txt | head -n-3 |xargs rm -f
I need to combine that with the code:
find /usr/local/**/* -type f
Should I use the maxdepth option?
Thanks for helping,
aola
Added maxdepth options to find for one level, sorting files by last modification time, tail to ignore the oldest modified 3 files and xargs with -r to remove the files only if they are found.
for folder in $(find /usr/local/ -type d)
do
find $folder -maxdepth 1 -type f -name "*.txt" | xargs -r ls -1tr | tail -n+3 | xargs -r rm -f
done
Run the above command once without rm to ensure that the previous commands pick the proper files for deletion.
You've almost got the solution: use find to get the files,ls to sort them by modification date and tail to omit three most recently modified ones:
find /usr/lib -type f | xargs ls -t | tail -n +4 | xargs rm
If you would like to remove only the files at a specified depth add -mindepth 4 -maxdepth 4 to find parameters.
You can use find's -printf option, to print the modification time in front of the file name and then sort and strip the date off. This avoids using ls at all.
find /usr/local -type f -name '*.txt' -printf '%T#|%p\n' | sort -r | cut -d '|' -f 2 | head -n-3 | xargs rm -f
The other Answers using xargs ls -t can lead to incorrect results, when there are more results than xargs can put in a single ls -t command.
but for each subfolder, so when I have
/usr/local/a/1.txt
/usr/local/a/2.txt
/usr/local/a/3.txt
/usr/local/a/4.txt
/usr/local/b/4.txt
/usr/local/b/3.txt
/usr/local/c/1.txt
/usr/local/c/7.txt
/usr/local/c/6.txt
/usr/local/c/12.txt
I want to to use the code for each subfolder separately
head -n-3 |xargs rm -f
so I bet if I have it sorted by date then the files to delete:
/usr/local/a/4.txt
/usr/local/c/12.txt
I want to leave in any subfolder three newest files

Get the newest directory to a variable in Bash

I would like to find the newest sub directory in a directory and save the result to variable in bash.
Something like this:
ls -t /backups | head -1 > $BACKUPDIR
Can anyone help?
BACKUPDIR=$(ls -td /backups/*/ | head -1)
$(...) evaluates the statement in a subshell and returns the output.
There is a simple solution to this using only ls:
BACKUPDIR=$(ls -td /backups/*/ | head -1)
-t orders by time (latest first)
-d only lists items from this folder
*/ only lists directories
head -1 returns the first item
I didn't know about */ until I found Listing only directories using ls in bash: An examination.
This ia a pure Bash solution:
topdir=/backups
BACKUPDIR=
# Handle subdirectories beginning with '.', and empty $topdir
shopt -s dotglob nullglob
for file in "$topdir"/* ; do
[[ -L $file || ! -d $file ]] && continue
[[ -z $BACKUPDIR || $file -nt $BACKUPDIR ]] && BACKUPDIR=$file
done
printf 'BACKUPDIR=%q\n' "$BACKUPDIR"
It skips symlinks, including symlinks to directories, which may or may not be the right thing to do. It skips other non-directories. It handles directories whose names contain any characters, including newlines and leading dots.
Well, I think this solution is the most efficient:
path="/my/dir/structure/*"
backupdir=$(find $path -type d -prune | tail -n 1)
Explanation why this is a little better:
We do not need sub-shells (aside from the one for getting the result into the bash variable).
We do not need a useless -exec ls -d at the end of the find command, it already prints the directory listing.
We can easily alter this, e.g. to exclude certain patterns. For example, if you want the second newest directory, because backup files are first written to a tmp dir in the same path:
backupdir=$(find $path -type -d -prune -not -name "*temp_dir" | tail -n 1)
The above solution doesn't take into account things like files being written and removed from the directory resulting in the upper directory being returned instead of the newest subdirectory.
The other issue is that this solution assumes that the directory only contains other directories and not files being written.
Let's say I create a file called "test.txt" and then run this command again:
echo "test" > test.txt
ls -t /backups | head -1
test.txt
The result is test.txt showing up instead of the last modified directory.
The proposed solution "works" but only in the best case scenario.
Assuming you have a maximum of 1 directory depth, a better solution is to use:
find /backups/* -type d -prune -exec ls -d {} \; |tail -1
Just swap the "/backups/" portion for your actual path.
If you want to avoid showing an absolute path in a bash script, you could always use something like this:
LOCALPATH=/backups
DIRECTORY=$(cd $LOCALPATH; find * -type d -prune -exec ls -d {} \; |tail -1)
With GNU find you can get list of directories with modification timestamps, sort that list and output the newest:
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\0" | sort -z -n | cut -z -f2- | tail -z -n1
or newline separated
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\n" | sort -n | cut -f2- | tail -n1
With POSIX find (that does not have -printf) you may, if you have it, run stat to get file modification timestamp:
find . -mindepth 1 -maxdepth 1 -type d -exec stat -c '%Y %n' {} \; | sort -n | cut -d' ' -f2- | tail -n1
Without stat a pure shell solution may be used by replacing [[ bash extension with [ as in this answer.
Your "something like this" was almost a hit:
BACKUPDIR=$(ls -t ./backups | head -1)
Combining what you wrote with what I have learned solved my problem too. Thank you for rising this question.
Note: I run the line above from GitBash within Windows environment in file called ./something.bash.

Resources