Delete all files in a directory matching a time pattern - shell

I am taking one of important folder every day backup by using cron. That folder name it will store with the current date.
Now my requirement is i need to keep only the current day and last two days backup.
i.e I want to keep only:
test_2016-11-04.tgz
test_2016-11-03.tgz
test_2016-11-02.tgz
Remaining folder it has to delete automatically. Please let us know how to do in shell script.
Below is my backup folder structure.
test_2016-10-30.tgz test_2016-11-01.tgz test_2016-11-03.tgz
test_2016-10-31.tgz test_2016-11-02.tgz test_2016-11-04.tgz

With ls -lrt | head -n -3 | awk '{print $9}
you can print all but the last 3 files in your directory.
Passing this output into rm you obtain the result desidered.

you could append end of backup script;
find ./backupFolder -name "test_*.tgz" -mtime +3 -type f -delete
also use this;
ls -1 test_*.tgz | sort -r | awk 'NR > 3 { print }' | xargs -d '\n' rm -f --

Generate an array on files you want to keep:
names=()
for d in {0..2}; do
names+=( "test_"$(date -d"$d days ago" "+%Y-%m-%d")".tgz" )
done
so that it looks like this:
$ printf "%s\n" "${names[#]}"
test_2016-11-04.tgz
test_2016-11-03.tgz
test_2016-11-02.tgz
Then, loop through the files and keep those that are not in the array:
for file in test_*.tgz; do
[[ ! ${names[*]} =~ "$file" ]] && echo "remove $file" || echo "keep $file"
done
If ran on your directory, this would result on an output like:
remove test_2016-10-30.tgz
remove test_2016-10-31.tgz
remove test_2016-11-01.tgz
keep test_2016-11-02.tgz
keep test_2016-11-03.tgz
keep test_2016-11-04.tgz
So now it is just a matter or replacing those echo with something more meaningful like rm.

Related

Automator/Apple Script: Move files with same prefix on a new folder. The folder name must be the files prefix

I'm a photographer and I have multiple jpg files of clothings in one folder. The files name structure is:
TYPE_FABRIC_COLOR (Example: BU23W02CA_CNU_RED, BU23W02CA_CNU_BLUE, BU23W23MG_LINO_WHITE)
I have to move files of same TYPE (BU23W02CA) on one folder named as TYPE.
For example:
MAIN FOLDER>
BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg, BU23W23MG_LINO_WHITE.jpg
Became:
MAIN FOLDER>
BU23W02CA_CNU > BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg
BU23W23MG_LINO > BU23W23MG_LINO_WHITE.jpg
Here are some scripts.
V1
#!/bin/bash
find . -maxdepth 1 -type f -name "*.jpg" -print0 | while IFS= read -r -d '' file
do
# Extract the directory name
dirname=$(echo "$file" | cut -d'_' -f1-2 | sed 's#\./\(.*\)#\1#')
#DEBUG echo "$file --> $dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
mv "$file" "$dirname"
done
it assumes all files that the find lists are of the format you described in your question, i.e. TYPE_FABRIC_COLOR.ext.
dirname is the extraction of the first two words delimited by _ in the file name.
since find lists the files with a ./ prefix, it is removed from the dirname as well (that is what the sed command does).
the find specifies the name of the files to consider as *.jpg. You can change this to something else, if you want to restrict which files are considered in the move.
this version loops through each file, creates a directory with it's first two sections (if it does not exists already), and moves the file into it.
if you want to see what the script is doing to each file, you can add option -v to the mv command. I used it to debug.
However, since it loops though each file one by one, this might take time with a large number of files, hence this next version.
V2
#!/bin/bash
while IFS= read -r dirname
do
echo ">$dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
find . -maxdepth 1 -type f -name "${dirname}_*" -exec mv {} "$dirname" \;
done < <(find . -maxdepth 1 -type f -name "*.jpg" -print | sed 's#^\./\(.*\)_\(.*\)_.*\..*$#\1_\2#' | sort | uniq)
this version loops on the directory names instead of on each file.
the last line does the "magic". It finds all files, and extracts the first two words (with sed) right away. Then these words are sorted and "uniqued".
the while loop then creates each directory one by one.
the find inside the while loop moves all files that match the directory being processed into it. Why did I not simply do mv ${dirname}_* ${dirname}? Since the expansion of the * wildcard could result in a too long arguments list for the mv command. Doing it with the find ensures that it will work even on LARGE number of files.
Suggesting oneliner awk script:
echo "$(ls -1 *.jpg)"| awk '{system("mkdir -p "$1 OFS $2);system("mv "$0" "$1 OFS $2)}' FS=_ OFS=_
Explanation:
echo "$(ls -1 *.jpg)": List all jpg files in current directory one file per line
FS=_ : Set awk field separator to _ $1=type $2=fabric $3=color.jpg
OFS=_ : Set awk output field separator to _
awk script explanation
{ # for each file name from list
system ("mkdir -p "$1 OFS $2); # execute "mkdir -p type_fabric"
system ("mv " $0 " " $1 OFS $2); # execute "mv current-file to type_fabric"
}

I am trying to delete to the oldest files in a directory once there are 8 files in the directory

I am trying to delete to the oldest files in a directory once there are 8 files in the directory.
ls -Ct /tmp/test/ | awk '{$1=$2=$3=$4=""; print $0}' | xargs rm
I would like it delete the output of:
ls -Ct /tmp/test/ | awk '{$1=$2=$3=$4=""; print $0}'
but I keep getting an error showing those files don't exist. I know it is because xargs is looking in the directory I'm currently in, but i need it to look off /tmp/test/ instead. Is there any way this can be done?
You wrote to delete to the oldest files in a directory once there are 8 files. I'm not quite sure what you mean by that but I'm going to assume that you leave 8 newest files and delete the rest. Also, since you tagged awk, I'm using GNU awk (for stat()) for the job.
First some test material:
$ mkdir test # create test dir
$ cd test # use it
$ for i in $(seq 1 10 | shuf) ; do touch $i ;sleep 1 ; done # touch some test files
$ ls -t # 1 possible distribution
8 10 6 7 1 2 4 9 3 5
The gawk program:
$ gawk '
#load "filefuncs" # enable stat()
BEGIN {
for(i=1;i<ARGC;i++) { # iterate argument files
ret=stat(ARGV[i],fdata) # use stat to get the mtime
mtimes[ARGV[i]]=fdata["mtime"] # hash to an array
}
PROCINFO["sorted_in"]="#val_num_desc" # set for traverse order to newest first
for(f in mtimes) # use that order
if(++c>8) # leave 8 newest
cmd=cmd OFS f # gather the list of files to rm
if(cmd) { # if any
cmd="rm -f" cmd # add rm to the beginning
print cmd # print the command to execute
# # # system(cmd) # this is the actual remove command
} # by uncommenting it you admit you
}' * # understand how the script works
# and accept all responsibility
If you want to use bash, this is easily done with two commands:
$ t=$(find . -maxdepth 1 -type f -printf "%T#\n" | sort -rg | awk '(NR==9)')
$ [[ "$t" != "" ]] && find . -maxdepth 1 -type f ! -newermt "#${t}" -delete
The first command finds the 9th youngest file and picks its modification time in epoch. The second command picks all files that are not newer than this modification time. (So that is why we pick the 9th time, to keep the 8 youngest)
When you want to do it in zsh, you can do the following:
$ files=( *(.om) )
$ (( ${#files[9,-1]} != 0 )) && rm "${files[9,-1]}"
This creates an array files which contains files (.) which are sorted (om) by modification time. Then we select the files starting at position 9 till the end. To be safe, we check first if there are files in that sublist.
Both methods presented here avoid any issues you might have with funny filenames.
I'm going to assume that you leave 8 newest files and delete the rest.
$ mkdir test
$ cd test
$ touch "0 0"
$ for i in $(seq 1 10) ; do touch $i ;sleep 1 ; done
$ touch "a a"
$ ls -t
ls -t1 | sed -n '9,$p' | while read f; do rm "$f" ; done
$ ls -t1
a a
10
9
8
7
6
5
4
Assuming the filenames do not contain tabs or newlines, please try the following:
dir="/tmp/test"
while IFS= read -r line; do
files+=( "$line" )
done < <(find "$dir" -maxdepth 1 -type f -printf "%T#\t%p\n" | sort -n -k 1 -t $'\t' | cut -f 2)
(( ${#files[#]} < 8 )) && exit
n=8
# try the following to remove the oldest n files
rm -- "${files[#]:0:$n}"
# try the following to keep the newest n files and remove older ones
m=$(( ${#files[#]} - n ))
rm -- "${files[#]:$m:$n}"
As #JamesBrown points out, it is not clear to me how many files should be removed or left. Please try either of the options above depending on your requirement.
You could use:
read filename < <(ls -rt)
to know the oldest file (or directory) into the variable filename.
This line invokes ls(1) to create a list of files ordered by time (-t) reversed (-r): the first filename returned will be the oldest.
To know how many files (or directories) are there, you can use:
ls | wc -w
Hope it helps.
how about
$ ls -t | tail +9 | xargs rm
sort the files in time order, list select from 9th and delete.
If for less than 8 files you don't want to see the rm error, just add -f option.

Shell script to loop over all files in a folder and pick them in numerical order

I have the following code to loop through the files of a folder. Files are named 1.txt, 2.txt all the way to 15.txt
for file in .solutions/*; do
if [ -f "$file" ]; then
echo "test case ${file##*/}:"
cat ./testcases/${file##*/}
echo
echo "result:"
cat "$file"
echo
echo
fi
done
My issue I get 1.txt then 10.txt to 15.txt displayed.
I would like it to be displayed in numerical order instead of lexicographical order, in other words I want the loop to iterate though the files in numerical order. Is there any way to achieve this?
ls *.txt | sort -n
This would solve the problem, provided .solutions is a directory and no directory is named with an extension .txt.
and if you want complete accuracy,
ls -al *.txt | awk '$0 ~ /^-/ {print $9}' | sort -n
Update:
As per your edits,
you can simply do this,
ls | sort -n |
while read file
do
#do whatever you want here
:
done
Looping through ls is usually a bad idea since file names can have newlines in them. Redirecting using process substitution instead of piping the results will keep the scope the same (variables you set will stay after the loop).
#!/usr/bin/env bash
while IFS= read -r -d '' file; do
echo "test case ${file##*/}:"
cat ./testcases/${file##*/}
echo
echo "result:"
cat "$file"
echo
echo
done < <(find '.solutions/' -name '*.txt' -type f -print0 | sort -nz)
Setting IFS to "" keeps the leading/trailing spaces, -r to stop backslashes messing stuff up, and -d '' to use NUL instead of newlines.
The find command looks normal files -type f, so the if [ -f "$file" ] check isn't needed. It finds -name '*.txt' files in '.solutions/' and prints them -print0 NUL terminated.
The sort command accepts NUL terminated strings with the -z option, and sorts them numerically with -n.

Faster way to list files with similar names (using bash)?

I have a directory with more than 20K files all with a random number prefix (eg 12345--name.jpg). I want to find files with similar names and remove all but one. I don't care which one because they are duplicates.
To find duplicated names I've use
find . -type f \( -name "*.jpg" \) | | sed -e 's/^[0-9]*--//g' | sort | uniq -d
as the list of a for/next loop.
To find all but one to delete, I'm currently using
rm $(ls -1 *name.jpg | tail -n +2)
This operation is pretty slow. I want to speed this up. Any suggestions?
I would do it like this.
*Note that you are dealing with rm command, so make sure that you have backup of the existing directory in case something goes south.
Create a backup directory and take backup of existing files. Once done check if all the files are there.
mkdir bkp_dir;cp *.jpg /bkp_dir
Create another temp directory where we will keep all only 1 file for each similar name. So all unique file names will be here.
$ mkdir tmp
$ for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
*Explanation of the command is at the last. Once executed, check in /tmp directory if you got unique instances of the files.
Remove all *.jpg files from main directory. Saying again, please verify that all files have been backed up before executing rm command.
rm *.jpg
Backup the unique instances from the temp directory.
cp tmp/*.jpg .
Explanation of command in step 2.
Command to get unique file names for step 2 will be
for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
$(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq) will get the unique file names like file1.jpg , file2.jpg
for i in $(...);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done will copy one file for each filename to tmp/ directory.
You should not be using ls in scripts and there is no reason to use a separate file list like in userunknown's reply.
keepone () {
shift
rm "$#"
}
keepone *name.jpg
If you are running find to identify the files you want to isolate anyway, traversing the directory twice is inefficient. Filter the output from find directly.
find . -type f -name "*.jpg" |
awk '{ f=$0; sub(/^[0-9]*--/, "", f); if (a[f]++) print }' |
xargs echo rm
Take out the echo if the results look like what you expect.
As an aside, the /g flag to sed is useless for a regex which can only match once. The flag says to replace all occurrences on a line instead of the first occurrence on a line, but if there can be only one, the first is equivalent to all.
Assuming no subdirectories and no whitespace-in-filenames involved:
find . -type f -name "*.jpg" | sed -e 's/^[0-9]*--//' | sort | uniq -d > namelist
removebutone () { shift; echo rm "$#"; }; cat namelist | while read n; do removebutone "*--$n"; done
or, better readable:
removebutone () {
shift
echo rm "$#"
}
cat namelist | while read n; do removebutone "*--$n"; done
Shift takes the first parameter from $* off.
Note that the parens around the name parmeter are superflous, and that there shouldn't be two pipes before sed. Maybe you had something else there, which needed to be covered.
If it looks promising, you have, of course, to remove the 'echo' in front of 'rm'.

c shell script: find directory and rename the output of find

i am still new in this shell script. I have a task that given to me which i have difficulty to execute it.
The task is i have a lot of directory which is based on ddmmyy.
03Mar2014 08Aug2013 11Jan2015 16Jan2014 22Feb2014 26Mar2014
03Nov2013 08Jan2014 11Jul2013 16Jul2013 22Jul2013 26Oct2014
03Oct2013 08Jan2015 11Nov2014 16May2014 22Mar2014 26Sep2013
The task is to make the directory to mmyy.
So far, my code is
foreach file(`find . -type d | awk -F/ 'NF == 3'`)
echo $file
set newmove = `echo $file | cut -c 1-2,5-`
echo $newmove
mv $file $newmove
output:
for find . -type d | awk -F/ 'NF == 3':
./24Jan2015/W51A
`echo $file | cut -c 1-2,5-`
./Jan2015/W51A
mv $file $newmove
mv: cannot rename ./24Jan2015/W51A to ./Jan2015/W51A: No such file or directory
but the script didnt work.
Do you guys have any idea how to do this?
First of all, the issue is you're renaming the file but actually tryng to rename the directory, hence the error.
As I understood the idea is to rename the folders in the current working directory to the desired format and by this actually merge the content of the folders from the same mmYYYY format to the new one (since 11Jan2015 and 16Jan2014 will be renamed both to Jan2014)
you can try this:
foreach dir ( `ls` )
set newdir = `echo $dir| cut -c 3-`
mkdir -p $newdir
mv -f $dir/* $newdir
rmdir $dir
end
-p will create the folder and will do nothing if the folder already exists.
Some assumptions:
The folders are at the same place which is pwd
There are always two leading digits in date
You want to merge the folders content of the same mmYYYY format
The files with the same name in different folders will be overwritten
There are only folders in pwd (add check if it's not the case)
In case these folders are in different places (which is not the case according to your output: ./24Jan2015 ) and collision is not an issue - the code should be changed to :
Use find
Create the new folder with the correct path
No merge and overwrite will occur, so 1,3,4 and 5 are not relevant.
UPDATE:
After additional input - if I understand correctly your find is looking only for the folders of depth 3. I can't say why but you can achieve the same much faster with
find . -type d -mindepth=2 -maxdepth=2
The output is the list of the folders that have subfolders.
Then you need to get the second folder's name (assuming it will always be of the expected format).
set olddir = `echo $file| cut -f 1-3 -d '/'`
set newdir = `echo $olddir | cut -c 1-2,5-`
and finally
foreach file(`find . -type d -mindepth=2 -maxdepth=2`)
set olddir = `echo $file| cut -f 1-3 -d '/'`
set newdir = `echo $olddir | cut -c 1-2,5-`
mkdir -p $newdir
mv -f $file $newdir
end
This will also handle the case if two folders were found under the same path.
UPDATE 2:
Since the script will run on Unix - the following updates should be made:
Original find was returned since unix find lacks the mindepth/maxdepth options
We should try to remove the olddir to cleanup the empty folders - it will fail if the folder is not empty but the script should continue to run
foreach file(`find . -type d | awk -F/ 'NF == 3'`)
set olddir = `echo $file| cut -f 1-2 -d '/'`
set newdir = `echo $olddir | cut -c 1-2,5-`
set dir_name=`basename "$file"`
if ( -d "$newdir/$dir_name" ) then
mv -f $file/* $newdir/$dir_name/
else
mkdir -p $newdir
mv -f $file $newdir
endif
rmdir $olddir
end
I really think c shell is the wrong tool for just about anything that involves programming. That said, this looks like it would do what you want with only a little help from an external tool:
#!/bin/csh
foreach file ([0-9][0-9][A-Z][a-z][a-z][0-9][0-9][0-9][0-9])
set new = `echo $file:q | cut -c 3-`
if ( -d "$new" ) then
echo "skipping $file because $new already exists"
continue
endif
mv -v "$file" "$new"
end
Note the glob that matches your list of directories to rename. This script isn't bothering to confirm whether the matched files ARE in fact directories, so if there's the possibility they might not be, you should account for that somehow.
Note that we are using a back-quoted expression to use an external tool, cut to grab a substring from each directory name. We use this (as you did in your question) because CSH IS NOT A PROGRAMMING LANGUAGE, and has no string processing capabilities of its own.
The if statement within the loop will skip any directory whose target already exists. So for example, if you were to go through the top row of your input in your question, 26Mar2014 would be converted to Mar2014, which already exists due to 03Mar2014. Since you haven't specified how this should be handled, this script skips that condition.

Resources