I want to find files older than N days from a given timestamp in format YYYYMMDDHH
I can find file older than 2 days with the below command, but this finds files with present time:
find /path/to/dir -mtime -2 -type f -ls
Lets say I give the input timestamp=2011093009 I want to find files older than 2 days from 2011093009.
Been doing my research, but can't seem to figure it out.
put one of the answers from here and using $() as suggested here
(updated as per comment by sputnick)
date=2011060109; find /home/kenjal/ -mtime $(( $(date +%Y%m%d%H) - $(date -d $date +%Y%m%d%H) ))
Basically this is accomplished by finding files in a range of dates...
I used perl to calculate the days from today to the given timestamp since GNU date is not available in my system, so -d is not an option. Code Below accepts date in format YYYYDDMM. See below:
#!/usr/bin/perl
use Time::Local;
my($day, $month, $year) = (localtime)[3,4,5];
$month = sprintf '%02d', $month+1;
$day = sprintf '%02d', $day;
my($currentYear, $currentDM) = ($year+1900, "$day$month");
my $todaysDate = "$currentYear$currentDM";
#print $todaysDate;
sub to_epoch {
my ($t) = #_;
my ($y, $d, $m) = ($t =~ /(\d{4})(\d{2})(\d{2})/);
return timelocal(0, 0, 0, $d+0, $m-1, $y-1900);
}
sub diff_days {
my ($t1, $t2) = #_;
return (abs(to_epoch($t2) - to_epoch($t1))) / 86400;
}
print diff_days($todaysDate, $ARGV[0]);
**Note: I'm no expert in Perl and this is the very first piece of code I modify/write. Having said that, I'm sure there are better ways to accomplish the above in Perl
Then the below korn script to perform what I needed.
#!/bin/ksh
daysFromToday=$(dateCalc.pl 20110111)
let daysOld=$daysFromToday+31
echo $daysFromToday "\t" $daysOld
find /path/to/dir/ -mtime +$daysFromToday -mtime -$daysOld -type f -ls
I'm finding all files older than +$daysFromToday, then narrowing the search to days newer than -$daysOld
#!/usr/bin/env bash
# getFiles arrayName olderDate newerDate [ pathName ]
getFiles() {
local i
while IFS= read -rd '' "$1"'[(_=$(read -rd "" x; echo "${x:-0}")) < $2 && _ > $3 ? ++i : 0]'; do
:
done < <(find "${4:-.}" -type f -printf '%p\0%Ts\0')
}
# main date1 date2 [ pathName ]
main() {
local -a dates files
local x
for x in "${#:1:2}"; do
dates+=( "$(date -d "$x" +%s)" ) || return 1
done
_=$dates let 'dates[1] > dates && (dates=dates[1], dates[1]=_)'
getFiles files "${dates[#]}" "$3"
declare -p files
}
main "$#"
# vim: set fenc=utf-8 ff=unix ts=4 sts=4 sw=4 ft=sh nowrap et:
This Bash script takes two dates and a pathname for find. getFiles takes an array name and the files with mtimes between the two dates are assigned to that array. This example script simply prints the array.
Requires a recent Bash, and GNU date. If it really has to be "N days", or you don't have GNU date, then there is no possible solution. You'll need to use a different language. No shell can do that using standard utilities.
Technically, you can calculate an offset in days using printf '%(%s)T' ... and some arithmetic, but there is no possible way to get the base date from a timestamp without GNU date, so I'm afraid you're out of luck.
Edit
I see this question has a ksh tag, in which case I lied, apparently ksh93's printf accepts a GNU date -d like string. I have no idea whether it's portable, and of course requires a system with ksh93 installed. You could do it in that case with some modification to the above script.
Related
I need help with the command to delete files on the server.
I have some archive folder.
file names of the form app-XXXXXX.tar.gz, where XXXXXX is the backup date. For example, app-231019.tar.gz
I need to delete files older than 14 days, but not the last 2 files.
I found a command
find /folder +14 -type f -delete
but it is not suitable for me
The filter "older than 14 days" should be applied based on the file name, and not by the date of recording to the server.
I cannot find a command on how to set a limit so that the last 2 files are not deleted, even if they are older than 14 days.
Would you please try the following:
dir="dir" # replace with your pathname
fortnightago=$(awk 'BEGIN {print strftime("%y%m%d", systime() - 86400 * 14)}')
# If your date command supports -d option, you can also say as:
# fortnightago=$(date -d "14 days ago" +%y%m%d)
for i in "$dir"/app-*.tar.gz; do
if [[ $i =~ app-([0-9]{2})([0-9]{2})([0-9]{2})\.tar\.gz ]]; then
yy="${BASH_REMATCH[3]}"
mm="${BASH_REMATCH[2]}"
dd="${BASH_REMATCH[1]}"
if (( $yy$mm$dd <= $fortnightago )); then
printf "%d%d%d%c%s\n" "${yy#0}" "${mm#0}" "${dd#0}" $'\t' "$i"
fi
fi
done | sort -rn -k1 | tail -n +3 | cut -f 2 | xargs rm --
[Explanation]
First it extracts the date string and rearrange it in "%y%m%d" order
for numeric comparison.
Print filenames which are older than 14 days with adding the date
in the 1st field.
Then sort the filenames by the 1st field in the descending order (the latest file first).
Then skip the first two lines to keep them.
Cut out the filenames as a removing list.
The filenames are passed to xargs with rm command.
As an alternative, if perl is your option, you can say:
perl -e '
$dir = "dir";
#t = localtime(time() - 86400 * 14);
$fortnightago = sprintf("%02d%02d%02d", $t[5] - 100, $t[4] + 1, $t[3]);
#ary = map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
grep { $_->[1] <= $fortnightago }
map { [ $_, m/app-(\d{2})(\d{2})(\d{2})\.tar\.gz/ && "$3$2$1" ] }
(<$dir/app-*.tar.gz>);
unlink splice(#ary, 2);
'
Hope this helps.
I have 4 different files with different fileName.date formats, having a date embedded as part of the name. I want to identify the files older than 3 months based on their name only because the files would be edited/changed later as well. I want to create a shell script and run it as a cron.
Here below are the file under the same directory:
fileone.log.2018-03-23
file_two_2018-03-23.log
filethree.log.2018-03-23
file_four_file_four_2018-03-23.log
I have checked the existing example but have not found what I am actually looking for!
Working on the premise that you mean 90 days - if you need specifically months, we can check that too, but it's different logic.
here's some code you could work from -
(you said you don't want to work from a list, so I edited to use the current directory.)
$: cat chkDates
# while read f # replaced with -
for f in *[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]*
do # first get the epoch timestamp of the file based on the sate string embedded in the name
filedate=$(
date +%s -d $(
echo $f | sed -E 's/.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\1/'
) # this returns the date substring
) # this converts it to an epoch integer of seconds since 1/1/70
# now see if it's > 90 days ( you said 3 months. if you need *months* we have to do some more...)
daysOld=$(( ( $(date +%s) - $filedate ) / 86400 )) # this should give you an integer result, btw
if (( 90 < $daysOld ))
then echo $f is old
else echo $f is not
fi
done # < listOfFileNames # not reading list now
You can pass date a date to report, and a format to present it.
sed pattern explanation
Note the sed -E 's/.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\1/' command. This assumes the date format will be consistently YYYY-MM-DD, and does no validations of reasonableness. It will happily accept any 4 digits, then 2, then 2, delimited by dashes.
-E uses expanded regexes, so parens () can denote values to be remembered, without needing \'s. . means any character, and * means any number (including zero) of the previous pattern, so .* means zero or more characters, eating up all the line before the date. [0-9] means any digit. {x,y} sets a minimum(x) and maximum(y) number of consecutive matches - with only one value {4} means only exactly 4 of the previous pattern will do. So, '.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*' means ignore as many characters as you can until seeing 4 digits, then a dash, 2 digits, then a dash, then 2 digits; remember that pattern (the ()'s), then ignore any characters behind it.
In a substitution, \1 means the first remembered match, so
sed -E 's/.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\1/'
means find and remember the date pattern in the filenames, and replace the whole name with just that part in the output. This assumes the date will be present - on a filename where there is no date, the pattern will not match, and the whole filename will be returned, so be careful with that.
(hope that helped.)
By isolating the date string from the filenames with sed (your examples were format-consistent, so I used that) we pass it in and ask for the UNIX Epoch timestamp of that date string using date +%s -d $(...), to represent the file with a math-handy number.
Subtract that from the current date in the same format, you get the approximate age of the file in seconds. Divide that by the number of seconds in a day and you get days old. The file date will default to midnight, but the math will drop fractions, so it sorts out.
here's the file list I made, working from your examples
$: cat listOfFileNames
fileone.log.2018-03-23
fileone.log.2018-09-23
file_two_2018-03-23.log
file_two_2018-08-23.log
filethree.log.2018-03-23
filethree.log.2018-10-02
file_four_file_four_2018-03-23.log
file_four_file_four_2019-03-23.log
I added a file for each that would be within the 90 days as of this posting - including one that is "post-dated", which can easily happen with this sort of thing.
Here's the output.
$: ./chkDates
fileone.log.2018-03-23 is old
fileone.log.2018-09-23 is not
file_two_2018-03-23.log is old
file_two_2018-08-23.log is not
filethree.log.2018-03-23 is old
filethree.log.2018-10-02 is not
file_four_file_four_2018-03-23.log is old
file_four_file_four_2019-03-23.log is not
That what you had in mind?
An alternate pure-bash way to get just the date string
(You still need date to convert to the epoch seconds...)
instead of
filedate=$(
date +%s -d $(
echo $f | sed -E 's/.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\1/'
) # this returns the date substring
) # this converts it to an epoch integer of seconds since 1/1/70
which doesn't seem to be working for you, try this:
tmp=${f%[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]*} # unwanted prefix
d=${f#$tmp} # prefix removed
tmp=${f#*[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]} # unwanted suffix
filedate=${d%$tmp} # suffix removed
filedate=$( date +%s --date=$filedate ) # epoch time
This is hard to read, but doesn't have to spawn as many subprocesses to get the work done. :)
If that doesn't work, then I'm suspicious of your version of date. Mine:
$: date --version
date (GNU coreutils) 8.26
UPDATE:
Simple Version:
Method for using the date inside of the file's name :
typeset stamp=$(date --date="90 day ago" +%s)
for file in /directory/*.log; do
fdate="$(echo "$file" | sed 's/[^0-9-]*//g')"
fstamp=$(date -d "${fdate} 00:00:00" +"%s")
if [ ${fstamp} -le ${stamp} ] ; then
echo "${file} : ${fdate} (${fstamp})"
fi
done
A More Complete Version:
This version will look at all files, if it fails to make a date value from the file it moves on.
typeset stamp=$(date --date="90 day ago" +%s)
for file in /tmp/* ; do
fdate="$(echo "$file" | sed 's/[^0-9-]*//g')"
fstamp=$(date -d "${fdate} 00:00:00" +"%s" 2> /dev/null)
[[ $? -ne 0 ]] && continue
if [ ${fstamp} -le ${stamp} ] ; then
echo "${file} : ${fdate} (${fstamp})"
fi
done
output:
/tmp/file_2016-05-23.log : 2016-05-23 (1463976000)
/tmp/file_2017-05-23.log : 2017-05-23 (1495512000)
/tmp/file_2018-05-23.log : 2018-05-23 (1527048000)
/tmp/file_2018-06-23.log : 2018-06-23 (1529726400)
/tmp/file_2018-07-23.log : 2018-07-23 (1532318400)
in this example the following were ignored :
/tmp/file_2018-08-23.log : 2018-08-23 (1534996800)
/tmp/file_2018-10-18.log : 2018-10-18 (1539835200)
I have a file with more than 10K lines of record.
Within each line, there are two date+time info. Below is an example:
"aaa bbb ccc 170915 200801 12;ddd e f; g; hh; 171020 122030 10; ii jj kk;"
I want to filter out the lines the days between these two dates is less than 30 days.
Below is my source code:
#!/bin/bash
filename="$1"
echo $filename
touch filterfile
totalline=`wc -l $filename | awk '{print $1}'`
i=0
j=0
echo $totalline lines
while read -r line
do
i=$[i+1]
if [ $i -gt $[j+9] ]; then
j=$i
echo $i
fi
shortline=`echo $line | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'`
date1=`echo $shortline | awk '{print $1}'`
date2=`echo $shortline | awk '{print $2}'`
if [ $date1 -gt 700000 ]
then
continue
fi
d1=`date -d $date1 +%s`
d2=`date -d $date2 +%s`
diffday=$[(d2-d1)/(24*3600)]
#diffdays=`date -d $date2 +%s` - `date -d $date1 +%s`)/(24*3600)
if [ $diffday -lt 30 ]
then
echo $line >> filterfile
fi
done < "$filename"
I am running it in cywin. It took about 10 second to handle 10 lines. I use echo $i to show the progress.
Is it because i am using some wrong way in my script?
This answer does not answer your question but gives an alternative method to your shell script. The answer to your question is given by Sundeep's comment :
Why is using a shell loop to process text considered bad practice?
Furthermore, you should be aware that everytime you call sed, awk, echo, date, ... you are requesting the system to execute a binary which needs to be loaded into memory etc etc. So if you do this in a loop, it is very inefficient.
alternative solution
awk programs are commonly used to process log files containing timestamp information, indicating when a particular log record was written. gawk extended the awk standard with time-handling functions. The one you are interested in is :
mktime(datespec [, utc-flag ]) Turn datespec into a timestamp in the
same form as is returned by systime(). It is similar to the function
of the same name in ISO C. The argument, datespec, is a string of the
form "YYYY MM DD HH MM SS [DST]". The string consists of six or seven
numbers representing, respectively, the full year including century,
the month from 1 to 12, the day of the month from 1 to 31, the hour of
the day from 0 to 23, the minute from 0 to 59, the second from 0 to
60, and an optional daylight-savings flag.
The values of these numbers need not be within the ranges specified;
for example, an hour of -1 means 1 hour before midnight. The
origin-zero Gregorian calendar is assumed, with year 0 preceding year
1 and year -1 preceding year 0. If utc-flag is present and is either
nonzero or non-null, the time is assumed to be in the UTC time zone;
otherwise, the time is assumed to be in the local time zone. If the
DST daylight-savings flag is positive, the time is assumed to be
daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), mktime() attempts to determine
whether daylight savings time is in effect for the specified time.
If datespec does not contain enough elements or if the resulting time
is out of range, mktime() returns -1.
As your date format is of the form yymmdd HHMMSS we need to write a parser function convertTime for this. Be aware in this function we will pass times of the form yymmddHHMMSS. Furthermore, using a space delimited fields, your times are located in field $4$5 and $11$12. As mktime converts the time to seconds since 1970-01-01 onwards, all we need to do is to check if the delta time is smaller than 30*24*3600 seconds.
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ t1=convertTime($4$5); t2=convertTime($11$12)}
(t2-t1 < 30*3600*24) { print }' <file>
If you are not interested in the real delta time (your sed line removes the actual time of the day), than you can adopt it to :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s "00 00 00"
return mktime(s)
}
{ t1=convertTime($4); t2=convertTime($11)}
(t2-t1 < 30*3600*24) { print }' <file>
If the dates are not in the fields, you can use match to find them :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ match($0,/[0-9]{6} [0-9]{6}/);
t1=convertTime(substr($0,RSTART,RLENGTH));
a=substr($0,RSTART+RLENGTH)
match(a,/[0-9]{6} [0-9]{6}/)
t2=convertTime(substr(a,RSTART,RLENGTH))}
(t2-t1 < 30*3600*24) { print }' <file>
With some modifications, often without speed in mind, I can reduce the processing time by 50% - which is a lot:
#!/bin/bash
filename="$1"
echo "$filename"
# touch filterfile
totalline=$(wc -l < "$filename")
i=0
j=0
echo "$totalline" lines
while read -r line
do
i=$((i+1))
if (( i > ((j+9)) )); then
j=$i
echo $i
fi
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
if (( date1 > 700000 ))
then
continue
fi
d1=$(date -d "$date1" +%s)
d2=$(date -d "$date2" +%s)
diffday=$(((d2-d1)/(24*3600)))
# diffdays=$(date -d $date2 +%s) - $(date -d $date1 +%s))/(24*3600)
if (( diffday < 30 ))
then
echo "$line" >> filterfile
fi
done < "$filename"
Some remarks:
# touch filterfile
Well - the later CMD >> filterfile overwrites this file and creates one, if it doesn't exist.
totalline=$(wc -l < "$filename")
You don't need awk, here. The filename output is surpressed if wc doesn't see the filename.
Capturing the output in an array:
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
allows us array access and saves another call to awk.
On my machine, your code took about 42s for 2880 lines (on your machine 2880 s?) and about 19s for the same file with my code.
So I suspect, if you aren't running it on an i486-machine, that cygwin might be a slowdown. It's a linux environment for windows, isn't it? Well, I'm on a core Linux system. Maybe you try the gnu-utils for Windows - the last time I looked for them, they were advertised as gnu-utils x32 or something, maybe there is an a64-version available by now.
And the next thing I would have a look at, is the date calculation - that might be a slowdown too.
2880 lines isn't that much, so I don't suspect that my SDD drive plays a huge role in the game.
I have files in a directory that are date based but not obviously date-stamped.
File_yyyymmdd_record.log
These are lying around in a directory for a few years worth of time.
Now if these were simply numbers all I needed to do was get the difference and incremenet a counter to push the value
var=substring( File_yyyymmdd_record.log ) /* get the yyyymmdd part */
var2=substring( File2_yyyymmdd_record.log ) /* get the yyyymmdd part */
delta=var2-var1
set i=delta and loop through to get the values for all these recordID's ( record ID is the yyyymmdd part )
The problem is if I have 2 different months and also years in the directory say 20131210 and 20140110
the difference not going to gimme all the recordID's in that directory , since, when it spills over to the next month the plain numeric calculation is not applicable- it should be a date based calculation.
what I want to do is use 2 input parameters to the shell
shell.sh recordID1 recordID2
and based on these it will find all records and store them some place and loop through each record as an input like this
find <dir> -iname recordID* ...<some awk and sed here> |
while read recordID ;
do <stuff >
done
Anyway this can be achieved esp in 2 contexts-
First the date calculation part and the other is to store these recordID's so I can cycle through them. Maybe echo them to a tmp file is what comes off the bat.
For the date calculation part - I tried this and it works . But not sure if it will falter some time / situation
echo $((($(date -u -d 2010-04-29 +%s) - $(date -u -d 2010-03-28 +%s)) / 86400))
So given recordID1 as 20100328 I have 32 days recordID's to look for in that directory.
You have to advance dates for 32 days from recordID1 and store them some place.
How best can all this be done.
I got your points, you need find out log files with file name between 20131210 and 20140110 .
(no need convert to epoch time)
#! /usr/bin/bash
sep=20131210
eep=20140110
find /DIR -type f -name "*.log" |while read file
do
d=${file:5:8}
if [ "$d" -ge "$sep" ] && [ "$d" -le "$eep" ]; then
do <stuff >
fi
done
Something like this should do:
s=20130102 # start date
e=20130202 # end date
sep=$(date +"%s" -d"$s") # conv to epoch
eep=$(date +"%s" -d"$e")
for f in *.log; do
d=$(date +"%s" -d$(sed -n 's/^[^_]*_\([^_]*\)_[^_]*.log/\1/p' <<< "$f"))
if [ "$d" -ge "$sep" ] && [ "$d" -le "$eep" ]; then
echo $f
fi
done
I have a directory full of log files in the form
${name}.log.${year}{month}${day}
such that they look like this:
logs/
production.log.20100314
production.log.20100321
production.log.20100328
production.log.20100403
production.log.20100410
...
production.log.20100314
production.log.old
I'd like to use a bash script to filter out all the logs older than x amount of month's and dump it into *.log.old
X=6 #months
LIST=*.log.*;
for file in LIST; do
is_older = file_is_older_than_months( ${file}, ${X} );
if is_older; then
cat ${c} >> production.log.old;
rm ${c};
fi
done;
How can I get all the files older than x months? and... How can I avoid that *.log.old file is included in the LIST attribute?
The following script expects GNU date to be installed. You can call it in the directory with your log files with the first parameter as the number of months.
#!/bin/sh
min_date=$(date -d "$1 months ago" "+%Y%m%d")
for log in *.log.*;do
[ "${log%.log.old}" "!=" "$log" ] && continue
[ "${log%.*}.$min_date" "<" "$log" ] && continue
cat "$log" >> "${log%.*}.old"
rm "$log"
done
Presumably as a log file, it won't have been modified since it was created?
Have you considered something like this...
find ./ -name "*.log.*" -mtime +60 -exec rm {} \;
to delete files that have not been modified for 60 days. If the files have been modified more recently then this is no good of course.
You'll have to compare the logfile date with the current date. Start with the year, multiply by 12 to get the difference in months. Do the same with months, and add them together. This gives you the age of the file in months (according to the file name).
For each filename, you can use an AWK filter to extract the year:
awk -F. '{ print substr($3,0,4) }'
You also need the current year:
date "+%Y"
To calculate the difference:
$(( current_year - file_year ))
Similarly for months.
assuming you have possibility of modifying the logs and the filename timestamp is the more accurate one. Here's an gawk script.
#!/bin/bash
awk 'BEGIN{
months=6
current=systime() #get current time in sec
sec=months*30*86400 #months in sec
output="old.production" #output file
}
{
m=split(FILENAME,fn,".")
yr=substr(fn[m],0,4)
mth=substr(fn[m],5,2)
day=substr(fn[m],7,2)
t=mktime(yr" "mth" "day" 00 00 00")
if ( (current-t) > sec){
print "file: "FILENAME" more than "months" month"
while( (getline line < FILENAME )>0 ){
print line > output
}
close(FILENAME)
cmd="rm \047"FILENAME"\047"
print cmd
#system(cmd) #uncomment to use
}
}' production*