Filtering Filenames with bash - bash

I have a directory full of log files in the form
${name}.log.${year}{month}${day}
such that they look like this:
logs/
production.log.20100314
production.log.20100321
production.log.20100328
production.log.20100403
production.log.20100410
...
production.log.20100314
production.log.old
I'd like to use a bash script to filter out all the logs older than x amount of month's and dump it into *.log.old
X=6 #months
LIST=*.log.*;
for file in LIST; do
is_older = file_is_older_than_months( ${file}, ${X} );
if is_older; then
cat ${c} >> production.log.old;
rm ${c};
fi
done;
How can I get all the files older than x months? and... How can I avoid that *.log.old file is included in the LIST attribute?

The following script expects GNU date to be installed. You can call it in the directory with your log files with the first parameter as the number of months.
#!/bin/sh
min_date=$(date -d "$1 months ago" "+%Y%m%d")
for log in *.log.*;do
[ "${log%.log.old}" "!=" "$log" ] && continue
[ "${log%.*}.$min_date" "<" "$log" ] && continue
cat "$log" >> "${log%.*}.old"
rm "$log"
done

Presumably as a log file, it won't have been modified since it was created?
Have you considered something like this...
find ./ -name "*.log.*" -mtime +60 -exec rm {} \;
to delete files that have not been modified for 60 days. If the files have been modified more recently then this is no good of course.

You'll have to compare the logfile date with the current date. Start with the year, multiply by 12 to get the difference in months. Do the same with months, and add them together. This gives you the age of the file in months (according to the file name).
For each filename, you can use an AWK filter to extract the year:
awk -F. '{ print substr($3,0,4) }'
You also need the current year:
date "+%Y"
To calculate the difference:
$(( current_year - file_year ))
Similarly for months.

assuming you have possibility of modifying the logs and the filename timestamp is the more accurate one. Here's an gawk script.
#!/bin/bash
awk 'BEGIN{
months=6
current=systime() #get current time in sec
sec=months*30*86400 #months in sec
output="old.production" #output file
}
{
m=split(FILENAME,fn,".")
yr=substr(fn[m],0,4)
mth=substr(fn[m],5,2)
day=substr(fn[m],7,2)
t=mktime(yr" "mth" "day" 00 00 00")
if ( (current-t) > sec){
print "file: "FILENAME" more than "months" month"
while( (getline line < FILENAME )>0 ){
print line > output
}
close(FILENAME)
cmd="rm \047"FILENAME"\047"
print cmd
#system(cmd) #uncomment to use
}
}' production*

Related

А command to delete files older than X days, but leave the last 2 files

I need help with the command to delete files on the server.
I have some archive folder.
file names of the form app-XXXXXX.tar.gz, where XXXXXX is the backup date. For example, app-231019.tar.gz
I need to delete files older than 14 days, but not the last 2 files.
I found a command
find /folder +14 -type f -delete
but it is not suitable for me
The filter "older than 14 days" should be applied based on the file name, and not by the date of recording to the server.
I cannot find a command on how to set a limit so that the last 2 files are not deleted, even if they are older than 14 days.
Would you please try the following:
dir="dir" # replace with your pathname
fortnightago=$(awk 'BEGIN {print strftime("%y%m%d", systime() - 86400 * 14)}')
# If your date command supports -d option, you can also say as:
# fortnightago=$(date -d "14 days ago" +%y%m%d)
for i in "$dir"/app-*.tar.gz; do
if [[ $i =~ app-([0-9]{2})([0-9]{2})([0-9]{2})\.tar\.gz ]]; then
yy="${BASH_REMATCH[3]}"
mm="${BASH_REMATCH[2]}"
dd="${BASH_REMATCH[1]}"
if (( $yy$mm$dd <= $fortnightago )); then
printf "%d%d%d%c%s\n" "${yy#0}" "${mm#0}" "${dd#0}" $'\t' "$i"
fi
fi
done | sort -rn -k1 | tail -n +3 | cut -f 2 | xargs rm --
[Explanation]
First it extracts the date string and rearrange it in "%y%m%d" order
for numeric comparison.
Print filenames which are older than 14 days with adding the date
in the 1st field.
Then sort the filenames by the 1st field in the descending order (the latest file first).
Then skip the first two lines to keep them.
Cut out the filenames as a removing list.
The filenames are passed to xargs with rm command.
As an alternative, if perl is your option, you can say:
perl -e '
$dir = "dir";
#t = localtime(time() - 86400 * 14);
$fortnightago = sprintf("%02d%02d%02d", $t[5] - 100, $t[4] + 1, $t[3]);
#ary = map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
grep { $_->[1] <= $fortnightago }
map { [ $_, m/app-(\d{2})(\d{2})(\d{2})\.tar\.gz/ && "$3$2$1" ] }
(<$dir/app-*.tar.gz>);
unlink splice(#ary, 2);
'
Hope this helps.

Remove lines having older end time

In my bash script I want to add a code which remove all entries older than x days.
To simplify this problem, I have divided into 3 parts. 2 parts are
done looking answer for 3rd part.)
a) To find the latest log date - Done
b) evaluate earliest epoch time. (All entries before this epoch
time should be deleted) - Done
No_OF_DAYS=2
One_Day=86400000
Latest_Time=`find . -name '*.tps' -exec sed '/endTime/!d; s/{//; s/,.*//' {} + | sort -r | head -1 | cut -d: -f2` #latest epoch time
Days_in_Epoch=$(($One_Day * $No_OF_DAYS))
Earliest_Time=$((Latest_Time - $Days_in_Epoch)) #earliest epoch time
c) delete all log entries older than evaluated earliest time.
PS:
there are multiple files and distributed in different sub folders.
All files having extension as ".tps".
time is in epoch format. endTime will be considered for calculations.("endTime":1488902735220)
sample data
Code:
{"endTime":1488902734775,"startTime":1488902734775,"operationIdentity":"publishCacheStatistics","name":"murex.risk.control.excesses.cache.CacheStatisticsTracer","context":{"parentContext":{"id":-1,"parentContext":null},"data":[{"value":"excessCacheExcessKeysToContexts","key":"name"},{"value":"0","key":"hits"},{"value":"0","key":"misses"},{"value":"0","key":"count"},{"value":"0","key":"maxElements"},{"value":"0","key":"evictions"},{"value":"N/A","key":"policy"}],"id":0}}
{"endTime":1488902735220,"startTime":1488902735220,"operationIdentity":"publishCacheStatistics","name":"murex.risk.control.excesses.cache.CacheStatisticsTracer","context":{"parentContext":{"id":-1,"parentContext":null},"data":[{"value":"excessCacheExcessKeysToContexts","key":"name"},{"value":"0","key":"hits"},{"value":"0","key":"misses"},{"value":"0","key":"count"},{"value":"0","key":"maxElements"},{"value":"0","key":"evictions"},{"value":"N/A","key":"policy"}],"id":8}}
{"endTime":1488902735550,"startTime":1488902735550,"operationIdentity":"publishCacheStatistics","name":"murex.risk.control.excesses.cache.CacheStatisticsTracer","context":{"parentContext":{"id":-1,"parentContext":null},"data":[{"value":"excessCacheContextsToExcessIds","key":"name"},{"value":"0","key":"hits"},{"value":"0","key":"misses"},{"value":"0","key":"count"},{"value":"0","key":"maxElements"},{"value":"0","key":"evictions"},{"value":"N/A","key":"policy"}],"id":9}}
For Example:
a)
latest epoch time = 1488902735550
b)
earliest epoch time = 1488902735220
Problem: Now I am looking for command which delete all the entries which is older/lesses than earliest epoch time. In above example 1st line should be deleted.
Any help/suggestions are appreciated. Thank you Linux
This will do the trick buddy. Be careful to test it with backup files first as it will overwrite your logs directly. Also change the TIME variable for whatever you want to compare.
while read file
do
awk -v FS=':|,' -v TIME='1488902735220' '{ if (! ($2 > TIME) && !( $0 ~ /^ *$/ ) ) { print $0 } }' $file > tmp.txt && cat tmp.txt > $file
done < <( find ./ -name '*.tps' 2>/dev/null )
Regards!
Based on your current solution, I'd use a simple loop to read the file line by line and only output those whose endTime is greater than your earliest time :
while read line; do
line_endTime=$(awk -F '[:,]' '{print $2}' <<< $line)
if [ "$line_endTime" -le "$Earliest_Time" ]; then echo $line; fi
done < input_file > filtered_output_file

how do you through loop a certain date range

I have files in a directory that are date based but not obviously date-stamped.
File_yyyymmdd_record.log
These are lying around in a directory for a few years worth of time.
Now if these were simply numbers all I needed to do was get the difference and incremenet a counter to push the value
var=substring( File_yyyymmdd_record.log ) /* get the yyyymmdd part */
var2=substring( File2_yyyymmdd_record.log ) /* get the yyyymmdd part */
delta=var2-var1
set i=delta and loop through to get the values for all these recordID's ( record ID is the yyyymmdd part )
The problem is if I have 2 different months and also years in the directory say 20131210 and 20140110
the difference not going to gimme all the recordID's in that directory , since, when it spills over to the next month the plain numeric calculation is not applicable- it should be a date based calculation.
what I want to do is use 2 input parameters to the shell
shell.sh recordID1 recordID2
and based on these it will find all records and store them some place and loop through each record as an input like this
find <dir> -iname recordID* ...<some awk and sed here> |
while read recordID ;
do <stuff >
done
Anyway this can be achieved esp in 2 contexts-
First the date calculation part and the other is to store these recordID's so I can cycle through them. Maybe echo them to a tmp file is what comes off the bat.
For the date calculation part - I tried this and it works . But not sure if it will falter some time / situation
echo $((($(date -u -d 2010-04-29 +%s) - $(date -u -d 2010-03-28 +%s)) / 86400))
So given recordID1 as 20100328 I have 32 days recordID's to look for in that directory.
You have to advance dates for 32 days from recordID1 and store them some place.
How best can all this be done.
I got your points, you need find out log files with file name between 20131210 and 20140110 .
(no need convert to epoch time)
#! /usr/bin/bash
sep=20131210
eep=20140110
find /DIR -type f -name "*.log" |while read file
do
d=${file:5:8}
if [ "$d" -ge "$sep" ] && [ "$d" -le "$eep" ]; then
do <stuff >
fi
done
Something like this should do:
s=20130102 # start date
e=20130202 # end date
sep=$(date +"%s" -d"$s") # conv to epoch
eep=$(date +"%s" -d"$e")
for f in *.log; do
d=$(date +"%s" -d$(sed -n 's/^[^_]*_\([^_]*\)_[^_]*.log/\1/p' <<< "$f"))
if [ "$d" -ge "$sep" ] && [ "$d" -le "$eep" ]; then
echo $f
fi
done

Using sed to find, convert and replace lines

I don't know too much of bash scripting and I'm trying to develop a bash script to do this operations:
I have a lot of .txt files in the same directory.
Every .txt file follows this structure:
file1.txt:
<name>first operation</name>
<operation>21</operation>
<StartTime>1292435633</StartTime>
<EndTime>1292435640</EndTime>
<name>second operation</name>
<operation>21</operation>
<StartTime>1292435646</StartTime>
<EndTime>1292435650</EndTime>
I want to search every <StartTime> line and convert it to standard date/time format (not unix timestamp) but preserving the structure <StartTime>2010-12-15 22:52</StartTime>, for example. This could be a function of search/replace, using sed? I think I could use these function that I found: date --utc --date "1970-01-01 $1 sec" "+%Y-%m-%d %T"
I want to to do the same with <EndTime> tag.
I should do this for all *.txt files in a directory.
I tried using sed but with not wanted results. As I said I don't know so much of bash scripting so any help would be appreciated.
Thank you for your help!
Regards
sed is incapable of doing date conversions; instead I would reccomend you to use a more appropriate tool like awk:
echo '<StartTime>1292435633</StartTime>' | awk '{
match($0,/[0-9]+/);
t = strftime("%F %T",substr($0,RSTART,RLENGTH),1);
sub(/[0-9]+/,t)
}
{print}'
If your input files have one tag per line, as in your structure example, it should work flawlessly.
If you need to repeat the operation for every .txt file just use a shell for:
for file in *.txt; do
awk '/^<[^>]*Time>/{
match($0,/[0-9]+/);
t = strftime("%F %T",substr($0,RSTART,RLENGTH),1);
sub(/[0-9]+/,t)
} 1' "$file" >"$file.new"
# mv "$file.new" "$file"
done
In comparison to the previous code, I have done two minor changes:
added condition /^<[^>]*Time>/ that checks if the current line starts with or
converted {print} to the shorter '1'
If the files ending with .new contain the result you were expecting, you can uncomment the line containing mv.
Using grep:
while read line;do
if [[ $line == *"<StartTime>"* || $line == *"<EndTime>"* ]];then
n=$(echo $line | grep -Po '(?<=(>)).*(?=<)')
line=${line/$n/$(date -d #$n)}
fi
echo $line >> file1.new.txt
done < file1.txt
$ cat file1.new.txt
<name>first operation</name>
<operation>21</operation>
<StartTime>Wed Dec 15 18:53:53 CET 2010</StartTime>
<EndTime>Wed Dec 15 18:54:00 CET 2010</EndTime>
<name>second operation</name>
<operation>21</operation>
<StartTime>Wed Dec 15 18:54:06 CET 2010</StartTime>
<EndTime>Wed Dec 15 18:54:10 CET 2010</EndTime>

bash shell date parsing, start with specific date and loop through each day in month

I need to create a bash shell script starting with a day and then loop through each subsequent day formatting that output as %Y_%m_d
I figure I can submit a start day and then another param for the number of days.
My issue/question is how to set a DATE (that is not now) and then add a day.
so my input would be 2010_04_01 6
my output would be
2010_04_01
2010_04_02
2010_04_03
2010_04_04
2010_04_05
2010_04_06
[radical#home ~]$ cat a.sh
#!/bin/bash
START=`echo $1 | tr -d _`;
for (( c=0; c<$2; c++ ))
do
echo -n "`date --date="$START +$c day" +%Y_%m_%d` ";
done
Now if you call this script with your params it will return what you wanted:
[radical#home ~]$ ./a.sh 2010_04_01 6
2010_04_01 2010_04_02 2010_04_03 2010_04_04 2010_04_05 2010_04_06
Very basic bash script should be able to do this:
#!/bin/bash
start_date=20100501
num_days=5
for i in `seq 1 $num_days`
do
date=`date +%Y/%m/%d -d "${start_date}-${i} days"`
echo $date # Use this however you want!
done
Output:
2010/04/30
2010/04/29
2010/04/28
2010/04/27
2010/04/26
Note: NONE of the solutions here will work with OS X. You would need, for example, something like this:
date -v-1d +%Y%m%d
That would print out yesterday for you. Or with underscores of course:
date -v-1d +%Y_%m_%d
So taking that into account, you should be able to adjust some of the loops in these examples with this command instead. -v option will easily allow you to add or subtract days, minutes, seconds, years, months, etc. -v+24d would add 24 days. and so on.
#!/bin/bash
inputdate="${1//_/-}" # change underscores into dashes
for ((i=0; i<$2; i++))
do
date -d "$inputdate + $i day" "+%Y_%m_%d"
done
Very basic bash script should be able to do this.
Script:
#!/bin/bash
start_date=20100501
num_days=5
for i in seq 1 $num_days
do
date=date +%Y/%m/%d -d "${start_date}-${i} days"
echo $date # Use this however you want!
done
Output:
2010/04/30
2010/04/29
2010/04/28
2010/04/27
2010/04/26
You can also use cal, for example
YYYY=2014; MM=02; for d in $(cal $MM $YYYY | grep "^ *[0-9]"); do DD=$(printf "%02d" $d); echo $YYYY$MM$DD; done
(originally posted here on my commandlinefu account)
You can pass a date via command line option -d to GNU date handling multiple input formats:
http://www.gnu.org/software/coreutils/manual/coreutils.html#Date-input-formats
Pass starting date as command line argument or use current date:
underscore_date=${1:-$(date +%y_%m_%d)}
date=${underscore_date//_/-}
for days in $(seq 0 6);do
date -d "$date + $days days" +%Y_%m_%d;
done
you can use gawk
#!/bin/bash
DATE=$1
num=$2
awk -vd="$DATE" -vn="$num" 'BEGIN{
m=split(d,D,"_")
t=mktime(D[1]" "D[2]" "D[3]" 00 00 00")
print d
for(i=1;i<=n;i++){
t+=86400
print strftime("%Y_%m_%d",t)
}
}'
output
$ ./shell.sh 2010_04_01 6
2010_04_01
2010_04_02
2010_04_03
2010_04_04
2010_04_05
2010_04_06
2010_04_07

Resources