Extract data from log [duplicate] - shell

This question already has an answer here:
bash only email if occurrence since last alert
(1 answer)
Closed 7 years ago.
I have logs in format
##<01-Mar-2015 03:48:18 o'clock GMT> <info>
##<01-Mar-2015 03:48:20 o'clock GMT> <info>
##<01-Mar-2015 03:48:30 o'clock GMT> <info>
##<01-Mar-2015 03:48:39 o'clock GMT> <info>
I got to write shell script to extract data of last 5 minutes from the last recorded data in the log file and then search a string in it.I am new to shell script , I used grep command but its of no use.Can anyone help me here.
I tried the below script
#!/bin/bash
H=1 ## Hours
LOGFILE=/path/to/logfile.txt
X=$(( H * 60 * 60 )) ## Hours converted to seconds
function get_ts {
DATE="${1%%\]*}"; DATE="${DATE##*\[}"; DATE=${DATE/:/ }; DATE=${DATE//\// }
TS=$(date -d "$DATE" '+%s')
}
get_ts "$(tail -n 1 "$LOGFILE")"
LAST=$TS
while read -r LINE; do
get_ts "$LINE"
(( (LAST - TS) <= X )) && echo "$LINE"
done < "$LOGFILE"
and on running it get the below error
get_ts: DATE=${DATE/:/ }: 0403-011 The specified substitution is not valid for this command.

IF you use awk, you can use date to get data for example last 5 minutes like this:
awk '$0>=from' from="$(date +"##<%d-%b-%Y %H:%M:%S" -d -5min)" logile
PS, you need the date command to match your format.

I'd parse the date into seconds since epoch and compare that with the system time:
TZ=GMT awk -F '[#<> :-]+' 'BEGIN { split("Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec", mnames, ","); for(i = 1; i <= 12; ++i) m[mnames[i]] = i } mktime($4 " " m[$3] " " $2 " " $5 " " $6 " " $7) + 300 >= systime()' filename
The -F '[#<> :-]+' is to split the date into individual parts, so that $2 is the day, $3 the month, $4 the year, and so forth. Then the code works as follows:
BEGIN {
# build a mapping from month name to number (to use in mktime)
split("Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec", mnames, ",")
for(i = 1; i <= 12; ++i) m[mnames[i]] = i
}
# build a numerically comparable timestamp from the split date, and
# select all lines whose timestamp is not more than 300 seconds behind
# the system time.
mktime($4 " " m[$3] " " $2 " " $5 " " $6 " " $7) + 300 >= systime()
Setting the TZ environment variable to GMT (with TZ=GMT before the awk call) will make mktime interpret the time stamps as GMT.

Related

grep based on timestamp and string pattern [duplicate]

This question already has answers here:
Extract data from log file in specified range of time [duplicate]
(5 answers)
Closed 8 years ago.
Hi I have the following log file structure:
####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
####<20-Jan-2015 07:16:43 o'clock UTC> <Notice> <Stdout> <example2.com>
####<21-Jan-2015 07:16:48 o'clock UTC> <Notice> <Stdout> <example3.com>
How can I filter this file by a date interval, for example:
Show all data between 19'th and 20'th of January 2015
I tried to use awk but I have problems converting 19-Jan-2015 to 2015-01-19 to continue comparison of dates.
For an oddball date format like that, I'd outsource the date parsing to the date utility.
#!/usr/bin/awk -f
# Formats the timestamp as a number, so that higher numbers represent
# a later timestamp. This will not handle the time zone because date
# can't handle the o'clock notation. I hope all your timestamps use the
# same time zone, otherwise you'll have to hack support for it in here.
function datefmt(d) {
# make d compatible with singly-quoted shell strings
gsub(/'/, "'\\''", d)
# then run the date command and get its output
command = "date -d '" d "' +%Y%m%d%H%M%S"
command | getline result
close(command)
# that's our result.
return result;
}
BEGIN {
# Field separator, so the part of the timestamp we'll parse is in $2 and $3
FS = "[< >]+"
# start, end set here.
start = datefmt("19-Jan-2015 00:00:00")
end = datefmt("20-Jan-2015 23:59:59")
}
{
# convert the timestamp into an easily comparable format
stamp = datefmt($2 " " $3)
# then print only lines in which the time stamp is in the range.
if(stamp >= start && stamp <= end) {
print
}
}
If the name of the file is example.txt, the the below script should work
for i in `awk -F'<' {'print $2'} example.txt| awk {'print $1"_"$2'}`; do date=`echo $i | sed 's/_/ /g'`; dunix=`date -d "$date" +%s`; if [[ (($dunix -ge 1421605800)) && (($dunix -le 1421778599)) ]]; then grep "$date" example.txt;fi; done
The script just converts the time provided in to unix timestamp, then compares the time and print the lines that meets the condition from the file.
Using string comparisons jwill be faster than creating date objects:
awk -F '<' '
{split($2, d, /[- ]/)}
d[3]=="2015" && d[2]=="Jan" && 19<=d[1] && d[1]<=20
' file
Another way using mktime all in awk
awk '
BEGIN{
From=mktime("2015 01 19 00 00 00")
To=mktime("2015 01 20 00 00 00")
}
{Time=0}
match($0,/<([^ ]+) ([^ ]+)/,a){
split(a[1],b,"-")
split(a[2],c,":")
b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
}
Time<To&&Time>From
' file
Output
####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
How it works
BEGIN{
From=mktime("2015 01 19 00 00 00")
To=mktime("2015 01 20 00 00 00")
}
Before processing the lines set the dates To and From where the data we want will be between the two.
This format is required for mktime to work.
The format is YYYY MM DD HH MM SS.
{time=0}
Reset time so further lines that don't match are not printed
match($0,/<([^ ]+) ([^ ]+)/,a)
Matches the first two words after the < and stores them in a.
Executes the next block if this is successful.
split(a[1],b,"-")
split(a[2],c,":")
Splits the date and time into individual numbers/Month.
b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
Converts month to number using the fact that all of them are three characters and then dividing by 3.
Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
makes time with collected values
Time<To&&Time>From
if the time is more than From and less than To it is inside the desired range and the default action for awk is to print.
Resources
https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html

Fill the missing dates using awk

I have some missing dates in a file. e.g.
$cat ifile.txt
20060805
20060807
20060808
20060809
20060810
20060813
20060815
20060829
20060901
20060903
20060904
20060905
20070712
20070713
20070716
20070717
The dates are in the format YYYYMMDD. My intention is fill the missing dates in between the dates if they are missing maximum for 5 day e.g.
20060805
20060806 ---- This was missed
20060807
20060808
20060809
20060810
20060811 ----- This was missed
20060812 ----- This was missed
20060813
20060814 ----- This was missed
20060815
20060829
20060830 ------ This was missed
20060831 ------ This was missed
20060901
20060902 ------ This was missed
20060903
20060904
20060905
20070712
20070713
20070714 ----- This was missed
20070715 ----- This was missed
20070716
20070717
Other dates are not needed where there is a gap of more than 5 days. For example, I don't need to fill the dates between 20060815 and 20060829, because the gap between them is more than 5 days.
I am doing it in following ways, but don't get anything.
#!/bin/sh
awk BEGIN'{
a[NR]=$1
} {
for(i=1; i<NR; i++)
if ((a[NR+1]-a[NR]) <= 5)
for (j=1; j<(a[NR+1]-a[NR]); j++)
print a[j]
}' ifile.txt
Desired output:
20060805
20060806
20060807
20060808
20060809
20060810
20060811
20060812
20060813
20060814
20060815
20060829
20060830
20060831
20060901
20060902
20060903
20060904
20060905
20070712
20070713
20070714
20070715
20070716
20070717
Could you please try following, written and tested with shown samples in GNU awk.
awk '
FNR==1{
print
prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
next
}
{
found=i=diff=""
curr_time=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
diff=(curr_time-prev)/86400
if(diff>1){
while(++i<=diff){ print strftime("%Y%m%d", prev+86400*i) }
found=1
}
prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
}
!found
' Input_file
The following seems to work:
stringtodate() {
echo "${1:0:4}-${1:4:2}-${1:6:2} 12:00:00"
}
datetoseconds() {
LC_ALL=C date -d "$(stringtodate "$1")" +%s
}
secondstodate() {
LC_ALL=C date -d "#$1" +%Y%m%d
}
outputdatesbetween() {
local start=$1
local stop=$2
for ((i = $1; i < $2; i += 3600*24)); do
secondstodate "$i"
done
}
prev=
while IFS= read -r line; do
now=$(datetoseconds "$line")
if [[ -n "$prev" ]] &&
((
now - prev > 3600 * 24 &&
now - prev < 3600 * 24 * 5
))
then
outputdatesbetween "$((prev + 3600 * 24))" "$now"
fi
echo "$line"
prev="$now"
done < 1
Tested on repl
Here is a quick GNU awk script. We use GNU awk to make use of the time-functions mktime and strftime:
awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
{t=mktime($1 " " $2 " " $3 " 0 0 0",1) }
(t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i,1) }
{print; p=t}' file
Using mktime we convert the time into the total seconds since 1970. The function strftime converts it back to the desired format. Be aware that we enable the UTC-flag in both functions to ensure that we do not end up with surprises around Daylight-Saving-Time. Furthermore, since we already make use of GNU awk, we can further use the FIELDWIDTHS to determine the field lengths.
note: If your awk does not support the UTC-flag in mktime and strftime, you can run the following:
TZ=UTC awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
{t=mktime($1 " " $2 " " $3 " 0 0 0") }
(t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i) }
{print; p=t}' file

Generate specific date in bash

Hi I have written down some bash to generate the date in format YYYYDDMM.
I know that is not perfect but the final thing will be:
Generate a range of 2 dates that are 31 days apart from each other and start with a minimum of today + one day and the older ones end at the end of this year. In format YYYYMMDD
month="$(awk -v min=1 -v max=12 'BEGIN{srand(); print int(min+rand()*(max-min+1))}')"
day="$(awk -v min=1 -v max=31 'BEGIN{srand(); print int(min+rand()*(max-min+1))}')"
year="$(date +%Y)"
if (( "${month}" < 10 )); then
month_proper="$(echo 0"${month}")"
else
month_proper="$(echo "${month}")"
fi
if (( "${day}" < 10 )); then
day_proper="$(echo 0"${day}")"
else
day_proper="$(echo "${day}")"
fi
echo month "${month}"
echo month with 0 if smaller than 10 : "${month_proper}"
echo day "${day}"
echo day with 0 smaller than 10 : "${day_proper}"
ok="$(date -d ""$year""${month_proper}""${day_proper}"" +"%Y%m%d")"
echo date with proper format "${ok}"
date -d "$year""${month_proper}""${day_proper}"
In which direction would I have to expand this script to get the final result? I already have the date generation, but there is no checking if there is one day ahead of today.
Requirements specify range nearly exactly (it is Dec-31 till Jan-31 next year), so you can write
a_fmt=`date -d "tomorrow" +%Y1231`
a_num=`date -d "$a_fmt" +%s`
b_num=$(( a_num + 31 * 24 * 3600 ))
b_fmt=`date -d #$b_num +%Y%m%d`
echo "Range is '$a_fmt'..'$b_fmt'"

How to sum 2 hours to date column in csv file

I have csv file consist of 2 columns, name and date in 24 hours format
Name, log_date
John, 11/29/2017 23:00
And i want to add 2 hours to log date to change date and time to be as below
John, 11/30/2017 01:00
I tried to add it by below command but with no success
awk - F 'NR>1{$4+=(2/24);}1' OFS="," IN.csv > OUT.csv
I get the below output
2017.08
in values of the log date column
So please help
You need a language that has datetime arithmetic. Perl for example:
perl -MTime::Piece -F'/,\s*/' -slane '
$datetime = Time::Piece->strptime($F[1], $fmt);
$F[1] = ($datetime + 7200)->strftime($fmt);
print join ", ", #F
' -- -fmt="%m/%d/%Y %H:%M" <<END
John, 11/29/2017 11:00
END
John, 11/29/2017 13:00
Given your input, there's no way to indicate that the time is 11 PM. How are you supposed to know that?
below is oneliner in python. This is really not a useable code, but I believe you can get idea of using one-liners. This one-liner can be made yet simpler.
python -c "s=r'John, 11/29/2017 13:00';
print(s.replace(s.split(\" \")[-1].split(\":\")[0],str(int(s.split(\" \")[-1].split(\":\")[0])+2)));";
Output
John, 11/29/2017 15:00
Yet, this will not roll over the date like if 23+2 = 25 which should suppose to be 1:00
All you're looking for is documented here.
Using space as a field separator :
{
split($2,D,"/")
split($3,H,":")
# format for mktime is "YYYY MM DD HH MM SS [DST]"
d = D[3] " " D[1] " " D[2]" " H[1] " " H[2] " 00"
t=mktime(d)
t = t + 7200 # add two hours
$2 = strftime("%m/%d/%Y",t)
$3 = strftime("%H:%M",t)
}1
awk -F',' '{if(NR>1){printf("%s, ", $1);system("date -d \"+2 hours " $2 "\" +\"%m/%d/%Y %H:%M\"")}else{print $0}}' IN.csv > OUT.csv

How to generate a sequence of dates given starting and ending dates using AWK of BASH scripts?

I have a data set with the following format
The first and second fields denote the dates (M/D/YYYY) of starting and ending of a study.
How one expand the data into the desired output format, taking into account the leap years using AWK or BASH scripts?
Your help is very much appreciated.
Input
7/2/2009 7/7/2009
2/28/1996 3/3/1996
12/30/2001 1/4/2002
Desired Output
7/7/2009
7/6/2009
7/5/2009
7/4/2009
7/3/2009
7/2/2009
3/3/1996
3/2/1996
3/1/1996
2/29/1996
2/28/1996
1/4/2002
1/3/2002
1/2/2002
1/1/2002
12/31/2001
12/30/2001
It can be done nicely with bash alone:
for i in `seq 1 5`;
do
date -d "2017-12-01 $i days" +%Y-%m-%d;
done;
or with pipes:
seq 1 5 | xargs -I {} date -d "2017-12-01 {} days" +%Y-%m-%d
If you have gawk:
#!/usr/bin/gawk -f
{
split($1,s,"/")
split($2,e,"/")
st=mktime(s[3] " " s[1] " " s[2] " 0 0 0")
et=mktime(e[3] " " e[1] " " e[2] " 0 0 0")
for (i=et;i>=st;i-=60*60*24) print strftime("%m/%d/%Y",i)
}
Demonstration:
./daterange.awk inputfile
Output:
07/07/2009
07/06/2009
07/05/2009
07/04/2009
07/03/2009
07/02/2009
03/03/1996
03/02/1996
03/01/1996
02/29/1996
02/28/1996
01/04/2002
01/03/2002
01/02/2002
01/01/2002
12/31/2001
12/30/2001
Edit:
The script above suffers from a naive assumption about the length of days. It's a minor nit, but it could produce unexpected results under some circumstances. At least one other answer here also has that problem. Presumably, the date command with subtracting (or adding) a number of days doesn't have this issue.
Some answers require you to know the number of days in advance.
Here's another method which hopefully addresses those concerns:
while read -r d1 d2
do
t1=$(date -d "$d1 12:00 PM" +%s)
t2=$(date -d "$d2 12:00 PM" +%s)
if ((t2 > t1)) # swap times/dates if needed
then
temp_t=$t1; temp_d=$d1
t1=$t2; d1=$d2
t2=$temp_t; d2=$temp_d
fi
t3=$t1
days=0
while ((t3 > t2))
do
read -r -u 3 d3 t3 3<<< "$(date -d "$d1 12:00 PM - $days days" '+%m/%d/%Y %s')"
((++days))
echo "$d3"
done
done < inputfile
You can do this in the shell without awk, assuming you have GNU date (which is needed for the date -d #nnn form, and possibly the ability to strip leading zeros on single digit days and months):
while read start end ; do
for d in $(seq $(date +%s -d $end) -86400 $(date +%s -d $start)) ; do
date +%-m/%-d/%Y -d #$d
done
done
If you are in a locale that does daylight savings, then this can get messed up if requesting a date sequence where a daylight saving switch occurs in between. Use -u to force to UTC, which also strictly observes 86400 seconds per day. Like this:
while read start end ; do
for d in $(seq $(date -u +%s -d $end) -86400 $(date -u +%s -d $start)) ; do
date -u +%-m/%-d/%Y -d #$d
done
done
Just feed this your input on stdin.
The output for your data is:
7/7/2009
7/6/2009
7/5/2009
7/4/2009
7/3/2009
7/2/2009
3/3/1996
3/2/1996
3/1/1996
2/29/1996
2/28/1996
1/4/2002
1/3/2002
1/2/2002
1/1/2002
12/31/2001
12/30/2001
Another option is to use dateseq from dateutils (http://www.fresse.org/dateutils/#dateseq). -i changes the input format and -f changes the output format. -1 must be specified as an increment when the first date is later than the second date.
$ dateseq -i %m/%d/%Y -f %m/%d/%Y 7/7/2009 -1 7/2/2009
07/07/2009
07/06/2009
07/05/2009
07/04/2009
07/03/2009
07/02/2009
$ dateseq 2017-04-01 2017-04-05
2017-04-01
2017-04-02
2017-04-03
2017-04-04
2017-04-05
I prefer ISO 8601 format dates - here is a solution using them.
You can adapt it easily enough to American format if you wish.
AWK Script
BEGIN {
days[ 1] = 31; days[ 2] = 28; days[ 3] = 31;
days[ 4] = 30; days[ 5] = 31; days[ 6] = 30;
days[ 7] = 31; days[ 8] = 31; days[ 9] = 30;
days[10] = 31; days[11] = 30; days[12] = 31;
}
function leap(y){
return ((y %4) == 0 && (y % 100 != 0 || y % 400 == 0));
}
function last(m, l, d){
d = days[m] + (m == 2) * l;
return d;
}
function prev_day(date, y, m, d){
y = substr(date, 1, 4)
m = substr(date, 6, 2)
d = substr(date, 9, 2)
#print d "/" m "/" y
if (d+0 == 1 && m+0 == 1){
d = 31; m = 12; y--;
}
else if (d+0 == 1){
m--; d = last(m, leap(y));
}
else
d--
return sprintf("%04d-%02d-%02d", y, m, d);
}
{
d1 = $1; d2 = $2;
print d2;
while (d2 != d1){
d2 = prev_day(d2);
print d2;
}
}
Call this file: dates.awk
Data
2009-07-02 2009-07-07
1996-02-28 1996-03-03
2001-12-30 2002-01-04
Call this file: dates.txt
Results
Command executed:
awk -f dates.awk dates.txt
Output:
2009-07-07
2009-07-06
2009-07-05
2009-07-04
2009-07-03
2009-07-02
1996-03-03
1996-03-02
1996-03-01
1996-02-29
1996-02-28
2002-01-04
2002-01-03
2002-01-02
2002-01-01
2001-12-31
2001-12-30
You can convert date to unix timestamp and then sequencing on it, you can even have granularity of nanoseconds if you want (with '%N' in date)
The following example prints time from 2020-11-07 00:00:00 to 2020-11-07 01:00:00 in intervals of 5 minutes
# total seconds past 1970-01-01 00:00:00 as observed on UTC timestamp in UTC
# you change TZ to represent time in your timezone like TZ="Asia/Kolkata"
start_time=$(date -u -d 'TZ="UTC" 2020-11-07 00:00:00' '+%s')
end_time=$(date -u -d 'TZ="UTC" 2020-11-07 01:00:00' '+%s')
# 60 seconds * 5 times (i.e. 5 minutes)
# you change interval according your needs or leave it to show every second
interval=$((60 * 5))
# generate sequence with intervals and convert back to timestamp in UTC
# again change TZ to represent timein your timezone
seq ${start_time} ${interval} ${end_time} |
xargs -I{} date -u -d 'TZ="UTC" #'{} '+%F %T'

Resources