grep based on timestamp and string pattern [duplicate] - shell

This question already has answers here:
Extract data from log file in specified range of time [duplicate]
(5 answers)
Closed 8 years ago.
Hi I have the following log file structure:
####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
####<20-Jan-2015 07:16:43 o'clock UTC> <Notice> <Stdout> <example2.com>
####<21-Jan-2015 07:16:48 o'clock UTC> <Notice> <Stdout> <example3.com>
How can I filter this file by a date interval, for example:
Show all data between 19'th and 20'th of January 2015
I tried to use awk but I have problems converting 19-Jan-2015 to 2015-01-19 to continue comparison of dates.

For an oddball date format like that, I'd outsource the date parsing to the date utility.
#!/usr/bin/awk -f
# Formats the timestamp as a number, so that higher numbers represent
# a later timestamp. This will not handle the time zone because date
# can't handle the o'clock notation. I hope all your timestamps use the
# same time zone, otherwise you'll have to hack support for it in here.
function datefmt(d) {
# make d compatible with singly-quoted shell strings
gsub(/'/, "'\\''", d)
# then run the date command and get its output
command = "date -d '" d "' +%Y%m%d%H%M%S"
command | getline result
close(command)
# that's our result.
return result;
}
BEGIN {
# Field separator, so the part of the timestamp we'll parse is in $2 and $3
FS = "[< >]+"
# start, end set here.
start = datefmt("19-Jan-2015 00:00:00")
end = datefmt("20-Jan-2015 23:59:59")
}
{
# convert the timestamp into an easily comparable format
stamp = datefmt($2 " " $3)
# then print only lines in which the time stamp is in the range.
if(stamp >= start && stamp <= end) {
print
}
}

If the name of the file is example.txt, the the below script should work
for i in `awk -F'<' {'print $2'} example.txt| awk {'print $1"_"$2'}`; do date=`echo $i | sed 's/_/ /g'`; dunix=`date -d "$date" +%s`; if [[ (($dunix -ge 1421605800)) && (($dunix -le 1421778599)) ]]; then grep "$date" example.txt;fi; done
The script just converts the time provided in to unix timestamp, then compares the time and print the lines that meets the condition from the file.

Using string comparisons jwill be faster than creating date objects:
awk -F '<' '
{split($2, d, /[- ]/)}
d[3]=="2015" && d[2]=="Jan" && 19<=d[1] && d[1]<=20
' file

Another way using mktime all in awk
awk '
BEGIN{
From=mktime("2015 01 19 00 00 00")
To=mktime("2015 01 20 00 00 00")
}
{Time=0}
match($0,/<([^ ]+) ([^ ]+)/,a){
split(a[1],b,"-")
split(a[2],c,":")
b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
}
Time<To&&Time>From
' file
Output
####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
How it works
BEGIN{
From=mktime("2015 01 19 00 00 00")
To=mktime("2015 01 20 00 00 00")
}
Before processing the lines set the dates To and From where the data we want will be between the two.
This format is required for mktime to work.
The format is YYYY MM DD HH MM SS.
{time=0}
Reset time so further lines that don't match are not printed
match($0,/<([^ ]+) ([^ ]+)/,a)
Matches the first two words after the < and stores them in a.
Executes the next block if this is successful.
split(a[1],b,"-")
split(a[2],c,":")
Splits the date and time into individual numbers/Month.
b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
Converts month to number using the fact that all of them are three characters and then dividing by 3.
Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
makes time with collected values
Time<To&&Time>From
if the time is more than From and less than To it is inside the desired range and the default action for awk is to print.
Resources
https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html

Related

Batch change accessed and modified date, with date from another file's content?

I'm migrating old notes from a SQL database based note taking app to separate text files.
I've managed to export the notes and date codes as separate text files.
The files are ordered like this:
$ ls -1
Note0001.txt
Note0001-date.txt
Note0002.txt
Note0002-date.txt
Note0003.txt
Note0003-date.txt
The contents of the date files looks like this:
$ cat Note0001-date.txt
388766121.742373
$ cat Note0002-date.txt
274605766.273638
$ cat Note0003-date.txt
384996285.436197
The dates are seconds since the epoch 2001-01-01. See other question about the format: What type of date format is this? And how to convert it?.
How do I batch change the accessed and modified date of the notes files, NoteNNNN.txt, to the date in the contents of respective date file, NoteNNNN-date.txt?
How to convert the date to UTC+1? Preferably with consideration of DST (daylight saving time).
I am trying to convert the dates with the method described this question:
https://unix.stackexchange.com/questions/2987/
But it outputs an error message in bash 3.2.57 (macOS):
$ date -d '2001-01-01 UTC+1 + 388766121 seconds'
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
[-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
I am new to working with the dates and timestamps in the terminal.
Iterate over each file pair, access the timestamp, shift the timestamp so it's something unix tools can understand, then touch files. Ie. big problems are composed of sum of small problems.
# find all files named .txt but not -date.txt
find . -name '*.txt' '!' -name '*-date.txt' |
# remove the .txt suffix
sed 's/\.txt$//' |
{
# the reference point of files content
start=$(date -d "2001-01-01" +%s) # will not work with BSD date
# I guess just precompute the value:
start=978303600
# for each file
while IFS= read -r f; do
# get the timestamp
diff=$(<"$f"-date.txt)
# increment the timestamp to seconds since epoch
ref=$(<<<"scale=6; $start + $diff" bc)
# TODO: use a tool convert the timestamp sinece epoch to BSD touch
# compatible format, ie. to ccyy-mm-ddTHH:MM:SS[.frac][Z]
ref=(TODO "$ref")
# change access and modification times of .txt file
touch -d "#$ref" "$f".txt
done
}
Assuming your OS local timezone is what you want for your output, and you have a version of awk that supports the GNU awk time functions, you could use the
following script. Also:
If the DST daylight-savings flag is positive, the time is assumed to
be daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), mktime() attempts to determine
whether daylight savings time is in effect for the specified time.
file tst.awk:
BEGIN {
epoch = mktime("2001 01 01 00 00 00")
}
FNR==1 {
close(out)
out = substr(FILENAME, 1, length(FILENAME)-9) ".txt"
}
{
print strftime("%F %T %Z", epoch+$0) > out
}
Usage:
awk -f tst.awk *-date.txt
Example
Here is an example with the script, without the I/O part, just converting the datetimes.
test file:
> cat file
388766121.742373
274605766.273638
384996285.436197
script tst.awk:
BEGIN { epoch = mktime("2001 01 01 00 00 00") }
{ print strftime("%F %T %Z", epoch+$0) }
Output:
> awk -f tst.awk file
2013-04-27 15:35:21 EEST
2009-09-14 08:22:46 EEST
2013-03-14 23:24:45 EET
The timezone of my box is being used by default (EET). If we 'd like to print to a different timezone, we should define that and set the TZ. Also DST is used by default, notice that some days are printed as EEST (Summer Time).

Add 30 Mins Time to DateTime format YYYY-MM-DD hh:mm:ss in AIX 5.0

I'm running AIX with coreutils 5.0. I need to advance an arbitrary date (or time) as given conformative to ISO-8601 format YYYY-MM-DD hh:mm:ss.
For example:
Value of D1 is: 2017-07-08 19:20:01, and I need to add 30 minutes.
In a modern UNIX-system I could probably write something like
date -d "$D1 + 30 minutes" +'%H:%M'
but, alas, I need it to work on an old AIX.
Try
$ date -d "$(date -d "$D1") + 30 minutes" +'%H:%M'
This works in bash, but not in ksh.
The inner call to date will parse D1 to a date, and present it in date's "native" format.
$ date -d "$D1"
Sat Jul 8 19:20:01 CEST 2017
This output will be used with + 30 minutes to create the date that you want, with the outer call to date.
The inner call to date will be expanded so that
$ date -d "$(date -d "$D1") + 30 minutes" +'%H:%M'
will be equivalent to
$ date -d "Sat Jul 8 19:20:01 CEST 2017 + 30 minutes" +'%H:%M'
which will be
19:50
date -d #$(($(date -d "$D1" +%s) + 30 * 60)) +%H:%M
$(date -d "$D1" +%s) echoes the epoch
$((epoch + value)) calculates the wanted time
date -d#epoch +fmt formats it
If you are running AIX from 2003 you are in dire straits, my friend, but if you only need the time, not the full date, as your question implies, I think #RamanSailopal got us half way there.
echo $D1 | awk -F "[: ]" '{
m = $3+30;
h = ($2+int(m/60)) % 24;
printf("%02i:%02i\n", h, m%60)
}'
awk splits the input in different fields, with the splitter pattern given in the -F argument. The pattern denotes : or space .
The input will be split in
$1 = 2017-07-08
$2 = 19
$3 = 20
$4 = 01
Then the script calculates a fake minute value (that can be more than or equal to 60) and stores it in m. From that value it calculates the hour, modulo 24, and the actual minutes, m modulo 60.
This could fail if you hit a leap second, so if you need second precision at all times, you should use some other method.
Awk solution:
awk -F '[-: ]' '{
ram=(mktime($1" "$2" "$3" "$4" "$5" "$6)+(30*60));
print strftime("%Y-%m-%d %T",ram)
}' <<< "$D1"
Convert the date to a date string using awk's mktime function. Add 30 minutes (30*60) and then convert back to a date string with the required format using strftime.

gawk - suppress output of matched lines

I'm running into an issue where gawk prints unwanted output. I want to find lines in a file that match an expression, test to see if the information in the line matches a certain condition, and then print the line if it does. I'm getting the output that I want, but gawk is also printing every line that matches the expression rather than just the lines that meet the condition.
I'm trying to search through files containing dates and times for certain actions to be executed. I want to show only lines that contain times in the future. The dates are formatted like so:
text... 2016-01-22 10:03:41 more text...
I tried using sed to just print all lines starting with ones that had the current hour, but there is no guarantee that the file contains a line with that hour, (plus there is no guarantee that the lines all have any particular year, month, day etc.) so I needed something more robust. I decided trying to convert the times into seconds since epoch, and comparing that to the current systime. If the conversion produces a number greater than systime, I want to print that line.
Right now it seems like gawk's mktime() function is the key to this. Unfortunately, it requires input in the following format:
yyyy mm dd hh mm ss
I'm currently searching a test file (called timecomp) for a regular expression matching the date format.
Edit: the test file only contains a date and time on each line, no other text.
I used sed to replace the date separators (i.e. /, -, and :) with a space, and then piped the output to a gawk script called stime using the following statement:
sed -e 's/[-://_]/ /g' timecomp | gawk -f stime
Here is the script
# stime
BEGIN { tsec=systime(); } /.*20[1-9][0-9] [0-1][1-9] [0-3][0-9] [0-2][0-9][0-6][0-9] [0-6][0-9]/ {
if (tsec < mktime($0))
print "\t" $0 # the tab is just to differentiate the desired output from the other lines that are being printed.
} $1
Right now this is getting the basic information that I want, but it is also printing every like that matches the original expression, rather than just the lines containing a time in the future. Sample output:
2016 01 22 13 23 20
2016 01 22 14 56 57
2016 01 22 15 46 46
2016 01 22 16 32 30
2016 01 22 18 56 23
2016 01 22 18 56 23
2016 01 22 22 22 28
2016 01 22 22 22 28
2016 01 22 23 41 06
2016 01 22 23 41 06
2016 01 22 20 32 33
How can I print only the lines in the future?
Note: I'm doing this on a Mac, but I want it to be portable to Linux because I'm ultimately making this for some tasks I have to do at work.
I'd like trying to accomplish this in one script rather than requiring the sed statement to reformat the dates, but I'm running into other issues that probably require a different question, so I'm sticking to this for now.
Any help would be greatly appreciated! Thanks!
Answered: I had a $1 at the last line of my script, and that was the cause of the additional output.
Instead of awk, this is an (almost) pure Bash solution:
#!/bin/bash
# Regex for time string
re='[0-9]{4}-[0-9]{2}-[0-9]{2} ([0-9]{2}:){2}[0-9]{2}'
# Current time, in seconds since epoch
now=$(date +%s)
while IFS= read -r line; do
# Match time string
[[ $line =~ $re ]]
time_string="${BASH_REMATCH[0]}"
# Convert time string to seconds since epoch
time_secs=$(date -d "$time_string" +%s)
# If time is in the future, print line
if (( time_secs > now )); then
echo "$line"
fi
done < <(grep 'pattern' "$1")
This takes advantage of the Coreutils date formatting to convert a date to seconds since epoch for easy comparison of two dates:
$ date
Fri, Jan 22, 2016 11:23:59 PM
$ date +%s
1453523046
And the -d argument to take a string as input:
$ date -d '2016-01-22 10:03:41' +%s
1453475021
The script does the following:
Filter the input file with grep (for lines containing a generic pattern, but could be anything)
Loop over lines containing pattern
Match the line with a regex that matches the date/time string yyyy-mm-dd hh:mm:ss and extract the match
Convert the time string to seconds since epoch
Compare that value to the time in $now, which is the current date/time in seconds since epoch
If the time from the logfile is in the future, print the line
For an example input file like this one
text 2016-01-22 10:03:41 with time in the past
more text 2016-01-22 10:03:41 matching pattern but in the past
other text 2017-01-22 10:03:41 in the future matching pattern
some text 2017-01-23 10:03:41 in the future but not matching
blahblah 2022-02-22 22:22:22 pattern and also in the future
the result is
$ date
Fri, Jan 22, 2016 11:36:54 PM
$ ./future_time logfile
other text 2017-01-22 10:03:41 in the future matching pattern
blahblah 2022-02-22 22:22:22 pattern and also in the future
This is what I have working now. It works for a few different date formats and on the actual files that have more than just the date and time. The default format that it works for is yyyy/mm/dd, but it takes an argument to specify a mm/dd/yyyy format if needed.
BEGIN { tsec=systime(); dtstr=""; dt[1]="" } /.*[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/ {
cur=$0
if ( fm=="mdy" ) {
match($0,/[0-1][1-9][-_\/][0-3][0-9][-_\/]20[1-9][0-9]/) # mm dd yyyy
section=substr($0,RSTART,RLENGTH)
split(section, dt, "[-_//]")
dtstr=dt[3] " " dt[1] " " dt[2]
gsub(/[0-1][1-9][-\/][0-3][0-9][-\/]20[1-9][0-9]/, dtstr, cur)
}
gsub(/[-_:/,]/, " ", cur)
match(cur,/20[1-9][0-9] [0-1][1-9] [0-3][0-9][[:space:] ]*[0-2][0-9] [0-6][0-9] [0-6][0-9]/)
arr=mktime(substr(cur,RSTART,RLENGTH))
if ( tsec < arr)
print $0
}
I'll be adding more format options as I find more formats, but this works for all the different files I've tested so far. If they have a mm/dd/yyyy format, you call it with:
gawk -f stime fm=mdy filename
I plan on adding an option to specify the time window that you want to see, but this is an excellent start. Thank you guys again, this is going to drastically simplify a few tasks at work ( I basically have to retrieve a great deal of data, often under time pressure depending on the situation ).

Running shell commands within AWK

I'm trying to work on a logfile, and I need to be able to specify the range of dates. So far (before any processing), I'm converting a date/time string to timestamp using date --date "monday" +%s.
Now, I want to be able to iterate over each line in a file, but check if the date (in a human readable format) is within the allowed range. To do this, I'd like to do something like the following:
echo `awk '{if(`date --date "$3 $4 $5 $6 $7" +%s` > $START && `date --date "" +%s` <= $END){/*processing code here*/}}' myfile`
I don't even know if thats possible... I've tried a lot of variations, plus I couldn't find anything understandable/usable online.
Thanks
Update:
Example of myfile is as follows. Its logging IPs and access times:
123.80.114.20 Sun May 01 11:52:28 GMT 2011
144.124.67.139 Sun May 01 16:11:31 GMT 2011
178.221.138.12 Mon May 02 08:59:23 GMT 2011
Given what you have to do, its really not that hard AND it is much more efficient to do your date processing by converting to strings and comparing.
Here's a partial solution that uses associative arrays to convert the month value to a number. Then you rely on the %02d format specifier to ensure 2 digits. You can reformat the dateTime value with '.', etc or leave the colons in the hr:min:sec if you really need the human readability.
The YYYYMMDD format is a big help in these sort of problems, as LT, GT, EQ all work without any further formatting.
echo "178.221.138.12 Mon May 02 08:59:23 GMT 2011" \
| awk 'BEGIN {
mons["Jan"]=1 ; mons["Feb"]=2; mons["Mar"]=3
mons["Apr"]=4 ; mons["May"]=5; mons["Jun"]=6
mons["Jul"]=7 ; mons["Aug"]=8; mons["Sep"]=9
mons["Oct"]=10 ; mons["Nov"]=11; mons["Dec"]=12
}
{
# 178.221.138.12 Mon May 02 08:59:23 GMT 2011
printf("dateTime=%04d%02d%02d%02d%02d%02d\n",
$NF, mons[$3], $4, substr($5,1,2), substr($5,4,2), substr($5,7,2) )
} ' -v StartTime=20110105235959
The -v StartTime is ilustrative of how to pass in (and the matching format) your starTime value.
I hope this helps.
Here's an alternative approach using awk's built-in mktime() function. I've never bothered with the month parsing until now - thanks to shelter for that part (see accepted answer). It always feels time to switch language around that point.
#!/bin/bash
# input format:
#(1 2 3 4 5 6 7)
#123.80.114.20 Sun May 01 11:52:28 GMT 2011
awk -v startTime=1304252691 -v endTime=1306000000 '
BEGIN {
mons["Jan"]=1 ; mons["Feb"]=2; mons["Mar"]=3
mons["Apr"]=4 ; mons["May"]=5; mons["Jun"]=6
mons["Jul"]=7 ; mons["Aug"]=8; mons["Sep"]=9
mons["Oct"]=10 ; mons["Nov"]=11; mons["Dec"]=12;
}
{
hmsSpaced=$5; gsub(":"," ",hmsSpaced);
timeInSec=mktime($7" "mons[$3]" "$4" "hmsSpaced);
if (timeInSec > startTime && timeInSec <= endTime) print $0
}' myfile
(I've chosen example time thresholds to select only the last two log lines.)
Note that if the mktime() function were a bit smarter this whole thing would reduce to:
awk -v startTime=1304252691 -v endTime=1306000000 't=mktime($7" "$3" "$4" "$5); if (t > startTime && t <= endTime) print $0}' myfile
I'm not sure of the format of the data you're parsing, but I do know that you can't use the backticks within single quotes. You'll have to use double quotes. If there are too many quotes being nested, and it's confusing you, you can also just save the output of your date command to a variable beforehand.

Humanized dates with awk?

I have this awk script that runs through a file and counts every occurrence of a given date. The date format in the original file is the standard date format, like this: Thu Mar 5 16:46:15 EST 2009 I use awk to throw away the weekday, time, and timezone, and then do my counting by pumping the dates into an associative array with the dates as indices.
In order to get the output to be sorted by date, I converted the dates to a different format that I could sort with bash sort.
Now, my output looks like this:
Date Count
03/05/2009 2
03/06/2009 1
05/13/2009 7
05/22/2009 14
05/23/2009 7
05/25/2009 7
05/29/2009 11
06/02/2009 12
06/03/2009 16
I'd really like the output to have more human readable dates, like this:
Mar 5, 2009
Mar 6, 2009
May 13, 2009
May 22, 2009
May 23, 2009
May 25, 2009
May 29, 2009
Jun 2, 2009
Jun 3, 2009
Any suggestions for a way I could do this? If I could do this on the fly when I output the count values that would be best.
UPDATE:
Here's my solution incorporating ghostdog74's example code:
grep -i "E[DS]T 2009" original.txt | awk '{printf "%s %2.d, %s\r\n",$2,$3,$6}' >dates.txt #outputs dates for counting
date -f dates.txt +'%Y %m %d' | awk ' #reformat dates as YYYYMMDD for future sort
{++total[$0]} #pump dates into associative array
END {
for (item in total) printf "%s\t%s\r\n", item, total[item] #output dates as yyyy mm dd with counts
}' | sort -t \t | awk ' #send to sort, then to cleanup
BEGIN {printf "%s\t%s\r\n","Date","Count"}
{t=$1" "$2" "$3" 0 0 0" #cleanup using example by ghostdog74
printf "%s\t%2.d\r\n",strftime("%b %d, %Y",mktime(t)),$4
}'
rm dates.txt
Sorry this looks so messy. I've tried to put clarifying comments in.
Use awk's sort and date's stdin to greatly simplify the script
Date will accept input from stdin so you can eliminate one pipe to awk and the temporary file. You can also eliminate a pipe to sort by using awk's array sort and as a result, eliminate another pipe to awk. Also, there's no need for a coprocess.
This script uses date for the monthname conversion which would presumably continue to work in other languages (ignoring the timezone and month/day order issues, though).
The end result looks like "grep|date|awk". I have broken it into separate lines for readability (it would be about half as big if the comments were eliminated):
grep -i "E[DS]T 2009" original.txt |
date -f - +'%Y %m %d' | #reformat dates as YYYYMMDD for future sort
awk '
BEGIN { printf "%s\t%s\r\n","Date","Count" }
{ ++total[$0] #pump dates into associative array }
END {
idx=1
for (item in total) {
d[idx]=item;idx++ # copy the array indices into the contents of a new array
}
c=asort(d) # sort the contents of the copy
for (i=1;i<=c;i++) { # use the contents of the copy to index into the original
printf "%s\t%2.d\r\n",strftime("%b %e, %Y",mktime(d[i]" 0 0 0")),total[d[i]]
}
}'
I get testy when I see someone using grep and awk (and sed, cut, ...) in a pipeline. Awk can fully handle the work of many utilities.
Here's a way to clean up your updated code to run in a single instance of awk (well, gawk), and using sort as a co-process:
gawk '
BEGIN {
IGNORECASE = 1
}
function mon2num(mon) {
return(((index("JanFebMarAprMayJunJulAugSepOctNovDec", mon)-1)/3)+1)
}
/ E[DS]T [[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ {
month=$2
day=$3
year=$6
date=sprintf("%4d%02d%02d", year, mon2num(month), day)
total[date]++
human[date] = sprintf("%3s %2d, %4d", month, day, year)
}
END {
sort_coprocess = "sort"
for (date in total) {
print date |& sort_coprocess
}
close(sort_coprocess, "to")
print "Date\tCount"
while ((sort_coprocess |& getline date) > 0) {
print human[date] "\t" total[date]
}
close(sort_coprocess)
}
' original.txt
if you are using gawk
awk 'BEGIN{
s="03/05/2009"
m=split(s,date,"/")
t=date[3]" "date[2]" "date[1]" 0 0 0"
print strftime("%b %d",mktime(t))
}'
the above is just an example, as you did not show your actual code and so cannot incorporate it into your code.
Why don't you prepend your awk-date to the original date? This yields a sortable key, but is human readable.
(Note: to sort right, you should make it yyyymmdd)
If needed, cut can remove the prepended column.
Gawk has strftime(). You can also call the date command to format them (man). Linux Forums gives some examples.

Resources