Humanized dates with awk? - bash

I have this awk script that runs through a file and counts every occurrence of a given date. The date format in the original file is the standard date format, like this: Thu Mar 5 16:46:15 EST 2009 I use awk to throw away the weekday, time, and timezone, and then do my counting by pumping the dates into an associative array with the dates as indices.
In order to get the output to be sorted by date, I converted the dates to a different format that I could sort with bash sort.
Now, my output looks like this:
Date Count
03/05/2009 2
03/06/2009 1
05/13/2009 7
05/22/2009 14
05/23/2009 7
05/25/2009 7
05/29/2009 11
06/02/2009 12
06/03/2009 16
I'd really like the output to have more human readable dates, like this:
Mar 5, 2009
Mar 6, 2009
May 13, 2009
May 22, 2009
May 23, 2009
May 25, 2009
May 29, 2009
Jun 2, 2009
Jun 3, 2009
Any suggestions for a way I could do this? If I could do this on the fly when I output the count values that would be best.
UPDATE:
Here's my solution incorporating ghostdog74's example code:
grep -i "E[DS]T 2009" original.txt | awk '{printf "%s %2.d, %s\r\n",$2,$3,$6}' >dates.txt #outputs dates for counting
date -f dates.txt +'%Y %m %d' | awk ' #reformat dates as YYYYMMDD for future sort
{++total[$0]} #pump dates into associative array
END {
for (item in total) printf "%s\t%s\r\n", item, total[item] #output dates as yyyy mm dd with counts
}' | sort -t \t | awk ' #send to sort, then to cleanup
BEGIN {printf "%s\t%s\r\n","Date","Count"}
{t=$1" "$2" "$3" 0 0 0" #cleanup using example by ghostdog74
printf "%s\t%2.d\r\n",strftime("%b %d, %Y",mktime(t)),$4
}'
rm dates.txt
Sorry this looks so messy. I've tried to put clarifying comments in.

Use awk's sort and date's stdin to greatly simplify the script
Date will accept input from stdin so you can eliminate one pipe to awk and the temporary file. You can also eliminate a pipe to sort by using awk's array sort and as a result, eliminate another pipe to awk. Also, there's no need for a coprocess.
This script uses date for the monthname conversion which would presumably continue to work in other languages (ignoring the timezone and month/day order issues, though).
The end result looks like "grep|date|awk". I have broken it into separate lines for readability (it would be about half as big if the comments were eliminated):
grep -i "E[DS]T 2009" original.txt |
date -f - +'%Y %m %d' | #reformat dates as YYYYMMDD for future sort
awk '
BEGIN { printf "%s\t%s\r\n","Date","Count" }
{ ++total[$0] #pump dates into associative array }
END {
idx=1
for (item in total) {
d[idx]=item;idx++ # copy the array indices into the contents of a new array
}
c=asort(d) # sort the contents of the copy
for (i=1;i<=c;i++) { # use the contents of the copy to index into the original
printf "%s\t%2.d\r\n",strftime("%b %e, %Y",mktime(d[i]" 0 0 0")),total[d[i]]
}
}'

I get testy when I see someone using grep and awk (and sed, cut, ...) in a pipeline. Awk can fully handle the work of many utilities.
Here's a way to clean up your updated code to run in a single instance of awk (well, gawk), and using sort as a co-process:
gawk '
BEGIN {
IGNORECASE = 1
}
function mon2num(mon) {
return(((index("JanFebMarAprMayJunJulAugSepOctNovDec", mon)-1)/3)+1)
}
/ E[DS]T [[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ {
month=$2
day=$3
year=$6
date=sprintf("%4d%02d%02d", year, mon2num(month), day)
total[date]++
human[date] = sprintf("%3s %2d, %4d", month, day, year)
}
END {
sort_coprocess = "sort"
for (date in total) {
print date |& sort_coprocess
}
close(sort_coprocess, "to")
print "Date\tCount"
while ((sort_coprocess |& getline date) > 0) {
print human[date] "\t" total[date]
}
close(sort_coprocess)
}
' original.txt

if you are using gawk
awk 'BEGIN{
s="03/05/2009"
m=split(s,date,"/")
t=date[3]" "date[2]" "date[1]" 0 0 0"
print strftime("%b %d",mktime(t))
}'
the above is just an example, as you did not show your actual code and so cannot incorporate it into your code.

Why don't you prepend your awk-date to the original date? This yields a sortable key, but is human readable.
(Note: to sort right, you should make it yyyymmdd)
If needed, cut can remove the prepended column.

Gawk has strftime(). You can also call the date command to format them (man). Linux Forums gives some examples.

Related

How do I sort a "MON_YYYY_day_NUM" time with UNIX tools?

I'm wondering how do i sort this example based on time. I have already sorted it based on everything else, but i just cannot figure out how to go sort it using time (the 07:30 part for example).
My current code:
sort -t"_" -k3n -k2M -k5n (still need to implement the time sort for the last sort)
What still needs to be sorted is the time:
Dunaj_Dec_2000_day_1_13:00.jpg
Rim_Jan_2001_day_1_13:00.jpg
Ljubljana_Nov_2002_day_2_07:10.jpg
Rim_Jan_2003_day_3_08:40.jpg
Rim_Jan_2003_day_3_08:30.jpg
Any help or just a point in the right direction is greatly appreciated!
Alphabetically; 24h time with a fixed number of digits is okay to sort using a plain alphabetic sort.
sort -t"_" -k3n -k2M -k5n -k6 # default sorting
sort -t"_" -k3n -k2M -k5n -k6V # version-number sort.
There's also a version sort V which would work fine.
I have to admit to shamelessly stealing from this answer on SO:
How to split log file in bash based on time condition
awk -F'[_:.]' '
BEGIN {
months["Jan"] = 1
months["Feb"] = 2
months["Mar"] = 3
months["Apr"] = 4
months["May"] = 5
months["Jun"] = 6
months["Jul"] = 7
months["Aug"] = 8
months["Sep"] = 9
months["Oct"] = 10
months["Nov"] = 11
months["Dec"] = 12
}
{ print mktime($3" "months[$2]" "$5" "$6" "$7" 00"), $0 }
' input | sort -n | cut -d\ -f2-
Use _:.\ field separator characters to parse each file name.
Initialize an associative array so we can map month names to numerical values (1-12)
Uses awk function mktime() - it takes a string in the format of "YYYY MM DD HH MM SS [ DST ]" as per https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html. Each line of input is print with a column prepending with the time in epoch seconds.
The results are piped to sort -n which will sort numerically the first column
Now that the results are sorted, we can remove the first column with cut
I have a MAC, so I had to use gawk to get the mktime function (it's not available with MacOS awk normally ). mawk is another option I've read.

Compare two timestamp columns and if difference is greater than 1 hour, trigger email alert(bash)

I have a file that looks like this:
user1,135.4,MATLAB,server1,14:53:59,15:54:28
user2,3432,Solver_HF+,server1,14:52:01,14:54:28
user3,3432,Solver_HF+,server1,14:52:01,15:54:14
user4,3432,Solver_HF+,server1,14:52:01,14:54:36
I want to run a comparison between the last two columns and if the difference is greater than an hour(such as lines 1 and 3) it will trigger something like this:
echo "individual line from file" | mail -s "subject" email#site.com
I was trying to come up with a possible solution using awk, but I'm still fairly new to linux and couldn't quite figure out something that worked.
the following awk scripts maybe is your want
awk 'BEGIN{FS=","}
{a="2019 01 01 " gensub(":"," ","g",$5);
b="2019 01 01 " gensub(":"," ","g",$6);
c = int((mktime(b)-mktime(a))/60)}
{if (c >= 60){system("echo "individual line from file" | mail -s "subject" email#site.com")}}' your_filename
then put the scritps into crontab or other trigger
for example
*/5 * * * * awk_scripts.sh
if you just want check new line . use tail -n filename may be more useful than cat
Here you go: (using gnu awk due to mktime)
awk -F, '{
split($(NF-1),t1,":");
split($NF,t2,":");
d1=mktime("0 0 0 "t1[1]" "t1[2]" "t1[3]" 0");
d2=mktime("0 0 0 "t2[1]" "t2[2]" "t2[3]" 0");
if (d2-d1>3600) print $0}' file
user1,135.4,MATLAB,server1,14:53:59,15:54:28
user3,3432,Solver_HF+,server1,14:52:01,15:54:14
Using field separator as comma to get the second last and last field.
The split the two field inn to array t1 and t2 to get hour min sec
mktime converts this to seconds.
do the math and print only lines with more than 3600 seconds
This can then be piped to other commands.
See how time function are used int gnu awk: https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html

how can text be wrapped in columns using gawk [duplicate]

This question already has answers here:
How to wrap lines within columns in Linux
(2 answers)
Closed 3 years ago.
I am trying to wrap columns of text using gawk or native bash, The fourth column (last one in this case) wraps to the next line. I would like it to wrap and all text remain under the respective heading. The output for the given input is just representative and text needing wrapped is in the last column. However id like to wrap ANY column
I have tried embedding fmt and fold commands in the awk script but have been unsuccessful in getting results required.
awk 'BEGIN{FS="|"; format="%-35s %-7s %-5s %-20s\n"
printf "\n"
printf format, "Date", "Task ID", "Code", "Description"
printf format ,"-------------------------", "-------", "-----", "------------------------------"}
{printf format, strftime("%c",$1), $2, $3, $4}'
INPUT:
1563685965|878|12015|Task HMI starting
1563686011|881|5041|Configured with engine 6000.8403 (/opt/NAI/LinuxShield/engine/lib/liblnxfv.so), dats 9322.0000 (/opt/NAI/LinuxShield/engine/dat), 197 extensions, 0 extra drivers
1563686011|882|5059|Created Scanner child id=1 pid=28,698 engine=6000.8403, dats=9322.0000
1563686139|883|12017|Task HMI Completed 2 items detected in 19 files (0 files timed out, 0 files excluded, 0 files cleaned, 0 files had errors, 0 were not scanned)
1563686139|885|5012|scanned=19 excluded=0 infected=2 cleaned=0 cleanAttempts=0 cleanRequests=0 denied=0 repaired=0 deleted=0 renamed=0 quarantined=0 timeouts=0 errors=0 uptime=174 busy=0 wait=0
I am still unclear on how to post or share on formation on this forum. This seems to work fairly well. The wrap function was taken from the duplicate post.
BEGIN{
format="%-35s %-7s %-10s %-20s\n"
printf "\n"
printf format, "Date", "Task ID", "Code", "Description"
printf format ,"-------------------------", "-------", "-----", "------------------------------"
}
{
split($0,cols,"|")
numLines=1
for(col in cols){
numLines=wrap(cols[col],80,colArr)
for(c in colArr){
fmtcol[col,c] = colArr[c]
}
maxLinesRow[col]=(numLines > maxLinesRow[col] ? numLines : maxLinesRow[col])
}
for (lineNr=1; lineNr<=maxLinesRow[col]; lineNr++) {
dt=((1,lineNr) in fmtcol ? strftime("%c",fmtcol[1,lineNr]):"")
printf format, dt, fmtcol[2,lineNr], fmtcol[3,lineNr], fmtcol[4,lineNr]
}
printf "\n"
delete colArr
}
function wrap(inStr,wid,outArr, lineEnd,numLines) {
while ( length(inStr) > wid ) {
lineEnd = ( match(substr(inStr,1,wid),/.*[[:space:]]/) ? RLENGTH - 1 : wid )
outArr[++numLines] = substr(inStr,1,lineEnd)
inStr = substr(inStr,lineEnd+1)
sub(/^[[:space:]]+/,"",inStr)
}
outArr[++numLines] = inStr
return numLines
}
I know you said you're using gawk, but tabular formatting with line wrapping like you want is really easy to do with perl, so here's a perl solution using the format feature's repeated fill mode:
#!/usr/bin/env perl
use warnings;
use strict;
use POSIX qw/strftime/;
printf "%-40s %-20s\n", 'Date', 'Description';
print '-' x 40, ' ', '-' x 20, "\n";
my ($date, $desc);
while (<>) {
chomp;
($date, $desc) = split '\|', $_;
$date = strftime '%c', localtime($date);
write;
}
format STDOUT =
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<< ~~
$date, $desc
.
Example:
$ cat input.txt
1000|this is some text
2000|this is some other text that is long enough that it will wrap around a bit.
$ perl fmt.pl input.txt
Date Description
---------------------------------------- --------------------
Wed, Dec 31, 1969 4:16:40 PM this is some text
Wed, Dec 31, 1969 4:33:20 PM this is some other
text that is long
enough that it will
wrap around a bit.

How to compare a field of a file with current timestamp and print the greater and lesser data?

How do I compare current timestamp and a field of a file and print the matched and unmatched data. I have 2 columns in a file (see below)
oac.bat 09:09
klm.txt 9:00
I want to compare the timestamp(2nd column) with current time say suppose(10:00) and print the output as follows.
At 10:00
greater.txt
xyz.txt 10:32
mnp.csv 23:54
Lesser.txt
oac.bat 09:09
klm.txt 9:00
Could anyone help me on this please ?
I used awk $0 > "10:00", which gives me only 2nd column details but I want both the column details and I am taking timestamp from system directly from system with a variable like
d=`date +%H:%M`
With GNU awk you can just use it's builtin time functions:
awk 'BEGIN{now = strftime("%H:%M")} {
split($NF,t,/:/)
cur=sprintf("%02d:%02d",t[1],t[2])
print > ((cur > now ? "greater" : "lesser") ".txt")
}' file
With other awks just set now using -v and date up front, e.g.:
awk -v now="$(date +"%H:%M")" '{
split($NF,t,/:/)
cur = sprintf("%02d:%02d",t[1],t[2])
print > ((cur > now ? "greater" : "lesser") ".txt")
}' file
The above is untested since you didn't provide input/output we could test against.
Pure Bash
The script can be implemented in pure Bash with the help of date command:
# Current Unix timestamp
let cmp_seconds=$(date +%s)
# Read file line by line
while IFS= read -r line; do
let line_seconds=$(date -d "${line##* }" +%s) || continue
(( line_seconds <= cmp_seconds )) && \
outfile=lesser || outfile=greater
# Append the line to the file chosen above
printf "%s\n" "$line" >> "${outfile}.txt"
done < file
In this script, ${line##* } removes the longest match of '* ' (any character followed by a space) pattern from the front of $line thus fetching the last column (the time). The time column is supposed to be in one of the following formats: HH:MM, or H:MM. Actually, date's -d option argument
can be in almost any common format. It can contain month names, time zones, ‘am’ and ‘pm’, ‘yesterday’, etc.
We use the flexibility of this option to convert the time (HH:MM, or H:MM) to Unix timestamp.
The let builtin allows arithmetic to be performed on shell variables. If the last let expression fails, or evaluates to zero, let returns 1 (error code), otherwise 0 (success). Thus, if for some reason the time column is in invalid format, the iteration for such line will be skipped with the help of continue.
Perl
Here is a Perl version I have written just for fun. You may use it instead of the Bash version, if you like.
# For current date
#cmp_seconds=$(date +%s)
# For specific hours and minutes
cmp_seconds=$(date -d '10:05' +%s)
perl -e '
my #t = localtime('$cmp_seconds');
my $minutes = $t[2] * 60 + $t[1];
while (<>) {
/ (\d?\d):(\d\d)$/ or next;
my $fh = ($1 * 60 + $2) > $minutes ? STDOUT : STDERR;
printf $fh "%s", $_;
}' < file >greater.txt 2>lesser.txt
The script computes the number of minutes in the following way:
HH:MM = HH * 60 + MM minutes
If the number of minutes from the file are greater then the number of minutes for the current time, it prints the next line to the standard output, otherwise to standard error. Finally, the standard output is redirected to greater.txt, and the standard error is redirected to lesser.txt.
I have written this script for demonstration of another approach (algorithm), which can be implemented in different languages, including Bash.

Running shell commands within AWK

I'm trying to work on a logfile, and I need to be able to specify the range of dates. So far (before any processing), I'm converting a date/time string to timestamp using date --date "monday" +%s.
Now, I want to be able to iterate over each line in a file, but check if the date (in a human readable format) is within the allowed range. To do this, I'd like to do something like the following:
echo `awk '{if(`date --date "$3 $4 $5 $6 $7" +%s` > $START && `date --date "" +%s` <= $END){/*processing code here*/}}' myfile`
I don't even know if thats possible... I've tried a lot of variations, plus I couldn't find anything understandable/usable online.
Thanks
Update:
Example of myfile is as follows. Its logging IPs and access times:
123.80.114.20 Sun May 01 11:52:28 GMT 2011
144.124.67.139 Sun May 01 16:11:31 GMT 2011
178.221.138.12 Mon May 02 08:59:23 GMT 2011
Given what you have to do, its really not that hard AND it is much more efficient to do your date processing by converting to strings and comparing.
Here's a partial solution that uses associative arrays to convert the month value to a number. Then you rely on the %02d format specifier to ensure 2 digits. You can reformat the dateTime value with '.', etc or leave the colons in the hr:min:sec if you really need the human readability.
The YYYYMMDD format is a big help in these sort of problems, as LT, GT, EQ all work without any further formatting.
echo "178.221.138.12 Mon May 02 08:59:23 GMT 2011" \
| awk 'BEGIN {
mons["Jan"]=1 ; mons["Feb"]=2; mons["Mar"]=3
mons["Apr"]=4 ; mons["May"]=5; mons["Jun"]=6
mons["Jul"]=7 ; mons["Aug"]=8; mons["Sep"]=9
mons["Oct"]=10 ; mons["Nov"]=11; mons["Dec"]=12
}
{
# 178.221.138.12 Mon May 02 08:59:23 GMT 2011
printf("dateTime=%04d%02d%02d%02d%02d%02d\n",
$NF, mons[$3], $4, substr($5,1,2), substr($5,4,2), substr($5,7,2) )
} ' -v StartTime=20110105235959
The -v StartTime is ilustrative of how to pass in (and the matching format) your starTime value.
I hope this helps.
Here's an alternative approach using awk's built-in mktime() function. I've never bothered with the month parsing until now - thanks to shelter for that part (see accepted answer). It always feels time to switch language around that point.
#!/bin/bash
# input format:
#(1 2 3 4 5 6 7)
#123.80.114.20 Sun May 01 11:52:28 GMT 2011
awk -v startTime=1304252691 -v endTime=1306000000 '
BEGIN {
mons["Jan"]=1 ; mons["Feb"]=2; mons["Mar"]=3
mons["Apr"]=4 ; mons["May"]=5; mons["Jun"]=6
mons["Jul"]=7 ; mons["Aug"]=8; mons["Sep"]=9
mons["Oct"]=10 ; mons["Nov"]=11; mons["Dec"]=12;
}
{
hmsSpaced=$5; gsub(":"," ",hmsSpaced);
timeInSec=mktime($7" "mons[$3]" "$4" "hmsSpaced);
if (timeInSec > startTime && timeInSec <= endTime) print $0
}' myfile
(I've chosen example time thresholds to select only the last two log lines.)
Note that if the mktime() function were a bit smarter this whole thing would reduce to:
awk -v startTime=1304252691 -v endTime=1306000000 't=mktime($7" "$3" "$4" "$5); if (t > startTime && t <= endTime) print $0}' myfile
I'm not sure of the format of the data you're parsing, but I do know that you can't use the backticks within single quotes. You'll have to use double quotes. If there are too many quotes being nested, and it's confusing you, you can also just save the output of your date command to a variable beforehand.

Resources