Awk exctracting column but just from one row - bash

I have a script in the following form:
2017-12-11 10:20:16.993 ...
2017-12-12 10:19:16.993 ...
2017-12-13 10:17:16.993 ...
and I want to extract the first column via awk - F. , and compare it to actual system time in seconds and print the line if the difference is less than 300 seconds.
> SYSTEM_TIME=$(date +%s)
> awk -F. -v system_time=$SYSTEM_TIME '{gsub(/[-:]/," ",$1); if(system_time-mktime($1) <= 300) {print $0}}' log.txt
This is my code, but I can't use mktime because it's not in the POSIX norm. Can it be done without it?
Thanks,
Ahmed

General Remark: logfiles are often incomplete. A date-time format is given, but often the time-zone is missing. When daylight-saving comes into-play it can mess up your complete karma if you are missing your timezone.
Note: In all commands below, it will be assumed that the date in the logfile is in UTC and that the system runs in UTC. If this is not the case, be aware that daylight saving time will create problems when running any of the commands below arround the time daylight-saving kicks in.
Combination of date and awk: (not POSIX)
If your date command has the -d flag (not POSIX), you can run the following:
awk -v r="(date -d '300 seconds ago' '+%F %T.%3N)" '(r < $0)'
GNU awk only:
If you want to make use of mktime, it is then easier to just do:
awk 'BEGIN{s=systime();FS=OFS="."}
{t=$1;gsub(/[-:]/," ",t); t=mktime(t)}
(t-s < 300)' logfile
I will be under the assumption that the log-files are not created in the future, so all times are always smaller than system time.
POSIX:
If you cannot make use of mktime but want to use posix only, which also implies that date does not have the -d flag, you can create your own implementation of mktime. Be aware, that the version presented here does not do any timezone corrections as is done with mktime. mktime_posix assumes that the datestring is in UTC
awk -v s="$(date +%s)" '
# Algorithm from "Astronomical Algorithms" By J.Meeus
function mktime_posix(datestring, a,t) {
split(datestring,a," ")
if (a[1] < 1970) return -1
if (a[2] <= 2) { a[1]--; a[2]+=12 }
t=int(a[1]/100); t=2-t+int(t/4)
t=int(365.25*a[1]) + int(30.6001*(a[2]+1)) + a[3] + t - 719593
return t*86400 + a[4]*3600 + a[5]*60 + a[6]
}
BEGIN{FS=OFS="."}
{t=$1;gsub(/[-:]/," ",t); t=mktime_posix(t)}
(t-s <= 300)' logfile
Related: this answer, this answer

I can think in doing this as its shorter.
#!/bin/bash
SYSTEM_TIME=$(date +%s)
LOGTIME=$( date "+%s" -d "$( awk -F'.' '{print $1}' <( head -1 inputtime.txt ))" )
DIFFERENCEINSECONDS=$( echo "$SYSTEM_TIME $LOGTIME" | awk '{ print ($1 - $2)}' )
if [[ "$DIFFERENCEINSECONDS" -gt 300 ]]
then
echo "TRIGGERED!"
fi
Hope its useful for you. Let me know.
Note : I assumed your input log could be called inputtime.txt. You need to change for your actual filename of course.

Related

Is it really slow to handle text file(more than 10K lines) with shell script?

I have a file with more than 10K lines of record.
Within each line, there are two date+time info. Below is an example:
"aaa bbb ccc 170915 200801 12;ddd e f; g; hh; 171020 122030 10; ii jj kk;"
I want to filter out the lines the days between these two dates is less than 30 days.
Below is my source code:
#!/bin/bash
filename="$1"
echo $filename
touch filterfile
totalline=`wc -l $filename | awk '{print $1}'`
i=0
j=0
echo $totalline lines
while read -r line
do
i=$[i+1]
if [ $i -gt $[j+9] ]; then
j=$i
echo $i
fi
shortline=`echo $line | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'`
date1=`echo $shortline | awk '{print $1}'`
date2=`echo $shortline | awk '{print $2}'`
if [ $date1 -gt 700000 ]
then
continue
fi
d1=`date -d $date1 +%s`
d2=`date -d $date2 +%s`
diffday=$[(d2-d1)/(24*3600)]
#diffdays=`date -d $date2 +%s` - `date -d $date1 +%s`)/(24*3600)
if [ $diffday -lt 30 ]
then
echo $line >> filterfile
fi
done < "$filename"
I am running it in cywin. It took about 10 second to handle 10 lines. I use echo $i to show the progress.
Is it because i am using some wrong way in my script?
This answer does not answer your question but gives an alternative method to your shell script. The answer to your question is given by Sundeep's comment :
Why is using a shell loop to process text considered bad practice?
Furthermore, you should be aware that everytime you call sed, awk, echo, date, ... you are requesting the system to execute a binary which needs to be loaded into memory etc etc. So if you do this in a loop, it is very inefficient.
alternative solution
awk programs are commonly used to process log files containing timestamp information, indicating when a particular log record was written. gawk extended the awk standard with time-handling functions. The one you are interested in is :
mktime(datespec [, utc-flag ]) Turn datespec into a timestamp in the
same form as is returned by systime(). It is similar to the function
of the same name in ISO C. The argument, datespec, is a string of the
form "YYYY MM DD HH MM SS [DST]". The string consists of six or seven
numbers representing, respectively, the full year including century,
the month from 1 to 12, the day of the month from 1 to 31, the hour of
the day from 0 to 23, the minute from 0 to 59, the second from 0 to
60, and an optional daylight-savings flag.
The values of these numbers need not be within the ranges specified;
for example, an hour of -1 means 1 hour before midnight. The
origin-zero Gregorian calendar is assumed, with year 0 preceding year
1 and year -1 preceding year 0. If utc-flag is present and is either
nonzero or non-null, the time is assumed to be in the UTC time zone;
otherwise, the time is assumed to be in the local time zone. If the
DST daylight-savings flag is positive, the time is assumed to be
daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), mktime() attempts to determine
whether daylight savings time is in effect for the specified time.
If datespec does not contain enough elements or if the resulting time
is out of range, mktime() returns -1.
As your date format is of the form yymmdd HHMMSS we need to write a parser function convertTime for this. Be aware in this function we will pass times of the form yymmddHHMMSS. Furthermore, using a space delimited fields, your times are located in field $4$5 and $11$12. As mktime converts the time to seconds since 1970-01-01 onwards, all we need to do is to check if the delta time is smaller than 30*24*3600 seconds.

awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ t1=convertTime($4$5); t2=convertTime($11$12)}
(t2-t1 < 30*3600*24) { print }' <file>
If you are not interested in the real delta time (your sed line removes the actual time of the day), than you can adopt it to :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s "00 00 00"
return mktime(s)
}
{ t1=convertTime($4); t2=convertTime($11)}
(t2-t1 < 30*3600*24) { print }' <file>
If the dates are not in the fields, you can use match to find them :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ match($0,/[0-9]{6} [0-9]{6}/);
t1=convertTime(substr($0,RSTART,RLENGTH));
a=substr($0,RSTART+RLENGTH)
match(a,/[0-9]{6} [0-9]{6}/)
t2=convertTime(substr(a,RSTART,RLENGTH))}
(t2-t1 < 30*3600*24) { print }' <file>
With some modifications, often without speed in mind, I can reduce the processing time by 50% - which is a lot:
#!/bin/bash
filename="$1"
echo "$filename"
# touch filterfile
totalline=$(wc -l < "$filename")
i=0
j=0
echo "$totalline" lines
while read -r line
do
i=$((i+1))
if (( i > ((j+9)) )); then
j=$i
echo $i
fi
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
if (( date1 > 700000 ))
then
continue
fi
d1=$(date -d "$date1" +%s)
d2=$(date -d "$date2" +%s)
diffday=$(((d2-d1)/(24*3600)))
# diffdays=$(date -d $date2 +%s) - $(date -d $date1 +%s))/(24*3600)
if (( diffday < 30 ))
then
echo "$line" >> filterfile
fi
done < "$filename"
Some remarks:
# touch filterfile
Well - the later CMD >> filterfile overwrites this file and creates one, if it doesn't exist.
totalline=$(wc -l < "$filename")
You don't need awk, here. The filename output is surpressed if wc doesn't see the filename.
Capturing the output in an array:
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
allows us array access and saves another call to awk.
On my machine, your code took about 42s for 2880 lines (on your machine 2880 s?) and about 19s for the same file with my code.
So I suspect, if you aren't running it on an i486-machine, that cygwin might be a slowdown. It's a linux environment for windows, isn't it? Well, I'm on a core Linux system. Maybe you try the gnu-utils for Windows - the last time I looked for them, they were advertised as gnu-utils x32 or something, maybe there is an a64-version available by now.
And the next thing I would have a look at, is the date calculation - that might be a slowdown too.
2880 lines isn't that much, so I don't suspect that my SDD drive plays a huge role in the game.

Remove lines having older end time

In my bash script I want to add a code which remove all entries older than x days.
To simplify this problem, I have divided into 3 parts. 2 parts are
done looking answer for 3rd part.)
a) To find the latest log date - Done
b) evaluate earliest epoch time. (All entries before this epoch
time should be deleted) - Done
No_OF_DAYS=2
One_Day=86400000
Latest_Time=`find . -name '*.tps' -exec sed '/endTime/!d; s/{//; s/,.*//' {} + | sort -r | head -1 | cut -d: -f2` #latest epoch time
Days_in_Epoch=$(($One_Day * $No_OF_DAYS))
Earliest_Time=$((Latest_Time - $Days_in_Epoch)) #earliest epoch time
c) delete all log entries older than evaluated earliest time.
PS:
there are multiple files and distributed in different sub folders.
All files having extension as ".tps".
time is in epoch format. endTime will be considered for calculations.("endTime":1488902735220)
sample data
Code:
{"endTime":1488902734775,"startTime":1488902734775,"operationIdentity":"publishCacheStatistics","name":"murex.risk.control.excesses.cache.CacheStatisticsTracer","context":{"parentContext":{"id":-1,"parentContext":null},"data":[{"value":"excessCacheExcessKeysToContexts","key":"name"},{"value":"0","key":"hits"},{"value":"0","key":"misses"},{"value":"0","key":"count"},{"value":"0","key":"maxElements"},{"value":"0","key":"evictions"},{"value":"N/A","key":"policy"}],"id":0}}
{"endTime":1488902735220,"startTime":1488902735220,"operationIdentity":"publishCacheStatistics","name":"murex.risk.control.excesses.cache.CacheStatisticsTracer","context":{"parentContext":{"id":-1,"parentContext":null},"data":[{"value":"excessCacheExcessKeysToContexts","key":"name"},{"value":"0","key":"hits"},{"value":"0","key":"misses"},{"value":"0","key":"count"},{"value":"0","key":"maxElements"},{"value":"0","key":"evictions"},{"value":"N/A","key":"policy"}],"id":8}}
{"endTime":1488902735550,"startTime":1488902735550,"operationIdentity":"publishCacheStatistics","name":"murex.risk.control.excesses.cache.CacheStatisticsTracer","context":{"parentContext":{"id":-1,"parentContext":null},"data":[{"value":"excessCacheContextsToExcessIds","key":"name"},{"value":"0","key":"hits"},{"value":"0","key":"misses"},{"value":"0","key":"count"},{"value":"0","key":"maxElements"},{"value":"0","key":"evictions"},{"value":"N/A","key":"policy"}],"id":9}}
For Example:
a)
latest epoch time = 1488902735550
b)
earliest epoch time = 1488902735220
Problem: Now I am looking for command which delete all the entries which is older/lesses than earliest epoch time. In above example 1st line should be deleted.
Any help/suggestions are appreciated. Thank you Linux
This will do the trick buddy. Be careful to test it with backup files first as it will overwrite your logs directly. Also change the TIME variable for whatever you want to compare.
while read file
do
awk -v FS=':|,' -v TIME='1488902735220' '{ if (! ($2 > TIME) && !( $0 ~ /^ *$/ ) ) { print $0 } }' $file > tmp.txt && cat tmp.txt > $file
done < <( find ./ -name '*.tps' 2>/dev/null )
Regards!
Based on your current solution, I'd use a simple loop to read the file line by line and only output those whose endTime is greater than your earliest time :
while read line; do
line_endTime=$(awk -F '[:,]' '{print $2}' <<< $line)
if [ "$line_endTime" -le "$Earliest_Time" ]; then echo $line; fi
done < input_file > filtered_output_file

How to compare a field of a file with current timestamp and print the greater and lesser data?

How do I compare current timestamp and a field of a file and print the matched and unmatched data. I have 2 columns in a file (see below)
oac.bat 09:09
klm.txt 9:00
I want to compare the timestamp(2nd column) with current time say suppose(10:00) and print the output as follows.
At 10:00
greater.txt
xyz.txt 10:32
mnp.csv 23:54
Lesser.txt
oac.bat 09:09
klm.txt 9:00
Could anyone help me on this please ?
I used awk $0 > "10:00", which gives me only 2nd column details but I want both the column details and I am taking timestamp from system directly from system with a variable like
d=`date +%H:%M`
With GNU awk you can just use it's builtin time functions:
awk 'BEGIN{now = strftime("%H:%M")} {
split($NF,t,/:/)
cur=sprintf("%02d:%02d",t[1],t[2])
print > ((cur > now ? "greater" : "lesser") ".txt")
}' file
With other awks just set now using -v and date up front, e.g.:
awk -v now="$(date +"%H:%M")" '{
split($NF,t,/:/)
cur = sprintf("%02d:%02d",t[1],t[2])
print > ((cur > now ? "greater" : "lesser") ".txt")
}' file
The above is untested since you didn't provide input/output we could test against.
Pure Bash
The script can be implemented in pure Bash with the help of date command:
# Current Unix timestamp
let cmp_seconds=$(date +%s)
# Read file line by line
while IFS= read -r line; do
let line_seconds=$(date -d "${line##* }" +%s) || continue
(( line_seconds <= cmp_seconds )) && \
outfile=lesser || outfile=greater
# Append the line to the file chosen above
printf "%s\n" "$line" >> "${outfile}.txt"
done < file
In this script, ${line##* } removes the longest match of '* ' (any character followed by a space) pattern from the front of $line thus fetching the last column (the time). The time column is supposed to be in one of the following formats: HH:MM, or H:MM. Actually, date's -d option argument
can be in almost any common format. It can contain month names, time zones, ‘am’ and ‘pm’, ‘yesterday’, etc.
We use the flexibility of this option to convert the time (HH:MM, or H:MM) to Unix timestamp.
The let builtin allows arithmetic to be performed on shell variables. If the last let expression fails, or evaluates to zero, let returns 1 (error code), otherwise 0 (success). Thus, if for some reason the time column is in invalid format, the iteration for such line will be skipped with the help of continue.
Perl
Here is a Perl version I have written just for fun. You may use it instead of the Bash version, if you like.
# For current date
#cmp_seconds=$(date +%s)
# For specific hours and minutes
cmp_seconds=$(date -d '10:05' +%s)
perl -e '
my #t = localtime('$cmp_seconds');
my $minutes = $t[2] * 60 + $t[1];
while (<>) {
/ (\d?\d):(\d\d)$/ or next;
my $fh = ($1 * 60 + $2) > $minutes ? STDOUT : STDERR;
printf $fh "%s", $_;
}' < file >greater.txt 2>lesser.txt
The script computes the number of minutes in the following way:
HH:MM = HH * 60 + MM minutes
If the number of minutes from the file are greater then the number of minutes for the current time, it prints the next line to the standard output, otherwise to standard error. Finally, the standard output is redirected to greater.txt, and the standard error is redirected to lesser.txt.
I have written this script for demonstration of another approach (algorithm), which can be implemented in different languages, including Bash.

Format and compute time difference between dates (awk/sed)

I am trying to compute time difference between dates formatted as below:
dd/mm/YY;hh:mm:ss;dd/mm/YY;hh:mm:ss (the first couple dd/mm/YY;hh:mm:ss points out the start date and the second couple is
the end date)
I want the output to be like this:
dd/mm/YY;hh:mm:ss;dd/mm/YY;hh:mm:ss;hh:mm:ss , where the added hh:mm:ss is the time difference between both dates.
Here is an example:
INPUT:
12/11/15;20:04:09;13/11/15;08:46:26
13/11/15;20:05:34;14/11/15;08:42:04
14/11/15;20:02:47;16/11/15;08:44:43
OUTPUT:
12/11/15;20:04:09;13/11/15;08:46:26;12:42:17
13/11/15;20:05:34;14/11/15;08:42:04;12:36:30
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
I've tried a lot of things with gsub, mktime and awk, in order to format dates, but nothing is efficient enough (too many operations to format and split).
Here is my attempt:
cat times.txt | awk -F';' '{gsub(/[/:]/," ",$0);d1=mktime("20"substr($1,7,2)" "substr($1,4,2)" "substr($1,1,2)" "$2);d2=mktime("20"substr($3,7,2)" "substr($3,4,2)" "substr($3,1,2)" "$4); print strftime("%H:%M:%S", d2-d1,1);}' > timestamps.txt
paste -d";" times.txt timestamps.txt
What do you suggest?
Thank you :)
You could try this and save some gsub and substr calls:
awk -F'[:;/]' '{d1=mktime("20"$3" "$2" "$1" "$4" "$5" "$6);
d2=mktime("20"$9" "$8" "$7" "$10" "$11" "$12);
delta = d2-d1
sec = delta%60
min = (delta - sec)%3600/60
hrs = int(delta/3600)
print $0";"(hrs < 10 ? "0"hrs : hrs)\
":"(min < 10 ? "0"min : min)\
":"(sec < 10 ? "0"sec : sec);}' time.txt
Since we cannot use strftime (tanks to Ed Morton), we have to handle the case that hours > 23 or hour/min/sec < 10 manually.
The above code outputs:
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
14/11/15;20:02:47;14/11/15;20:02:48;00:00:01
for the input
14/11/15;20:02:47;16/11/15;08:44:43
14/11/15;20:02:47;14/11/15;20:02:48
You cannot do this job robustly without mktime() as the time difference calculation needs to account for leap days, leap seconds, etc. I don't think you can do it any more efficiently than this:
$ cat tst.awk
BEGIN { FS="[/;:]" }
{
d1 = mktime("20"$3" "$2" "$1" "$4" "$5" "$6)
d2 = mktime("20"$9" "$8" "$7" "$10" "$11" "$12)
delta = d2 - d1
hrs = int(delta/3600)
min = int((delta - hrs*3600)/60)
sec = delta - (hrs*3600 + min*60)
printf "%s;%02d:%02d:%02d\n", $0, hrs, min, sec
}
$ awk -f tst.awk file
12/11/15;20:04:09;13/11/15;08:46:26;12:42:17
13/11/15;20:05:34;14/11/15;08:42:04;12:36:30
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
Note - you cannot use strftime() [alone] to calculate the hrs, mins, and secs because when your delta value is more than 1 day strftime() will return the hrs, mins, and secs associated with the time of day on the last day of that delta instead of the total number of hrs, mins, and secs associated with the entire delta.
What you're asking will be pretty tricky traditional awk.
Of course, gawk (GNU awk) supports mktime, but other awk implementations do not. But you can do this directly in bash, relying on the date command for your conversion. This solution uses BSD date (so it'll work in FreeBSD, NetBSD, OpenBSD, OSX, etc).
while IFS=\; read date1 time1 date2 time2; do
stamp1=$(date -j -f '%d/%m/%y %T' "$date1 $time1" '+%s')
stamp2=$(date -j -f '%d/%m/%y %T' "$date2 $time2" '+%s')
d=$((stamp2-stamp1))
printf '%s;%s;%s;%s;%02d:%02d:%02d\n' "$date1" "$time1" "$date2" "$time2" $(( (d/3600)%60)) $(( (d/60)%60 )) $((d%60))
done < dates.txt
Results:
12/11/15;20:04:09;13/11/15;08:46:26;12:42:17
13/11/15;20:05:34;14/11/15;08:42:04;12:36:30
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
Of course, if you're using a non-BSD OS, you may have to install bsddate (if it's available) to get this functionality, or figure out how to get something equivalent using the tools you have on hand.

How to calculate this difference in unix

I have a file named Test1.dat and its content's are as follows
Abcxxxxxxxxxxx_123.dat#10:10:15
Bcdxxxxxxxxxxx_145.dat#10:15:23
Cssxxxxxxxxxxx_567.dat#10:26:56
Fgsxxxxxxxxxxx_823.dat#10:46:56
Kssxxxxxxxxxxx_999.dat#11:15:23
Please note that after the # symbol it is the HH:MM:SS format that follows. My question now is I want to calculate the time difference between the current time and the time present in the files and fetch only those filenames where time difference is more than 30 Mins. So if current time is 11:00:00, I want to fetch files that have arrived 30 minutes before so basically the first three files.
This awk should do:
awk -F# '$2>=from && $2<=to' from="$(date +%H:%M:%S -d -30min)" to="$(date +%H:%M:%S)" file
If you only need to get the last 30 min (In you case you need the first one since you do not like 11:30):
awk -F# '$2>=from' from="$(date +%H:%M:%S -d -30min)" file
You can also use bash script and get the result
#!/bin/bash
current=`date '+%s'`
needed_time=`echo "$current - 60 * 30" | bc`
while read line ; do
time=`echo $line |sed -r 's/[^#]*.(.*)/\1/g'`
user_time=`date -d $time '+%s'`
if [ $user_time -le $needed_time ] ; then
echo "$line"
fi
done < file_name
Could be this the answer?
awk -F# '{ts = systime(); thirty_mins = 30 * 60; thirty_mins_ago = ts - thirty_mins; if ($2 < strftime("%H:%M:%S", thirty_mins_ago)) print $2 }' <file.txt
strftime is a GAWK (gnu awk) extention.

Resources