Format and compute time difference between dates (awk/sed) - bash

I am trying to compute time difference between dates formatted as below:
dd/mm/YY;hh:mm:ss;dd/mm/YY;hh:mm:ss (the first couple dd/mm/YY;hh:mm:ss points out the start date and the second couple is
the end date)
I want the output to be like this:
dd/mm/YY;hh:mm:ss;dd/mm/YY;hh:mm:ss;hh:mm:ss , where the added hh:mm:ss is the time difference between both dates.
Here is an example:
INPUT:
12/11/15;20:04:09;13/11/15;08:46:26
13/11/15;20:05:34;14/11/15;08:42:04
14/11/15;20:02:47;16/11/15;08:44:43
OUTPUT:
12/11/15;20:04:09;13/11/15;08:46:26;12:42:17
13/11/15;20:05:34;14/11/15;08:42:04;12:36:30
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
I've tried a lot of things with gsub, mktime and awk, in order to format dates, but nothing is efficient enough (too many operations to format and split).
Here is my attempt:
cat times.txt | awk -F';' '{gsub(/[/:]/," ",$0);d1=mktime("20"substr($1,7,2)" "substr($1,4,2)" "substr($1,1,2)" "$2);d2=mktime("20"substr($3,7,2)" "substr($3,4,2)" "substr($3,1,2)" "$4); print strftime("%H:%M:%S", d2-d1,1);}' > timestamps.txt
paste -d";" times.txt timestamps.txt
What do you suggest?
Thank you :)

You could try this and save some gsub and substr calls:
awk -F'[:;/]' '{d1=mktime("20"$3" "$2" "$1" "$4" "$5" "$6);
d2=mktime("20"$9" "$8" "$7" "$10" "$11" "$12);
delta = d2-d1
sec = delta%60
min = (delta - sec)%3600/60
hrs = int(delta/3600)
print $0";"(hrs < 10 ? "0"hrs : hrs)\
":"(min < 10 ? "0"min : min)\
":"(sec < 10 ? "0"sec : sec);}' time.txt
Since we cannot use strftime (tanks to Ed Morton), we have to handle the case that hours > 23 or hour/min/sec < 10 manually.
The above code outputs:
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
14/11/15;20:02:47;14/11/15;20:02:48;00:00:01
for the input
14/11/15;20:02:47;16/11/15;08:44:43
14/11/15;20:02:47;14/11/15;20:02:48

You cannot do this job robustly without mktime() as the time difference calculation needs to account for leap days, leap seconds, etc. I don't think you can do it any more efficiently than this:
$ cat tst.awk
BEGIN { FS="[/;:]" }
{
d1 = mktime("20"$3" "$2" "$1" "$4" "$5" "$6)
d2 = mktime("20"$9" "$8" "$7" "$10" "$11" "$12)
delta = d2 - d1
hrs = int(delta/3600)
min = int((delta - hrs*3600)/60)
sec = delta - (hrs*3600 + min*60)
printf "%s;%02d:%02d:%02d\n", $0, hrs, min, sec
}
$ awk -f tst.awk file
12/11/15;20:04:09;13/11/15;08:46:26;12:42:17
13/11/15;20:05:34;14/11/15;08:42:04;12:36:30
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
Note - you cannot use strftime() [alone] to calculate the hrs, mins, and secs because when your delta value is more than 1 day strftime() will return the hrs, mins, and secs associated with the time of day on the last day of that delta instead of the total number of hrs, mins, and secs associated with the entire delta.

What you're asking will be pretty tricky traditional awk.
Of course, gawk (GNU awk) supports mktime, but other awk implementations do not. But you can do this directly in bash, relying on the date command for your conversion. This solution uses BSD date (so it'll work in FreeBSD, NetBSD, OpenBSD, OSX, etc).
while IFS=\; read date1 time1 date2 time2; do
stamp1=$(date -j -f '%d/%m/%y %T' "$date1 $time1" '+%s')
stamp2=$(date -j -f '%d/%m/%y %T' "$date2 $time2" '+%s')
d=$((stamp2-stamp1))
printf '%s;%s;%s;%s;%02d:%02d:%02d\n' "$date1" "$time1" "$date2" "$time2" $(( (d/3600)%60)) $(( (d/60)%60 )) $((d%60))
done < dates.txt
Results:
12/11/15;20:04:09;13/11/15;08:46:26;12:42:17
13/11/15;20:05:34;14/11/15;08:42:04;12:36:30
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
Of course, if you're using a non-BSD OS, you may have to install bsddate (if it's available) to get this functionality, or figure out how to get something equivalent using the tools you have on hand.

Related

Awk exctracting column but just from one row

I have a script in the following form:
2017-12-11 10:20:16.993 ...
2017-12-12 10:19:16.993 ...
2017-12-13 10:17:16.993 ...
and I want to extract the first column via awk - F. , and compare it to actual system time in seconds and print the line if the difference is less than 300 seconds.
> SYSTEM_TIME=$(date +%s)
> awk -F. -v system_time=$SYSTEM_TIME '{gsub(/[-:]/," ",$1); if(system_time-mktime($1) <= 300) {print $0}}' log.txt
This is my code, but I can't use mktime because it's not in the POSIX norm. Can it be done without it?
Thanks,
Ahmed
General Remark: logfiles are often incomplete. A date-time format is given, but often the time-zone is missing. When daylight-saving comes into-play it can mess up your complete karma if you are missing your timezone.
Note: In all commands below, it will be assumed that the date in the logfile is in UTC and that the system runs in UTC. If this is not the case, be aware that daylight saving time will create problems when running any of the commands below arround the time daylight-saving kicks in.
Combination of date and awk: (not POSIX)
If your date command has the -d flag (not POSIX), you can run the following:
awk -v r="(date -d '300 seconds ago' '+%F %T.%3N)" '(r < $0)'
GNU awk only:
If you want to make use of mktime, it is then easier to just do:
awk 'BEGIN{s=systime();FS=OFS="."}
{t=$1;gsub(/[-:]/," ",t); t=mktime(t)}
(t-s < 300)' logfile
I will be under the assumption that the log-files are not created in the future, so all times are always smaller than system time.
POSIX:
If you cannot make use of mktime but want to use posix only, which also implies that date does not have the -d flag, you can create your own implementation of mktime. Be aware, that the version presented here does not do any timezone corrections as is done with mktime. mktime_posix assumes that the datestring is in UTC
awk -v s="$(date +%s)" '
# Algorithm from "Astronomical Algorithms" By J.Meeus
function mktime_posix(datestring, a,t) {
split(datestring,a," ")
if (a[1] < 1970) return -1
if (a[2] <= 2) { a[1]--; a[2]+=12 }
t=int(a[1]/100); t=2-t+int(t/4)
t=int(365.25*a[1]) + int(30.6001*(a[2]+1)) + a[3] + t - 719593
return t*86400 + a[4]*3600 + a[5]*60 + a[6]
}
BEGIN{FS=OFS="."}
{t=$1;gsub(/[-:]/," ",t); t=mktime_posix(t)}
(t-s <= 300)' logfile
Related: this answer, this answer
I can think in doing this as its shorter.
#!/bin/bash
SYSTEM_TIME=$(date +%s)
LOGTIME=$( date "+%s" -d "$( awk -F'.' '{print $1}' <( head -1 inputtime.txt ))" )
DIFFERENCEINSECONDS=$( echo "$SYSTEM_TIME $LOGTIME" | awk '{ print ($1 - $2)}' )
if [[ "$DIFFERENCEINSECONDS" -gt 300 ]]
then
echo "TRIGGERED!"
fi
Hope its useful for you. Let me know.
Note : I assumed your input log could be called inputtime.txt. You need to change for your actual filename of course.

Is it really slow to handle text file(more than 10K lines) with shell script?

I have a file with more than 10K lines of record.
Within each line, there are two date+time info. Below is an example:
"aaa bbb ccc 170915 200801 12;ddd e f; g; hh; 171020 122030 10; ii jj kk;"
I want to filter out the lines the days between these two dates is less than 30 days.
Below is my source code:
#!/bin/bash
filename="$1"
echo $filename
touch filterfile
totalline=`wc -l $filename | awk '{print $1}'`
i=0
j=0
echo $totalline lines
while read -r line
do
i=$[i+1]
if [ $i -gt $[j+9] ]; then
j=$i
echo $i
fi
shortline=`echo $line | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'`
date1=`echo $shortline | awk '{print $1}'`
date2=`echo $shortline | awk '{print $2}'`
if [ $date1 -gt 700000 ]
then
continue
fi
d1=`date -d $date1 +%s`
d2=`date -d $date2 +%s`
diffday=$[(d2-d1)/(24*3600)]
#diffdays=`date -d $date2 +%s` - `date -d $date1 +%s`)/(24*3600)
if [ $diffday -lt 30 ]
then
echo $line >> filterfile
fi
done < "$filename"
I am running it in cywin. It took about 10 second to handle 10 lines. I use echo $i to show the progress.
Is it because i am using some wrong way in my script?
This answer does not answer your question but gives an alternative method to your shell script. The answer to your question is given by Sundeep's comment :
Why is using a shell loop to process text considered bad practice?
Furthermore, you should be aware that everytime you call sed, awk, echo, date, ... you are requesting the system to execute a binary which needs to be loaded into memory etc etc. So if you do this in a loop, it is very inefficient.
alternative solution
awk programs are commonly used to process log files containing timestamp information, indicating when a particular log record was written. gawk extended the awk standard with time-handling functions. The one you are interested in is :
mktime(datespec [, utc-flag ]) Turn datespec into a timestamp in the
same form as is returned by systime(). It is similar to the function
of the same name in ISO C. The argument, datespec, is a string of the
form "YYYY MM DD HH MM SS [DST]". The string consists of six or seven
numbers representing, respectively, the full year including century,
the month from 1 to 12, the day of the month from 1 to 31, the hour of
the day from 0 to 23, the minute from 0 to 59, the second from 0 to
60, and an optional daylight-savings flag.
The values of these numbers need not be within the ranges specified;
for example, an hour of -1 means 1 hour before midnight. The
origin-zero Gregorian calendar is assumed, with year 0 preceding year
1 and year -1 preceding year 0. If utc-flag is present and is either
nonzero or non-null, the time is assumed to be in the UTC time zone;
otherwise, the time is assumed to be in the local time zone. If the
DST daylight-savings flag is positive, the time is assumed to be
daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), mktime() attempts to determine
whether daylight savings time is in effect for the specified time.
If datespec does not contain enough elements or if the resulting time
is out of range, mktime() returns -1.
As your date format is of the form yymmdd HHMMSS we need to write a parser function convertTime for this. Be aware in this function we will pass times of the form yymmddHHMMSS. Furthermore, using a space delimited fields, your times are located in field $4$5 and $11$12. As mktime converts the time to seconds since 1970-01-01 onwards, all we need to do is to check if the delta time is smaller than 30*24*3600 seconds.

awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ t1=convertTime($4$5); t2=convertTime($11$12)}
(t2-t1 < 30*3600*24) { print }' <file>
If you are not interested in the real delta time (your sed line removes the actual time of the day), than you can adopt it to :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s "00 00 00"
return mktime(s)
}
{ t1=convertTime($4); t2=convertTime($11)}
(t2-t1 < 30*3600*24) { print }' <file>
If the dates are not in the fields, you can use match to find them :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ match($0,/[0-9]{6} [0-9]{6}/);
t1=convertTime(substr($0,RSTART,RLENGTH));
a=substr($0,RSTART+RLENGTH)
match(a,/[0-9]{6} [0-9]{6}/)
t2=convertTime(substr(a,RSTART,RLENGTH))}
(t2-t1 < 30*3600*24) { print }' <file>
With some modifications, often without speed in mind, I can reduce the processing time by 50% - which is a lot:
#!/bin/bash
filename="$1"
echo "$filename"
# touch filterfile
totalline=$(wc -l < "$filename")
i=0
j=0
echo "$totalline" lines
while read -r line
do
i=$((i+1))
if (( i > ((j+9)) )); then
j=$i
echo $i
fi
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
if (( date1 > 700000 ))
then
continue
fi
d1=$(date -d "$date1" +%s)
d2=$(date -d "$date2" +%s)
diffday=$(((d2-d1)/(24*3600)))
# diffdays=$(date -d $date2 +%s) - $(date -d $date1 +%s))/(24*3600)
if (( diffday < 30 ))
then
echo "$line" >> filterfile
fi
done < "$filename"
Some remarks:
# touch filterfile
Well - the later CMD >> filterfile overwrites this file and creates one, if it doesn't exist.
totalline=$(wc -l < "$filename")
You don't need awk, here. The filename output is surpressed if wc doesn't see the filename.
Capturing the output in an array:
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
allows us array access and saves another call to awk.
On my machine, your code took about 42s for 2880 lines (on your machine 2880 s?) and about 19s for the same file with my code.
So I suspect, if you aren't running it on an i486-machine, that cygwin might be a slowdown. It's a linux environment for windows, isn't it? Well, I'm on a core Linux system. Maybe you try the gnu-utils for Windows - the last time I looked for them, they were advertised as gnu-utils x32 or something, maybe there is an a64-version available by now.
And the next thing I would have a look at, is the date calculation - that might be a slowdown too.
2880 lines isn't that much, so I don't suspect that my SDD drive plays a huge role in the game.

Add 30 Mins Time to DateTime format YYYY-MM-DD hh:mm:ss in AIX 5.0

I'm running AIX with coreutils 5.0. I need to advance an arbitrary date (or time) as given conformative to ISO-8601 format YYYY-MM-DD hh:mm:ss.
For example:
Value of D1 is: 2017-07-08 19:20:01, and I need to add 30 minutes.
In a modern UNIX-system I could probably write something like
date -d "$D1 + 30 minutes" +'%H:%M'
but, alas, I need it to work on an old AIX.
Try
$ date -d "$(date -d "$D1") + 30 minutes" +'%H:%M'
This works in bash, but not in ksh.
The inner call to date will parse D1 to a date, and present it in date's "native" format.
$ date -d "$D1"
Sat Jul 8 19:20:01 CEST 2017
This output will be used with + 30 minutes to create the date that you want, with the outer call to date.
The inner call to date will be expanded so that
$ date -d "$(date -d "$D1") + 30 minutes" +'%H:%M'
will be equivalent to
$ date -d "Sat Jul 8 19:20:01 CEST 2017 + 30 minutes" +'%H:%M'
which will be
19:50
date -d #$(($(date -d "$D1" +%s) + 30 * 60)) +%H:%M
$(date -d "$D1" +%s) echoes the epoch
$((epoch + value)) calculates the wanted time
date -d#epoch +fmt formats it
If you are running AIX from 2003 you are in dire straits, my friend, but if you only need the time, not the full date, as your question implies, I think #RamanSailopal got us half way there.
echo $D1 | awk -F "[: ]" '{
m = $3+30;
h = ($2+int(m/60)) % 24;
printf("%02i:%02i\n", h, m%60)
}'
awk splits the input in different fields, with the splitter pattern given in the -F argument. The pattern denotes : or space .
The input will be split in
$1 = 2017-07-08
$2 = 19
$3 = 20
$4 = 01
Then the script calculates a fake minute value (that can be more than or equal to 60) and stores it in m. From that value it calculates the hour, modulo 24, and the actual minutes, m modulo 60.
This could fail if you hit a leap second, so if you need second precision at all times, you should use some other method.
Awk solution:
awk -F '[-: ]' '{
ram=(mktime($1" "$2" "$3" "$4" "$5" "$6)+(30*60));
print strftime("%Y-%m-%d %T",ram)
}' <<< "$D1"
Convert the date to a date string using awk's mktime function. Add 30 minutes (30*60) and then convert back to a date string with the required format using strftime.

Bash - convert time interval string to nr. of seconds

I'm trying to convert strings, describing a time interval, to the corresponding number of seconds.
After some experimenting I figured out that I can use date like this:
soon=$(date -d '5 minutes 10 seconds' +%s); now=$(date +%s)
echo $(( $soon-$now ))
but I think there should be an easier way to convert strings like "5 minutes 10 seconds" to the corresponding number of seconds, in this example 310. Is there a way to do this in one command?
Note: although portability would be useful, it isn't my top priority.
You could start at epoch
date -d"1970-01-01 00:00:00 UTC 5 minutes 10 seconds" "+%s"
310
You could also easily sub in times
Time="1 day"
date -d"1970-01-01 00:00:00 UTC $Time" "+%s"
86400
There is one way to do it, without using date command in pure bash (for portability)
Assuming you just have an input string to convert "5 minutes 10 seconds" in a bash variable with a : de-limiter as below.
$ convertString="00:05:10"
$ IFS=: read -r hour minute second <<< "$convertString"
$ secondsValue=$(((hour * 60 + minute) * 60 + second))
$ printf "%s\n" "$secondsValue"
310
You can run the above commands directly on the command-line without the $ mark.
This will do (add the epoch 19700101):
$ date -ud '19700101 5 minutes 10 seconds' +%s
310
It is important to add a -u to avoid local time (and DST) effects.
$ TZ=America/Los_Angeles date -d '19700101 5 minutes 10 seconds' +%s
29110
Note that date could do some math:
$ date -ud '19700101 +5 minutes 10 seconds -47 seconds -1 min' +%s
203
The previous suggestions didn't work properly on alpine linux, so here's a small helper function that is POSIX compliant, is easy to use and also supports calculations (just as a side effect of the implementation).
The function always returns an integer based on the provided parameters.
$ durationToSeconds '<value>' '<fallback>'
$ durationToSeconds "1h 30m"
5400
$ durationToSeconds "$someemptyvar" 1h
3600
$ durationToSeconds "$someemptyvar" "1h 30m"
5400
# Calculations also work
$ durationToSeconds "1h * 3"
10800
$ durationToSeconds "1h - 1h"
0
# And also supports long forms for year, day, hour, minute, second
$ durationToSeconds "3 days 1 hour"
262800
# It's also case insensitive
$ durationToSeconds "3 Days"
259200
function durationToSeconds () {
set -f
normalize () { echo $1 | tr '[:upper:]' '[:lower:]' | tr -d "\"\\\'" | sed 's/years\{0,1\}/y/g; s/months\{0,1\}/m/g; s/days\{0,1\}/d/g; s/hours\{0,1\}/h/g; s/minutes\{0,1\}/m/g; s/min/m/g; s/seconds\{0,1\}/s/g; s/sec/s/g; s/ //g;'; }
local value=$(normalize "$1")
local fallback=$(normalize "$2")
echo $value | grep -v '^[-+*/0-9ydhms]\{0,30\}$' > /dev/null 2>&1
if [ $? -eq 0 ]
then
>&2 echo Invalid duration pattern \"$value\"
else
if [ "$value" = "" ]; then
[ "$fallback" != "" ] && durationToSeconds "$fallback"
else
sedtmpl () { echo "s/\([0-9]\+\)$1/(0\1 * $2)/g;"; }
local template="$(sedtmpl '\( \|$\)' 1) $(sedtmpl y '365 * 86400') $(sedtmpl d 86400) $(sedtmpl h 3600) $(sedtmpl m 60) $(sedtmpl s 1) s/) *(/) + (/g;"
echo $value | sed "$template" | bc
fi
fi
set +f
}
Edit : Yes. I developed for OP after comment and checked on Mac OS X, CentOS and Ubuntu. One liner, POSIX compliant command for converting "X minutes Y seconds" format to seconds. That was the question.
echo $(($(echo "5 minutes 10 seconds" | cut -c1-2)*60 + $(echo "5 minutes 10 seconds" | cut -c1-12 | awk '{print substr($0,11)}')))
OP told me via comment that he wants for "X minutes Y seconds" format not for HH:MM:SS format. The command with date and "+%s" is throwing error on (my) Mac. OP wanted to grab the numerical values from "X minutes Y seconds" format and convert it to seconds. First I extracted the minute in digit (take it as equation A) :
echo "5 minutes 10 seconds" | cut -c1-2)
then I extracted the seconds part (take it as equation B) :
echo "5 minutes 10 seconds" | cut -c1-12 | awk '{print substr($0,11)}'
Now multiply minute by 60 then add with the other :
echo $((equation A)*60) + (equation B))
OP should ask the others to check my developmental version (but working) of command before using it for automatic repeated usage like we do with cron on a production server.
If we want to run this on a log file with values in "X minutes Y seconds" format, we have to change echo "5 minutes 10 seconds" to cat file | ... like command. I kept a gist of it too if I or others ever need we can use it with cat to run on server log files with x minutes y seconds like log format.
Although off-topic (what I understood, question has not much to do with current time), this is not working for POSIX-compliant OS to get current time in seconds :
date -d "1970-01-01 00:00:00 UTC 5 minutes 10 seconds" "+%s"
It will throw error on MacOS X but work on most GNU/Linux distro. That +%s part will throw error on POSIX-compliant OS upon complicated usage. These commands are mostly suitable to get current time in seconds on POSIX compliant to any kind of unix like OS :
awk 'BEGIN{srand(); print srand()}'
perl -le 'print time'
If OP needs can extend it by generating current time in seconds and subtract. I hope it will help.
---- OLD Answer before EDIT ----
You can get the current time without that date -- echo | awk '{print systime();}' or wget -qO- http://www.timeapi.org/utc/now?\\s. Other way to convert time to second is echo "00:20:40.25" | awk -F: '{ print ($1 * 3600) + ($2 * 60) + $3 }'.
The example with printf shown in another answer is near perfect.
That thing you want is always needed by the basic utilities of GNU/Linux - gnu.org/../../../../../Setting-an-Alarm.html
Way to approach really depends how much foolproof way you need.

Calculate time (h:m:s or m:s) difference in bash

I want to display the duration of a command in my script. Depeding the duration I want somethng like this: 12 minutes 04 seconds or 01 hours 15 minutes 02 seconds. Always with a leading zero.
I tried different things I found here, but didn't get the result.
BTW: that's my first try in the bash.
#!/bin/bash
DATE1=$(date +%s)
# ... your commands
DATE2=$(date +%s)
DIFF=$(awk -vdate1=$DATE1 -vdate2=$DATE2 'BEGIN{print strftime("%H hour %M minutes %S seconds",date2-date1,1)}')
echo $DIFF
Time is converted to seconds and stored in variables DATE1 and DATE2
pre-condition DATE2 > DATE1
DATE1=$(date +%s)
# ... your commands
DATE2=$(date +%s)
strftime is used to get time-diff in seconds and formatted
1 is passed as 3rd argument as UTC Flag
strftime("%H hour %M minutes %S seconds",date2-date1,1)
You could use the time executable (not the bash builtin because the external program features the format option -f and delivers the time as [hours:]minutes:seconds and I am not in the mood right now to wait an hour to find out how the builtin shows us the hours :) ) and awk like this (using the example sleep 2):
/usr/bin/time -f "%E" sleep 2 2> >(
awk -F : '
{ if(FN>2) printf("%02d hours %02d minutes %02d seconds\n", $1, $2, $3)
else printf("%02d minutes %02d seconds\n", $1, $2)
}'
)
Here we use /usr/bin/time with its -f option. Then we pipe the output of stderr into awk splitting the string at :. (time writes to stderr, thus we need the 2> to redirect stderr into the >( awk ... ) filter.
The awk filter decides on the number of fields in NF what printf statement it is using.

Resources