Sum durations in bash - bash

I am getting execution time of various processes in a file from their respective log files. The file with execution time looks similar to following (it may have hundreds of entries)
1:00:01.11
2:2.20
1.02
The first line is hours:minutes:seconds, the second line is minutes:seconds and, the third is just seconds.
I want to sum all entries to come to a total execution time. How can I achieve this in bash? If not bash, then can you provide me some examples from other scripting language to sum timestamps?

To complement Matt Jacob's elegant perl solution with a (POSIX-compliant) awk solution:
awk -F: '{ n=0; for(i=NF; i>=1; --i) secs += $i * 60 ^ n++ } END { print secs }' file
With the sample input, this outputs (the sum of all time spans in seconds):
3724.33
See the section below for how to format this value as a time span, similar to the input (01:02:04.33).
Explanation:
-F: splits the input lines into fields by :, so that the resulting fields ($1, $2, ...) represent the hour, minute, and seconds components individually.
n=0; for(i=NF; i>=1; --i) secs += $i * 60 ^ n++ enumerates the fields in reverse order (first seconds, then minutes, then hours, if defined; NF is the number of fields) and multiplies each field with the appropriate multiple of 60 to yield an overall value in seconds, stored in variable secs, cumulatively across lines.
END { print secs } is executed after all lines have been processed and simply prints the cumulative value in seconds.
Formatting the output as a time span:
Custom output formatting must be used:
awk -F: '
{ n=0; for(i=NF; i>=1; --i) secs += $i * 60 ^ n++ }
END {
hours = int(secs / 3600)
minutes = int((secs - hours * 3600) / 60)
secs = secs % 60
printf "%02d:%02d:%05.2f\n", hours, minutes, secs
}
' file
The above yields (the equivalent of 3724.33 seconds):
01:02:04.33
The END { ... } block splits the total number of seconds accumulated in secs back into hours, minutes, and seconds, and outputs the result with appropriate formatting of the components using printf.
The reason that utilities such as date and GNU awk's (nonstandard) date-formatting functions cannot be used to format the output is twofold:
The standard time format specifier %H wraps around at 24 hours, so if the cumulative time span exceeds 24 hours, the output would be incorrect.
Fractional seconds would be lost (the granularity of Unix time stamps is whole seconds).

The mostly-readable full perl script:
use strict;
use warnings;
my $seconds = 0;
while (<DATA>) {
my #fields = reverse(split(/:/));
for my $i (0 .. $#fields) {
$seconds += $fields[$i] * 60 ** $i;
}
}
print "$seconds\n";
__DATA__
1:00:01.11
2:2.20
1.02
Or, the barely-readable one-liner version:
$ perl -F: -wane '#F = reverse(#F); $seconds += $F[$_] * 60 ** $_ for 0 .. $#F; END { print "$seconds\n" }' times.log
Output:
3724.33
In both cases, we're splitting each line on the H:M:S separator : and then reversing the array so that we can process from right-to-left. To get the total time in seconds, we can rely on a neat trick where we multiply each field by powers of 60.
If you want the result in H:M:S format instead of raw seconds, strftime() from the POSIX core module makes it easy:
use POSIX qw(strftime);
print strftime('%H:%M:%S', gmtime($seconds)), "\n";
Output:
01:02:04

Related

How to compare a field of a file with current timestamp and print the greater and lesser data?

How do I compare current timestamp and a field of a file and print the matched and unmatched data. I have 2 columns in a file (see below)
oac.bat 09:09
klm.txt 9:00
I want to compare the timestamp(2nd column) with current time say suppose(10:00) and print the output as follows.
At 10:00
greater.txt
xyz.txt 10:32
mnp.csv 23:54
Lesser.txt
oac.bat 09:09
klm.txt 9:00
Could anyone help me on this please ?
I used awk $0 > "10:00", which gives me only 2nd column details but I want both the column details and I am taking timestamp from system directly from system with a variable like
d=`date +%H:%M`
With GNU awk you can just use it's builtin time functions:
awk 'BEGIN{now = strftime("%H:%M")} {
split($NF,t,/:/)
cur=sprintf("%02d:%02d",t[1],t[2])
print > ((cur > now ? "greater" : "lesser") ".txt")
}' file
With other awks just set now using -v and date up front, e.g.:
awk -v now="$(date +"%H:%M")" '{
split($NF,t,/:/)
cur = sprintf("%02d:%02d",t[1],t[2])
print > ((cur > now ? "greater" : "lesser") ".txt")
}' file
The above is untested since you didn't provide input/output we could test against.
Pure Bash
The script can be implemented in pure Bash with the help of date command:
# Current Unix timestamp
let cmp_seconds=$(date +%s)
# Read file line by line
while IFS= read -r line; do
let line_seconds=$(date -d "${line##* }" +%s) || continue
(( line_seconds <= cmp_seconds )) && \
outfile=lesser || outfile=greater
# Append the line to the file chosen above
printf "%s\n" "$line" >> "${outfile}.txt"
done < file
In this script, ${line##* } removes the longest match of '* ' (any character followed by a space) pattern from the front of $line thus fetching the last column (the time). The time column is supposed to be in one of the following formats: HH:MM, or H:MM. Actually, date's -d option argument
can be in almost any common format. It can contain month names, time zones, ‘am’ and ‘pm’, ‘yesterday’, etc.
We use the flexibility of this option to convert the time (HH:MM, or H:MM) to Unix timestamp.
The let builtin allows arithmetic to be performed on shell variables. If the last let expression fails, or evaluates to zero, let returns 1 (error code), otherwise 0 (success). Thus, if for some reason the time column is in invalid format, the iteration for such line will be skipped with the help of continue.
Perl
Here is a Perl version I have written just for fun. You may use it instead of the Bash version, if you like.
# For current date
#cmp_seconds=$(date +%s)
# For specific hours and minutes
cmp_seconds=$(date -d '10:05' +%s)
perl -e '
my #t = localtime('$cmp_seconds');
my $minutes = $t[2] * 60 + $t[1];
while (<>) {
/ (\d?\d):(\d\d)$/ or next;
my $fh = ($1 * 60 + $2) > $minutes ? STDOUT : STDERR;
printf $fh "%s", $_;
}' < file >greater.txt 2>lesser.txt
The script computes the number of minutes in the following way:
HH:MM = HH * 60 + MM minutes
If the number of minutes from the file are greater then the number of minutes for the current time, it prints the next line to the standard output, otherwise to standard error. Finally, the standard output is redirected to greater.txt, and the standard error is redirected to lesser.txt.
I have written this script for demonstration of another approach (algorithm), which can be implemented in different languages, including Bash.

Format and compute time difference between dates (awk/sed)

I am trying to compute time difference between dates formatted as below:
dd/mm/YY;hh:mm:ss;dd/mm/YY;hh:mm:ss (the first couple dd/mm/YY;hh:mm:ss points out the start date and the second couple is
the end date)
I want the output to be like this:
dd/mm/YY;hh:mm:ss;dd/mm/YY;hh:mm:ss;hh:mm:ss , where the added hh:mm:ss is the time difference between both dates.
Here is an example:
INPUT:
12/11/15;20:04:09;13/11/15;08:46:26
13/11/15;20:05:34;14/11/15;08:42:04
14/11/15;20:02:47;16/11/15;08:44:43
OUTPUT:
12/11/15;20:04:09;13/11/15;08:46:26;12:42:17
13/11/15;20:05:34;14/11/15;08:42:04;12:36:30
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
I've tried a lot of things with gsub, mktime and awk, in order to format dates, but nothing is efficient enough (too many operations to format and split).
Here is my attempt:
cat times.txt | awk -F';' '{gsub(/[/:]/," ",$0);d1=mktime("20"substr($1,7,2)" "substr($1,4,2)" "substr($1,1,2)" "$2);d2=mktime("20"substr($3,7,2)" "substr($3,4,2)" "substr($3,1,2)" "$4); print strftime("%H:%M:%S", d2-d1,1);}' > timestamps.txt
paste -d";" times.txt timestamps.txt
What do you suggest?
Thank you :)
You could try this and save some gsub and substr calls:
awk -F'[:;/]' '{d1=mktime("20"$3" "$2" "$1" "$4" "$5" "$6);
d2=mktime("20"$9" "$8" "$7" "$10" "$11" "$12);
delta = d2-d1
sec = delta%60
min = (delta - sec)%3600/60
hrs = int(delta/3600)
print $0";"(hrs < 10 ? "0"hrs : hrs)\
":"(min < 10 ? "0"min : min)\
":"(sec < 10 ? "0"sec : sec);}' time.txt
Since we cannot use strftime (tanks to Ed Morton), we have to handle the case that hours > 23 or hour/min/sec < 10 manually.
The above code outputs:
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
14/11/15;20:02:47;14/11/15;20:02:48;00:00:01
for the input
14/11/15;20:02:47;16/11/15;08:44:43
14/11/15;20:02:47;14/11/15;20:02:48
You cannot do this job robustly without mktime() as the time difference calculation needs to account for leap days, leap seconds, etc. I don't think you can do it any more efficiently than this:
$ cat tst.awk
BEGIN { FS="[/;:]" }
{
d1 = mktime("20"$3" "$2" "$1" "$4" "$5" "$6)
d2 = mktime("20"$9" "$8" "$7" "$10" "$11" "$12)
delta = d2 - d1
hrs = int(delta/3600)
min = int((delta - hrs*3600)/60)
sec = delta - (hrs*3600 + min*60)
printf "%s;%02d:%02d:%02d\n", $0, hrs, min, sec
}
$ awk -f tst.awk file
12/11/15;20:04:09;13/11/15;08:46:26;12:42:17
13/11/15;20:05:34;14/11/15;08:42:04;12:36:30
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
Note - you cannot use strftime() [alone] to calculate the hrs, mins, and secs because when your delta value is more than 1 day strftime() will return the hrs, mins, and secs associated with the time of day on the last day of that delta instead of the total number of hrs, mins, and secs associated with the entire delta.
What you're asking will be pretty tricky traditional awk.
Of course, gawk (GNU awk) supports mktime, but other awk implementations do not. But you can do this directly in bash, relying on the date command for your conversion. This solution uses BSD date (so it'll work in FreeBSD, NetBSD, OpenBSD, OSX, etc).
while IFS=\; read date1 time1 date2 time2; do
stamp1=$(date -j -f '%d/%m/%y %T' "$date1 $time1" '+%s')
stamp2=$(date -j -f '%d/%m/%y %T' "$date2 $time2" '+%s')
d=$((stamp2-stamp1))
printf '%s;%s;%s;%s;%02d:%02d:%02d\n' "$date1" "$time1" "$date2" "$time2" $(( (d/3600)%60)) $(( (d/60)%60 )) $((d%60))
done < dates.txt
Results:
12/11/15;20:04:09;13/11/15;08:46:26;12:42:17
13/11/15;20:05:34;14/11/15;08:42:04;12:36:30
14/11/15;20:02:47;16/11/15;08:44:43;36:41:56
Of course, if you're using a non-BSD OS, you may have to install bsddate (if it's available) to get this functionality, or figure out how to get something equivalent using the tools you have on hand.

Why does awk skip the second field in first entry?

I have a manually created log file of the format
date start duration description
2/5 10:00p 1:45 Did this and that.
2/6 2:00a 0:20 Woke up from my slumber.
==============================================
2:05 TOTAL time spent
There are many entries in the log. To avoid manually recomputing total time every time an entry is added, I wrote the following script:
#!/bin/bash
file=`ls | grep log`
head -n -1 $file | egrep -o [0-9]:[0-9]{2}[^ap] \
| awk '{ FS = ":" ; SUM += 60*$1 ; SUM += $2 } END { print SUM }'
First, the script assumes there is exactly one file with log in its name, and that's the file I'm after. Second, it takes all lines other than the line with the current total, greps the time information from the line, and feeds it to awk, which converts it to minutes.
This is where I run into problems. The final sum would always be slightly off. Through trial and error, I discovered that awk will never count the second field of the very first record, e.g. the 45 minutes in this case. It will count the hour; it won't count the minutes. It has no such problem with the other records, but it's always off by the minutes in the first record.
What could be causing this behavior? How do I debug it?
You set FS in the loop and it's already too late for the first line.
The right way to do is :
echo -e "1:45\n0:20" | awk 'BEGIN { FS=":" } { SUM += 60*$1 + $2 } END { print SUM }'
You did not show us, that how you expect output
Whether like this ?
$ cat log
date start duration description
2/5 10:00p 1:45 Did this and that.
2/6 2:00a 0:20 Woke up from my slumber.
==============================================
2:05 TOTAL time spent
Awk Code
awk '$3~/([[:digit:]]):([[:digit:]])/ && !/TOTAL/{
split($3,A,":")
sum+=A[1]*60+A[2]
}
END{
print "Total",sum,"Minutes"
}' log
Resulting
Total 125 Minutes

How to figure out elapsed time in days, hours and minutes with bash? (and print it nicely)

I want to find out how many days, hours and minutes have passed since a certain time using the shell (bash). awk seems like the right tool, because I need to calculate, format and print to the command line.
Expected Output:
"Elapsed Time: "
"Days: 3123"
"Hours: 12"
"Minutes: 23"
Algorithm idea:
time_now = get_time()
time_then = some_constant
diff = time_now - time_then # this is all in seconds
days = round( diff / 86400 ) # to nearest floor integer
print( days )
diff -= diff - (days*86400)
hours = round( diff/3600 )
print (hours)
diff = diff - (hours*3600)
minutes = round( diff/60 )
print( minutes )
How can I do this in awk? I came up with this:
date +%s | awk '{time_then = 815002800; diff = $1-time_then}; {print (diff/86400)}' | sed 's/\.[1-9]*//' | awk '{print "Days: " $0 }'
The sed removes numbers after the decimal. Always rounds the number down (so its an integer).
So how can I jam the hours and minutes in there? It feels like there must be a better way. Maybe I am using the wrong tool?
You algorithm can be directly expressed in awk:
date +%s | awk '{
time_now = 815002800
time_then = some_constant
diff = time_now - time_then # this is all in seconds
days = int(diff / 86400) # to nearest floor integer
print days
diff -= days * 86400
hours = int(diff/3600)
print hours
diff -= hours*3600
minutes = int(diff/60)
print minutes
}'
```
will print
9432
21
40

Comparing Two Timestamps Within a Second

I am writing a shell script that parses a CSV file and performs some calculations.
The timestamps are in the form: HH:MM:SSS.sss and stored in variables: $t2 and $t1.
I would like to know the difference between the two stamps (it will always be less than one second) and report this as $t3 in seconds (ie: 0.020)
t3=$t2-$t1
But the above code is just printing the two variable with a minus sign between - how do I compare the two timestamps?
Here's a funky way to do it! Strip off all the whole seconds to get the milliseconds. Do the subtraction. If result has gone negative it's because the seconds overflowed, so add back in 1000ms. Slot a decimal point on the front to make seconds from milliseconds.
#!/bin/bash -xv
t1="00:00:02.001"
t2="00:00:03.081"
ms1=${t1/*\./}
ms2=${t2/*\./}
t3=$((10#$ms2-10#$ms1))
[[ $t3 < 0 ]] && t3=$((t3+1000))
t3=$(echo "scale=3; $t3/1000"|bc)
echo $t3
You can use awk maths to compute this difference in 2 timestamps after converting both timestamps to their milli-second value:
t1=04:13:32.234
t2=04:13:32.258
awk -F '[ :.]+' '{
t1=($1*60*60 + $2*60 + $3)*1000 + $4
t2=($5*60*60 + $6*60 + $7)*1000 + $8
print (t2-t1)/60}' <<< "$t1 $t2"
0.4
Formula used for conversion:
timestamp value (ms) = (hour * 60 * 60 + minute * 60 + second ) * 1000 + milli-second

Resources