Unix converting time format to integer value - bash

I have the following text file.
Account1,2h 01m 00s
Account2,4h 25m 23s
Account3,5h 43m 59s
I wish to add the values of hours, minutes and seconds in order to total them to their respective minute totals.
Account1 minute total = 121
Account2 minute total = 265
Account3 minute total = 343
I have the following bash file
cat data.txt | cut -f2 -d','
This isolates the time values; however, from here I don't know what steps I would take to isolate the time, convert it to integers and then convert it to minutes. I have tried using a PARAM but to no avail.

If awk is an option, you can try this
awk -F"[, ]" '{h=60; m=1; s=0.01666667}{split($2,a,/h/); split($3,b,/m/); split($4,c,/s/); print$1, "minute total = " int(a[1] * h + b[1] * m + c[1] * s)}' input_file
$ cat awk.script
BEGIN {
FS=",| "
} {
h=60
m=1
s=0.01666667
}{
split($2,a,/h/)
split($3,b,/m/)
split($4,c,/s/)
print $1, "minute total = " int(a[1] * h + b[1] * m + c[1] * s)
}
Output
awk -f awk.script input_file
Account1 minute total = 121
Account2 minute total = 265
Account3 minute total = 343

Related

sum by year and insert missing entries with 0

I have a report for year-month entries like below
201703 5
201708 10
201709 20
201710 40
201711 80
201712 100
201802 0
201803 25
201804 50
201805 50
201806 150
201807 300
201808 200
201902 10
I need to sum the year-month entries by year and print after all the months for that particular year. The year-month can have missing entries for any month(s).
For those months the a dummy value (0) should be inserted.
Required output:
201703 5
201704 0
201705 0
201706 0
201707 0
201708 10
201709 20
201710 40
201711 80
201712 100
2017 255
201801 0
201802 0
201803 25
201804 50
201805 50
201806 150
201807 300
201808 200
201809 0
201810 0
201811 0
201812 0
2018 775
201901 0
201902 10
201903 0
2019 10
I can get the summary of year by using below command.
awk ' { c=substr($1,0,4); if(c!=p) { print p,s ;s=0} s=s+$2 ; p=c ; print } ' ym.dat
But, how to insert entries for the missing ones?.
Also the last entry should not exceed current (system time) year-month. i.e for this specific example, dummy values should not be inserted for 201904..201905.. etc. It should just stop with 201903
You may use this awk script mmyy.awk:
{
rec[$1] = $2;
yy=substr($1, 1, 4)
mm=substr($1, 5, 2) + 0
ys[yy] += $2
}
NR == 1 {
fm = mm
fy = yy
}
END {
for (y=fy; y<=cy; y++)
for (m=1; m<=12; m++) {
# print previous years sums
if (m == 1 && y-1 in ys)
print y-1, ys[y-1]
if (y == fy && m < fm)
continue;
else if (y == cy && m > cm)
break;
# print year month with values or 0 if entry is missing
k = sprintf("%d%02d", y, m)
printf "%d%02d %d\n", y, m, (k in rec ? rec[k] : 0)
}
print y-1, ys[y-1]
}
Then call it as:
awk -v cy=$(date '+%Y') -v cm=$(date '+%m') -f mmyy.awk file
201703 5
201704 0
201705 0
201706 0
201707 0
201708 10
201709 20
201710 40
201711 80
201712 100
2017 255
201801 0
201802 0
201803 25
201804 50
201805 50
201806 150
201807 300
201808 200
201809 0
201810 0
201811 0
201812 0
2018 775
201901 0
201902 10
201903 0
2019 10
With GNU awk for strftime():
$ cat tst.awk
NR==1 {
begDate = $1
endDate = strftime("%Y%m")
}
{
val[$1] = $NF
year = substr($1,1,4)
}
year != prevYear { prt(); prevYear=year }
END { prt() }
function prt( mth, sum, date) {
if (prevYear != "") {
for (mth=1; mth<=12; mth++) {
date = sprintf("%04d%02d", prevYear, mth)
if ( (date >= begDate) && (date <=endDate) ) {
print date, val[date]+0
sum += val[date]
delete val[date]
}
}
print prevYear, sum+0
}
}
.
$ awk -f tst.awk file
201703 5
201704 0
201705 0
201706 0
201707 0
201708 10
201709 20
201710 40
201711 80
201712 100
2017 255
201801 0
201802 0
201803 25
201804 50
201805 50
201806 150
201807 300
201808 200
201809 0
201810 0
201811 0
201812 0
2018 775
201901 0
201902 10
201903 0
2019 10
With other awks you'd just pass in endDate using awk -v endDate=$(date +'%Y%m') '...'
Perl to the rescue!
perl -lane '$start ||= $F[0];
$Y{substr $F[0], 0, 4} += $F[1];
$YM{$F[0]} = $F[1];
END { for $y (sort keys %Y) {
for $m (1 .. 12) {
$m = sprintf "%02d", $m;
next if "$y$m" lt $start;
print "$y$m ", $YM{$y . $m} || 0;
last if $y == 1900 + (localtime)[5]
&& (localtime)[4] < $m;
}
print "$y ", $Y{$y} || 0;
}
}' -- file
-n reads the input line by line
-l removes newlines from input and adds them to output
-a splits each line on whitespace into the #F array
substr extracts the year from the YYYYMM date. Hashes %Y and %YM use dates and keys and the counts as values. That's why the year hash uses += which adds the value to the already accumulated one.
The END block is evaluated after the input has been exhausted.
It just iterates over the years stored in the hash, the range 1 .. 12 is used for month to insert the zeroes (the || operator prints it).
next and $start skips the months before the start of the report.
last is responsible for skipping the rest of the current year.
The following awk script will do what you expect. The idea is:
store data in an array
print and sum only when the year changes
This gives:
# function that prints the year starting
# at month m1 and ending at m2
function print_year(m1,m2, s,str) {
s=0
for(i=(m1+0); i<=(m2+0); ++i) {
str=y sprintf("%0.2d",i);
print str, a[str]+0; s+=a[str]
}
print y,s
}
# This works for GNU awk, replace for posix with a call as
# awk -v stime=$(date "+%Y%m") -f script.awk file
BEGIN{ stime=strftime("%Y%m") }
# initializer on first record
(NR==1){ y=substr($1,1,4); m1=substr($1,5) }
# print intermediate year
(substr($1,1,4) != y) {
print_year(m1,12)
y=substr($1,1,4); m1="01";
delete a
}
# set array value and keep track of last month
{a[$1]=$2; m2=substr($1,5)}
# check if entry is still valid (past stime or not)
($1 > stime) { exit }
# print all missing years full
# print last year upto system time month
END {
for (;y<substr(stime,1,4)+0;y++) { print_year(m1,12); m1=1; m2=12; }
print_year(m1,substr(stime,5))
}
Nice question, btw. Friday afternoon brain frier. Time to head home.
In awk. The optional endtime and its value are brought in as arguments:
$ awk -v arg1=201904 -v arg2=100 ' # optional parameters
function foo(ym,v) {
while(p<ym){
y=substr(p,1,4) # get year from previous round
m=substr(p,5,2)+0 # get month
p=y+(m==12) sprintf("%02d",m%12+1) # December magic
if(m==12)
print y,s[y] # print the sums (delete maybe?)
print p, (p==ym?v:0) # print yyyymm and 0/$2
}
}
{
s[substr($1,1,4)]+=$2 # sums in array, year index
}
NR==1 { # handle first record
print
p=$1
}
NR>1 {
foo($1,$2)
}
END {
if(arg1)
foo(arg1,arg2)
print y=substr($1,1,4),s[y]+arg2
}' file
Tail from output:
2018 775
201901 0
201902 10
201903 0
201904 100
2019 110

Arithmetic calculation in shell scripting-bash

I have an input notepad file as shown below:
sample input file:
vegetables and rates
kg rate total
Tomato 4 50 100
potato 2 60 120
Beans 3 80 240
Overalltotal: (100+120++240) = 460
I need to multiply the column 2 and column 3 and check the total if it is right and the overall total as well. If that's not right we need to print in the same file as an error message as shown below
enter code here
sample output file:
vegetables and rates
kg rate vegtotal
Tomato 4 50 200
potato 2 60 120
Beans 3 80 240
Overalltotal: (200+120++240) = 560
Error in calculations:
Vegtotal for tomato is wrong: It should be 200 instead of 100
Overalltotal is wrong: It should be 560 instead of 460
Code so far:
for f in Date*.log; do
awk 'NR>1{ a[$1]=$2*$3 }{ print }END{ printf("\n");
for(i in a)
{ if(a[i]!=$4)
{ print i,"Error in calculations",a[i] }
} }' "$f" > tmpfile && mv tmpfile "$f";
done
It calculates the total but not comparing the values. How can I compare them and print to same file?
Complex awk solution:
awk 'NF && NR>1 && $0!~/total:/{
r=$2*$3; v=(v!="")? v"+"r : r;
if(r!=$4){ veg_er[$1]=r" instead of "$4 }
err_t+=$4; t+=r; $4=r
}
$0~/total/ && err_t {
print $1,"("v")",$3,t; print "Error in calculations:";
for(i in veg_er) { print "Veg total for "i" is wrong: it should be "veg_er[i] }
print "Overalltotal is wrong: It should be "t" instead of "err_t; next
}1' inputfile
The output:
kg rate total
Tomato 4 50 200
potato 2 60 120
Beans 3 80 240
Overalltotal: (200+120+240) = 560
Error in calculations:
Veg total for Tomato is wrong: it should be 200 instead of 100
Overalltotal is wrong: It should be 560 instead of 460
Details:
NF && NR>1 && $0!~/total:/ - considering veg lines (excuding header and total lines)
r=$2*$3 - the result of product of the 2nd and 3rd fields
v=(v!="")? v"+"r : r - concatenating resulting product values
veg_er - the array containing erroneous vegs info (veg name, erroneous product value, and real product value)
err_t+=$4 - accumulating erroneous total value
t+=r - accumulating real total value
$0~/total/ && err_t - processing total line and error events
Input
akshay#db-3325:/tmp$ cat file
kg rate total
Tomato 4 50 100
potato 2 60 120
Beans 3 80 240
Output
akshay#db-3325:/tmp$ awk 'FNR>1{sum+= $2 * $3 }1;END{print "Total : "sum}' file
kg rate total
Tomato 4 50 100
potato 2 60 120
Beans 3 80 240
Total : 560
Explanation
awk ' # call awk
FNR>1{ # if no of lines of current file is greater than 1,
# then , this is to skip first row
sum+= $2 * $3 # sum total which is product of value
# in column2 and column3
}1; # 1 at the end does default operation,
# that is print current record ( print $0 )
# if you want to skip record being printed remove "1", so that script just prints total
END{ # end block
print "Total : "sum # print sum
}
' file

Calculating sum of gradients with awk

I have a file that contains 4 columns such as:
A B C D
1 2 3 4
10 20 30 40
100 200 300 400
.
.
.
I can calculate gradient of columns B to D versus A such as following commands:
NR>1{print $0,($2-b)/($1-a)}{a=$1;b=$2}' file
How can I print sum of gradients as the 5th column in the file? The results should be:
A B C D sum
1 2 3 4 1+2+3+4=10
10 20 30 40 (20-2)/(10-1)+(30-3)/(10-1)+(40-4)/(10-1)=9
100 200 300 400 (200-20)/(100-10)+(300-30)/(100-10)+(400-40)/(100-10)=9
.
.
.
awk 'NR == 1 { print $0, "sum"; next } { if (NR == 2) { sum = $1 + $2 + $3 + $4 } else { t = $1 - a; sum = ($2 - b) / t + ($3 - c) / t + ($4 - d) / t } print $0, sum; a = $1; b = $2; c = $3; d = $4 }' file
Output:
A B C D sum
1 2 3 4 10
10 20 30 40 9
100 200 300 400 9
With ... | column -t:
A B C D sum
1 2 3 4 10
10 20 30 40 9
100 200 300 400 9
Update:
#!/usr/bin/awk -f
NR == 1 {
print $0, "sum"
next
}
{
sum = 0
if (NR == 2) {
for (i = 1; i <= NF; ++i)
sum += $i
} else {
t = $1 - a[1]
for (i = 2; i <= NF; ++i)
sum += ($i - a[i]) / t
}
print $0, sum
for (i = 1; i <= NF; ++i)
a[i] = $i
}
Usage:
awk -f script.awk file
If you apply the same logic to the first line of numbers as you do to the rest, taking the initial value of each column as 0, you get 9 as the result of the sum (as it was in your question originally). This approach uses a loop to accumulate the sum of the gradient from the second field up to the last one. It uses the fact that on the first time round, the uninitialised values in the array a evaluate to 0:
awk 'NR==1 { print $0, "sum"; next }
{
s = 0
for(i=2;i<=NF;++i) s += ($i-a[i])/($1-a[1]) # accumulate sum
for(i=1;i<=NF;++i) a[i] = $i # fill array to be used for next iteration
print $0, s
}' file
You can pack it all onto one line if you want but remember to separate the statements with semicolons. It's also slightly shorter to only use a single for loop with an if:
awk 'NR==1{print$0,"sum";next}{s=0;for(i=1;i<=NF;++i)if(i>1)s+=($i-a[i])/($1-a[1]);a[i]=$i;print$0,s}' file
Output:
A B C D sum
1 2 3 4 9
10 20 30 40 9
100 200 300 400 9

/proc/uptime in Mac OS X

I need the EXACT same output as Linux's "cat /proc/uptime".
For example, with /proc/uptime, you'd get
1884371.64 38646169.12
but with any Mac alternative, like "uptime", you'd get
20:25 up 20:26, 6 users, load averages: 3.19 2.82 2.76
I need it to be exactly like cat /proc/uptime, but on Mac OS X.
Got it...
$sysctl -n kern.boottime | cut -c14-18
87988
Then I just converted that to readable format (don't remember how):
1 Days 00:26:28
There simply is no "/proc" directory on the Macintosh.
On MacOS, you can do a command like:
sysctl kern.boottime
and you'll get a response like:
kern.boottime: { sec = 1362633455, usec = 0 } Wed Mar 6 21:17:35 2013
boottime=`sysctl -n kern.boottime | awk '{print $4}' | sed 's/,//g'`
unixtime=`date +%s`
timeAgo=$(($unixtime - $boottime))
uptime=`awk -v time=$timeAgo 'BEGIN { seconds = time % 60; minutes = int(time / 60 % 60); hours = int(time / 60 / 60 % 24); days = int(time / 60 / 60 / 24); printf("%.0f days, %.0f hours, %.0f minutes, %.0f seconds", days, hours, minutes, seconds); exit }'`
echo $uptime
Will return something like
1 Day, 20 hours, 10 minutes, 55 seconds
Here is what I Do to get the the values instead of Cut method
sysctl kern.boottime | awk '{print $5}'
Where $5 is the Range of the string
Example
$1 Gives you "sysctl kern.boottime"
$2 Gives you "{"
$3 Gives you "sec"
from the String
kern.boottime: { sec = 1604030189, usec = 263821 } Fri Oct 30 09:26:29 2020

how to group records into bucket based on the timestamp?

i have a list of entries from the logs:
15:38:52.363 1031
15:41:06.347 1259
15:41:06.597 1171
15:48:44.115 1588
15:48:44.125 1366
15:48:44.125 1132
15:53:14.525 1348
15:53:15.121 1553
15:53:15.181 1286
15:53:15.187 1293
the first one is the timestamp, the second one is the value.
now i'm trying to group them up by an interval of, say, 20 sec. i want to either sum the values, or get their average. i wonder what's the easiest way to do this? preferrably i can do this thru some simple shell script, so i can pipe my grep statement into and get a divided list. thanks!
This gawk script completely ignores fractional seconds. It also knows nothing about spanning from one day to the next (crossing 00:00:00):
grep ... | awk -v interval=20 'function groupout() {print "----", "Timespan ending:", strftime("%T", prevtime), "Sum:", sum, "Avg:", sum/count, "----"} BEGIN {prevtime = 0} {split($1, a, "[:.]"); time = mktime(strftime("%Y %m %d") " " a[1] " " a[2] " " a[3]); if (time > prevtime + interval) {if (NR != 1) {groupout(); sum=0; count=0}}; print; sum+=$2; count++; prevtime = time} END {groupout()}'
Output:
15:38:52.363 1031
---- Timespan ending: 15:38:52 Sum: 1031 Avg: 1031 ----
15:41:06.347 1259
15:41:06.597 1171
---- Timespan ending: 15:41:06 Sum: 2430 Avg: 1215 ----
15:48:44.115 1588
15:48:44.125 1366
15:48:44.125 1132
---- Timespan ending: 15:48:44 Sum: 4086 Avg: 1362 ----
15:53:14.525 1348
15:53:15.121 1553
15:53:15.181 1286
15:53:15.187 1293
---- Timespan ending: 15:53:15 Sum: 5480 Avg: 1370 ----
Here it is again more readably:
awk -v interval=20 '
function groupout() {
print "----", "Timespan ending:", strftime("%T", prevtime), "Sum:", sum, "Avg:", sum/count, "----"
}
BEGIN {
prevtime = 0
}
{
split($1, a, "[:.]");
time = mktime(strftime("%Y %m %d") " " a[1] " " a[2] " " a[3]);
if (time > prevtime + interval) {
if (NR != 1) {groupout(); sum=0; count=0}
};
print;
sum+=$2;
count++;
prevtime = time
}
END {groupout()}'

Resources