Suppose there was a meeting and the meeting record is saved in a CSV file. How to write a bash script/awk script to find out the total amount of time for which an employee stayed online. One employee may leave and rejoin the meeting, all his/her online time should be calculated.
What I did is as follows, but got stuck on how to compare one record with all other record, and add the total time of each joined and left pairs of a person.
#!/bin/bash
inputFile=$1
startTime=$(date -u -d $2 +"%s")
endTime=$(date -u -d $3 +"%s")
awk 'BEGIN{ FS=","; totalTime=0; }
{
for (rows=1; rows <= NR; rows++) {
#I am stuck here on how to compare a record with each and every record
if (($1==?? && $2=="Joined") && ($1==?? && $2=="Left")) {
totalTime=$($(date -u -d $3 +"%s")-$(date -u -d $3 +"%s"))
print $1 "," $totalTime +"%H:%M:%S"
}' $inputFile
The start_time and end_time of the meeting are given at command line such as:
$ ./script.sh input.csv 10:00:00 13:00:00
The output look like this: (Can be stored in an output file)
Bob, 00:30:00
John, 01:02:00
The contents of the CSV file is as follows:
Employee_name, Joined/Left, Time
John, joined, 10:00:00
Bob, joined, 10:01:00
James, joined, 10:00:30
Bob, left, 10:20:00
Bob, joined, 10:35:00
Bob, left, 11:40:00
James, left, 11:40:00
John, left, 10:41:00
Bob, joined, 11:45:00
$ cat tst.awk
BEGIN { FS=" *, *"; OFS=", " }
NR==1 { next }
$1 in joined {
jt = time2secs(joined[$1])
lt = time2secs($3)
totSecs[$1] += (lt - jt)
delete joined[$1]
next
}
{ joined[$1] = $3 }
END {
for (name in totSecs) {
print name, secs2time(totSecs[name])
}
}
function time2secs(time, t) {
split(time,t,/:/)
return (t[1]*60 + t[2])*60 + t[3]
}
function secs2time(secs, h,m,s) {
h = int(secs / (60*60))
m = int((secs - (h*60*60)) / 60)
s = int(secs % 60)
return sprintf("%02d:%02d:%02d", h, m, s)
}
.
$ awk -f tst.awk file
James, 01:39:30
Bob, 01:24:00
John, 00:41:00
If you need to consider DST transitions, leap-seconds, meetings going overnight (or for multiple days), people still being in the meeting when it ends, or anything else you haven't shown in your question - that is left as an exercise :-).
Related
What I have:
file.csv
car, speed, gas, color
2cv, 120, 8 , green
vw, 80, , yellow
Jaguar, 250, 15 , red
Benz, , , silver
What I have found:
This script returns exactly what I need by column number:
#!/bin/bash
awk -F', *' -v col=3 '
FNR>1 {
if ($col)
maxc=FNR
}
END {
print maxc
}
' file.csv
read -p "For End press Enter or Ctl + C"
I get exactly the output which I need (the number of the last line of column):
* for "col=1" ("car" column), the answer: 5
* for "col=2" ("speed" column), the answer: 4
* for "col=3" ("gas" column), the answer: 4
* for "col=4" ("color" column), the answer: 5
What I am looking for:
I am looking for a way to get the same not by "vol=volumnumber p.e. vol=3" but by "vol=columnheadlinevalue p.e. vol=gas".
It can be, it need additional like:
col_name=gas # selected column headline
col=get column number from $col_name # not working part
Here's one way to do exactly what you do but which finds the column name when FNR==1:
#!/bin/bash
columns=(car speed gas color)
for col in "${columns[#]}"
do
LINE_CNT=$(awk '-F[\t ]*,[\t ]*' -vcol=${col} '
FNR==1 {
for(i=1; i<=NF; ++i) {
if($i == col) {
col = i;
break;
}
}
if(i>NF) {
exit 1;
}
}
FNR>1 {
if($col) maxc=FNR;
}
END{
print maxc;
}' file.csv)
echo "$col $LINE_CNT"
done
Output:
car 5
speed 4
gas 4
color 5
I have my command below and I want to have the result in the same line with delimeters. My command:
Array=("GET" "POST" "OPTIONS" "HEAD")
echo $(date "+%Y-%m-%d %H:%M")
for i in "${Array[#]}"
do
cat /home/log/myfile_log | grep "$(date "+%d/%b/%Y:%H")"| awk -v last5=$(date --date="-5 min" "+%M") -F':' '$3>=last5 && $3<last5+5{print}' | egrep -a "$i" | wc -l
done
Results is:
2019-01-01 13:27
1651
5760
0
0
I want to have the result below:
2019-01-01 13:27,1651,5760,0,0
It looks (to me) like the overall objective is to scan /home/log/myfile.log for entries that have occurred within the last 5 minutes and which match one of the 4 entries in ${Array[#]}, keeping count of the matches along the way and finally printing the current date and the counts to a single line of output.
I've opted for a complete rewrite that uses awk's abilities of pattern matching, keeping counts and generating a single line of output:
date1=$(date "+%Y-%m-%d %H:%M") # current date
date5=$(date --date="-5 min" "+%M") # date from 5 minutes ago
awk -v d1="${date1}" -v d5="${date5}" -F":" '
BEGIN { keep=0 # init some variables
g=0
p=0
o=0
h=0
}
$3>=d5 && $3<d5+5 { keep=1 } # do we keep processing this line?
!keep { next } # if not then skip to next line
/GET/ { g++ } # increment our counters
/POST/ { p++ }
/OPTIONS/ { o++ }
/HEAD/ { h++ }
{ keep=0 } # reset keep flag for next line
# print results to single line of output
END { printf "%s,%s,%s,%s,%s\n", d1, g, p, o, h }
' <(grep "$(date '+%d/%b/%Y:%H')" /home/log/myfile_log)
NOTE: The OP may need to revisit the <(grep "$(date ...)" /home/log/myfile.log) to handle timestamp periods that span hours, days, months and years, eg, 14:59 - 16:04, 12/31/2019 23:59 - 01/01/2020 00:04, etc.
Yeah, it's a bit verbose but a bit easier to understand; OP can rewrite/reduce as sees fit.
Basically, I've two columns. First one stands for the users and the second one for the time they've spent on the server. So I'd like to sum for each client, how many minutes did he spend on the server.
user1 21:03
user2 19:55
user3 20:09
user1 18:57
user1 19:09
user3 21:05
user4 19:57
Let's say that I've this. I know how to split but there's one problem. Whenever I do awk -F: '{print $1} it prints users and the first parameter of the time (the number before :), and when I do awk -F: '{print $2} it prints only the numbers after :. After all of the sum, I'd like to get something like
user1 59:09
user2 19:55
user3 41:14
user4 19:57
Here's a possible solution:
perl -ne '/^(\S+) (\d\d):(\d\d)$/ or next; $t{$1} += $2 * 60 + $3; END { printf "%s %02d:%02d\n", $_, $t{$_} / 60, $t{$_} % 60 for sort keys %t }'
Or with better formatting:
perl -ne '
/^(\S+) (\d\d):(\d\d)$/ or next;
$t{$1} += $2 * 60 + $3;
END {
printf "%s %02d:%02d\n", $_, $t{$_} / 60, $t{$_} % 60
for sort keys %t;
}
'
We loop over all input lines (-n). We make sure every line matches the pattern \S+ \d\d:\d\d (i.e. a sequence of 1 or more non-space characters, a space, two digits, a colon, two digits) or else we skip it.
We accumulate the number of seconds per user in the hash %t. The keys are the user names, the values are the numbers.
At the end we print the contents of %t in a nicely formatted way.
this is an awk solution
cat 1.txt | awk '{a[$1]+=substr($2,0,2)*60+substr($2,4)} END {for(i in a) printf("%s %02d:%02d\n", i,a[i]/60,a[i]%60)}'
user1 59:09
user2 19:55
user3 41:14
user4 19:57
first construct an array with index=$1, and the value = convert the time to an integer by minutes * 60 + seconds
{a[$1]+=substr($2,0,2)*60+substr($2,4)}
then print the array in the desired format which converts integer to a mi:ss format.
printf("%s %02d:%02d\n", i,a[i]/60,a[i]%60)
If you want to use awk (and assuming the duration is always hh:mm, though their sizes can be arbitrary), the following will do the trick:
{
split($2, flds, ":") # Get hours and minutes.
mins[$1] += flds[1] * 60 + flds[2] # Add to initially zero array item.
}
END {
for (key in mins) { # For each key in array.
printf "%s %d:%02d\n", # Output specific format.
key, # Key, hours, and minutes.
mins[key] / 60,
mins[key] % 60
}
}
That's the expanded, readable variant, the compressed one is shown in the following transcript, along with the output as expected:
pax> awk '{split($2,flds,":");mins[$1] += flds[1] * 60 + flds[2]}END{for(key in mins){printf "%s %d:%02d\n",key,mins[key]/60,mins[key]%60}}' testprog.in
user1 59:09
user2 19:55
user3 41:14
user4 19:57
Just keep in mind you haven't specified the input format whenever a user entry has more than 24 hours. If it goes something like 25:42, the script will work as is.
If instead it decides to break out days (into something like 1:01:42 rather than 25:42), you'll need to adjust how the minutes are calculated. This can be done relatively easily (including the possibility for minutes-only entries) by checking the flds array size with (in the main body of the script, the non-END bit):
num = split($2, flds, ":")
if (num == 1) { add = flds[1] }
else if (num == 2) { add = flds[1] * 60 + flds[2] }
else { add = flds[1] * 1440 + flds[2] * 60 + flds[3] }
mins[$1] += add
I have csv file consist of 2 columns, name and date in 24 hours format
Name, log_date
John, 11/29/2017 23:00
And i want to add 2 hours to log date to change date and time to be as below
John, 11/30/2017 01:00
I tried to add it by below command but with no success
awk - F 'NR>1{$4+=(2/24);}1' OFS="," IN.csv > OUT.csv
I get the below output
2017.08
in values of the log date column
So please help
You need a language that has datetime arithmetic. Perl for example:
perl -MTime::Piece -F'/,\s*/' -slane '
$datetime = Time::Piece->strptime($F[1], $fmt);
$F[1] = ($datetime + 7200)->strftime($fmt);
print join ", ", #F
' -- -fmt="%m/%d/%Y %H:%M" <<END
John, 11/29/2017 11:00
END
John, 11/29/2017 13:00
Given your input, there's no way to indicate that the time is 11 PM. How are you supposed to know that?
below is oneliner in python. This is really not a useable code, but I believe you can get idea of using one-liners. This one-liner can be made yet simpler.
python -c "s=r'John, 11/29/2017 13:00';
print(s.replace(s.split(\" \")[-1].split(\":\")[0],str(int(s.split(\" \")[-1].split(\":\")[0])+2)));";
Output
John, 11/29/2017 15:00
Yet, this will not roll over the date like if 23+2 = 25 which should suppose to be 1:00
All you're looking for is documented here.
Using space as a field separator :
{
split($2,D,"/")
split($3,H,":")
# format for mktime is "YYYY MM DD HH MM SS [DST]"
d = D[3] " " D[1] " " D[2]" " H[1] " " H[2] " 00"
t=mktime(d)
t = t + 7200 # add two hours
$2 = strftime("%m/%d/%Y",t)
$3 = strftime("%H:%M",t)
}1
awk -F',' '{if(NR>1){printf("%s, ", $1);system("date -d \"+2 hours " $2 "\" +\"%m/%d/%Y %H:%M\"")}else{print $0}}' IN.csv > OUT.csv
I need small help related to Unix shell script using awk.
I have a file like below:
139341 8.61248 python_dev ntoma2 r 07/17/2017 07:27:43 gpuml#acepd1641.udp.finco.com 1
139342 8.61248 python_val ntoma2 r 07/17/2017 07:27:48 gpuml#acepd1611.udp.finco.com 1
139652 8.61248 python_dev ntoma2 r 07/17/2017 10:55:57 gpuml#acepd1671.udp.finco.com 1
Which is space separated. I need to get 1st col and 4th col which are job-id and user-name(ntoma2 in this case) based on 6th col (which is date in date formate - mm/dd/yyyy), older than 7days. Compare 6th column with current date and I need to get cols which are older than 7days.
I have below one to get Job id and user name of older than 7 days:
cat filename.txt | awk -v dt="$(date "--date=$(date) -7 day" +%m/%d/%Y)" -F" " '/qw/{ if($6<dt) print $4,":",$1 }' >> ./longRunningJob.$$
Also i have another command to get email ids like below using user-name (from the above 4th col):
/ccore/pbis/bin/enum-members "adsusers" | grep ^UNIX -B3 | grep <User-Name> -B2 | grep UPN | awk '{print $2}'
I need to combined above 2 commands and need to send a report to every user as like below:
echo "Hello <User Name>, There is a long running job which is of job-id: <job-id> more than 7days, so please kill the job or let us know if we can help. Thank you!" | mailx -s "Long Running Job"
NOTE: if user name repeated, all the list should go in one email.
I am not sure how can i combine these 2 and send email to user, can some one please help me?
Thank you in advance!!
Vasu
You can certainly do this in awk -- easier in gawk because of date support.
Just to give you an outline of how to do this, I wrote this in Ruby:
$ cat file
139341 8.61248 python_dev ntoma2 r 07/10/2017 07:27:43 gpuml#acepd1641.udp.finco.com 1
139342 8.61248 python_val ntoma2 r 07/09/2017 07:27:48 gpuml#acepd1611.udp.finco.com 1
139652 8.61248 python_dev ntoma2 r 07/17/2017 10:55:57 gpuml#acepd1671.udp.finco.com 1
$ ruby -lane 'BEGIN{ require "date"
jobs=Hash.new { |h,k| h[k]=[] }
users=Hash.new()
pn=7.0
}
t=DateTime.parse("%s %s" % [$F[5].split("/").rotate(-1).join("-"), $F[6]])
ti_days=(DateTime.now-t).to_f
ts="%d days, %d hours, %d minutes and %d seconds" % [60,60,24]
.reduce([ti_days*86400]) { |m,o| m.unshift(m.shift.divmod(o)).flatten }
users[$F[3]]=$F[7]
jobs[$F[3]] << "Job: %s has been running %s" % [$F[0], ts] if (DateTime.now-t).to_f > pn
END{
jobs.map { |id, v|
w1,w2=["is a","job"]
w1,w2=["are","jobs"] if v.length>1
s="Hello #{id}, There #{w1} long running #{w2} running more than the policy of #{pn.to_i} days. Please kill the #{w2} or let us know if we can help. Thank you!\n\t" << v.join("\n\t")
puts "#{users[id]} \n#{s}"
# s is the formated email address and body. You take it from here...
}
}
' /tmp/file
gpuml#acepd1671.udp.finco.com
Hello ntoma2, There are long running jobs running more than the policy of 7 days. Please kill the jobs or let us know if we can help. Thank you!
Job: 139341 has been running 11 days, 9 hours, 28 minutes and 44 seconds
Job: 139342 has been running 12 days, 9 hours, 28 minutes and 39 seconds
I got the Solution, but there is a bug in it, here is the solution:
!#/bin/bash
{ qstat -u \*; /ccore/pbis/bin/enum-members "adsusers"; } | awk -v dt=$(date "--date=$(date) -7 day" +%m/%d/%Y) '
/^User obj/ {
F2 = 1
FS = ":"
T1 = T2 = ""
next
}
!F2 {
if (NR < 3) next
if ($5 ~ "qw" && $6 < dt) JID[$4] = $1 "," JID[$4]
next
}
/^UPN/ {T1 = $2
}
/^Display/ {T2 = $2
}
/^Alias/ {gsub (/ /, _, $2)
EM[$2] = T1
DN[$2] = T2
}
END {for (j in JID) {print "echo -e \"Hello " DN[j] " \\n \\nJob(s) with job id(s): " JID[j] " executing more than last 7 days, hence request you to take action, else job(s) will be killed in another 1 day \\n \\n Thank you.\" | mailx -s \"Long running job for user: " DN[j] " (" j ") and Job ID(s): " JID[j] "\" " EM[j]
}
}
' | sh
The bug in the above code is -- the if condition of date compare (as shown below) is is not working as expected, i am really not sure how to compare the $6 and the variable dt (both of format mm/dd/yyyy). I think i should use either mkdate() or something else. can some one please help?
if ($5 ~ "qw" && $6 < dt)
Thank you!!
Vasu