How to analyze time tracking reports with awk? - time

I'm tracking my time with two great tools, todotxt and punch. With these one
can generate reports that look like this:
2012-11-23 (2 hours 56 minutes):
first task (52 minutes)
second task (2 hours 4 minutes)
2012-11-24 (2 hours 8 minutes):
second task (2 hours 8 minutes)
My question: what's a convenient way for analyzing this kind of output? E.g.
how could I sum up the time that is spent doing "first task"/"second task" or find out
my total working hours for a longer period such as "2012-11-*"?
So, I'd like to have a command such as punch.sh report /regex-for-date-or-task-i'm-interested-in/.
I read and saw that this is possible with awk. But I don't know how to 1) sum minutes + hours and 2) provide "task names with spaces" as variables for awk.
UPDATE:
I'm also tagging my tasks with +tags to mark different projects (as in first task +projecttag). So it would also be great to sum the time spent on all tasks with a certain tag.
Thanks for any help!

Before running this script. Please uncomment the appropriate gsub() function. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
FS="[( \t)]"
}
/^\t/ {
line = $0
#### If your input has...
## No tags, show hrs in each task
# gsub(/^[\t ]*| *\(.*/,"",line)
## Tags, show hrs in each task
# gsub(/^[\t ]*| *\+.*/,"",line)
## Tags, show hrs in each tag
# gsub(/^[^+]*\+| *\(.*/,"",line)
####
for(i=1;i<=NF;i++) {
if ($i == "hours") h[line]+=$(i-1)
if ($i == "minutes") m[line]+=$(i-1)
}
}
END {
for (i in m) {
while (m[i] >= 60) { m[i]-=60; h[i]++ }
print i ":", (h[i] ? h[i] : "0") " hrs", m[i], "mins"
}
}

Related

Dynamically get the total up/down time from a log file

Let's say I have the following log file that continuously logs a server's down/up time:
status.log
UP - "18:00:00"
..
..
DOWN - "19:00:03"
..
..
DOWN - "22:00:47"
..
..
UP - "23:59:48"
UP - "23:59:49"
UP - "23:59:50"
DOWN - "23:59:51"
DOWN - "23:59:52"
UP - "23:59:53"
UP - "23:59:54"
UP - "23:59:56"
UP - "23:59:57"
UP - "23:59:59"
each day is logged in a separate folder under the same filename.
not my actual code, but this is much simpler and transparent approach:
#!/bin/ruby
downtime_log = File.readlines("path/to/log/file").select { |line| line =~ /DOWN/ }
puts "#{downtime_log.count} Downtimes for today"
logic-wise, how can I get the total downtime per file/day in minutes and seconds but not as a total count.
I assume that your file contains exactly one line per second. Then the number of seconds your service was down can be evaluated like you already did in your approach:
number_of_seconds_downtime = File.readlines('path/to/log/file')
.select { |line| line =~ /DOWN/ }
.count
Or simplified:
number_of_seconds_downtime = File.readlines('path/to/log/file')
.count { |line| line =~ /DOWN/ }
To translate this into minutes and seconds just divmod
minutes, seconds = number_of_seconds_downtime.divmod(60)
and output the result like this:
puts "#{minutes}:#{seconds} downtime"

AWK performance while processing big files

I have an awk script that I use for calculate how much time some transactions takes to complete. The script gets the unique ID of each transaction and stores the minimum and maximum timestamp of each one. Then it calculates the difference and at the end it shows those results that are over 60 seconds.
It works very well when used with some thousand (200k) but it takes more time when used in real world. I tested it several times and it takes about 15 minutes to process about 28 million of lines. Can I consider this good performance or it is possible to improve it?
I'm open to any kind of suggestion.
Here you have the complete code
zgrep -E "\(([a-z0-9]){15,}:" /path/to/very/big/log | awk '{
gsub("[()]|:.*","",$4); #just removing ugly chars
++cont
min=$4"min" #name for maximun value of current transaction
max=$4"max" #same as previous, just for readability
split($2,secs,/[:,]/) #split hours,minutes and seconds
seconds = 3600*secs[1] + 60*secs[2] + secs[3] #turn everything into seconds
if(arr[min] > seconds || arr[min] == 0)
arr[min]=seconds
if(arr[max] < seconds)
arr[max]=seconds
dif=arr[max] - arr[min]
if(dif > 60)
result[$4] = dif
}
END{
for(x in result)
print x" - "result[x]
print ":Processed "cont" lines"
}'
You don't need to calculate the dif every time you read a record. Just do it once in the END section.
You don't need that cont variable, just use NR.
You dont need to populate min and max separately string concatenation is slow in awk.
You shouldn't change $4 as that will force the record to be recompiled.
Try this:
awk '{
name = $4
gsub(/[()]|:.*/,"",name); #just removing ugly chars
split($2,secs,/[:,]/) #split hours,minutes and seconds
seconds = 3600*secs[1] + 60*secs[2] + secs[3] #turn everything into seconds
if (NR==1) {
min[name] = max[name] = seconds
}
else {
if (min[name] > seconds) {
min[name] = seconds
}
if (max[name] < seconds) {
max[name] = seconds
}
}
}
END {
for (name in min) {
diff = max[name] - min[name]
if (diff > 60) {
print name, "-", diff
}
}
print ":Processed", NR, "lines"
}'
After making some test, and with the suggestions gave by Ed Morton (both for code improvement and performance test) I found that the bottleneck was the zgrep command. Here is an example that does several things:
Check if we have a transaction line (first if)
Cleans the transaction id
checks if this has been already registered (second if) by checking if it is in the array
If is not registered then checks if it is the appropriate type of transaction and if so it registers the timestamp in second
If is already registered saves the new time-stamp as the maximun
After all it makes the necessary operations to calculate the time difference
Thank you very much to all that helped me.
zcat /veryBigLog.gz | awk '
{if($4 ~ /^\([:alnum:]/ ){
name=$4;gsub(/[()]|:.*/,"",name);
if(!(name in min)){
if($0 ~ /TypeOFTransaction/ ){
split($2,secs,/[:,]/)
seconds = 3600*secs[1] + 60*secs[2] + secs[3]
max[name] = min[name]=seconds
print lengt(min) "new "name " start at "seconds
}
}else{
split($2,secs,/[:,]/)
seconds = 3600*secs[1] + 60*secs[2] + secs[3]
if( max[name] < seconds) max[name]=seconds
print name " new max " max[name]
}
}}END{
for(x in min){
dif=max[x]- min[x]
print max[x]" max - min "min[x]" : "dif
}
print "Processed "NR" Records"
print "Found "length(min)" MOs" }'

Awk Calc Avg Rows Below Certain Line

I'm having trouble calculating an average of specific numbers in column BELOW a specific text identifier using awk. I have two columns of data and I'm trying to start the average keying on a common identifier that repeats, which is 01/1991. So, awk should calc the average of all lines beginning with 01/1991, which repeats, using the next 21 lines with total count of rows for average = 22 for the total number of years 1991-2012. The desired output is an average of each TextID/Name entry for all the January's (01) for each year 1991 - 2012 show below:
TextID/Name 1
Avg: 50.34
TextID/Name 2
Avg: 45.67
TextID/Name 3
Avg: 39.97
...
sample data:
TextID/Name 1
01/1991, 57.67
01/1992, 56.43
01/1993, 49.41
..
01/2012, 39.88
TextID/Name 2
01/1991, 45.66
01/1992, 34.77
01/1993, 56.21
..
01/2012, 42.11
TextID/Name 3
01/1991, 32.22
01/1992, 23.71
01/1993, 29.55
..
01/2012, 35.10
continues with the same data for TextID/Name 4
I'm getting an answer using this code shown below but the average is starting to calculate BEFORE the specific identifier line and not on and below that line (01/1991).
awk '$1="01/1991" {sum+=$2} (NR%22==0){avg=sum/22;print"Average: "avg;sum=0;next}' myfile
Thanks and explanations of the solution is greatly appreciated! I have edited the original answer with more description - thank you again.
If you look at your file, the first field is "01/1991," with a comma at the end, not "01/1991". Also, NR%22==0 will look at line numbers divisible by 22, not 22 lines after the point it thinks you care about.
You can do something like this instead:
awk '
BEGIN { l=-1; }
$1 == "01/1991," {
l=22;
s=0;
}
l > 0 { s+=$2; l--; }
l == 0 { print s/22; l--; }'
It has a counter l that it sets to the number of lines to count, then it sums up that number of lines.
You may want to consider simply summing all lines from one 01/1991 to the next though, which might be more robust.
If you're allowed to use Perl instead of Awk, you could do:
#!/usr/bin/env perl
$start = 0;
$have_started = 0;
$count = 0;
$sum = 0;
while (<>) {
$line = $_;
# Grab the value after the date and comma
if ($line = /\d+\/\d+,\s+([\d\.]+)/) {
$val = $+;
}
# Start summing values after 01/1991
if (/01\/1991,\s+([\d\.]+)/) {
$have_started = 1;
$val = $+;
}
# If we have started counting,
if ($have_started) {
$count++;
$sum += $+;
}
}
print "Average of all values = " . $sum/$count;
Run it like so:
$ cat your-text-file.txt | above-perl-script.pl

Retrieving time for microbenchmarking scripts with millisecond-level accuracy

I want to read a file using shell script and here want to calculate the time required to read a file. I have created below method to get the time in milliseconds at the start and end of the file reading and I will calculate the time difference, but it is not adding the hour+minute+seconds and showing me that the required numeric input.
Method
getCurrentTimeInMili()
{
hourTime=$(($(date +%H)*3600))
minuteTime=$(($(date +%m)*60))
secondTime=$(date +%S)
timeInMili= $(($hourTime + $minuteTime + $secondTime));
return timeInMili
}
Error
./testshell.sh: line 17: return: timeInMili: numeric argument required
omit the space between timeInMili= and $
timeInMili= $(($hourTime + $minuteTime + $secondTime));
^
This to
timeInMili=$(($hourTime + $minuteTime + $secondTime));
Invoking date multiple times means that their return values can be a bit out of sync with each other -- which could be bad if we're invoked just before a second boundary. Better is to call date only once and retrieve all the information desired, like so:
getCurrentTimeInMili() {
date +'%H 3600 * %M 60 * + %S + 1000 * %N 1000000 / + p' | dc
}
startTime=$(getCurrentTimeInMili)
sleep 5
endTime=$(getCurrentTimeInMili)
If you don't need this much accuracy, you can simply use the time builtin, as in:
time sleep 5

Fast editing subtitles file

I like GNU/Linux and writing bash scripts to automatize my tasks. But I am a beginner and have a lot of problems with it. So, I have a subtitle file in format like I this(I'm Polish so it's a Polish subtitles):
00:00:27:W zamierzchłych czasach|ziemia pokryta była lasami.
00:00:31:Od wieków mieszkały|w nich duchy bogów.
00:00:37:Człowiek żył w harmonii ze zwierzętami.
I thing you understanding this simple format. The problem is that in the "movie file", before the movie starts is 1:15 of introduction. I want to add to each subtitle file's line 1:15. So the example should look like this:
00:01:43:W zamierzchłych czasach|ziemia pokryta była lasami.
00:01:46:Od wieków mieszkały|w nich duchy bogów.
00:01:52:Człowiek żył w harmonii ze zwierzętami.
Could you help me to write this script?
BTW I'm Polish and I'm still learning English. So if you cannot understand me, write.
Here's a solution in awk - probably easier than bash for this kind of problem:
#!/usr/bin/awk -f
BEGIN {
FS=":"
}
{
hr = $1
min = $2
sec = $3
sec = sec + 15
if (sec >= 60) {
sec = sec - 60
min = min + 1
}
min = min + 1
if (min >= 60) {
min = min - 60
hr = hr + 1
}
printf "%02d:%02d:%02d:%s\n", hr, min, sec, $4
}
Suggestions for improvement welcome!

Resources