Dynamically get the total up/down time from a log file - ruby

Let's say I have the following log file that continuously logs a server's down/up time:
status.log
UP - "18:00:00"
..
..
DOWN - "19:00:03"
..
..
DOWN - "22:00:47"
..
..
UP - "23:59:48"
UP - "23:59:49"
UP - "23:59:50"
DOWN - "23:59:51"
DOWN - "23:59:52"
UP - "23:59:53"
UP - "23:59:54"
UP - "23:59:56"
UP - "23:59:57"
UP - "23:59:59"
each day is logged in a separate folder under the same filename.
not my actual code, but this is much simpler and transparent approach:
#!/bin/ruby
downtime_log = File.readlines("path/to/log/file").select { |line| line =~ /DOWN/ }
puts "#{downtime_log.count} Downtimes for today"
logic-wise, how can I get the total downtime per file/day in minutes and seconds but not as a total count.

I assume that your file contains exactly one line per second. Then the number of seconds your service was down can be evaluated like you already did in your approach:
number_of_seconds_downtime = File.readlines('path/to/log/file')
.select { |line| line =~ /DOWN/ }
.count
Or simplified:
number_of_seconds_downtime = File.readlines('path/to/log/file')
.count { |line| line =~ /DOWN/ }
To translate this into minutes and seconds just divmod
minutes, seconds = number_of_seconds_downtime.divmod(60)
and output the result like this:
puts "#{minutes}:#{seconds} downtime"

Related

Time Delta problem in Hackerrank not taking good answer / Python 3

The hackerrank challenge is in the following url: https://www.hackerrank.com/challenges/python-time-delta/problem
I got testcase 0 correct, but the website is saying that I have wrong answers for testcase 1 and 2, but in my pycharm, I copied the website expected output and compared with my output and they were exactly the same.
Please have a look at my code.
#!/bin/pyth
# Complete the time_delta function below.
from datetime import datetime
def time_delta(tmp1, tmp2):
dicto = {'Jan':1, 'Feb':2, 'Mar':3,
'Apr':4, 'May':5, 'Jun':6,
'Jul':7, 'Aug':8, 'Sep':9,
'Oct':10, 'Nov':11, 'Dec':12}
# extracting t1 from first timestamp without -xxxx
t1 = datetime(int(tmp1[2]), dicto[tmp1[1]], int(tmp1[0]), int(tmp1[3][:2]),int(tmp1[3][3:5]), int(tmp1[3][6:]))
# extracting t1 from second timestamp without -xxxx
t2 = datetime(int(tmp2[2]), dicto[tmp2[1]], int(tmp2[0]), int(tmp2[3][:2]), int(tmp2[3][3:5]), int(tmp2[3][6:]))
# converting -xxxx of timestamp 1
t1_utc = int(tmp1[4][:3])*3600 + int(tmp1[4][3:])*60
# converting -xxxx of timestamp 2
t2_utc = int(tmp2[4][:3])*3600 + int(tmp2[4][3:])*60
# absolute difference
return abs(int((t1-t2).total_seconds()-(t1_utc-t2_utc)))
if __name__ == '__main__':
# fptr = open(os.environ['OUTPUT_PATH'], 'w')
t = int(input())
for t_itr in range(t):
tmp1 = list(input().split(' '))[1:]
tmp2 = list(input().split(' '))[1:]
delta = time_delta(tmp1, tmp2)
print(delta)
t1_utc = int(tmp1[4][:3])*3600 + int(tmp1[4][3:])*60
For a time zone like +0715, you correctly add “7 hours of seconds” and “15 minutes of seconds”
For a timezone like -0715, you are adding “-7 hours of seconds” and “+15 minutes of seconds”, resulting in -6h45m, instead of -7h15m.
You need to either use the same “sign” for both parts, or apply the sign afterwards.

AWK performance while processing big files

I have an awk script that I use for calculate how much time some transactions takes to complete. The script gets the unique ID of each transaction and stores the minimum and maximum timestamp of each one. Then it calculates the difference and at the end it shows those results that are over 60 seconds.
It works very well when used with some thousand (200k) but it takes more time when used in real world. I tested it several times and it takes about 15 minutes to process about 28 million of lines. Can I consider this good performance or it is possible to improve it?
I'm open to any kind of suggestion.
Here you have the complete code
zgrep -E "\(([a-z0-9]){15,}:" /path/to/very/big/log | awk '{
gsub("[()]|:.*","",$4); #just removing ugly chars
++cont
min=$4"min" #name for maximun value of current transaction
max=$4"max" #same as previous, just for readability
split($2,secs,/[:,]/) #split hours,minutes and seconds
seconds = 3600*secs[1] + 60*secs[2] + secs[3] #turn everything into seconds
if(arr[min] > seconds || arr[min] == 0)
arr[min]=seconds
if(arr[max] < seconds)
arr[max]=seconds
dif=arr[max] - arr[min]
if(dif > 60)
result[$4] = dif
}
END{
for(x in result)
print x" - "result[x]
print ":Processed "cont" lines"
}'
You don't need to calculate the dif every time you read a record. Just do it once in the END section.
You don't need that cont variable, just use NR.
You dont need to populate min and max separately string concatenation is slow in awk.
You shouldn't change $4 as that will force the record to be recompiled.
Try this:
awk '{
name = $4
gsub(/[()]|:.*/,"",name); #just removing ugly chars
split($2,secs,/[:,]/) #split hours,minutes and seconds
seconds = 3600*secs[1] + 60*secs[2] + secs[3] #turn everything into seconds
if (NR==1) {
min[name] = max[name] = seconds
}
else {
if (min[name] > seconds) {
min[name] = seconds
}
if (max[name] < seconds) {
max[name] = seconds
}
}
}
END {
for (name in min) {
diff = max[name] - min[name]
if (diff > 60) {
print name, "-", diff
}
}
print ":Processed", NR, "lines"
}'
After making some test, and with the suggestions gave by Ed Morton (both for code improvement and performance test) I found that the bottleneck was the zgrep command. Here is an example that does several things:
Check if we have a transaction line (first if)
Cleans the transaction id
checks if this has been already registered (second if) by checking if it is in the array
If is not registered then checks if it is the appropriate type of transaction and if so it registers the timestamp in second
If is already registered saves the new time-stamp as the maximun
After all it makes the necessary operations to calculate the time difference
Thank you very much to all that helped me.
zcat /veryBigLog.gz | awk '
{if($4 ~ /^\([:alnum:]/ ){
name=$4;gsub(/[()]|:.*/,"",name);
if(!(name in min)){
if($0 ~ /TypeOFTransaction/ ){
split($2,secs,/[:,]/)
seconds = 3600*secs[1] + 60*secs[2] + secs[3]
max[name] = min[name]=seconds
print lengt(min) "new "name " start at "seconds
}
}else{
split($2,secs,/[:,]/)
seconds = 3600*secs[1] + 60*secs[2] + secs[3]
if( max[name] < seconds) max[name]=seconds
print name " new max " max[name]
}
}}END{
for(x in min){
dif=max[x]- min[x]
print max[x]" max - min "min[x]" : "dif
}
print "Processed "NR" Records"
print "Found "length(min)" MOs" }'

Measure estimated completion time of ruby script

I've been running a lot of scripts lately that iterate over 10k - 300k objects, and I'm thinking of writing some code that estimates the completion time of the script (they take 20-180 minutes). I've got to imagine though that there's something out there that does this already. Is there?
To Clarify (edit):
Were I to write code to do this, it would work by measuring how long it takes to perform "the operation" on a single object, multiplying that amount of time by the number of objects left, and adding it to the current time.
Granted, this would only work in situations where you have a script involving a single loop that takes up 99% of the script's total run time, and in which you could reasonably expect to be able to calculate an semi-accurate average for each iteration of that loop. This is true of the scripts for which I'd like estimate completion time.
Have a look at the ruby-progressbar gem: https://github.com/jfelchner/ruby-progressbar
It generates a nice progressbar and estimates completion time (ETA):
example task: 67% |oooooooooooooooooooooo | ETA: 00:01:15
You can granularity measure the time of each method within your script and then sum the components as described here.
You let your process run, and after a set time of iterations, you measure the elapsed time. You then use that value as an estimation for the time left. This ensures that the time is always dynamically estimated according to the current task.
This example is extra verbose, like a code double whopper with triple cheese:
# Some variables for this test
iterations = 1000
probe_at = (iterations * 0.1).to_i
time_total = 0
#======================================
iterations.times do |i|
time_start = Time.now
#you could yield here if this were a function
5000.times do # <tedius task simulation>
Math.sqrt(rand(200000))
end # <end of tedious task simulation>
time_total += time_taken = Time.now - time_start
if i == probe_at
iteration_cost = (time_total / probe_at)
time_left = iteration_cost * (iterations - probe_at)
puts "Time taken (ACTUAL): #{time_total} | iteration: #{i}"
puts "Time left (ESTIMATE): #{time_left} | iteration: #{i}"
puts "Estimated total: #{time_total + time_left} | iteration: #{i}"
end
if i == 999
puts "Time taken (ACTUAL): #{time_total} | iteration: #{i}"
end
end
You could easily rewrite this into a class or a method.

Match Multiple Patterns in a String and Return Matches as Hash

I'm working with some log files, trying to extract pieces of data.
Here's an example of a file which, for the purposes of testing, I'm loading into a variable named sample. NOTE: The column layout of the log files is not guaranteed to be consistent from one file to the next.
sample = "test script result
Load for five secs: 70%/50%; one minute: 53%; five minutes: 49%
Time source is NTP, 23:25:12.829 UTC Wed Jun 11 2014
D
MAC Address IP Address MAC RxPwr Timing I
State (dBmv) Offset P
0000.955c.5a50 192.168.0.1 online(pt) 0.00 5522 N
338c.4f90.2794 10.10.0.1 online(pt) 0.00 3661 N
990a.cb24.71dc 127.0.0.1 online(pt) -0.50 4645 N
778c.4fc8.7307 192.168.1.1 online(pt) 0.00 3960 N
"
Right now, I'm just looking for IPv4 and MAC address; eventually the search will need to include more patterns. To accomplish this, I'm using two regular expressions and passing them to Regexp.union
patterns = Regexp.union(/(?<mac_address>\h{4}\.\h{4}\.\h{4})/, /(?<ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/)
As you can see, I'm using named groups to identify the matches.
The result I'm trying to achieve is a Hash. The key should equal the capture group name, and the value should equal what was matched by the regular expression.
Example:
{"mac_address"=>"0000.955c.5a50", "ip_address"=>"192.168.0.1"}
{"mac_address"=>"338c.4f90.2794", "ip_address"=>"10.10.0.1"}
{"mac_address"=>"990a.cb24.71dc", "ip_address"=>"127.0.0.1"}
{"mac_address"=>"778c.4fc8.7307", "ip_address"=>"192.168.1.1"}
Here's what I've come up with so far:
sample.split(/\r?\n/).each do |line|
hashes = []
line.split(/\s+/).each do |val|
match = val.match(patterns)
if match
hashes << Hash[match.names.zip(match.captures)].delete_if { |k,v| v.nil? }
end
end
results = hashes.reduce({}) { |r,h| h.each {|k,v| r[k] = v}; r }
puts results if results.length > 0
end
I feel like there should be a more "elegant" way to do this. My chief concern, though, is performance.

How to analyze time tracking reports with awk?

I'm tracking my time with two great tools, todotxt and punch. With these one
can generate reports that look like this:
2012-11-23 (2 hours 56 minutes):
first task (52 minutes)
second task (2 hours 4 minutes)
2012-11-24 (2 hours 8 minutes):
second task (2 hours 8 minutes)
My question: what's a convenient way for analyzing this kind of output? E.g.
how could I sum up the time that is spent doing "first task"/"second task" or find out
my total working hours for a longer period such as "2012-11-*"?
So, I'd like to have a command such as punch.sh report /regex-for-date-or-task-i'm-interested-in/.
I read and saw that this is possible with awk. But I don't know how to 1) sum minutes + hours and 2) provide "task names with spaces" as variables for awk.
UPDATE:
I'm also tagging my tasks with +tags to mark different projects (as in first task +projecttag). So it would also be great to sum the time spent on all tasks with a certain tag.
Thanks for any help!
Before running this script. Please uncomment the appropriate gsub() function. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
FS="[( \t)]"
}
/^\t/ {
line = $0
#### If your input has...
## No tags, show hrs in each task
# gsub(/^[\t ]*| *\(.*/,"",line)
## Tags, show hrs in each task
# gsub(/^[\t ]*| *\+.*/,"",line)
## Tags, show hrs in each tag
# gsub(/^[^+]*\+| *\(.*/,"",line)
####
for(i=1;i<=NF;i++) {
if ($i == "hours") h[line]+=$(i-1)
if ($i == "minutes") m[line]+=$(i-1)
}
}
END {
for (i in m) {
while (m[i] >= 60) { m[i]-=60; h[i]++ }
print i ":", (h[i] ? h[i] : "0") " hrs", m[i], "mins"
}
}

Resources