Retrieving time for microbenchmarking scripts with millisecond-level accuracy - shell

I want to read a file using shell script and here want to calculate the time required to read a file. I have created below method to get the time in milliseconds at the start and end of the file reading and I will calculate the time difference, but it is not adding the hour+minute+seconds and showing me that the required numeric input.
Method
getCurrentTimeInMili()
{
hourTime=$(($(date +%H)*3600))
minuteTime=$(($(date +%m)*60))
secondTime=$(date +%S)
timeInMili= $(($hourTime + $minuteTime + $secondTime));
return timeInMili
}
Error
./testshell.sh: line 17: return: timeInMili: numeric argument required

omit the space between timeInMili= and $
timeInMili= $(($hourTime + $minuteTime + $secondTime));
^
This to
timeInMili=$(($hourTime + $minuteTime + $secondTime));

Invoking date multiple times means that their return values can be a bit out of sync with each other -- which could be bad if we're invoked just before a second boundary. Better is to call date only once and retrieve all the information desired, like so:
getCurrentTimeInMili() {
date +'%H 3600 * %M 60 * + %S + 1000 * %N 1000000 / + p' | dc
}
startTime=$(getCurrentTimeInMili)
sleep 5
endTime=$(getCurrentTimeInMili)
If you don't need this much accuracy, you can simply use the time builtin, as in:
time sleep 5

Related

Dynamically get the total up/down time from a log file

Let's say I have the following log file that continuously logs a server's down/up time:
status.log
UP - "18:00:00"
..
..
DOWN - "19:00:03"
..
..
DOWN - "22:00:47"
..
..
UP - "23:59:48"
UP - "23:59:49"
UP - "23:59:50"
DOWN - "23:59:51"
DOWN - "23:59:52"
UP - "23:59:53"
UP - "23:59:54"
UP - "23:59:56"
UP - "23:59:57"
UP - "23:59:59"
each day is logged in a separate folder under the same filename.
not my actual code, but this is much simpler and transparent approach:
#!/bin/ruby
downtime_log = File.readlines("path/to/log/file").select { |line| line =~ /DOWN/ }
puts "#{downtime_log.count} Downtimes for today"
logic-wise, how can I get the total downtime per file/day in minutes and seconds but not as a total count.
I assume that your file contains exactly one line per second. Then the number of seconds your service was down can be evaluated like you already did in your approach:
number_of_seconds_downtime = File.readlines('path/to/log/file')
.select { |line| line =~ /DOWN/ }
.count
Or simplified:
number_of_seconds_downtime = File.readlines('path/to/log/file')
.count { |line| line =~ /DOWN/ }
To translate this into minutes and seconds just divmod
minutes, seconds = number_of_seconds_downtime.divmod(60)
and output the result like this:
puts "#{minutes}:#{seconds} downtime"

why doesn't this ruby code work

here's a practice question - Write a method that will take in a number of minutes, and returns a string that formats the number into hours:minutes.
def time_conversion(minutes)
hours = minutes / 60
mins = minutes % 60
time = hours + ":" + mins
return time
end
the following are tests to see if this works. if they return true then it means my code works correctly.
puts('time_conversion(15) == "0:15": ' + (time_conversion(15) == '0:15').to_s)
puts('time_conversion(150) == "2:30": ' + (time_conversion(150) == '2:30').to_s)
puts('time_conversion(360) == "6:00": ' + (time_conversion(360) == '6:00').to_s)
sometimes i get true for the first two tests but the third test line shows false even though the code will print out exactly the required.
other times I get the following error:
String can't be coerced into Fixnum (repl):4:in +' (repl):4:intime_conversion' (repl):1:in `initialize'
please assist.
The error mainly refers to this line
time = hours + ":" + mins
hours & mins are Fixnum, whereas ":" is String
As the error message indicates, "String can't be coerced into Fixnum".
You could either do time = hours.to_s + ":" + minutes.to_s or time = "#{hours}:#{minutes}".
Because Fixnum#+ takes a Numeral argument, not a String.

AWK performance while processing big files

I have an awk script that I use for calculate how much time some transactions takes to complete. The script gets the unique ID of each transaction and stores the minimum and maximum timestamp of each one. Then it calculates the difference and at the end it shows those results that are over 60 seconds.
It works very well when used with some thousand (200k) but it takes more time when used in real world. I tested it several times and it takes about 15 minutes to process about 28 million of lines. Can I consider this good performance or it is possible to improve it?
I'm open to any kind of suggestion.
Here you have the complete code
zgrep -E "\(([a-z0-9]){15,}:" /path/to/very/big/log | awk '{
gsub("[()]|:.*","",$4); #just removing ugly chars
++cont
min=$4"min" #name for maximun value of current transaction
max=$4"max" #same as previous, just for readability
split($2,secs,/[:,]/) #split hours,minutes and seconds
seconds = 3600*secs[1] + 60*secs[2] + secs[3] #turn everything into seconds
if(arr[min] > seconds || arr[min] == 0)
arr[min]=seconds
if(arr[max] < seconds)
arr[max]=seconds
dif=arr[max] - arr[min]
if(dif > 60)
result[$4] = dif
}
END{
for(x in result)
print x" - "result[x]
print ":Processed "cont" lines"
}'
You don't need to calculate the dif every time you read a record. Just do it once in the END section.
You don't need that cont variable, just use NR.
You dont need to populate min and max separately string concatenation is slow in awk.
You shouldn't change $4 as that will force the record to be recompiled.
Try this:
awk '{
name = $4
gsub(/[()]|:.*/,"",name); #just removing ugly chars
split($2,secs,/[:,]/) #split hours,minutes and seconds
seconds = 3600*secs[1] + 60*secs[2] + secs[3] #turn everything into seconds
if (NR==1) {
min[name] = max[name] = seconds
}
else {
if (min[name] > seconds) {
min[name] = seconds
}
if (max[name] < seconds) {
max[name] = seconds
}
}
}
END {
for (name in min) {
diff = max[name] - min[name]
if (diff > 60) {
print name, "-", diff
}
}
print ":Processed", NR, "lines"
}'
After making some test, and with the suggestions gave by Ed Morton (both for code improvement and performance test) I found that the bottleneck was the zgrep command. Here is an example that does several things:
Check if we have a transaction line (first if)
Cleans the transaction id
checks if this has been already registered (second if) by checking if it is in the array
If is not registered then checks if it is the appropriate type of transaction and if so it registers the timestamp in second
If is already registered saves the new time-stamp as the maximun
After all it makes the necessary operations to calculate the time difference
Thank you very much to all that helped me.
zcat /veryBigLog.gz | awk '
{if($4 ~ /^\([:alnum:]/ ){
name=$4;gsub(/[()]|:.*/,"",name);
if(!(name in min)){
if($0 ~ /TypeOFTransaction/ ){
split($2,secs,/[:,]/)
seconds = 3600*secs[1] + 60*secs[2] + secs[3]
max[name] = min[name]=seconds
print lengt(min) "new "name " start at "seconds
}
}else{
split($2,secs,/[:,]/)
seconds = 3600*secs[1] + 60*secs[2] + secs[3]
if( max[name] < seconds) max[name]=seconds
print name " new max " max[name]
}
}}END{
for(x in min){
dif=max[x]- min[x]
print max[x]" max - min "min[x]" : "dif
}
print "Processed "NR" Records"
print "Found "length(min)" MOs" }'

PID control - value of process parameter based on PID result

I'm trying to implement a PID controller following http://en.wikipedia.org/wiki/PID_controller
The mechanism I try to control works as follows:
1. I have an input variable which I can control. Typical values would be 0.5...10.
2. I have an output value which I measure daily. My goal for the output is roughly at the same range.
The two variables have strong correlation - when the process parameter goes up, the output generally goes up, but there's quite a bit of noise.
I'm following the implementation here:
http://code.activestate.com/recipes/577231-discrete-pid-controller/
Now the PID seems like it is correlated with the error term, not the measured level of output. So my guess is that I am not supposed to use it as-is for the process variable, but rather as some correction to the current value? How is that supposed to work exactly?
For example, if we take Kp=1, Ki=Kd=0, The process (input) variable is 4, the current output level is 3 and my target is a value of 2, I get the following:
error = 2-3 = -1
PID = -1
Then I should set the process variable to -1? or 4-1=3?
You need to think in terms of the PID controller correcting a manipulated variable (MV) for errors, and that you need to use an I term to get to an on-target steady-state result. The I term is how the PID retains and applies memory of the prior behavior of the system.
If you are thinking in terms of the output of the controller being changes in the MV, it is more of a 'velocity form' PID, and the memory of prior errors and behavior is integrated and accumulated in the prior MV setting.
From your example, it seems like a manipulated value of -1 is not feasible and that you would like the controller to suggest a value like 3 to get a process output (PV) of 2. For a PID controller to make use of "The process (input) variable is 4,..." (MV in my terms) Ki must be non-zero, and if the system was at steady-state, whatever was accumulated in the integral (sum_e=sum(e)) would precisely equal 4/Ki, so:
Kp= Ki = 1 ; Kd =0
error = SV - PV = 2 - 3 = -1
sum_e = sum_e + error = 4/Ki -1
MV = PID = -1(Kp) + (4/Ki -1)Ki = -1Kp + 4 - 1*Ki = -1 +4 -1 = 2
If you used a slower Ki than 1, it would smooth out the noise more and not adjust the MV so quickly:
Ki = 0.1 ;
MV = PID = -1(Kp) + (4/Ki -1)Ki = -1Kp + 4 - 1*Ki = -1 +4 -0.1 = 2.9
At steady state at target (PV = SV), sum_e * Ki should produce the steady-state MV:
PV = SV
error = SV - PV = 0
Kp * error = 0
MV = 3 = PID = 0 * Kp + Ki * sum_e
A nice way to understand the PID controller is to put units on everything and think of Kp, Ki, Kd as conversions of the process error, accumulated error*timeUnit, and rate-of-change of error/timeUnit into terms of the manipulated variable, and that the controlled system converts the controller's manipulated variable into units of output.

How to analyze time tracking reports with awk?

I'm tracking my time with two great tools, todotxt and punch. With these one
can generate reports that look like this:
2012-11-23 (2 hours 56 minutes):
first task (52 minutes)
second task (2 hours 4 minutes)
2012-11-24 (2 hours 8 minutes):
second task (2 hours 8 minutes)
My question: what's a convenient way for analyzing this kind of output? E.g.
how could I sum up the time that is spent doing "first task"/"second task" or find out
my total working hours for a longer period such as "2012-11-*"?
So, I'd like to have a command such as punch.sh report /regex-for-date-or-task-i'm-interested-in/.
I read and saw that this is possible with awk. But I don't know how to 1) sum minutes + hours and 2) provide "task names with spaces" as variables for awk.
UPDATE:
I'm also tagging my tasks with +tags to mark different projects (as in first task +projecttag). So it would also be great to sum the time spent on all tasks with a certain tag.
Thanks for any help!
Before running this script. Please uncomment the appropriate gsub() function. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
FS="[( \t)]"
}
/^\t/ {
line = $0
#### If your input has...
## No tags, show hrs in each task
# gsub(/^[\t ]*| *\(.*/,"",line)
## Tags, show hrs in each task
# gsub(/^[\t ]*| *\+.*/,"",line)
## Tags, show hrs in each tag
# gsub(/^[^+]*\+| *\(.*/,"",line)
####
for(i=1;i<=NF;i++) {
if ($i == "hours") h[line]+=$(i-1)
if ($i == "minutes") m[line]+=$(i-1)
}
}
END {
for (i in m) {
while (m[i] >= 60) { m[i]-=60; h[i]++ }
print i ":", (h[i] ? h[i] : "0") " hrs", m[i], "mins"
}
}

Resources