How can 'behave' be used to test time constraints? - time

I couldn't work out if Stack Overflow is an appropriate site to ask this and if not then what the appropriate site would be! There are so many stack exchange sites now and I didn't want to go through 200 sites :S
When I try and test whether my functions run within X seconds using behave (ie gherkin feature files and behave test steps), the code takes longer to run with behave testing than it would on its own. Especially at the beginning of the test but also in other parts.
Has anybody tested time constraints with behave before and know a workaround to adjust for the extra time that behave adds?
Is this even possible?
EDIT: To show how I'm timing the tests:
#when("the Python script provides images to the model")
def step_impl(context):
context.x_second_requirement = 5
# TODO: Investigate why this takes so long, when I'm not using behave I can use a 0.8 second timing constraint
context.start_time = time.time()
context.car_brain.tick()
context.end_time = time.time()
#then("the model must not take more than X seconds to produce output")
def step_impl(context):
assert context.end_time - context.start_time < context.x_second_requirement
Cheers,
Milan

Related

Find the Run Time of Select Ruby Code

Problem
Howdy guys, so I want to find the run time of a block of code in Ruby, but I am not entirely sure as to how I could do it. I want to run some code, and then output how long it took to run that code because I have a super huge program and the run time changes a lot. I want to make sure it always has a consistent run time (I could do it by sleeping it for a fraction of a second) but that isn't my problem. I want to find out how long the run time actually is so the program can know if it needs to slow things down or speed things up.
My Thoughts
So, I have an idea as to how it could work. I have never used Time in ruby but I have an idea as to how I could use that. I could have a variable equal to the time (in milliseconds) and then another variable that I make at the end of the code block that does it again, and then I just subtract them, but I have (1) never used Time and (2) I don't actually know if that is the best way.
Thanks in advance!
Ruby has the Benchmark module for timing how long things take. I've never used this outside of seeing if a method is taking too long to run, etc. in development, not sure if this is 'recommended' for production code or for keeping things above a minimum runtime (as it sounds like you might be doing), but take a look and see how it feels for your use case.
It also sounds like you might be interested in the Timeout module as well (for making sure things don't take longer than a set amount of time).
If you really have a use case for making sure something takes a minimum amount of time, timing the code (either using a Benchmark method or just Time or another solution) and then sleep the difference is the only thing that comes to mind.
It is simple. Look at your watch (Time.now) and remember the time, run the code, look at your watch again, subtract.
t0 = Time.now
# your block of code
puts Time.now - t0
[http://ruby-doc.org/core-1.9.3/Time.html
You want to to use the Time object. (Time Docs)
For example,
start = Time.now
# code to time
finish = Time.now
diff = finish - start
diff would be in seconds, as a floating point number.
EDIT: end is reserved.
or you can use
require 'benchmark'
def foo
time = Benchmark.measure {
code to test
}
puts time.real #or save it to logs
end
Sample output:
2.2.3 :001 > foo
5.230000 0.020000 5.250000 ( 5.274806)
Values are CPU time, system time, total and real elapsed time.
[http://ruby-doc.org/stdlib-2.0.0/libdoc/benchmark/rdoc/Benchmark.html#method-c-bm
Source: Ruby docs.

Profiling a multi-tiered, distributed, web application (server side)

I would like to profile a complex web application from the server PoV.
According to the wikipedia link above, and the Stack Overflow profiling tag description, profiling (in one of its forms) means getting a list (or a graphical representation) of APIs/components of the application, each with the number of calls and time spent in it during run-time.
Note that unlike a traditional one-program/one-language a web server application may be:
Distributed over multiple machines
Different components may be written in different languages
Different components may be running on top of different OSes, etc.
So the traditional "Just use a profiler" answer is not easily applicable to this problem.
I'm not looking for:
Coarse performance stats like the ones provided by various log-analysis tools (e.g. analog) nor for
client-side, per-page performance stats like the ones presented by tools like Google's Pagespeed, or Yahoo! Y!Slow, waterfall diagrams, and browser component load times)
Instead, I'm looking for a classic profiler-style report:
number of calls
call durations
by function/API/component-name, on the server-side of the web application.
Bottom line, the question is:
How can one profile a multi-tiered, multi-platform, distributed web application?
A free-software based solution is much preferred.
I have been searching the web for a solution for a while and couldn't find anything satisfactory to fit my needs except some pretty expensive commercial offerings. In the end, I bit the bullet, thought about the problem, and wrote my own solution which I wanted to freely share.
I'm posting my own solution since this practice is encouraged on SO
This solution is far from perfect, for example, it is at very high level (individual URLs) which may not good for all use-cases. Nevertheless, it has helped me immensely in trying to understand where my web-app spends its time.
In the spirit on open source and knowledge sharing, I welcome any other, especially superior, approaches and solutions from others.
Thinking of how traditional profilers work, it should be straight-forward to come up with a general free-software solution to this challenge.
Let's break the problem into two parts:
Collecting the data
Presenting the data
Collecting the data
Assume we can break our web application into its individual
constituent parts (API, functions) and measure the time it takes
each of these parts to complete. Each part is called thousands of
times a day, so we could collect this data over a full day or so on
multiple hosts. When the day is over we would have a pretty big and
relevant data-set.
Epiphany #1: substitute 'function' with 'URL', and our existing
web-logs are "it". The data we need is already there:
Each part of a web API is defined by the request URL (possibly
with some parameters)
The round-trip times (often in microseconds) appear on each line
We have a day, (week, month) worth of lines with this data handy
So if we have access to standard web-logs for all the distributed
parts of our web application, part one of our problem (collecting
the data) is solved.
Presenting the data
Now we have a big data-set, but still no real insight.
How can we gain insight?
Epiphany #2: visualize our (multiple) web-server logs directly.
A picture is worth a 1000 words. Which picture can we use?
We need to condense 100s of thousands or millions lines of multiple
web-server logs into a short summary which would tell most of the
story about our performance. In other words: the goal is to generate
a profiler-like report, or even better: a graphical profiler report,
directly from our web logs.
Imagine we could map:
Call-latencies to one dimension
Number of calls to another dimension, and
The function identities to a color (essentially a 3rd dimension)
One such picture: a stacked-density chart of latencies by API
appears below (functions names were made-up for illustrative purposes).
The Chart:
Some observations from this example
We have a tri-modal distribution representing 3 radically
different 'worlds' in our application:
The fastest responses, are centered around ~300 microseconds
of latency. These responses are coming from our varnish cache
The second fastest, taking a bit less than 0.01 seconds on
average, are coming from various APIs served by our middle-layer
web application (Apache/Tomcat)
The slowest responses, centered around 0.1 seconds and
sometimes taking several seconds to respond to, involve round-trips
to our SQL database.
We can see how dramatic caching effects can be on an application
(note that the x-axis is on a log10 scale)
We can specifically see which APIs tend to be fast vs slow, so
we know what to focus on.
We can see which APIs are most often called each day.
We can also see that some of them are so rarely called, it is hard to even see their color on the chart.
How to do it?
The first step is to pre-process and extract the subset needed-data
from the logs. A trivial utility like Unix 'cut' on multiple logs
may be sufficient here. You may also need to collapse multiple
similar URLs into shorter strings describing the function/API like
'registration', or 'purchase'. If you have a multi-host unified log
view generated by a load-balancer, this task may be easier. We
extract only the names of the APIs (URLs) and their latencies, so we
end up with one big file with a pair of columns, separated by TABs
*API_Name Latency_in_microSecs*
func_01 32734
func_01 32851
func_06 598452
...
func_11 232734
Now we run the R script below on the resulting data pairs to produce
the wanted chart (using Hadley Wickham's wonderful ggplot2 library).
Voilla!
The code to generate the chart
Finally, here's the code to produce the chart from the API+Latency TSV data file:
#!/usr/bin/Rscript --vanilla
#
# Generate stacked chart of API latencies by API from a TSV data-set
#
# ariel faigon - Dec 2012
#
.libPaths(c('~/local/lib/R',
'/usr/lib/R/library',
'/usr/lib/R/site-library'
))
suppressPackageStartupMessages(library(ggplot2))
# grid lib needed for 'unit()':
suppressPackageStartupMessages(library(grid))
#
# Constants: width, height, resolution, font-colors and styles
# Adapt to taste
#
wh.ratio = 2
WIDTH = 8
HEIGHT = WIDTH / wh.ratio
DPI = 200
FONTSIZE = 11
MyGray = gray(0.5)
title.theme = element_text(family="FreeSans", face="bold.italic",
size=FONTSIZE)
x.label.theme = element_text(family="FreeSans", face="bold.italic",
size=FONTSIZE-1, vjust=-0.1)
y.label.theme = element_text(family="FreeSans", face="bold.italic",
size=FONTSIZE-1, angle=90, vjust=0.2)
x.axis.theme = element_text(family="FreeSans", face="bold",
size=FONTSIZE-1, colour=MyGray)
y.axis.theme = element_text(family="FreeSans", face="bold",
size=FONTSIZE-1, colour=MyGray)
#
# Function generating well-spaced & well-labeled y-axis (count) breaks
#
yscale_breaks <- function(from.to) {
from <- 0
to <- from.to[2]
# round to 10 ceiling
to <- ceiling(to / 10.0) * 10
# Count major breaks on 10^N boundaries, include the 0
n.maj = 1 + ceiling(log(to) / log(10))
# if major breaks are too few, add minor-breaks half-way between them
n.breaks <- ifelse(n.maj < 5, max(5, n.maj*2+1), n.maj)
breaks <- as.integer(seq(from, to, length.out=n.breaks))
breaks
}
#
# -- main
#
# -- process the command line args: [tsv_file [png_file]]
# (use defaults if they aren't provided)
#
argv <- commandArgs(trailingOnly = TRUE)
if (is.null(argv) || (length(argv) < 1)) {
argv <- c(Sys.glob('*api-lat.tsv')[1])
}
tsvfile <- argv[1]
stopifnot(! is.na(tsvfile))
pngfile <- ifelse(is.na(argv[2]), paste(tsvfile, '.png', sep=''), argv[2])
# -- Read the data from the TSV file into an internal data.frame d
d <- read.csv(tsvfile, sep='\t', head=F)
# -- Give each data column a human readable name
names(d) <- c('API', 'Latency')
#
# -- Convert microseconds Latency (our weblog resolution) to seconds
#
d <- transform(d, Latency=Latency/1e6)
#
# -- Trim the latency axis:
# Drop the few 0.001% extreme-slowest outliers on the right
# to prevent them from pushing the bulk of the data to the left
Max.Lat <- quantile(d$Latency, probs=0.99999)
d <- subset(d, Latency < Max.Lat)
#
# -- API factor pruning
# Drop rows where the APIs is less than small % of total calls
#
Rare.APIs.pct <- 0.001
if (Rare.APIs.pct > 0.0) {
d.N <- nrow(d)
API.counts <- table(d$API)
d <- transform(d, CallPct=100.0*API.counts[d$API]/d.N)
d <- d[d$CallPct > Rare.APIs.pct, ]
d.N.new <- nrow(d)
}
#
# -- Adjust legend item-height &font-size
# to the number of distinct APIs we have
#
API.count <- nlevels(as.factor(d$API))
Legend.LineSize <- ifelse(API.count < 20, 1.0, 20.0/API.count)
Legend.FontSize <- max(6, as.integer(Legend.LineSize * (FONTSIZE - 1)))
legend.theme = element_text(family="FreeSans", face="bold.italic",
size=Legend.FontSize,
colour=gray(0.3))
# -- set latency (X-axis) breaks and labels (s.b made more generic)
lat.breaks <- c(0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10)
lat.labels <- sprintf("%g", lat.breaks)
#
# -- Generate the chart using ggplot
#
p <- ggplot(data=d, aes(x=Latency, y=..count../1000.0, group=API, fill=API)) +
geom_bar(binwidth=0.01) +
scale_x_log10(breaks=lat.breaks, labels=lat.labels) +
scale_y_continuous(breaks=yscale_breaks) +
ggtitle('APIs Calls & Latency Distribution') +
xlab('Latency in seconds - log(10) scale') +
ylab('Call count (in 1000s)') +
theme(
plot.title=title.theme,
axis.title.y=y.label.theme,
axis.title.x=x.label.theme,
axis.text.x=x.axis.theme,
axis.text.y=y.axis.theme,
legend.text=legend.theme,
legend.key.height=unit(Legend.LineSize, "line")
)
#
# -- Save the plot into the png file
#
ggsave(p, file=pngfile, width=WIDTH, height=HEIGHT, dpi=DPI)
Your discussion of "back in the day" profiling practice is true.
There's just one little problem it always had:
In non-toy software, it may find something, but it doesn't find much, for a bunch of reasons.
The thing about opportunities for higher performance is, if you don't find them, the software doesn't break, so you just can pretend they don't exist.
That is, until a different method is tried, and they are found.
In statistics, this is called a type 2 error - a false negative.
An opportunity is there, but you didn't find it.
What it means is if somebody does know how to find it, they're going to win, big time.
Here's probably more than you ever wanted to know about that.
So if you're looking at the same kind of stuff in a web app - invocation counts, time measurements, you're not liable to do better than the same kind of non-results.
I'm not into web apps, but I did a fair amount of performance tuning in a protocol-based factory automation app many years ago.
I used a logging technique.
I won't say it was easy, but it did work.
The people I see doing something similar is here, where they use what they call a waterfall chart.
The basic idea is rather than casting a wide net and getting a lot of measurements, you trace through a single logical thread of transactions, analyzing where delays are occurring that don't have to.
So if results are what you're after, I'd look down that line of thinking.

Is it faster to use a persistent variable than to reallocate memory each function call by e.g. zeros() if a function is called often?

Assume I have a function which is called often, say by an ODE-solver or similar. Is it faster to use a persistent variable than to reallocate it each time?
That is, which function would be faster and what is best practice?
function ret=thisfunction(a,b,c)
A = zeros(3)
foo = 3;
bar = 34;
% ...
% process some in A
% ...
ret = A\c;
end
or
function ret=thatfunction(a,b,c)
persistent A foo bar
if isempty(A);
A=zeros(3);
foo = 3;
bar = 34;
end
% ...
% process some in A
% ...
ret = A\c;
end
Which one is faster can only be proven by test, as it may depend on variable size etc. However, I would say that if it is not required, it is usually also not recommended to use persistent variables.
Therefore I would definately recommend you to use option number one.
Sidenote: You probably want to check whether it exists rather than whether it is empty. Furthermore I don't know what happens to your A when you leave the function scope, if you want to define it as persistent or global you may have to do it one level higher.
When you have a single function such as this to test I have found that it's very easy to setup a parent function, run the function you are testing say, 10 million times and time the results. Then consider the difference in time AND the possible trade off or side effects of using a persistent variable here. It may not be worth it if the difference is a few percent over 10 million calls and you are actually only going to call the function 10 thousand times in application. YMMV.
In regards to best practice, I would dissuade you from using persitent variables in this manner, for two reasons.
Persitent variables can be cleared externally, e.g. running clear('thatfunction') from any other function that has "thatfunction" on the path would reset your persitent variables in "thatfunction". As such, it's possible that they'll be unwittingly reset elsewhere. This may not be a problem for you in this context, but if you want to keep results between function calls (which is the primary point of persitent variables) this can cause you headaches.
Also, if you modify them, you'll have to remember to clear them when you're done running in-order to reset your workspace to a clean state. Otherwise if you (or someone else) runs your program again without clearing your persitent variable(s) first, the results from the previous run. This isn't an issue if they're read-only, but you cannot enforce that they will be.

How can I increase the performance of watir-webdriver automated scripts

The main problem I'm having is pulling data from tables, but any other general tips would be welcome too. The tables I'm dealing with have roughly 25 columns and varying numbers of rows (anywhere from 5-50).
Currently I am grabbing the table and converting it to an array:
require "watir-webdriver"
b = Watir::Browser.new :chrome
b.goto "http://someurl"
# The following operation takes way too long
table = b.table(:index, 1).to_a
# The rest is fast enough
table.each do |row|
# Code for pulling data from about 15 of the columns goes here
# ...
end
b.close
The operation table = b.table(:index, 5).to_a takes over a minute when the table has 20 rows. It seems like it should be very fast to put the cells of a 20 X 25 table into an array. I need to do this for over 80 tables, so it ends up taking 1-2 hours to run. Why is it taking so long and how can I improve the speed?
I have tried iterating over the table rows without first converting to an array as well, but there was no improvement in performance:
b.table(:index, 1).rows.each do |row|
# ...
Same results using Windows 7 and Ubuntu. I've also tried Firefox instead of Chrome without a noticeable difference.
A quick workaround would be to use Nokogiri if you're just reading data from a big page:
require 'nokogiri'
doc = Nokogiri::HTML.parse(b.table(:index, 1).html))
I'd love to see more detail though. If you can provide a code + HTML example that demonstrates the issue, please file it in the issue tracker.
The #1 thing you can do to improve the performance of a script that uses watir is to reduce the number of remote calls into the browser. Each time you locate or operate on a DOM element, that's a call into the browser and can take 5ms or more.
In your case, you can reduce the number of remote calls by doing the work on the browser side via execute_script() and checking the result on the ruby side.
When attempting to improve the speed of your code it's vital to have some means of testing execution times (e.g. ruby benchmark). You might also like to look at ruby-prof to get a detailled breakdown of the time spent in each method.
I would start by trying to establish if it's not the to_a method rather than the table that's causing the delays on that line of code. Watir's internals (or nokogiri as per jarib's answer) may be quicker.

Is there a good way to debug order dependent test failures in RSpec (RSpec2)?

Too often people write tests that don't clean up after themselves when they mess with state. Often this doesn't matter since objects tend to be torn down and recreated for most tests, but there are some unfortunate cases where there's global state on objects that persist for the entire test run, and when you run tests, that depend on and modify that global state, in a certain order, they fail.
These tests and possibly implementations obviously need to be fixed, but it's a pain to try to figure out what's causing the failure when the tests that affect each other may not be the only things in the full test suite. It's especially difficult when it's not initially clear that the failures are order dependent, and may fail intermittently or on one machine but not another. For example:
rspec test1_spec.rb test2_spec.rb # failures in test2
rspec test2_spec.rb test1_spec.rb # no failures
In RSpec 1 there were some options (--reverse, --loadby) for ordering test runs, but those have disappeared in RSpec 2 and were only minimally helpful in debugging these issues anyway.
I'm not sure of the ordering that either RSpec 1 or RSpec 2 use by default, but one custom designed test suite I used in the past randomly ordered the tests on every run so that these failures came to light more quickly. In the test output the seed that was used to determine ordering was printed with the results so that it was easy to reproduce the failures even if you had to do some work to narrow down the individual tests in the suite that were causing them. There were then options that allowed you to start and stop at any given test file in the order, which allowed you to easily do a binary search to find the problem tests.
I have not found any such utilities in RSpec, so I'm asking here: What are some good ways people have found to debug these types of order dependent test failures?
There is now a --bisect flag that will find the minimum set of tests to run to reproduce the failure. Try:
$ rspec --bisect=verbose
It might also be useful to use the --fail-fast flag with it.
I wouldn't say I have a good answer, and I'd love to here some better solutions than mine. That said...
The only real technique I have for debugging these issues is adding a global (via spec_helper) hook for printing some aspect of database state (my usual culprit) before and after each test (conditioned to check if I care or not). A recent example was adding something like this to my spec_helper.rb.
Spec::Runner.configure do |config|
config.before(:each) do
$label_count = Label.count
end
config.after(:each) do
label_diff = Label.count - $label_count
$label_count = Label.count
puts "#{self.class.description} #{description} altered label count by #{label_diff}" if label_diff != 0
end
end
We have a single test in our Continuous Integration setup that globs the spec/ directory of a Rails app and runs each of them against each other.
Takes a lot of time but we found 5 or 6 dependencies that way.
Here is some quick dirty script I wrote to debug order-dependent failure - https://gist.github.com/biomancer/ddf59bd841dbf0c448f7
It consists of 2 parts.
First one is intended to run rspec suit multiple times with different seed and dump results to rspec_[ok|fail]_[seed].txt files in current directory to gather stats.
The second part is iterating through all these files, extracts test group names and analyzes their position to the affected test to make assumptions about dependencies and forms some 'risk' groups - safe, unsafe, etc. The script output explains other details and group meanings.
This script will work correctly only for simple dependencies and only if the affected test is failing for some seeds and passes for another ones, but I think it's still better than nothing.
In my case it was complex dependency when effect could be cancelled by another test but this script helped me to get directions after running its analyze part multiple times on different sets of dumps, specifically only on the failed ones (I just moved 'ok' dumps out of current directory).
Found my own question 4 years later, and now rspec has a --order flag that lets you set random order, and if you get order dependent failures reproduce the order with --seed 123 where the seed is printed out on every spec run.
https://www.relishapp.com/rspec/rspec-core/v/2-13/docs/command-line/order-new-in-rspec-core-2-8
Its most likely some state persisting between tests so make sure your database and any other data stores (include class var's and globals) are reset after every test. The database_cleaner gem might help.
Rspec Search and Destroy
is meant to help with this problem.
https://github.com/shepmaster/rspec-search-and-destroy

Resources