Which timer do use? - performance

In JMeter, we have multiple timers - for instance Uniform Standard Timer, Gaussian Random timer etc.
While researching my query - I found various books and blog which tell me
How to add the timers in JMeter
What is the internal formula/logic between the timers.
I am somehow confused, which one to use? and when?
For instance, if I am trying to wait for the user to log in, which timer is more appropriate? Uniform? or Gaussian?

As per A Comprehensive Guide to Using JMeter Timers
Uniform Random Timer
The Uniform Random Timer pauses the thread by a factor of:
The next pseudorandom uniformly-distributed value in range between 0.0 (inclusive) and 1.0 (exclusive)
Multiplied by “Random Delay Maximum”
Plus “Constant Delay Offset”
Gaussian Random Timer
A Gaussian Random Timer calculates the thread delay time using an approach like a Uniform Random Timer does, but instead of a uniformly-distributed pseudorandom value in 0.0 - 0.9 range, the normal (a.k.a. Gaussian) distribution is being used as the first argument for the formula.
There are several algorithms for generating normally distributed values, in JMeter Marsaglia polar method is used which takes the next 2 random values U and V in -1 to 1 range until the S = U2 + V2 > 1 condition is met. Once S is defined it is used in the formula
to return the next pseudorandom Gaussian (“normally”) distributed value. The first time the method is called it returns X, the second time it will return Y, the third time it starts over and will return the new X, etc.
With regards to "which timer to use and when" - there is no "good" answer which fits all the cases, the timers you've mentioned are used to simulate think time as real users don't hammer the application non-stop, they need some time to "think" between operations. So the decision is up to you:
on one hand real users perform different delays between operations so it would be good to randomize this delay a little bit if you want the test to be as much realistic as possible
on the other hand load test needs to be repeatable so it makes sense to go for Constant Timers to avoid any random factor impacting the test results
maybe a good idea would be using Uniform or Gaussian random timer initially in order to mimic real users and then for regression testing purposes switch to the Constant Timer for results repeatability purposes

Related

Improving scaling of sample from discrete distribution

I have recently started playing around with Julia and I am currently working on a Monte Carlo simulation of some stochastic process on a 2-dimensional lattice. Each site has some associated rate of activation (the number of times it "does something" per second on average) which we can assume to be approximately constant. The order in which lattice sites activate is relevant so we need a method for randomly picking one particular site with probability proportional to its rate of activation.
It looks like sample(sites,weights(rates)) from the StatsBase package is exactly what I am looking for BUT from testing out my code structure (no logic, just loops and RNG), it turns out that sample() scales linearly with the number of sites. This means that overall my runtimes scale like N^(2+2), where N in the side length of my 2-dimensional lattice (one factor of 2 from the increase in total rate of activity, the other from the scaling of sample()).
Now, the increase in total rate of activity is unavoidable but I think the scaling of the "random pick with weights" method can be improved. More specifically, one should be able to achieve a logarithmic scaling with the number of sites (rather than linear). Consider for example the following function (and, please, forgive the poor coding)
function randompick(indices,rates)
cumrates = [sum(rates[1:i]) for i in indices]
pick = rand()*cumrates[end]
tick = 0
lowb = 0
highb = indis[end]
while tick == 0
mid = floor(Int,(highb+lowb)/2)
midrate = cumrates[mid]
if pick > midrate
lowb = mid
else
highb = mid
end
if highb-lowb == 1
tick = 1
end
end
return(highb)
end
Because we half the number of "pickable" sites at each step, it would take n steps to pick one specific site out of 2^n (hence the logarithmic scaling). However, in its current state randompick() is so much slower than sample() that scaling is practically irrelevant. Is there any way of reducing this method to a form that can compete with sample() and hence take advantage of the improved scaling?
EDIT: calculating cumrates scales like N^2 as well but that can be solved by working with rates in the correct (cumulative) form throughout the code.
A simpler version of what I think you were trying for is:
function randompick(rates)
cumrates = cumsum(rates)
pick = rand()*cumrates[end]
searchsortedfirst(cumrates, pick)
end
The call to searchsortedfirst does scale logarithmically, but cumsum only scales linearly, thus eliminating any advantage this might have.
If the rates are constant, you could preprocess cumrates ahead of time, but if this was the case you would be better off using an alias table which can sample in constant time. There is an implementation available in the Distributions.jl package:
using Distributions
s = Distributions.AliasTable(rates)
rand(s)
I found out about an alternative sampling method in this paper by P. Hanusse that does not seem to scale with N, at least when the allowed activity rates are of the same order of magnitude.
The idea is to assume that all sites have the same rate of activity, equal to the rate of activity of the most active site maxrate (so that the random pick is reduced to a single RNG call rand(1:N)). Once we have picked a site, we separate its (constant) rate of activity into two contributions, the original rate of activity and a "do-nothing" rate (the second being the constant rate minus its original rate). Now we generate a second random number c = rand() * maxrate. If c<rate[site], we keep that site choice and proceed to activate the site, otherwise we go back to the uniform random pick.
The function containing the two RNG calls would look like this, with the second returned value determining whether the call has to be repeated.
function HanussePick(rates,maxrate)
site = rand(1:N^2)
slider = rand() * maxrate
return(site,rates[site]-slider)
end
The advantage of this approach is that, if the allowed rates of activity are comparable to each other, there should be no scaling with N, as we only need to generate O(1) random numbers.

How to detect the precise sampling interval from samples stored in a database?

A hardware sensor is sampled precisely (precise period of sampling) using a real-time unit. However, the time value is not sent to the database together with the sampled value. Instead, time of insertion of the record to the database is stored for the sample in the database. The DATETIME type is used, and the GETDATE() function is used to get current time (Microsoft SQL Server).
How can I reconstruct the precise sampling times?
As the sampling interval is (should be) 60 seconds exactly, there was no need earlier for more precise solution. (This is an old solution, third party, with a lot of historical samples. This way it is not possible to fix the design.)
For processing of the samples, I need to reconstruct the correct time instances for the samples. There is no problem with shifting the time of the whole sequence (that is, it does not matter whether the start time is rather off, not absolute). On the other hand, the sampling interval should be detected as precisely as possible. I also cannot be sure, that the sampling interval was exactly 60 seconds (as mentioned above). I also cannot be sure, that the sampling interval was really constant (say, slight differences based on temperature of the device).
When processing the samples, I want to get:
start time
the sampling interval
the sequence o the sample values
When reconstructing the samples, I need to convert it back to tuples:
time of the sample
value of the sample
Because of that, for the sequence with n samples, the time of the last sample should be equal to start_time + sampling_interval * (n - 1), and it should be reasonably close to the original end time stored in the database.
Think in terms of the stored sample times slightly oscillate with respect to the real sample-times (the constant delay between the sampling and the insertion into the database is not a problem here).
I was thinking about calculating the mean value and the corrected standard deviation for the interval calculated from the previous and current sample times.
Discontinuity detection: If the calculated interval is greater than 3 sigma off the mean value, I would consider it a discontinuity of the sampled curve (say, the machine is switched off, or any outer event lead to missing samples. In the case, I want to start with processing a new sequence. (The sampling frequency could also be changed.)
Is there any well known approach to the problem. If yes, can you point me to the article(s)? Or can you give me the name or acronym of the algorithm?
+1 to looking at the difference sequence. We can model the difference sequence as the sum of a low frequency truth (the true rate of the samples, slowly varying over time) and high frequency noise (the random delay to get the sample into the database). You want a low-pass filter to remove the latter.

JMeter: Gaussian random timer vs Poisson random timer

I am trying to figure out which timer to use for my loadtests in order to simulate a gradual growth in traffic towards the website.
I had a look ad the Gaussian Random Timer:
To delay every user request for random amount of time use Gaussian
Random Timer with most of the time intervals happening near a specific
value.
and the Poisson random timer:
To pause each and every thread request for random amount of time use
Poisson Random Timer with most of the time intervals occurring close a
specific value.
taken from this source.
Now I don't really understand what's the difference between the two. They both apply a random delay that is more likely to be close to a specific value. So what am I missing? How to they differ in practice?
The difference is in the algorithm used to generate random values:
Poisson is based on this:
http://en.wikipedia.org/wiki/Poisson_distribution
http://www.johndcook.com/blog/2010/06/14/generating-poisson-random-values/
Gaussian uses :
java.util.Random#nextGaussian()
Both add to Constant Delay Offset the value of a random number generated based on either Poisson or Gaussian.
The difference is in underlying algorythm, check the following links for details
Normal (Gaussian) distribution
Poisson Distribution
I would also recommend reading A Comprehensive Guide to Using JMeter Timers article for exhaustive information on JMeter timers.

Generate on/off signals of random duration SIMULINK

For my SIMULINK model I need to generate a signal that takes the values 1 or 0. To generate it I need to draw a number from a an exponential distribution and use this number as the time the signal stays in 0. Once this time has passed, I have to draw a new number from the exponential distribution and use this number as the time the signal stays in 1, and the repeat the process until the end of the simulation. As a SIMULINK newbie I'm quite puzzled by this problem and would appreciate any suggestions on how to solve it.
You've got a couple of choices.
In MATLAB, you can generate all samples in advance (i.e. before running the simulation) and use them to create a suitable signal, then use that as an input into the model (using the From Workspace block).
Or, if you need to do the sampling at each time step, then you have to write an S-Function, using the random number in the mdlGetTimeOfNextVarHit method. There is an example of doing something very similar on the Goddard Consulting web site called Square Wave with Jitter.

About random number sequence generation

I am new to randomized algorithms, and learning it myself by reading books. I am reading a book Data structures and Algorithm Analysis by Mark Allen Wessis
.
Suppose we only need to flip a coin; thus, we must generate a 0 or 1
randomly. One way to do this is to examine the system clock. The clock
might record time as an integer that counts the number of seconds
since January 1, 1970 (atleast on Unix System). We could then use the
lowest bit. The problem is that this does not work well if a sequence
of random numbers is needed. One second is a long time, and the clock
might not change at all while the program is running. Even if the time
were recorded in units of microseconds, if the program were running by
itself the sequence of numbers that would be generated would be far
from random, since the time between calls to the generator would be
essentially identical on every program invocation. We see, then, that
what is really needed is a sequence of random numbers. These numbers
should appear independent. If a coin is flipped and heads appears,
the next coin flip should still be equally likely to come up heads or
tails.
Following are question on above text snippet.
In above text snippet " for count number of seconds we could use lowest bit", author is mentioning that this does not work as one second is a long time,
and clock might not change at all", my question is that why one second is long time and clock will change every second, and in what context author is mentioning
that clock does not change? Request to help to understand with simple example.
How author is mentioning that even for microseconds we don't get sequence of random numbers?
Thanks!
Programs using random (or in this case pseudo-random) numbers usually need plenty of them in a short time. That's one reason why simply using the clock doesn't really work, because The system clock doesn't update as fast as your code is requesting new numbers, therefore qui're quite likely to get the same results over and over again until the clock changes. It's probably more noticeable on Unix systems where the usual method of getting the time only gives you second accuracy. And not even microseconds really help as computers are way faster than that by now.
The second problem you want to avoid is linear dependency of pseudo-random values. Imagine you want to place a number of dots in a square, randomly. You'll pick an x and a y coordinate. If your pseudo-random values are a simple linear sequence (like what you'd obtain naïvely from a clock) you'd get a diagonal line with many points clumped together in the same place. That doesn't really work.
One of the simplest types of pseudo-random number generators, the Linear Congruental Generator has a similar problem, even though it's not so readily apparent at first sight. Due to the very simple formula
you'll still get quite predictable results, albeit only if you pick points in 3D space, as all numbers lies on a number of distinct planes (a problem all pseudo-random generators exhibit at a certain dimension):
Computers are fast. I'm over simplifying, but if your clock speed is measured in GHz, it can do billions of operations in 1 second. Relatively speaking, 1 second is an eternity, so it is possible it does not change.
If your program is doing regular operation, it is not guaranteed to sample the clock at a random time. Therefore, you don't get a random number.
Don't forget that for a computer, a single second can be 'an eternity'. Programs / algorithms are often executed in a matter of milliseconds. (1000ths of a second. )
The following pseudocode:
for(int i = 0; i < 1000; i++)
n = rand(0, 1000)
fills n a thousand times with a random number between 0 and 1000. On a typical machine, this script executes almost immediatly.
While you typically only initialize the seed at the beginning:
The following pseudocode:
srand(time());
for(int i = 0; i < 1000; i++)
n = rand(0, 1000)
initializes the seed once and then executes the code, generating a seemingly random set of numbers. The problem arises then, when you execute the code multiple times. Lets say the code executes in 3 milliseconds. Then the code executes again in 3 millisecnds, but both in the same second. The result is then a same set of numbers.
For the second point: The author probabaly assumes a FAST computer. THe above problem still holds...
He means by that is you are not able to control how fast your computer or any other computer runs your code. So if you suggest 1 second for execution thats far from anything. If you try to run code by yourself you will see that this is executed in milliseconds so even that is not enough to ensure you got random numbers !

Resources