Jmeter aggregate report total throughput - how is calculated - jmeter

I have problem to find out how total value in aggregate report is calculated.
Do anybody know algorithm for this value ?
Basing on Jmeter documentation for single call is calculate as: total execution/ time of execution.
Problem is that total value for throughput isn't number of total executions divided by total time of test. It is calculated in more smart way and I looking for algorithm of this smart way :).

As per The Load Reports guide:
Throughput is measured in requests per second/minute/hour. The time unit is chosen so that the displayed rate is at least 1.0. When the throughput is saved to a CSV file, it is expressed in requests/second, i.e. 30.0 requests/minute is saved as 0.5.
As per JMeter Glossary
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time).
As per Calculator class from JMeter's source
/**
* Throughput in bytes / second
*
* #return throughput in bytes/second
*/
public double getBytesPerSecond() {
if (elapsedTime > 0) {
return bytes / ((double) elapsedTime / 1000); // 1000 = millisecs/sec
}
return 0.0;
}
/**
* Throughput in kilobytes / second
*
* #return Throughput in kilobytes / second
*/
public double getKBPerSecond() {
return getBytesPerSecond() / 1024; // 1024=bytes per kb
}

Related

How does Jmeter calculate throughput for multiple samples?

Multiple samples
I have been able to calculate the throughput with single sample accurately but with multiple samples i am not able to get the value that is in the picture.
According to JMeter Glossary:
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time).
So you need to measure your test duration and then divide the number of the requests by the test duration.
The exact implementation of all the calculated metrics lives in Calculator class, the function which returns the throughput looks like:
private double getRatePerSecond(long value) {
if (elapsedTime > 0) {
return value / ((double) elapsedTime / 1000); // 1000 = millisecs/sec
}
return 0.0;
}
You might also be interested in What is the Relationship Between Users and Hits Per Second? article

Calculate or Update Average without iteration over time

Consider a small monitoring device which displays temperature average every 10 seconds.
timestamp value
20190304000000 62.7
20190304000010 62.5
20190304000020 62.8
....
....
how to calculate and update average without increasing memory footprint. that is, whole data storage (persistent or memory) in not possible
In addition to other answers, you might want to use IIR filter to get Exponential moving average: filter that applies weighting factors which decrease exponentially, so last values have more impact than older ones
newAverage = OldAverage * (1-alpha) + NewValue * alpha
where alpha is small value like 0.1 that has relation to decreasing time/constant
Keep a total sum, and a count of the number of temperatures recorded. Then divide the sum by the count every time you report the answer, to avoid compounding floating point errors.
from itertools import count
temperature_sum = 0
for temperature_count in count(1):
temperature_sum += read_from_sensor()
print("Average: {}".format(temperature_sum / temperature_count))
we will need two variables
int count;
float average;
void main() {
do while(true) {
float temperature = ReadFromSensor(); //not included
average = ((average*count) + temperature)/ ++count;
cout << "average: " << average << endl;
}
}

Calculate execution time of a program based on CPI, instructions, etc

I'm trying to calculate the execution time of an application. Assuming the only stall penalty occurs on memory access instructions (100 cycles being the penalty).
How am I supposed to find out execution time in seconds with this info?
CPI (CPUCycles?) = 1.0
ClockRate = 1GHZ
TotalInstructions = 59880
MemoryAccessInstructions = 8467
CacheMissRate = 62% (0.62) (5290/8467)
CacheHits = 3117
CacheMisses = 5290
CacheMissPenalty = 100 (cycles)
Assuming no other penalties.
totalCycles = TotalInstructions + CacheMisses * CacheMissPenalty ?
I assume that cache hits cost same as other opcodes, so those are included in TotalInstructions.
That's then 588880 cycles, 1GHz is 1000000000 cycles per second.
So that code will take 0.58888ms to execute (5.8888e-7 second).
This value is of course purely theoretical estimate, as modern CPU doesn't work like that (1 instruction = 1 cycle). If you are interested in real world values, just profile it.

Number of passes in external merge

There doesn't seem to be any preexisting questions on this, at least from a title search. I am seeking to find the optimal amount of passes for an external merge. So, if we have 1000 chunks of data, one pass would be a 1000 way merge. Two pass could be 5 groups of 200 chunks, then a final merge of 1 group of 5 chunks. And so on. I've done some math, which must have a flaw, because it looks like two passes never beats one pass. It could very well be a misunderstanding in how data is read, though.
First, a numerical example:
Data: 100 GB
Ram: 1 GB
Since we have 1GB memory, we can load in 1GB at a time to sort using quicksort or mergesort. Now we have 100 chunks to sort. We can do a 100 way merge. This is done by making RAM/(chunks+1) size buckets = 1024MB/101 = 10.14MB. There are 100 10.14MB buckets for each of the 100 chunks, and one output bucket also of size 10.14MB. As we merge, if any input buckets empty, we do a disk seek to refill that bucket. Likewise, when the output bucket gets full, we write to the disk and empty it. I claim that the number of "times the disk needs to read" is (data/ram)*(chunks+1). I get this from the fact that we have ram/(chunks+1) sized input buckets, and we must read in the entire data for a given pass, so we read (data/bucket_size) times. In other words, every time an input bucket empties we must refill it. We do this over 100 chunks here, so numChunks*(chunk_size/bucket_size) = datasize/bucket_size or 100*(1024MB/10.14MB). BucketSize = ram/(chunks+1) so 100*(1024/10.14) = (data/ram) * (chunks+1) = 1024*100MB/1024MB * 101 = 10100 reads.
For a two pass system, we do A groups of B #chunks, then a final merge of 1 group of A #chunks. Using previous logic, we have numReads = A*( (data/ram)*(B+1)) + 1*( (data/ram)*(A+1)). We also have A*B = Data/Ram. For instance, 10 groups of 10 chunks, where each chunk is a GB. Here, A = 10 B = 10. 10*10 = 100/1 = 100, which is Data/Ram. This is because Data/Ram was the original number of chunks. For 2 pass, we want to break Data/Ram into A groups of B #chunks.
I'll try to break down the formula here, let D = data, A = #groups, B = #chunks/group, R = RAM
A*(D/R)*(B+1) + 1*(D/R)*(A+1) - This is A times the number of reads of an external merge on B #chunks plus the final merge on A #chunks.
A = D/(R*B) => D^2/(B*R^2) * (B+1) + D/R * (D/(R*B)+1)
(D^2/R^2)*[1 + 2/B] + D/R is number of reads for a 2 pass external merge. For 1 pass, we have (data/ram)*(chunks+1) where chunks = data/ram for 1 pass. Thus, for one pass we have D^2/R^2 + D/R. We see that a 2 pass only reaches that as the chunk size B goes to infinity, and even then the additional final merge gives us D^2/R^2 + D/R. So there must be something about the reads I'm missing, or my math is flawed. Thanks to anyone who takes the time to help me!
You ignore the fact that the total time it takes to read a block of data from disk is the sum of
The access time which is roughly constant and on the order of several milliseconds for rotating hard disk drives.
The transfer time which depends on the size of the data block and the transfer rate.
As the number of chunks increases, the size of the input buffers (you call them buckets) decreases. The smaller the input buffers get, the more pronounced is the effect of the constant access time on the total time is takes to fill a buffer. At a certain point, the time to fill a buffer will be almost completely dominated by the access time. So the total time for a merge pass begins to scale with the number of buffers and not the amount of data read.
That's where additional merge passes can speed up the process. It allows to use fewer and larger input buffers and mitigates the effect of access time.
Edit: Here's a quick back-of-the-envelope calculation to give an idea about where the break-even point is.
The total transfer time can be calculated easily. All the data has to read and written once per pass:
total_transfer_time = num_passes * 2 * data / transfer_rate
The total access time for buffer reads is:
total_access_time = num_passes * num_buffer_reads * access_time
Since there's only a single output buffer, it can be made larger than the input buffers without wasting too much memory, so I'll ignore the access time for writes. The number of buffer reads is data / buffer_size, buffer size is about ram / num_chunks for the one-pass approach, and the number of chunks is data / ram. So we have:
total_access_time1 = num_chunks^2 * access_time
For the two-pass solution, it makes sense to use sqrt(num_chunks) buffers to minimize access time. So buffer size is ram / sqrt(num_chunks) and we have:
total_access_time2 = 2 * (data / (ram / sqrt(num_chunks))) * acccess_time
= 2 * num_chunks^1.5 * access_time
So if we use transfer_rate = 100 MB/s, access_time = 10 ms, data = 100 GB, ram = 1 GB, the total time is:
total_time1 = (2 * 100 GB / 100 MB/s) + 100^2 * 10 ms
= 2000 s + 100 s = 2100 s
total_time2 = (2 * 2 * 100 GB / 100 MB/s) + 2 * 100^1.5 * 10 ms
= 4000 s + 20 s = 4020 s
The effect of access time is still very small. So let's change data to 1000 GB:
total_time1 = (2 * 1000 GB / 100 MB/s) + 1000^2 * 10 ms
= 20000 s + 10000 s = 30000 s
total_time2 = (2 * 2 * 1000 GB / 100 MB/s) + 2 * 1000^1.5 * 10 ms
= 40000 s + 632 s = 40632 s
Now half the time in the one-pass version is spent with disk seeks. Let's try with 5000 GB:
total_time1 = (2 * 5000 GB / 100 MB/s) + 5000^2 * 10 ms
= 100000 s + 250000 s = 350000 s
total_time2 = (2 * 2 * 5000 GB / 100 MB/s) + 2 * 5000^1.5 * 10 ms
= 200000 s + 7071 s = 207071 s
Now the two-pass version is faster.
To get an optimum you need a more sophisticated model of the disk. Let time to fill a block of size S be rS + k where k is seek time and r is read rate.
If you divide RAM of size M into C+1 buffers of size M/(C+1), then the time to load RAM once is (C+1) (r M/(C+1) + k) = rM + k(C+1). So as you'd expect, making C smaller speeds up read time by eliminating seeks. It's fastest to read all of memory in one sequential block, but merging doesn't allow it. We must make a tradeoff. That's where we need to look for the optimum.
With total data size of c times RAM size, there are c chunks to be merged.
In the one pass scheme, C=c, and the total read time must be just the time to fill RAM c times over: c (rM + k(c+1)) = c(rM + kc + k).
In the two pass scheme with an N-way division of data for the first pass, that pass has C=c/N and in the second pass, C=N. So total cost is
c ( rM + k(c/N+1) ) + c ( rM + k(N+1) ) = c ( 2rM + k(c/N + N) + 2k )
Note this model omits write time. You should fill that in eventually unless you're assuming it's overlapped I/O on a different device and thus can be ignored.
It's not hard to see here that if c and k are suitably large, then the c/N+N term in the 2-pass model can be so small compared to the c in the one-pass that the 2-pass model will be faster.
I'm going to stop now, but you can carry this logic on to (probably) get a closed approximation formula for an arbitrary number of passes. THis will require solving an infinite series. Then you can set the derivative to zero and solve for an estimate of the optimal pass number. If life is good you'll also learn the optimal value of N by setting the gradient of a 2d function in pass number and N to zero. My intuition says N ~ sqrt(c).
If the math gets intractable, you could still simulate a reasonable range of numbers of passes with the kind of simple algebra above at the start and pick an optimum that way.
This is an interesting problem and I'm sorry I don't have more time to work on it at the moment. I hope the analysis framework is enough to let you punch through to a nice result.

Generating random numbers based on an expected value

I am programming in java and I have come across a problem I could use some help with. Basically I need the user to enter how many times they expect a certain event to happen in a certain amount of times. The event takes a certain amount of time to complete as well. With all that said I need to use a random number generator to decide whether or not the event should happen based on the expected value.
Here's an example. Say the event takes 2 seconds to complete. The user says they want 100 seconds total and they expect the event to happen 25 times. Right now this is what I have. Units is the units of time and expectedLanding is how many times they would like the event to take place.
double isLandingProb = units/expectedLanding;
double isLanding = isLandingProb * random.nextDouble();
if(isLanding >= isLandingProb/2){
//do event here
}
This solution isn't working, and I'm having trouble thinking of something that would work.
Try this:
double isLandingProb = someProbability;
double isLanding = random.nextDouble();
if(isLanding <= isLandingProb){
//do event here
}
For example, if your probability is .25 (1 out of 4), and nextDouble returns a random number between 0 and 1, then your nextDouble needs to be less than (or equal to) .25 to achieve a landing.
Given an event that takes x seconds to run, but you want it to run on average once every y seconds, then it needs to execute with probability x/y. Then the expectation of the number of seconds the event is running over y seconds is x = one event.
int totalSeconds;
int totalTimes;
double eventTime;
double secondsPerEvent = 1.0d * totalSeconds / totalTimes;
if( eventTime > secondsPerEvent ) throw new Exception("Impossible to satisfy");
double eventProbability = eventTime / secondsPerEvent;
if( eventProbability < random.nextDouble() )
// do event

Resources