Jmeter- Jenkins performance plugin report exponents numbers - jmeter

I am able to generate performance trend report using Jenkins Performance plugin. In performance report what is the significance/meaning of the exponent numbers highlighted in red color in below image. Example -2, +5698, -6657 etc. Not able to find any information related to this

The +value indicates that this build contributed to increase in the response time from the previous build by that 'value'.
Ex: In your case, for the first URI, the average response time increased by +5698 ms, similarly incase of -ve value, it means the current build is faster than the previous by that value. I was able to verify this on my performance trend report.
However i am not sure what the + and - equates to in the "samples" column. Let me know if this works!

Related

SpringBoot - observability on *_max *_count *_sum metrics

Small question regarding Spring Boot, some of the useful default metrics, and how to properly use them in Grafana please.
Currently with a Spring Boot 2.5.1+ (question applicable to 2.x.x.) with Actuator + Micrometer + Prometheus dependencies, there are lots of very handy default metrics that come out of the box.
I am seeing many many of them with pattern _max _count _sum.
Example, just to take a few:
spring_data_repository_invocations_seconds_max
spring_data_repository_invocations_seconds_count
spring_data_repository_invocations_seconds_sum
reactor_netty_http_client_data_received_bytes_max
reactor_netty_http_client_data_received_bytes_count
reactor_netty_http_client_data_received_bytes_sum
http_server_requests_seconds_max
http_server_requests_seconds_count
http_server_requests_seconds_sum
Unfortunately, I am not sure what to do with them, how to correctly use them, and feel like my ignorance makes me miss on some great application insights.
Searching on the web, I am seeing some using like this, to compute what seems to be an average with Grafana:
irate(http_server_requests_seconds::sum{exception="None", uri!~".*actuator.*"}[5m]) / irate(http_server_requests_seconds::count{exception="None", uri!~".*actuator.*"}[5m])
But Not sure if it is the correct way to use those.
May I ask what sort of queries are possible, usually used when dealing with metrics of type _max _count _sum please?
Thank you
UPD 2022/11: Recently I've had a chance to work with these metrics myself and I made a dashboard with everything I say in this answer and more. It's available on Github or Grafana.com. I hope this will be a good example of how you can use these metrics.
Original answer:
count and sum are generally used to calculate an average. count accumulates the number of times sum was increased, while sum holds the total value of something. Let's take http_server_requests_seconds for example:
http_server_requests_seconds_sum 10
http_server_requests_seconds_count 5
With the example above one can say that there were 5 HTTP requests and their combined duration was 10 seconds. If you divide sum by count you'll get the average request duration of 2 seconds.
Having these you can create at least two useful panels: average request duration (=average latency) and request rate.
Request rate
Using rate() or irate() function you can get how many there were requests per second:
rate(http_server_requests_seconds_count[5m])
rate() works in the following way:
Prometheus takes samples from the given interval ([5m] in this example) and calculates difference between current timepoint (not necessarily now) and [5m] ago.
The obtained value is then divided by the amount of seconds in the interval.
Short interval will make the graph look like a saw (every fluctuation will be noticeable); long interval will make the line more smooth and slow in displaying changes.
Average Request Duration
You can proceed with
http_server_requests_seconds_sum / http_server_requests_seconds_count
but it is highly likely that you will only see a straight line on the graph. This is because values of those metrics grow too big with time and a really drastic change must occur for this query to show any difference. Because of this nature, it will be better to calculate average on interval samples of the data. Using increase() function you can get an approximate value of how the metric changed during the interval. Thus:
increase(http_server_requests_seconds_sum[5m]) / increase(http_server_requests_seconds_count[5m])
The value is approximate because under the hood increase() is rate() multiplied by [inverval]. The error is insignificant for fast-moving counters (such as the request rate), just be ready that there can be an increase of 2.5 requests.
Aggregation and filtering
If you already ran one of the queries above, you have noticed that there is not one line, but many. This is due to labels; each unique set of labels that the metric has is considered a separate time series. This can be fixed by using an aggregation function (like sum()). For example, you can aggregate request rate by instance:
sum by(instance) (rate(http_server_requests_seconds_count[5m]))
This will show you a line for each unique instance label. Now if you want to see only some and not all instances, you can do that with a filter. For example, to calculate a value just for nodeA instance:
sum by(instance) (rate(http_server_requests_seconds_count{instance="nodeA"}[5m]))
Read more about selectors here. With labels you can create any number of useful panels. Perhaps you'd like to calculate the percentage of exceptions, or their rate of occurrence, or perhaps a request rate by status code, you name it.
Note on max
From what I found on the web, max shows the maximum recorded value during some interval set in settings (default is 2 minutes if to trust the source). This is somewhat uncommon metric and whether it is useful is up to you. Since it is a Gauge (unlike sum and count it can go both up and down) you don't need extra functions (such as rate()) to see dynamics. Thus
http_server_requests_seconds_max
... will show you the maximum request duration. You can augment this with aggregation functions (avg(), sum(), etc) and label filters to make it more useful.

Jmeter Max value decresae over time

I'm using jmeter 5 to launch a simple load test. now i want to understand console output. But i have the difficulty with max value.
I was expected that max is a Maximum elapsed time of all the requests. But during the load test, his value decrease and increase.
Load test parameter:
loops: 1000
concurent threads: 5
rand-up: 1s
The image below show my console output. And you can see the max value decrease and increase. I don't know why.
please someone can explain me ? I have a some problems to understand variations of other values.
It's simple.
There, on that picture, you've got two types of reporting records:
1) Ones with "summary =" are overall, for the whole test duration.
As you can see, there Max values are gradually, but slowly, changes towards increase (Mins do the opposite, expectedly).
Which is expected. I shouldn't go for a why-s here, right?
2) Ones with "summary + " are delta.
That's what was added for a certain time period (30 sec here), and all the values you observe there are calculated for that time span ONLY.
Again, obviously - they are different, and independent of each other.
So, concluding: nothing actually "jumps" up there, everything works as expected, you'd just misinterpret it.
Hope that soothes your concerns.
P.S. You'd cleared any mentions of InfluxDB & Grafana out of the questions, but I have to add that it works similar way for that bundle: these values depends on a timeframe & grouping by time (smaller time chunks) within this timeframe.

SonarQube: calculate code coverage for delta only

Is it possible in SonarQube to calculate code coverage for a delta only?
For instance: a project had 1000 lines yesterday and its unit test coverage results are already in SonarQube. A new commit was pushed today with an extra 100 lines of code and additional test cases. These additional test cases cover 70 of the 100 new lines. Is there a way, possibly using TimeMachine, to retrieve/calculate the code coverage for the delta only? (in this case 70%)
You're looking for "Coverage on New Code", which is calculated against the "Leak period", i.e. the first listing in Administration > General > Differential Views.
Your problem is that differential values are calculated during analysis, so you can't update the leak period value and retroactively get exactly what you described. But narrow the leak period value down from the default 30 days (maybe previous_version?) and you'll get close going forward.

Interpreting basic output from Vowpal Wabbit

I had a couple questions about the output from a simple run of VW. I have read around the internet and the wiki sites but am still unsure about a couple of basic things.
I ran the following on the boston housing data:
vw -d housing.vm --progress 1
where the housing.vm file is set up as (partially):
and output is (partially):
Question 1:
1) Is it correct to think about the average loss column as the following steps:
a) predict zero, so the first average loss is the squared error of the first example (with the prediction as zero)
b) build a model on example 1 and predict example 2. Average the now 2 squared losses
c) build a model on example 1-2 and predict example 3. Average the now 3 squared losses
d) ...
Do this until you hit the end of the data (assuming a single pass)
2) What is the current features columns? It appears to be the number of non-zero features + an intercept. What is shown in the example, suggests that a feature is not counted if it is zero - is that true? For instance, the second record has a value of zero for 'ZN'. Does VW really look at that numeric feature as missing??
Your statements are basically correct. By default, VW does online learning, so in step c it takes the current model (weights) and updates it with the current example (rather than learning from all the previous examples again).
As you supposed, the current features column is the number of (non-zero) features for the current example. The intercept feature is included automatically, unless you specify --noconstant.
There is no difference between a missing feature and a feature with zero value. Both means that you won't update the corresponding weight.

Comparison of Sorting Algorithms using running time in terms of seconds

I have devised a test in order to compare the different running times of my sorting algorithm with Insertion sort, bubble sort, quick sort, selection sort, and shell sort. I have based my test using the test done in this website http://warp.povusers.org/SortComparison/index.html, but I modified my test a bit.
I set up a test manager program server which generates the data, and the test manager sends it to the clients that run the different algorithms, therefore they are sorting the same data to have no bias.
I noticed that the insertion sort, bubble sort, and selection sort algorithms really did run for a very long time (some more than 15 minutes) just to sort one given data for sizes of 100,000 and 1,000,000.
So I changed the number of runs per test case for those two data sizes. My original runs for the 100,000 was 500 but I reduced it to 15, and for 1,000,000 was 100 and I reduced it to 3.
Now my professor doubts the credibility as to why I've reduced it that much, but as I've observed the running time for sorting a specific data distribution varied only by a small percentage, which is why I still find it that even though I've reduced it to that much I'd still be able to approximate the average runtime for that specific test case of that algorithm.
My question now is, is my assumption wrong? Does the machine at times make significant running time changes (>50% changes), like say for example sorting the same data over and over if a first run would give it 0.3 milliseconds will the second run give as much difference as making it run for 1.5 seconds? Because from my observation, the running times don't vary largely given the same type of test distribution (e.g. completely random, completely sorted, completely reversed).
What you are looking for is a way to measure error in your experiments. My favorite book on subject is Error Analysis by Taylor and Chapter 4 has what you need which I'll summarize here.
You need to calculate Standard error of the mean or SDOM. First calculate mean and standard deviation (formulas are on Wikipedia and quite simple). Your SDOM is standard deviation divided by square root of number of measurements. Assuming your timings have Normal distribution (which it should), the twice the value of SDOM is a very common way to specify +/- error.
For example, let's say you run sorting algorithm 5 times and get following numbers: 5, 6, 7, 4, 5. Then mean is 5.4 and standard deviation is 1.1. Therefor SDOM is 1.1/sqrt(5) = 0.5. So 2*SDOM = 1. Now you can say that algorithm rum time was 5.4 ± 1. You professor can determine if this is acceptable error in measurement. Notice that as you take more readings, your SDOM, i.e. plus or minus error, goes down inversely proportional to square root of N. Twice of SDOM interval has 95% probability or confidence that the true value lies within the interval which is accepted standard.
Also you most likely want to measure performance by measuring CPU time instead of simple timer. Modern CPUs are too complex with various cache level and pipeline optimizations and you might end up getting less accurate measurement if you are using timer. More about CPU time is in this answer: How can I measure CPU time and wall clock time on both Linux/Windows?
It absolutely does. You need a variety of "random" samples in order to be able to draw proper conclusions about the population.
Look at it this way. It takes a long time to poll 100,000 people in the U.S. about their political stance. If we reduce the sample size to 100 people in order to complete it faster, we not only reduce the precision of our final result (2 decimal places rather than 5), we also introduce a larger chance that the members of the sample have a specific bias (there is a greater chance that 100 people out of 3xx,000,000 think the same way than 100,000 out of those same 3xx,000,000).
Your professor is right, however he's not provided the details that I mention some of them here :
Sampling issue: It's right that you generate some random numbers and feed them to your sorting methods, but with a few test cases indeed you're biased cause almost all of the random functions are biased to some extent (specially to the state of machine or time at the moment), so you should use more and more test cases to be more confident about the randomness.
Machine state: Suppose you've provide perfect data (fully representative of a uniform distribution), the performance of the electro-mechanical devises like computers may vary in different situations, so you should try for considerable times to smooth the effects of these phenomena.
Note : In advanced technical reports, you should provide a confidence coefficient for the answers you provide derived from statistical analysis, and proven step by step, but if you don't need to be that much exact, simply increase these :
The size of the data
The number of tests

Resources