confusing time newrelic graph - heroku

In this graph left side graph which says web transaction graph it show seconds in Y-axis and maximum time approaching is 1750 ms which is 1.7 seconds while browser time shows 1.22s why is that . same goes on Apdex score graph, browser apdex score is 1.0 out of 7.0 which is great while if we see at the drawn graph it clearly shows that most of the requests lies in 0.5 - 0.75 seconds region so ain't apdex score should lie > 1.0 instead of sharp 1.0 seconds. I want to understand the graphs and times mentioned. Thanks in advance.

For:
it show seconds in Y-axis and maximum time approaching is 1750 ms which is 1.7 seconds while browser time shows 1.22s why is that
At the point in time where the Web Transaction Time measures 1.7 seconds your throughput is 5+ individual transactions. In addition, this is at the Overview (Application level), which could be across multiple different types of transactions. You will not be able to compare this number to an individual browser transaction response time for these reasons.
For:
apdex score should lie > 1.0 instead of sharp 1.0 seconds.
The unit of measurement of Apdex is not seconds. See here for calculating an Apdex score: https://docs.newrelic.com/docs/apm/new-relic-apm/apdex/apdex-measuring-user-satisfaction#score
Apdex score varies from 0 to 1, with 0 as the worst possible score (100% of response times were Frustrated), and 1 as the best possible score (100% of response times were Satisfied).

In this graph left side graph which says web transaction graph it show
seconds in Y-axis and maximum time approaching is 1750 ms which is 1.7
seconds while browser time shows 1.22s why is that?
You are in APM (stands for Application Performance Monitoring). This is your server-side code, from the looks of things I believe this is a Ruby on Rails application. The average time for the server side code is the 414ms. However since you also have the Browser Agent installed you see the average client side time, which is 1.22 seconds. If you click on the APM bar in the upper left you can click over to Browser.
same goes on Apdex score graph, browser apdex score is 1.0 out of 7.0
which is great while if we see at the drawn graph it clearly shows
that most of the requests lies in 0.5 - 0.75 seconds region so ain't
apdex score should lie > 1.0 instead of sharp 1.0 seconds. I want to
understand the graphs and times mentioned.
Again you are seeing both the Apdex Score for APM AND Browser. Your APM threshold is set at 0.5 seconds. This is the [0.5] above the graph. (Apdex is shown in brackets).
Any transaction that comes in under 0.5 seconds receives a score of 1.
Any transaction that comes in over 4X that threshold (so 4*0.5=2 seconds) receives a 0
Any transaction that comes in between 0.5 and 2.0 seconds receives a 0.5
Add all of those up for each Transaction and divide by the number of "Requests Per Minute" (RPM) or number of Transactions for that time period. So for this overall time period you have an Apdex score of 0.78 (this will always be between 0 and 1).
The Browser threshold is [7.0] or 7 seconds. This is the default and can be configured to be lower. For this time frame you have an Apdex score of 1.0, meaning that all transactions are under that 7 second threshold.

Related

Why is Average response time is reducing when we are increasing the number of users?

I am using J-Meter to run a performance test with different number of users. With 1 user, the avg response time is 1.4 seconds, but with more number of users, it's logical that the avg response time will go up, but instead it is reducing. Can anyone explain why? The test scenario is that I am interacting a few times (2-3 interactions) with a chat bot.
Please help me understand this confusing results below:
1 user - 30 seconds - 1.3 seconds (average response time)
5 users - 60 seconds - 0.92 seconds (average response time)
10 users - 60 seconds - 0.93 seconds (average response time)
20 users - 120 seconds - 0.92 seconds (average response time)
First iteration of first user often involves some overhead on client side (most commonly DNS resolution), and can have some overhead on server side (server "warmup"). That overhead is not required in the following iterations or users.
Thus what you see as reduction in average time is actually reduction of the impact of the slower "first user first iteration" execution time on overall outcome. This is why it's important to provide a sufficient sample, so that such local spike does not matter that much anymore. My rule of thumb is at least 10000 iterations before looking at any averages, although level of comfort is up to every tester to set.
Also when increasing number of users, you should not expect average to be worse, unless you reached a saturation point: it should be stable rather. So if you expect your app to be able to support not more than 20 users, than your result is surprising, but if you expect application to support 20000 users, you should not have any average degradation at 20 users.
To test if this is what happens, try to run 1 user, but for much longer, so that total number of iterations is similar to running 20 users for example. Roughly you need to increase duration of test with 1 user to 20 min to get to similar number of iterations (i.e. same length of test would be 120 sec, but also x20 iterations with 20 users, giving you rough number of 20 min total for 1 user)

How are server hits/second more than active thread count? | Jmeter

I'm running a load test to test the throughput of a server by making HTTP requests through JMeter.
I'm using the Thread Stepper plugin that allows me to increase the number of threads I'm using to make the requests after a particular time period.
The following graphs show the number of active threads with time and another one shows the corresponding hits per second I was able to make.
The third graph shows the latencies of the requests. The fourth one shows the response per second.
I'm not able to correlate the four graphs together.
In the server hits per second, I'm able to make a maximum of around 240 requests per second with only 50 active threads. However, the latency of the request is around 1 second.
My understanding is that a single thread would make a request, and then wait for the response to return before making the second request.
Since the minimum latency in my case is around 1 second, how is JMeter able to hit 240 requests per second with only 50 threads?
Server hits per second, max of 240 with only 50 threads. How?
Response latencies (minimum latency of 1 sec)
Active threads with time (50 threads when server hits are 240/sec)
Response per second (max of 300/sec, how?)
My expectation is that the reasons could be in:
Response time is less than 1 second therefore JMeter is able to send more than one request per second with every thread
It might also be connected with HTTP redirections and/or Embedded Resources processing, as per plugin's documentation:
Hits uncludes child samples from transactions and embedded resources hits.
For example this single HTTP Request with 1 single user results in 20 sub-samples which are being counted by the "Server Hits Per Second" plugin.
I took some time at analyzing the four graphs you provided and it seems to make sense that Jmeter Graphs are plotted reasonably well (since you feel the Jmeter is plotting incorrectly I will try to explain why the graphs look normal to me) .Taking clue from the point 1 of the answer that #Dmitri T provided I start the below analysis:
1 . Like pointed by #Dimitry T, the number of responses are coming in more faster than than the number of hits(requests) sent to the server; which can be seen from the Number of responses/second graph as the first batch of hits is sent at -between 50 to 70 from 0 to first five minutes . The responses for this set of requests come a a much faster rate in i.e at 60 to 90 from 0 to the first five minutes.. the same trend is observed for the set of hits fired from five to 10 minutes (responses come faster than the requests(hits) i.e 100 to 150 responses compared to 85 to 130 hits) ...Hence by the continuous tned the Load Generator is able to send more hits and more hits and more hits for the 50 active threads...which gives the upwards positive slope coupled with the Thread Stepper plugin's capability..
Hence the hits and responses graph are in lock step pattern(marching in unison) with the response graph having a better slope compared to hits per second graph.
This upwards happy happy trend continues till the queuing effect due to entire processing capacity use ,takes place at 23 minutes. This point in time all the graphs seems to have a opposite effect of what they were doing up till now i.e for 22.59 minutes.
The response latency (i.e the time taken to get the response is increased from 23rd minute on . At the same time there is a drop in hits per second(maybe due to not enough threads available to load generator o fire next request as they(threads aka users) are in queue and have not exited the process to make the next request). This drop in requests have dropped the rate of receiving responses as seen from the number of responses graph. But still you can see "service center" still processing the requests efficiently i.e sending back request faster the arriving rate i.e as per queuing theory the service rate is faster then the arrival rate and hence reinforcing point 1 of our analysis.
At 60 users load .Something happens ..Queuing happens!!(Confirm this by checking drop in response time graph with Throughput graph drop at the same time.If yes then requests were piped-up at the server i.e queued.) and this is the point where all the service centers are busy.and hence a drop in response time which impact the user threads from being able to generate a new hits causing low in hits per second.
The error codes observed in number of responses per second graph namely the 400,403,500 and 504 seem to part of the response codes all, from the 10th user load onwards which may indicate a time bound or data issue(first 10 users of your csv have proper data in database and the rest don't)..
Or it could be with the "credit" or "debit" transaction since chances are both may conflict...or be deadlocked on a Bank account etc.
If you notice the nature of all the error codes they can be seen to be many where more volume of responses are received i.e till 23 minute and reduced in volume since the level of responses are less due to queuing from 23rd minute on wards.Hence directly proportional with response codes. The 504 (gateway timeout) error which is a sure sign of lot of time taken to process and the web server timing out means the load is high..so we can consider the load till 80 users ..i.e at 40th minute as a reasonable load bearing capacity of the system(Obliviously if more 504 errors are observed we can fix that point as the unstressed load the system can handle.)
***Important: Check your HITS per second Graph configuration :Another observation is that the metering parameter to plot the graph could be not in sync with the expected scale i.e per second .Since you are expecting Hits in seconds but in your Hits per second graph you per configuration to plot could be 500 ms i.e half a second.so this could cause the plotting to go up high i.e higher than 50hits per 50 users ..

How Throughput is calculate and display in Sec,Minute and Hours in Jmeter?

I have one observation and want to get knowledge on Throughput calculation,Some time Throughput is displaying in seconds,some times in minutes and some times in Hours,please any one provide exact answer to calculate throughput and when it will display in Seconds,Minutes and Hours in Jmeter Summary Report
From JMeter Docs:
Throughput is calculated as requests/unit of time. The time is
calculated from the start of the first sample to the end of the last
sample. This includes any intervals between samples, as it is supposed
to represent the load on the server. The formula is: Throughput =
(number of requests) / (total time).
unit time varies based on the throughput values.
examples:
In 10 seconds, 10 requests are sent, then throughput is 10/10 = 1/sec
In 10 seconds, 1 requests are sent, then throughput is 1/10 = 0.1/sec = 6/min (showing 0.1/sec in decimals will be automatically shown in next higher unit time)
If you understand, it is to avoid small values (like, 0.1, 0.001 etc). In such cases, higher unit time is more friendly in understanding, while all unit times are correct. It is a matter of usability.
so,
1/sec = 60/min = 3600/hour = SAME

What does this mean in JMeter load? 100 in13.2s= 7.4/s

I have checked for load 100 and got result 100 in13.2/s= 7.4/s.
So what is the meaning of 100 in 13.2/s = 7.4/s?
It means the Number of Executed Samples or Requests are 100. Test duration is 13.2 seconds and Throughput is 7.4/s. So your application handled average 7.4 requests per second during those 13.2 seconds. From that test, the total number of requests are 100.
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time).
In fact, there's been a mistake in the question: it should be "100 in 13.2s" not "100 in 13.2/s" !!
For further detail, go through Apache JMeter User Manual: Glossary & Elemants of a Test Plan.

How do I weight my rate by sample size (in Datadog)?

So I have an ongoing metric of events. They are either tagged as success or fail. So I have 3 numbers; failed, completed, total. This is easily illustrated (in Datadog) using a stacked bar graph like so:
So the dark part are the failures. And by looking at the y scale and the dashed red line for scale, this easily tells a human if the rate is a problem and significant. Which to mean means that I have a failure rate in excess of 60%, over at least some time (10 minutes?) and that there are enough events in this period to consider the rate exceptional.
So I am looking for some sort of formula that starts with: failures divided by total (giving me a score between 0 and 1) and then multiplies this somehow again with the total and some thresholds that I decide means that the total is high enough for me to get an automated alert.
For extra credit, here is the actual Datadog metric that I am trying to get to work:
(sum:event{status:fail}.rollup(sum, 300) / sum:event{}.rollup(sum,
300))
And I am watching for 15 minutes and alert of score above 0.75. But I am not sure about sum, count, avg, rollup or count. And ofc this alert will send me mail during the night when the total events goes low enough to were a high failure rate isn't proof of any problem.

Resources