When NewRelic starts to collect metrics - performance

This might be unusual question but I have to be sure if my suspicions are correct. In our company we use NewRelic to monitor our applications. From time to time I check what NewRelic says about app developed by me and I'm always wondering why average response time is way much lower then tests made manually by me or some external tools, e.g:
average response time of some endpoint is always somewhere around 130 ms in NewRelic metrics
when I test it manually it is something about 230-250 ms
some tools used in our company which can make lots of requests per some period of time also claims that average response time is something about 200 ms
(similar difference is visible in other endpoints ~ 100ms)
Those tests are made from location in eastern Europe and our app is hosted in UK, so we can assume that request needs something about 40 ms to reach servers and then the same amount to go back. Another thing is that we have some infrastructure overhead like loadbalancing and url resolving so we can have there another "few" milliseconds. As You can see when we add everything we have the difference.
The question is: am I right ? Because those are only my speculations and I wasn't able to find clear answer where and when NewRelic starts to collect the data when we look at whole request path :
client ---request---> web-app ---response---> client

Related

Huge differences in Jmeter starttime and webservices entry point starttime on openshift

I am doing load test to see performance using JMeter. My webservices deployed on OpenShift.
I was checking the starttime of Jmeter and STarttime of Webservice entry point of that particular transaction has some milliseconds differences(sometime it has more than 1s difference also).
Can you please let me know what could be the reason.
I don't know what do you mean by "starttime"
JMeter acts as follows:
Sends requests
Waits for the response
Measures the time in-between
If your JMeter instance is geographically in the different location than the OpenShift instance it might take some time for request to travel around the globe (see Connect Time and Latency metrics) so if you're in Japan and the server is in France - the request will have to pass via hundreds or routers
So if you want to get more closer results you need to deploy JMeter in more close geo location and preferably on OpenShift as well
More information:
JMeter Glossary
Understanding Your Reports: Part 3 - Key Statistics Performance Testers Need to Understand

Difference in load test results

What can be the reason for difference in results of load test run at different times with SAME bandwidth?
If I run load test at midnight the response times would be better and during they are real bad. Thanks for your help.
Maybe during the day the application is being used by real users and your artificial load is being added to the natural load?
Another option is that network is more busy during the day so channel bandwidth is fully utilized.
Load testing tool itself metrics don't tell the full story, you can only make assumptions by looking at TCP connect time metric.
If you have an APM system in place you can assess what's going on with the system during the daytime and night time and detect the factors which are impacting the response time. If you don't - you can set up your own by using i.e. JMeter PerfMon Plugin
Adding to Dmitri's note, there could be multiple reasons / causes for difference in results.
As Dmitri pointed check your APM tool to see server health while tests is executing
Do you integrate with any downstream applications? Do these applications reside in a stable and dedicated perf testing environment or they are live production environments? if it is later, then you should expect a latency in response during day time
Authentication / token validation - usually gateways are configured to validate incoming bearer token. when you execute during morning time there is a possibility that your gateway could be busy serving other real users requests (assuming this a production AD / Okta / PingID servers)

Time reported in WILY is very much less compared to Load Runner Time. Why?

I am trying to monitor the time spent on server using WILY Introscope but i observe that the time mentioned in WILY for each of the servers is in the range of 100 to 1000 ms. But the time taken for a page to load in browser is almost 5 seconds.
Why is the tool reporting incorrect value ? how to get the complete time in WILY ?
time mentioned in WILY for each of the servers is in the range of 100
to 1000 ms. But the time taken for a page to load in browser is almost
5 seconds.
Reason is - In Browser, you see all the outgoing traffic from the browser. Ideally, any web page would contain 1 POST request followed by multiple GET requests. POST could be your text/html data while Get could be image, CSS, javascript etc.
Mostly these Get requests would be answered by the Web server and post request would be served by involving app server.
The time reported in WILY is only the time spent on server to serve the POST request. Your GET request calls will not be captured by WILY.
Why is the tool reporting incorrect value ? how to get the complete
time in WILY ?
Tool is not reporting incorrect value. Tool sits on a JVM ideally. So it monitors the activity of the JVM and provides the metrics. That is the expected behavior.
A page is a complex item, requiring parsing of the page contents and then requests to multiple servers/sources. So, your page load time will be made up request time for an individual component, processing time for the page parsing and javascript (depending upon virtual user type), requests for the page components, where they are served from, etc... Compare this to your Wily monitoring, which may only be on one of the tiers involved.
For instance, you may have static components being served from a CDN which has zero visibility in your Wily Model. You might also be looking at your app server when the majority of the time is spent serving static components off of a web server, which is oft ignored from a monitoring perspective. Your page could have third party components which are loading which get counted in the Loadrunner time, but do not get counted in the Wily time.
It all comes down a a question of sampling. It is very common for what you see in your deep diag tool to be a piece of the total page load, or an individual request which makes up a page where there are many more components to be loaded. If you want and even more interesting look then enable the w3c time-taken field in your web HTTP request logs and look to see the cost of every individual request. You can do this in the web layer of your app servers as well. Wily will then provide internal breakdown for those items which are "slow."

Performance of NewRelic Real User Monitoring

We're been using NewRelic Real User Monitoring to track performance and activity.
We've noticed that the browser metrics are showing the majority of time is just Network times.
Even extremely small and simple server pages are showing average times of 3-5 seconds, even though they are just a few k in size and their Web application times and rendering times are mere milliseconds.
The site is hosted in the UK and when I run Chrome's Network Developer Tools I can see the page loading in around 50ms and then the hit to beacon-1.newrelic.com (in the USA) taking a further 500ms.
The majority of our clients do not have the luxury of high bandwidth or modern browsers and I believe that NewRelic itself is causing them a particularly poor user experience.
Are there any ways of making the new relic calls perform better? Can I make new relic call to a local (UK or Europe) based beacon?
I don't want to turn off new relic, but at the moment, it is causing more performance issues than it is alerting us to.
New Relic real user monitoring (RUM) does not affect the page load time for your users. The 500 ms that you are seeing refers to the amount of time it takes for the RUM data we collected from your app to reach our servers here in the U.S. The data is transferred after the pages are loaded, so it doesn't affect the page load at all for your users. This 500 ms of data travel time, therefore, is not part of any of our measurements of the networking, page rendering or DOM processing time.
New Relic calculates network time by first finding the total amount of time your application takes from request to page load, and then subtracting any application server time from that total. It is assumed that the resulting amount of time is "network" time. As such, it doesn't include the amount of time it takes to send that data to New Relic's servers. See this page for more info on how RUM works:
https://newrelic.com/docs/features/how-does-real-user-monitoring-work
If you're worried that there might be a bug or that your numbers don't look accurate, you can always file a support ticket with New Relic so we can look at your account in more detail.

Why is the latency of my GAE app serving static files so high?

I was checking the performance of my Go application on GAE, and I thought that the response time for a static file was quite high (183ms). Is it? Why is it? What can I do about it?
64.103.25.105 - - [07/Feb/2013:04:10:03 -0800] "GET /css/bootstrap-responsive.css
HTTP/1.1" 200 21752 - "Go http package" "example.com" ms=183 cpu_ms=0
"Regular" 200 ms seems on the high side of things for static files. I serve a static version of the same "bootstrap-responsive.css" from my application and I can see two types of answer times:
50-100ms (most of the time)
150-500ms (sometimes)
Since I have a ping roundtrip of more or less 50ms to google app engine, it seems the file is usually served within 50ms or so.
I would guess the 150-300ms response time is related to google app engine frontend server being "cold cached". I presumed that retrieving the file from some persistent storage, involves higher latencies than if it is in the frontend server cache.
I also assume that you can hit various frontend servers and get sporadic higher latencies.
Lastly, the overall perceived latency from a browser should be closely approximated by:
(tc)ping round trip + tcp/http queuing/buffering at the frontend server + file serving application time (as seen in your google app logs) + time to transfer the file.
If the frontend server is not overloaded and the file is small, the latency should be close to ping + serving time.
In my case, 50ms (ping) + 35ms (serving) = 85ms, is quite close to what I see in my browser 95ms.
Finally, If your app is serving a lot of requests, they maybe get queued, introducing a delay that is not "visible" in the application logs.
For a comparison I tested a site using tools.pingdom.com
Pingdom reported a Load time of 218ms
Here was the result from the logs:
2013-02-11 22:28:26.773 /stylesheets/bootstrap.min.css 200 35ms 45kb
Another test resulting in 238ms from Pingdom and 2ms in the logs.
Therefore, I would say that your 183ms seems relatively good. There are so many factors at play:
Your location to the server
Is the server that is serving the resource overloaded?
You could try serving the files using a Go instance instead of App Engine's static file server. I tested this some time ago, the results were occasionally faster, but the speeds were less consistent. Response time also increased under load, due to App Engine Instance being Limited to 10 Concurrent Requests. Not to mention you will be billed for the instance time.
Edit:
For a comparison to other Cloud / CDN providers see Cedexis's - Free Country Reports
You should try setting caching on static files.

Resources