Ajax-calls: big differences between server runtime and client waiting time - ajax

I have two REST endpoints driving some navigation in a web site. Both create nearly the same response, but one gets its data straight from the db whereas the other has to ask a search engine (solr) first to get some data and then do the db calls.
If i profile both endpoints via JProfiler i get a higher runtime (approx. 60%) for the second one (about 31ms vs. 53ms). That's as expected.
Profile result:
If i view the same ajax calls from the client side i get a very different picture.
The faster of the both calls takes about 146 ms waiting and network time
The slower of the both calls takes about 1.4 seconds waiting and network
Frontend timing is measured via chrome developer tools. The server is a tomcat 7.0.30 running in STS 3.2. Client and server live on the same system, db and solr are external so there should be no network latency between tomcat and the browser. As a side note: The faster response has the bigger payload (2.6 vs 4.5 kb).
I have no idea why the slower of the both calls takes about 60% more server time but in sum nearly 1000% more "frontend time".
The question is: Is there any way i can figure out where this timing differences originate?

By default, the CPU views in JProfiler show times in the "Runnable" thread state. If a thread reads data from a socket connection or waits for some condition, that time is not included in the "Runnable" thread state.
In the upper right corner of the CPU views there is a thread state selector. If you change that to "All states", you will get times that you can compare with the wall clock times from the browser.

Related

Sitecore page load slowness

I'm using Sitecore instance 9.1, Solr 7.2.1, and SXA 1.8.
I have deployed the environment on Azure and while monitoring incoming requests (to CD instance), I've noticed slowness in loading some pages at specific times.
I've explored App Insights and found an unexplainable behavior the request is taking 28.7 seconds while the breakdown of it shows executions of milli-seconds .. How is that possible? and How to explain what's happening during extra 28 seconds on the app service ??
I've checked the profiler and it shows that the thread is taking only 1042.48 ms .. How is that possible ?
This is an intermittent issue happens during the day .. regular requests are being served within 3 to 4 seconds.
I noticed that Azure often shows a profile trace for a "similar", but completely different request when clicking from the End-to-end transaction view. You can check this by comparing the timestamp and URL of the profile trace and the transaction you clicked from.
For example, I see a transaction logged at 8:58:39 PM, 2021-09-25 with 9.1 s response time:
However, when I click the profile trace icon, Azure takes me to a trace that was captured 10 minutes earlier, at 08:49:20 PM, 2021-09-25 and took only 121.64 ms:
So, if the issue you experience is intermittent and you cannot replicate it easily, try looking at the profile traces with the Slowest wall clock time by going to Application Insights → Performance → Drill into profile traces:
This will show you the worst-performing requests captured by the profiler at the top of the list:
In order to figure out why it is slow, you’ll need to understand what happens internally, f.e:
How the wall clock time is spent while processing your request?
Are there any locks internally?
The source of that data is dynamic profiling, Azure can do that on demand.
The IIS stats report would show you slowest requests, so you could look into Thread Time distribution to see where those 28 seconds are spent:
In Sitecore the when the application start the Initial prefetch configuration allows to pre-populate prefetch caches. Pre-heated prefetch caches help to reduce the processing time of incoming requests. The initial prefetch configuration of caches are taking time to load on initial stage.
Sitecore XP instance takes too long to load. This is caused by a performance issue in the CatalogRepository.GetCatalogItems method. It will be fixed in upcoming updates
see Site core knowledge base
In Sitecore XP 9.0 the initial prefetch configuration was revised. The prefetch cache for the core database was configured to include items that are used to render the Sitecore Client interface.
The Sitecore Client interface is not used on Content Delivery instances. Disabling initial prefetch configuration for the Core database helps in avoiding excessive resource consumption on the SQL Server hosting the Core database.
Change the configuration of the Core database in the \App_Config\Sitecore.config file:
Refer site core knowledge base

Web Server Performance Degradation

The web application is running on Springboot and deployed on WebLogic.
We have assigned 400 as max threads and JDBC to be 100 connections.
When we perform load testing on the web application, the performance is optimal when the load is low (the response time is less than 200ms for most of the http request that we called).
When we increase the load, we can see that the thread count increases and jdbc count also increases gradually but no where near to max. However, the response time is getting much longer and it could take more than 5 seconds to response.
CPU usage, thread count, memory, JDBC connection seems to be normal during these period.
Another observation is that during testing and we saw that the performance is degrading, we used another machine to make a http call to the server that is only retrieving text without any DB or logic, and even this simple http call will take 10s to respond. (And the server resources is still not MAX!)
So, we are wondering what keep them waiting ?
Any other possible bottleneck?
If the server doesn't lack resources like CPU/RAM/etc. only a profiler can tell you where your application spends the most time which might be in:
Waiting in a queue for next thread/db connection from the pool to be available
Slow database query
Inefficient functions/algorithms which a subject to optimization
WebLogic configuration not suitable for high loads
JVM configuration not suitable for high loads (i.e. system is doing garbage collection to often/too long)
So I would recommend re-running your test with profiler tool telemetry enabled and at the same time monitoring essential JVM metrics using i.e. JMXMon Sample Collector which can be used for monitoring your application-specific metrics as well. It's a plugin which can be installed using JMeter Plugins Manager
For a detailed approach on how ago about identifying poor thread performance I suggest you take look at the TSA Method by Brendan Gregg.

Performance of NewRelic Real User Monitoring

We're been using NewRelic Real User Monitoring to track performance and activity.
We've noticed that the browser metrics are showing the majority of time is just Network times.
Even extremely small and simple server pages are showing average times of 3-5 seconds, even though they are just a few k in size and their Web application times and rendering times are mere milliseconds.
The site is hosted in the UK and when I run Chrome's Network Developer Tools I can see the page loading in around 50ms and then the hit to beacon-1.newrelic.com (in the USA) taking a further 500ms.
The majority of our clients do not have the luxury of high bandwidth or modern browsers and I believe that NewRelic itself is causing them a particularly poor user experience.
Are there any ways of making the new relic calls perform better? Can I make new relic call to a local (UK or Europe) based beacon?
I don't want to turn off new relic, but at the moment, it is causing more performance issues than it is alerting us to.
New Relic real user monitoring (RUM) does not affect the page load time for your users. The 500 ms that you are seeing refers to the amount of time it takes for the RUM data we collected from your app to reach our servers here in the U.S. The data is transferred after the pages are loaded, so it doesn't affect the page load at all for your users. This 500 ms of data travel time, therefore, is not part of any of our measurements of the networking, page rendering or DOM processing time.
New Relic calculates network time by first finding the total amount of time your application takes from request to page load, and then subtracting any application server time from that total. It is assumed that the resulting amount of time is "network" time. As such, it doesn't include the amount of time it takes to send that data to New Relic's servers. See this page for more info on how RUM works:
https://newrelic.com/docs/features/how-does-real-user-monitoring-work
If you're worried that there might be a bug or that your numbers don't look accurate, you can always file a support ticket with New Relic so we can look at your account in more detail.

wcf operation times out without error

I have a .NET 3.5 BasicHttpBinding no security WCF service hosted on IIS 6.0.
I have service throttling bumped up as per MS recommendations.
My service operation is getting called a few hundreds of time conccurrently, and at some point the client gets a timeout exception (59:00, that's whats set in the server and client timeouts).
If I raise the timeout it just hits the new limit.
It seems like the application just "freezes" somewhere and we have not been able to figure out how this happens.
WCF tracing on the server side doesn't come up with anything.
Any ideas regarding what could be the issue?
Thanks
I assume your WebService is not using the new async/await especially wrt the database calls. In that case its because you are blocking your limited threads.
In more detail. IIS/ASP.net only creates a limited number of threads to handle requests. The first...say 8 requests spin up threads and start working. At some point they will hit the database (I am assuming a traditional n-tier app). Those threads sleep. The next say...992 requests hit IIS and are held in a queue.
At some point the database calls return, process stuff...send data to the client. Another 8 requests are dequeued...hit the database...etc...
However each set of 8 requests takes a finite time to complete. With over 900 requests ahead of them, the last 100 or so threads will take at the very least 100 * latency * number of roundtrips before they can start up. If your latency * number of roundtrips is high...your last request will take a long time before it even gets dequeued, hence the timeout.
Two remedies exists. The first, create more threads....will use up all your memory and your IIS crashes. The second is to use .net 4.5 and async/await.
See here for more information

What does the times mean in Google Chrome's timeline in the network panel?

Often when troubleshooting performance using the Google Chrome's network panel I see different times and often wonder what they mean.
Can someone validate that I understand these properly:
Blocking: Time blocked by browser's multiple request for the same domain limit(???)
Waiting: Waiting for a connection from the server (???)
Sending: Time spent to transfer the file from the server to the browser (???)
Receiving: Time spent by the browser analyzing and decoding the file (???)
DNS Lookup: Time spent resolving the hostname.
Connecting: Time spent establishing a socket connection.
Now how would someone fix long blocking times?
Now how would someone fix long waiting times?
Sending is time spent uploading the data/request to the server. It occurs between blocking and waiting. For example, if I post back an ASPX page this would indicate the amount of time it took to upload the request (including the values of the forms and the session state) back to the ASP server.
Waiting is the time after the request has been sent, but before a response from the server has been received. Basically this is the time spent waiting for a response from the server.
Receiving is the time spent downloading the response from the server.
Blocking is the amount of time between the UI thread starting the request and the HTTP GET request getting onto the wire.
The order these occur in is:
Blocking*
DNS Lookup
Connecting
Sending
Waiting
Receiving
*Blocking and DNS Lookup might be swapped.
The network tab does not indicate time spent processing.
If you have long blocking times then the machine running the browser is running slowly. You can fix this by upgrading the machine (more RAM, faster processor, etc.) or by reducing its workload (turn off services you do not need, closing programs, etc.).
Long wait times indicate that your server is taking a long time to respond to requests. This either means:
The request takes a long time to process (like if you are pulling a large amount of data from the database, large amounts of data need to be sorted, or a file has to be found on an HDD which needs to spin up).
Your server is receiving too many requests to handle all requests in a reasonable amount of time (it might take .02 seconds to process a request, but when you have 1000 requests there will be a noticeable delay).
The two issues (long waiting + long blocking) are related. If you can reduce the workload on the server by caching, adding new server, and reducing the work required for active pages then you should see improvements in both areas.
You can read a detailed official explanation from google team here. It is a really helpful resource and your info goes under Timeline view section.
Resource network timing shows the same information as in resource bar in timeline view. Answering your quesiton:
DNS lookup: Time spent performing the DNS lookup. (you need to find out IP address of site.com and this takes time)
Blocking: Time the request spent waiting for an already established connection to become available for re-use. As was said in another answer it does not depend on your server - this is client's problem.
Connecting: Time it took to establish a connection, including TCP handshakes/retries, DNS lookup, and time connecting to a proxy or negotiating a secure-socket layer (SSL). Depends on network congestion.
Sending - Time spent sending the request. Depends on the size of sent data (which is mostly small because your request is almost always a few bytes except if you submit a big image or huge amount of text), network congestion, proximity of client to server
Waiting - Time spent waiting for the initial response. This is mostly the time of your server to process and respond to your response. This is how fast if your server calculating things, fetching records from database and so on.
Receiving - Time spent receiving the response data. Something similar to the sending, but now you are getting your data from the server (response size is mostly bigger than request). So it also depends on the size, connection quality and so on.
Blocking: Time the request spent waiting for an already established connection to become available for re-use. As was said in
another answer it does not depend on your server - this is client's
problem.
I do not agree with the statement above. All else being same [my machine workload] - my browser shows very less "Blocking" time for one website and long blocking time for some other website.
So if waiting for one of the six threads + proxy negotiation** is high, it is mostly because of the cascading effect of the server's slowness OR the bad design of page [too much being sent across the wire, too many times].
** - whatever "Proxy Negotiation" means!, nobody explains this very well, particularly where no local/CDN proxy is actually involved

Resources