I have a web role with co-located cache. there are two instances of this role.
Even when there is a cache-hit, the turn-around time for our request measures to a few seconds. Upon analysis we found that the time taken by cache to get back with data is 1 second on average. However, IIS logs suggest that the overall servicing of the request takes about 4 seconds. there is no intermediate operation before or after data retrieval from cache.
What could be wrong here? What would be a good way to analyse the problem?
For what it's worth we were having a similar problem with caching in Redis in Azure and a RESTful API.
The problem turned out to be the serialization of data.
Some ways to debug the problem:
Download ANTS profile (it has a free trial) and profile your worker role locally.
Enable profiling for your worker role, deploy it, run it for a bit, then download the profile file in Visual Studio. (You can use Server Explorer to find your instance and download the log).
Download the Azure tool kit (http://blogs.msdn.com/b/kwill/archive/2013/08/26/azuretools-the-diagnostic-utility-used-by-the-windows-azure-developer-support-team.aspx) on your instance. It has things like Process Explorer that can tell you how much memory your role is taking, how much CPU, what it's doing on the network etc.
You can contact Azure support and have them help you profile your application. We did that and got absolutely amazing support. They talked with us on the phone for hours and helped us profile our code.
You really should increase the log level for client and server refer In-Role Cache Troubleshooting and Diagnostics (Windows Azure Cache) and take a look at the performance counters. If read operations (GET) is taking long time then there can be paging in one of the instances or may be there is overload on the server. If you see any performance issue on the cache instances then you should take reassess the capacity using Capacity Planning Considerations for In-Role Cache (Windows Azure Cache) .
If this doesn't help then please open a support ticket.
Related
I have an Azure App Service running an MVC Web API. It connects to a DB and Redis cache. The calls to the API are taking a massive amount of time or timing out. I have stripped back the methods to be doing pretty much nothing.
public HttpResponseMessage GetData()
{
Request.CreateResponse(HttpStatusCode.OK, "abc");
}
I still have the same issue. The webserver isn't under any pressure, nor is the DB, both below 30%. I'm at a loss to know where to even start.
This method is called quite a lot so there may be a lot of concurrent requests. I'm running an S3 App Service plan which should more than suffice.
Any suggestions on how to trouble shoot greatly welcome? I can only think it is down to the number of simultaneous requests.
As far as I know, this issue is often caused by:
requests taking a long time
application using high memory/CPU
application crashing due to an exception.
I suggest you could firstly enable diagnostics logging for your web app.
By using this way, you could find the details information about Detailed Error, Failed Request Tracing,Web Server Logging in your web application.
More details, you could refer to bellow link:
Enable diagnostics logging for web apps in Azure App Service
You could also use the Azure App Service Support Portal.
By using this way, you could troubleshoot issues related to your web app by looking at HTTP logs, event logs, process dumps, and more.
You can access all this information using our Support portal at http://.scm.azurewebsites.net/Support
More details, you could refer to below link:
New Updates to Support Site Extension for Azure Websites
Besides, you could also refer to this article:Troubleshoot slow web app performance issues in Azure App Service
I have a rather strange scenario. We have a range of WEBAPIs hosted on the cloud. We consume those services in our Windows 8 application. The problem is when the services are running locally it takes less than 400ms but when hosted on Windows azure it takes upto 20 seconds for some requests. I have checked the indexes of our database tables and its fine. I have no clue so as to what to profile and how to improve the performance.
Thanks!
Everyone Thanks a lot!
But I found a way to use dottrace(Excellent profiling tool) on the azure deployment. Here is the link
http://blog.maartenballiauw.be/post/2013/03/13/Remote-profiling-Windows-Azure-Cloud-Services-with-dotTrace.aspx
You can also use windows azure diagnostics and stopwatch class to log all times to the wad tables.
Also found out that the first request to the azure service is always slow in another thread. Have just copied it here below
Serkan, you would need to first make sure in your post, weather you have published a Cloud Service or a Website to Windows Azure. Based on Cloud Service (A Web Role) or a WebSite the answer to your question will be different. As you want to learn more I would explain what goes on behind.
As you suggested that your first connection is slow, I can see that happen with Windows Azure Websites. Windows Azure Websites are running in shared pool of resources and uses the concept of hot (active) and cold (inactive) sites in which if a websites has no active connection for x amount of time, the site goes into cold state means the host IIS process exits. When a new connection is made to that websites it takes a few seconds to get the site ready and working. Depend on how your first page code is, the time to load the site for the first time varies. Similar discussion is logged: Very slow opening MySQL connection using MySQL Connector for .net
With Windows Azure Cloud Service the overall application model is different. Your webrole has its own IIS server which is fully dedicated to your application and above Website limitation does not occur however there could be other reasons which could have slower page load. If you are using WebRole, then what you could do is run a page load profiler first and RD to your Azure Instance to collect the page load data to see what else you could do to boost the performance.
You'll obviously need to profile your app to find the real cause. Check out these two articles which should get you started:
http://msdn.microsoft.com/en-us/library/windowsazure/hh369930.aspx
http://www.windowsazure.com/en-us/develop/net/common-tasks/profiling-in-visual-studio/
Sometimes when I access my windows azure website, the initial response time is very slow. After the first page load the website is fast. Some background: The website is not that often visited at the moment. Further, I am using a keepalivecontroller to keep the website running and the website is running in shared mode. I am wondering: are websites that are not that active removed from memory in windows azure? Or is it just that background tasks on the operational level of windows azure are interfering sometimes? It is not transparent for me what is happening, so is there some sla of something for windows azure websites?
There is now a new feature available for Windows Azure Websites in 'Reserved' mode that will keep your website warm. You can now turn on "Always-on" under the "Configuration"-tab on your Azure Website. As explained in this blog post:
When the new “Always On” feature is is enabled on a site, “Windows
Azure will automatically ping your website regularly to ensure that
the website is always active and in a warm/running state,” Guthrie
writes. “This is useful to ensure that a site is always responsive
(and that the app domain or worker process has not paged out due to
lack of external HTTP requests).”
Easiest way to keep a website warm is to call it regularly using the Scheduler feature in Windows Azure Mobile Services.
You simply write a script in the Scheduler that pings your website every x minutes.
Here's a post covering how to do that: http://fabriccontroller.net/blog/posts/job-scheduling-in-windows-azure/
The Windows Azure Web Sites are still in preview, so there is currently no SLA with that service.
The Web Sites do idle out when in free or in Shared mode, which is likely what you are seeing. When the site idles out it actually is removed from memory, and indeed the IIS process host running the site is shut down. This is how they can get the density of hosting 100 sites on the same VM.
You can find a lot of info on the Channel9 site about why this is the case, or, as a shameless plug, here is an article that talks about how the process is handled.
Now, you mentioned that you were using a keepalivecontroller, but what exactly do you mean by that? I use pingdom.com to contantly request data for one of my websites, and that seems to do pretty well. It is still possible that a request doesn't come in and the idle time is met which then cycles the site. It is also possible that even if you always have the site running that the VM the site sites on needs to have the underlying OS updated, in which case Azure would then move the site process to another VM, which could also cause the slow start up on the next request.
I'd start logging your application start ups and then look through your logs to see how often that is happening.
If you only need to warm it up once (vs keeping it warm) and are mostly trying to prevent your customers experience page cold starts, I believe the correct tool is IIS Application Initialization. You can configure it with a list of urls to hit before it deems the app ready for action.
My site is suffering from page cold starts and that is severely magnified in Azure Websites (even on an S3), but it is absolutely speedy after its served that first time thanks to several layers of caching (our inefficient use of Umbraco's dynamic nodes query language creates a lot of database churn--which we're cleaning up opportunistically).
From what I've read and my own web.config attempts this is still not available in Azure Websites. I've asked Microsoft for it here: MS IDEA: Application Initialization to warm up specific pages when app pool starts. Please consider voting for it.
For each service/site you need to go to "Configure", then switch "Always On" to ON. Also make sure you click Save; it took my website about 2 minutes before noticing the change.
Why this is not the default is kind of mind boggling, because my setup on HostGator was running much faster than Azure. I guess Microsoft is figuring if nobody is accessing your site, it's okay if it has a long load time.
I'm using the role-based caching feature for a windows azure web role.
Configured as co-located. I've followed the steps given by windows azure docs for caching (preview). I get the following error:
ErrorCode <ERRCA0017>:SubStatus<ES0006>:There is a temporary failure.
Please retry later. (One or more specified cache servers are
unavailable, which could be caused by busy network or servers. For
on-premises cache clusters, also verify the following conditions.
Ensure that security permission has been granted for this client
account, and check that the AppFabric Caching Service is allowed
through the firewall on all cache hosts. Also the MaxBufferSize on the
server must be greater than or equal to the serialized object size
sent from the client.). Additional Information : The client was trying
to communicate with the server: net.tcp://127.255.0.4:20010/.
I'm running everything as localhost, using the local development storage, my cache client is in the same role as the server. Changed many configuration attributes, but I always get that excpection or similar like "cannot connect to tcp....".
I'd appreciate some help. Thanks.
There are couple of things which could go wrong with your application.
Very first thing to make sure that you have SDK 1.7 in your machine even with Windows Azure Caching Services and then verify that you have reference set from Windows Azure Cache (not from Windows Server App Fabric SDK). I have seen such misconfiguration in past which lead to such errors.
Now have you changed your dataCacheClient, identifier to your ROLE Name as described in the documentation link here. If you follow the documentation as described to you should not hit any error so for the sake of checking what could be wrong, you can create exact same application as described in this link and see if that works or not.
To get more details error, please be sure to increase the DataCacheFactoryConfiguration.ChannelOpenTimeout value to longer i.e. 2 minutes then default 20 seconds as described here. This step will help you to get details about inner exception which may lead to actual root cause to your problem.
We use Azure co-located caching (not in preview anymore) as our session backer and have fairly regular outages. About once a month.
We tried using the Enterprise library Transient Fault Handling but our instances still hang when caching experiences problems. I think that the transient fault code would work for data caching, but for session backing there is some activity closer to the metal that we can't seem to code against.
The error codes have become more informative over the last year and go something like...
ErrorCode:SubStatus:The request timed out..
Additional Information : The client was trying to communicate with the
server: net.tcp://10.xx.xxx.xx:xxxxx/.
Our best guess so far from experimenting and MS support is that each, or at least one co-located cache role/instance needs to know about all the other instance's IPs, since Azure can destroy and re-up instances whenever they want, this sometimes fails to update the dependent instances. This is secret sauce for Azure, but it is not a secret when our site goes down. I'm looking for any more information on this and to see how others are working around this issue.
One possible work-around. One of our talented platform administrators found that resetting IIS on the instances and scaling up two more instances seem to help the problem. This makes sense to me because it gives caching another chance to gather all the required info about the other instances. This is NOT CONFIRMED to solve the problem but if we repeat this during the next outage it could be a valid work around.
So say I've got an MVC app hosted in the cloud somewhere, meaning I don't have access to IIS or any infrastructure.
All I have control over is the App code itself, and what comes down to the client.
My goal
Is to collect data over time of how well the MVC app is performing in terms of response times.
Current Problems
I can get a lot of data from Google Analyics, and other client-side tricks, but that won't tell if say, the App Pool is recycling too often.
Similarly if I put stop watches in the actions, that won't tell me about any delays in the App Startup (if it has to start up again).
Also, if I do put a stop watch in the Action, it doesn't take into account any delays in redering the View. For example, even though it's bad practice, there might be a DB call being made from the View, and my action metrics won't take that into account.
My Question
So, if I want to get true metrics of how long requests are taking overal from mulitple clients and users, where are the best places to but Stopwatches in the App. Or is it impossible to get true metrics from the app itself, and I have to place counters outside of the App (like in IIS).
Add New Relic, it's available for free as part of the AppHarbor service - https://appharbor.com/addons/newrelic
Since you mention "in the cloud somewhere" are you using Microsoft Azure for hosting? If so, there's some great diagnostics you can log to your Azure storage with DiagnosticsMonitorConfiguration.
Here's a tutorial on how to add diagnostics to your web and worker roles. You can find a full list of performance counters on MSDN
You can get everything from application requests/second, memory and CPU utilization, network adapter statistics, output cache hits/misses, request execution time, etc.