I have a IIS server running on Windows Server 2003. The server hosts multiple websites.
Occasionally the CPU load peaks in long durations of time, such that the server stops responding or responds with lag.
The problem is that we don't know which of the multiple websites is creating the high load - I have tried looking around in Performance Monitor for counters which could be used, but I don't see anything about CPU load for specific IIS instances.
This makes it quite hard to find the root of the problem.
For each application pool there is w3wp.exe process. So try to set each web application in a different application pool. By the way this is always a good practice.
Then run the following script.
Then you can see which web application is creating the high CPU load via the Task Manager or the Performance Monitor.
Have you tried checking the Performance counters for Garbage Collection 2 spikes (# Gen 2 Collections)?
Periods of very high CPU load are often attributable to this.
EDIT: This SO answer might be useful: What are the best ASP.NET performance counters to monitor?
This blog post describes collecting and interpreting GC Performance Counters: http://blogs.msdn.com/maoni/archive/2004/06/03/148029.aspx
Using WMI try :
To get process usage (W2K3/2K8) :
"SELECT IDProcess, PercentPrivilegedTime, PercentProcessorTime, PercentUserTime FROM Win32_PerfFormattedData_PerfProc_Process where Name='w3wp'"
To identify your site use this :
"SELECT ProcessId, CommandLine, WorkingSetSize, ThreadCount, PrivatePageCount, PageFileUsage, PageFaults, HandleCount, CreationDate, Caption FROM Win32_Process where Caption='w3wp.exe'"
Use this tool for test sql : http://code.msdn.microsoft.com/NitoWMI
Good luck.
Related
In our Application there are more the 2000 pages which are deployed in prod server. Sometime when user browse some URL's the CPU spikes going more than 70%. I can not find when it's occurs and which URL create this. So can any one tell me best open source tool to Monitor and Create logs of W3WP.exe process CPU utilization and request URL's when CPU spikes more than 50%.
procdump + windbg
There is a sysinternals tool called procdump which can automatically create a memory dump of your process for analysis when cpu exceeds a threshold.
From the command line usage:
-c CPU threshold at which to create a dump of the process.
Once you have a process dump you will need to load it into windbg in order to analyze what's taking up all the cpu cycles. Covering off windbg is pretty big, but here's briefly what you need to do:
load the SOS dll (managed debug extension)
call the !runaway command to get list of long running threads
dive into a long running thread by selecting it and calling !clrstack command
There are many blogs on using windbg. Here is one example. A great resource on analyzing these types of issues is Tess Ferrandez's blog.
perfmon + procdump + windbg
Perfmon can help you see if the issue is related to high rates of memory allocation which is causing garbage collection. You can look at CPU for w3wp as well as allocation rates for the process and the number of Gen 2 collections occurring. Gen 2 collections mean Gen 1 and 0 are also collected, meaning it can be an expensive operation. Counters to look at:
# Gen 2 Collections
% Time in GC
Allocated Bytes/second
If you see some very high allocation rates, you will still need a memory dump (procdump) and windbg to analyse what the root cause is.
Again - Tess Ferrandez has a blog post on this flavor of high cpu. In this post the issue is allocating large objects onto the heap.
perfmon + appcmd
I haven't tried this myself but in theory it should work, and is simpler than other options - though will not produce same level of detail. You can configure perfmon alerts on cpu for w3wp.exe. The alerts can be configured to run a task. You can create a batch file which runs the appcmd IIS tool and tell it to dump all the running requests:
appcmd list requests > c:\temp\high-cpu-requests.txt
This way you will get a list of long running requests when the cpu is high, and hopefully be able to work out offending page from there.
IIS Advanced Logging may help you here.
Whilst it will not give you CPU Utilisation per request, it can log CPU utilisation in general. What you could do is try and match these spikes to the requests that come before it.
I have a selfhost signalr application, everything is ok but when users become more than 5000, users reconnected rapidly. I know that defalt value of appConcurrentRequestLimit is 5000. and i run this:
cd %windir%\system32\inetsrv
appcmd.exe set config /section:system.webserver/serverRuntime /appConcurrentRequestLimit:100000
but nothing changed. I increased maxConcurrentRequestsPerCPU and requestQueueLimit according to this
but i have got problem yet.
i'm using windows server 2012 and iis 8
You are shooting in the dark here, and you have no data about the actual performance and what's happening. The users could reconnect because of different reasons (server timeouts, regular interval reconnects, server errors). There are countless possibilities.
The correct way to know what's happening and measure performance is to run a Baseline performance load test using the default configuration, and collect the relevant performance counters like current requests, queued requests, current connections, max connections etc.
You should also collect any relevant Error logs on the server that could help you figure out what's happening.
You can find the full list of performance counters you need below:
Memory
.NET CLR Memory# bytes in all Heaps (for w3wp)
ASP.NET
ASP.NET\Requests Current
ASP.NET\Queued
ASP.NET\Rejected
CPU
Processor Information\Processor Time
TCP/IP
TCPv6\Connections Established
TCPv4\Connections Established
Web Service
Web Service\Current Connections
Web Service\Maximum Connections
Threading
.NET CLR LocksAndThreads\ # of current logical Threads
.NET CLR LocksAndThreads\ # of current physical Threads
Once you have your baseline performance results on a graph, then you can modify configuration (e.g. modify the number of concurrent requests like you tried above) and then re-run your test, and collect again the same performance counters.
The performance counter results will speak for themselves, and they will lead you to a solution.
You can generate the load with a tool like Crank:
https://github.com/SignalR/SignalR/tree/dev/src/Microsoft.AspNet.SignalR.Crank
In addition you can also check the SignalR troubleshooting guide:
http://www.asp.net/signalr/overview/testing-and-debugging/troubleshooting
I have created a test plan for creating userprofile.
I want to run my test plan for 100 users but when i run it for 10 users then it is running successfully with rump up time of 2 sec; but when i try it for 100 users & more than that it is getting failed I am giving rump uptime of 40 sec for 100 users.
I am not able to understand what may be the problem with it.
In my test plan the thread user are differentiated with id
Thanks in Advance.
It's a wide question, this behavior can be caused by
Your application under test can't handle load of 100 threads. Check logs for errors and make sure that application/web server and/or database configuration allow 100+ concurrent connections. Also you can check "Latency" metric to see if there is a problem with infrastructure or application itself.
Your load generator machine can't create 100 concurrent threads. If so - you'll need to consider JMeter Distributed Testing
Your script isn't optimized. I.e. using memory-consuming listeners like "View Results Tree", any graph listeners, regular expression extractors. Try following JMeter Performance and Tuning Tips guide and see whether it resolves your issue.
Agree with Dmitri, reason could be one of the above three.
One more thing you can try.
You can run your jmeter in ui mode for validation of your script and after validation you can run it in non-ui mode which will save lot of memory and cpu processing (basically UI is heaviest part in jmeter).
you can run your jmeter script in non-ui mode like this,
Jmeter -n -t -H proxy -P port
generally on a single dual core machine with 2 GB ram (Load Generator in your case) 100 user test can be carried out successfully.
some more things you can look at to find out the actual bottleneck
1.check application server logs (server on which your application is hosted)
if there are any failures in that then see performance counters on server (CPU, Memory, network etc) to see anything is overloaded.
(if server is windows then check using perfmon if linux then try sar)
if something is overloaded then reason is your app server cant take load of 100 users
probably try tuning it more.
2.check load generator system performance counters (JVM heap usage,CPU,Memory etc)
if JVM heap size is small enough try increasing it but if other counters are overloaded then try distributed load testing.
3.remove unwanted/heavy listeners, assertion from script.
maybe this will help :)
I have a Windows Server 2008 with Plesk running two web sites.
Sometimes the server is going slow and there is a named.exe process making the CPU peak 100%.
It last a short period of time and after a while it comes again.
I would like to know what this process is for and how to configure it for not consuming this cpu and make my sites go slow.
This must be a DNS service, also known as Bind. High CPU usage may indicate one of the following:
DNS is re-reading its configuration. In this case high CPU usage shall be aligned with your activities in Plesk - i.e. adding and removing domains.
Someone (normally another DNS server) is pulling data from your DNS server. It is normal process. As you say it is for short period of time, it doesn't look like DNS DDoS
AFAIK there is no default way in Windows to restrict software from taking 100% CPU if no other apps require CPU at the moment.
See "DNS Treewalk Suite" system, off the process, and uses the antivirus.
Check the error "log" in the system.
I'm using a kind of load balancer over a small cluster that is able to achieve >2000rps on zero-duration requests (t.i. ones that are immediately satisfied by the worker nodes).
But as soon as the requests stop being zero-duration and start taking even 1ms, performance immediately drops >10x. The data being transfered in both directions is identical and is about 2kb in size.
This is for sure not related to saturation of the cluster or network throughput, because 200rps of 1ms requests is a very tiny load and the network is 10Gbit. Besides, the CPU load is just some 2-5% both on the load balancer and on the worker nodes.
I wonder whether that might be related to some pathological behavior of the OS scheduler, or the OS network stack (t.i. there is some special case behavior for very short interactions).
How might I diagnose the reason? Which perfcounters to watch? What tools or methodologies to use?
(Just in case someone simply knows the answer to my particular problem, I'm talking about the MS HPC Server 2008 R2's "WCF Broker", running on Windows Server 2008 R2 over Hyper-V)
One thing you can do is use ETW tracing to try and understand what the nodes are doing while your WCF job is running. On HPC server, I sometimes clusrun xperf to collect traces on all or specific nodes. There are a number of tools that you can use for analyzing ETW traces, including xperf itself. I haven't done any serious work using HPC SOA (WCF), but I did write a simple WCF raytracer app and then used xperf to profile it on several of the nodes.
Turned out it was a completely network-unrelated issue having to do with peculiarities of the scheduling mechanism of HPC Server. I resolved the issue by tweaking a configuration option "serviceRequestPrefetchCount" to 0 in the loadBalancing section of the WCF service config file.
I'm assuming that there are some shared resources with some kind of locking system in place? Is locking a bottleneck? It's hard to guess without seeing the system.
Do you have a way to profile the workers? What are they spending most of their time on, especially in the fast vs slow scenarios?