High CPU load but low CPU usage and RAM usage - performance

I am running a mobile website to get the live running status of any train in India. It is http://www.spoturtrain.com . The full code is written in PHP and Nginx is used as the webserver, php-fpm is used as the application server. All php requests are proxied to the app server. During peak traffic hours in the morning, the system load shoots up to 4 but the CPU% and the memory usage is low. Please take a look at the snapshot of the top command of the server.

Th %CPU displayed in the bottom section is per-thread, which means the percentage of one CPU core used by the indicated thread. The CPU(s) section indicates the total amount of available CPU that is being utilized, so it is possible to have one thread reporting that it is using 100% CPU, while only 25% (4 core) or 12.5% (8 core) of the overall CPU cycles are being consumed.
Analyzing thread CPU usage on Linux
You don't really ask a question, so it's hard to tell if you are wanting some advice or just asking to have the numbers explained. As #Charles states, a typical "acceptable" load is 1 per CPU core before noticeable performance degradation occurs, but in the case of PHP running on most web servers, you may (but probably won't in most cases) start noticing problems at anything above 1. Whether or not you do will largely depend on your disk and network I/O.
Whether or not the performance is acceptable for your application isn't something I can answer, but you can take a look at this thread for more places to jump into the options for getting your web server to thread requests.
What is thread safe or non thread safe in PHP
Whether or not you can do anything about it depends on your hosting situation.

Related

Monitor and create logs of W3WP.exe process CPU utilization and request URL's when CPU spikes more than 50%

In our Application there are more the 2000 pages which are deployed in prod server. Sometime when user browse some URL's the CPU spikes going more than 70%. I can not find when it's occurs and which URL create this. So can any one tell me best open source tool to Monitor and Create logs of W3WP.exe process CPU utilization and request URL's when CPU spikes more than 50%.
procdump + windbg
There is a sysinternals tool called procdump which can automatically create a memory dump of your process for analysis when cpu exceeds a threshold.
From the command line usage:
-c CPU threshold at which to create a dump of the process.
Once you have a process dump you will need to load it into windbg in order to analyze what's taking up all the cpu cycles. Covering off windbg is pretty big, but here's briefly what you need to do:
load the SOS dll (managed debug extension)
call the !runaway command to get list of long running threads
dive into a long running thread by selecting it and calling !clrstack command
There are many blogs on using windbg. Here is one example. A great resource on analyzing these types of issues is Tess Ferrandez's blog.
perfmon + procdump + windbg
Perfmon can help you see if the issue is related to high rates of memory allocation which is causing garbage collection. You can look at CPU for w3wp as well as allocation rates for the process and the number of Gen 2 collections occurring. Gen 2 collections mean Gen 1 and 0 are also collected, meaning it can be an expensive operation. Counters to look at:
# Gen 2 Collections
% Time in GC
Allocated Bytes/second
If you see some very high allocation rates, you will still need a memory dump (procdump) and windbg to analyse what the root cause is.
Again - Tess Ferrandez has a blog post on this flavor of high cpu. In this post the issue is allocating large objects onto the heap.
perfmon + appcmd
I haven't tried this myself but in theory it should work, and is simpler than other options - though will not produce same level of detail. You can configure perfmon alerts on cpu for w3wp.exe. The alerts can be configured to run a task. You can create a batch file which runs the appcmd IIS tool and tell it to dump all the running requests:
appcmd list requests > c:\temp\high-cpu-requests.txt
This way you will get a list of long running requests when the cpu is high, and hopefully be able to work out offending page from there.
IIS Advanced Logging may help you here.
Whilst it will not give you CPU Utilisation per request, it can log CPU utilisation in general. What you could do is try and match these spikes to the requests that come before it.

Node.js CPU load balancing

I created test with JMeter to test performance of Ghost blogging platform. Ghost written in Node.js and was installed in cloud server with 1Gb RAM, 1 CPU.
I noticed after 400 concurrent users JMeter getting errors. Till 400 concurrent users load is normal. I decide increase CPU and added 1 CPU.
But errors reproduced and added 2 CPUs, totally 4 CPUs. The problem is occuring after 400 concurrent users.
I don't understand why 1 CPU can handle 400 users and the same results with 4 CPUs.
During monitoring I noticed that only one CPU is busy and 3 other CPUs idle. When I check JMeter summary in console there were errors, about 5% of request. See screenshot.
I would like to know is it possible to balance load between CPUs?
Are you using cluster module to load-balance and Node 0.10.x?
If that's so, please update your node.js to 0.11.x.
Node 0.10.x was using balancing algorithm provided by an operating system. In 0.11.x the algorithm was changed, so it will be more evenly distributed from now on.
Node.js is famously single-threaded (see this answer): a single node process will only use one core (see this answer for a more in-depth look), which is why you see that your program fully uses one core, and that all other cores are idle.
The usual solution is to use the cluster core module of Node, which helps you launch a cluster of Node processes to handle the load, by allowing you to create child processes that all share the same server ports.
However, you can't really use this without fixing Ghost's code. An option is to use pm2, which can wrap a node program, by using the cluster module for you. For instance, with four cores:
$ pm2 start app.js -i 4
In theory this should work, except if Ghost relies on some global variables (that can't be shared by every process).
Use cluster core and for load balancing nginx. Thats bad part about node.js. Fantastic framework, but developer has to enter into load balancing mess. While java and other runtimes makes is seamless. Anyway, nothing is perfect.

What can interfere with testing a server's performance?

My HTTP server can't take load tests... It gives really high latency when multiple connections are made.
Server Configuration:
5 instances of (CPU 0.5vCore, Memory 512MB, Disk 20GB)
A load balancer
10G shared bandwidth
When I transfer a 3.5mb zip, it takes about 1second when there is only one connection. However, when over 30 connections are made, it goes up to 20~50 seconds.
I am testing with JMeter on my laptop. Is there a possibility that my testing environment interferes with the load-testing?
If so, what would be a solution to improve my testing environment?
First of all you need to monitor and pin down the problem(s).
Start off by picking up information on these four layers:
CPU Usage
Memory Usage
Network Usage
I/O Usage
All of them on the OS layer. (Monitoring tools will vary depending on your OS).
Once you have this data and you can narrow the problem path (CPU bound, network latency, I/O latency or whatever) an answer will kick in. Also doing this (if it is the first time you are trying to test your app) will help you get scaling information on your environment and your application in general.

What is named.exe process and how to avoid consuming high CPU rates

I have a Windows Server 2008 with Plesk running two web sites.
Sometimes the server is going slow and there is a named.exe process making the CPU peak 100%.
It last a short period of time and after a while it comes again.
I would like to know what this process is for and how to configure it for not consuming this cpu and make my sites go slow.
This must be a DNS service, also known as Bind. High CPU usage may indicate one of the following:
DNS is re-reading its configuration. In this case high CPU usage shall be aligned with your activities in Plesk - i.e. adding and removing domains.
Someone (normally another DNS server) is pulling data from your DNS server. It is normal process. As you say it is for short period of time, it doesn't look like DNS DDoS
AFAIK there is no default way in Windows to restrict software from taking 100% CPU if no other apps require CPU at the moment.
See "DNS Treewalk Suite" system, off the process, and uses the antivirus.
Check the error "log" in the system.

monitoring application (CPU and cache usage) on single Linux box with 80 cores

I am looking for a performance monitoring tool for my application which will collect/visualize in realtime the CPU and cache usage on single Linux box like IBM System or HP ProLiant with typical configuration 8 processors / 80 cores.
Application is the home-grown multithreaded C+ code which uses OpenMP.
This monitoring tool should not run 24 hours per day; it should not do e-mail notification.
I will run this tool just before sending commands to my apps, the apps will execute the command (it may take as a maximum few minutes only). During this time interval I need to analyze:
- usage of cores
- data movement between processors
- usage of L1, L2, L3 caches
- some other metrics (help me here) which can help to find bottleneck in application
performance and resource utilization
I guess that tools like Nagios / Zabbix are too heavy for this task.
From another side using the command-line tools like "top" and "sar" for 80 cores not very convenient and plotting (not necessary real-time) would be nice to have...
While getting the per core usage is rather easy - the other values might prove to be not practical, not at least without running that application within a profiler of some sorts.
Measuring QPI utilization is something highly non-trivial if at all possible. Intel's vTune might be able to acquire such things but only when running instrumented version of your binaries.
Also on x86 there is no way to figure out L1,L2,L3 usage of any kind - you can grab the low level CPU counters to measure cache misses though (but would probably need to use instrumented/profiled binaries and always withan something like vTune or PAPI).
You could "easily" setup something to pull all the lower level performance counters into SNMP and grab the SNMP values via standard SNMP capable monitoring tools but be aware that SNMP pulling is something that you don't want to occur more than 1-2/s max. Or pull that info into something like collectd.
I'm also having the impression that you don't understand the problem domain of monitoring tools. They are not ment to be used as low level analysis probes for finding application level/system bottlenecks - at best you could get some hints which resource (from a 10K feet view) is running under full utilization. Monitoring and alterting tools are something that operations staff needs to use to understand which part of their IT system is currently used and how, to gather historical data and predict future resource utilization and to be alerted when something breaks.
SiteScope, Hyperic or any combination of shell scripts, native OS utilities and a DB to store the results may do the job.

Resources