Bad health due to com.cloudera.cmon.agent.DnsTest timeout - hadoop

Problems:
More and more data nodes become bad health in Cloudera Manager.
Clue1:
no any task or job, just an idle data node here,
top
-bash-4.1$ top
top - 18:27:22 up 4:59, 3 users, load average: 4.55, 3.52, 3.18
Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
Cpu(s): 14.8%us, 85.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7932720k total, 1243372k used, 6689348k free, 52244k buffers
Swap: 6160376k total, 0k used, 6160376k free, 267228k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13766 root 20 0 2664m 21m 7048 S 85.4 0.3 190:34.75 java
17688 root 20 0 2664m 19m 7048 S 75.5 0.3 1:05.97 java
12765 root 20 0 2859m 21m 7140 S 36.9 0.3 133:25.46 java
2909 mapred 20 0 1894m 113m 14m S 1.0 1.5 2:55.26 java
1850 root 20 0 1469m 62m 4436 S 0.7 0.8 2:54.53 python
1332 root 20 0 50000 3000 2424 S 0.3 0.0 0:12.04 vmtoolsd
2683 hbase 20 0 1927m 152m 18m S 0.3 2.0 0:36.64 java
Clue2:
-bash-4.1$ ps -ef|grep 13766
root 13766 1850 99 16:01 ? 03:12:54 java -classpath /usr/share/cmf/lib/agent-4.6.3.jar com.cloudera.cmon.agent.DnsTest
Clue3:
in cloudera-scm-agent.log,
[30/Aug/2013 16:01:58 +0000] 1850 Monitor-HostMonitor throttling_logger ERROR Timeout with args ['java', '-classpath', '/usr/share/cmf/lib/agent-4.6.3.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[30/Aug/2013 16:01:58 +0000] 1850 Monitor-HostMonitor throttling_logger ERROR Failed to collect java-based DNS names
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 53, in collect
result, stdout, stderr = self._subprocess_with_timeout(args, self._poll_timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 42, in _subprocess_with_timeout
return SubprocessTimeout().subprocess_with_timeout(args, timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/subprocess_timeout.py", line 70, in subprocess_with_timeout
raise Exception("timeout with args %s" % args)
Exception: timeout with args ['java', '-classpath', '/usr/share/cmf/lib/agent-4.6.3.jar', 'com.cloudera.cmon.agent.DnsTest']
"cloudera-scm-agent.log" line 30357 of 30357 --100%-- col 1
Backgrouds:
if I restart all nodes, then everythings are OK, but after half and hour or more, bad health is coming one by one.
Version: Cloudera Standard 4.6.3 (#192 built by jenkins on 20130812-1221 git: fa61cf8559fbefeb5af7f223fd02164d1a0adfdb)
I added all nodes in /etc/hosts
the installed CDH is 4.3.1.
in fact, these nodes are VMs with fixed IP address.
Any suggestions?
BTW, where can I download source code of com.cloudera.cmon.agent.DnsTest?

Related

Aggregate Top CPU Using Processes

When I run top -n 1 -d 2 | head -n 12; it returns processor usage for some processes sorted by %cpu desc as desired, but I'm not convinced that the results are aggregated as they should be. I'm wanting to put these results in a file maybe like
while true; do
top -n 1 -d 2 | head -n 12;
done > top_cpu_users;
When I run top -d 2; interactively, I first see some results, then two seconds later I see the results updated and they appear to be aggregated over the last two seconds. The first results do not appear to be aggregated in the same way.
How do I get top cpu users every two seconds aggregated over the previous two seconds?
top will always capture a first full scan of process info for use as a baseline. It uses that to initialize the utility's database of values used for later comparative reporting. That is the basis of the first report presented to the screen.
The follow-on reports are the true measures for the specified evaluation intervals.
Your code snippet will therefore never provide what you are really looking for.
You need to skip the results from the first scan and only use the follow on reports, but the only way to do that is to generate them from a single command by specifying the count of scans desired, then parse the resulting combined report.
To that end, here is a proposed solution:
#!/bin/bash
output="top_cpu_users"
rm -f ${output} ${output}.tmp
snapshots=5
interval=2
process_count=6 ### Number of heavy hitter processes being monitored
top_head=7 ### Number of header lines in top report
lines=$(( ${process_count} + ${top_head} )) ### total lines saved from each report run
echo -e "\n Collecting process snapshots every ${interval} seconds ..."
top -b -n $(( ${snapshots} + 1 )) -d ${interval} > ${output}.tmp
echo -e "\n Parsing snapshots ..."
awk -v max="${lines}" 'BEGIN{
doprint=0 ;
first=1 ;
}
{
if( $1 == "top" ){
if( first == 1 ){
first=0 ;
}else{
print NR | "cat >&2" ;
print "" ;
doprint=1 ;
entry=0 ;
} ;
} ;
if( doprint == 1 ){
entry++ ;
print $0 ;
if( entry == max ){
doprint=0 ;
} ;
} ;
}' ${output}.tmp >${output}
more ${output}
The session output for that will look like this:
Collecting process snapshots every 2 seconds ...
Parsing snapshots ...
266
531
796
1061
1326
top - 20:14:02 up 8:37, 1 user, load average: 0.15, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 1.0 sy, 0.0 ni, 98.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.6 free, 2753.7 used, 767.6 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31773 root 20 0 0 0 0 I 1.5 0.0 0:09.08 kworker/0:3-events
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.04 top
1503 mysql 20 0 2387360 20664 2988 S 0.5 0.5 3:10.11 mysqld
2250 ericthe+ 20 0 1949412 130004 20272 S 0.5 3.5 0:46.16 caja
3104 ericthe+ 20 0 4837044 461944 127416 S 0.5 12.3 81:26.50 firefox
29998 ericthe+ 20 0 2636764 165632 54700 S 0.5 4.4 0:36.97 Isolated Web Co
top - 20:14:04 up 8:37, 1 user, load average: 0.14, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 0.7 sy, 0.0 ni, 97.4 id, 0.4 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.7 used, 767.6 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3104 ericthe+ 20 0 4837044 462208 127416 S 3.0 12.3 81:26.56 firefox
1503 mysql 20 0 2387360 20664 2988 S 1.0 0.5 3:10.13 mysqld
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.06 top
1489 root 20 0 546692 61584 48956 S 0.5 1.6 17:23.78 Xorg
2233 ericthe+ 20 0 303744 11036 7500 S 0.5 0.3 4:46.84 compton
7239 ericthe+ 20 0 2617520 127452 44768 S 0.5 3.4 1:41.13 Isolated Web Co
top - 20:14:06 up 8:37, 1 user, load average: 0.14, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.4 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.7 used, 767.6 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1489 root 20 0 546700 61584 48956 S 1.5 1.6 17:23.81 Xorg
3104 ericthe+ 20 0 4837044 462208 127416 S 1.5 12.3 81:26.59 firefox
1503 mysql 20 0 2387360 20664 2988 S 0.5 0.5 3:10.14 mysqld
2233 ericthe+ 20 0 303744 11036 7500 S 0.5 0.3 4:46.85 compton
2478 ericthe+ 20 0 346156 10368 8792 S 0.5 0.3 0:22.97 mate-cpufreq-ap
2481 ericthe+ 20 0 346540 11148 9168 S 0.5 0.3 0:41.73 mate-sensors-ap
top - 20:14:08 up 8:37, 1 user, load average: 0.14, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.5 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.6 used, 767.7 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.08 top
3104 ericthe+ 20 0 4837044 462208 127416 S 0.5 12.3 81:26.60 firefox
18370 ericthe+ 20 0 2682392 97268 45144 S 0.5 2.6 0:55.36 Isolated Web Co
19436 ericthe+ 20 0 2618496 123608 52540 S 0.5 3.3 1:55.08 Isolated Web Co
26630 ericthe+ 20 0 2690464 179020 56060 S 0.5 4.8 1:45.57 Isolated Web Co
29998 ericthe+ 20 0 2636764 165632 54700 S 0.5 4.4 0:36.98 Isolated Web Co
top - 20:14:10 up 8:37, 1 user, load average: 0.13, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.5 us, 0.9 sy, 0.0 ni, 96.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.6 used, 767.7 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3104 ericthe+ 20 0 4837076 463000 127416 S 7.5 12.3 81:26.75 firefox
1489 root 20 0 546716 61584 48956 S 1.5 1.6 17:23.84 Xorg
1503 mysql 20 0 2387360 20664 2988 S 1.0 0.5 3:10.16 mysqld
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.10 top
2233 ericthe+ 20 0 303744 11036 7500 S 0.5 0.3 4:46.86 compton
2481 ericthe+ 20 0 346540 11148 9168 S 0.5 0.3 0:41.74 mate-sensors-ap

apache running slow without using all ram

I have a centos server running apache with 8GB ram. It is running very slowly to load simple php pages. I have set the following on my config file. I cannot see any httpd process more than 100m. It usually slows down after 5mins from a restart.
<IfModule prefork.c>
StartServers 12
MinSpareServers 12
MaxSpareServers 12
ServerLimit 150
MaxClients 150
MaxRequestsPerChild 1000
</IfModule>
$ ps -ylC httpd | awk '{x += $8;y += 1} END {print "Apache Memory Usage (MB): "x/1024; print "Average Proccess Size (MB): "x/((y-1)*1024)}'
Apache Memory Usage (MB): 1896.09
Average Proccess Size (MB): 36.4633
What else can I do to make the pages load faster.
$ free -m
total used free shared buffers cached
Mem: 7872 847 7024 0 29 328
-/+ buffers/cache: 489 7382
Swap: 7999 934 7065
top - 15:42:17 up 545 days, 16:46, 2 users, load average: 0.05, 0.06, 0.
Tasks: 251 total, 1 running, 250 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 2.3%sy, 0.0%ni, 97.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.
Mem: 8060928k total, 909112k used, 7151816k free, 30216k buffers
Swap: 8191992k total, 956880k used, 7235112k free, 336612k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16544 apache 20 0 734m 47m 10m S 0.0 0.6 0:00.21 httpd
16334 apache 20 0 731m 45m 10m S 0.0 0.6 0:00.41 httpd
16212 apache 20 0 723m 37m 10m S 0.0 0.5 0:00.72 httpd
16555 apache 20 0 724m 37m 10m S 0.0 0.5 0:00.25 httpd
16347 apache 20 0 724m 36m 10m S 0.0 0.5 0:00.42 httpd
16608 apache 20 0 721m 34m 10m S 0.0 0.4 0:00.16 httpd
16088 apache 20 0 717m 31m 10m S 0.0 0.4 0:00.35 httpd
16012 apache 20 0 717m 30m 10m S 0.0 0.4 0:00.78 httpd
16338 apache 20 0 716m 30m 10m S 0.0 0.4 0:00.36 httpd
16336 apache 20 0 715m 29m 10m S 0.0 0.4 0:00.42 httpd
16560 apache 20 0 716m 29m 9.9m S 0.0 0.4 0:00.06 httpd
16346 apache 20 0 715m 28m 10m S 0.0 0.4 0:00.28 httpd
16016 apache 20 0 714m 28m 10m S 0.0 0.4 0:00.74 httpd
16497 apache 20 0 715m 28m 10m S 0.0 0.4 0:00.18 httpd
16607 apache 20 0 714m 27m 9m S 0.0 0.4 0:00.17 httpd
16007 root 20 0 597m 27m 15m S 0.0 0.3 0:00.13 httpd
16694 apache 20 0 713m 26m 10m S 0.0 0.3 0:00.10 httpd
16695 apache 20 0 712m 25m 9.9m S 0.0 0.3 0:00.04 httpd
16554 apache 20 0 712m 25m 10m S 0.0 0.3 0:00.15 httpd
16691 apache 20 0 598m 14m 2752 S 0.0 0.2 0:00.00 httpd
22613 root 20 0 884m 12m 6664 S 0.0 0.2 132:10.11 agtrep
16700 apache 20 0 597m 12m 712 S 0.0 0.2 0:00.00 httpd
16750 apache 20 0 597m 12m 712 S 0.0 0.2 0:00.00 httpd
16751 apache 20 0 597m 12m 712 S 0.0 0.2 0:00.00 httpd
2374 root 20 0 2616m 8032 1024 S 0.0 0.1 171:31.74 python
9699 root 0 -20 50304 6488 1168 S 0.0 0.1 1467:01 scopeux
9535 root 20 0 644m 6304 2700 S 0.0 0.1 21:01.24 coda
14976 root 20 0 246m 5800 2452 S 0.0 0.1 42:44.70 sssd_be
22563 root 20 0 825m 4704 2636 S 0.0 0.1 44:07.68 opcmona
22496 root 20 0 880m 4540 3304 S 0.0 0.1 13:54.78 opcmsga
22469 root 20 0 856m 4428 2804 S 0.0 0.1 1:18.45 ovconfd
22433 root 20 0 654m 4144 2752 S 0.0 0.1 10:45.71 ovbbccb
22552 root 20 0 253m 2936 1168 S 0.0 0.0 50:35.27 opcle
22521 root 20 0 152m 1820 1044 S 0.0 0.0 0:53.57 opcmsgi
14977 root 20 0 215m 1736 1020 S 0.0 0.0 15:53.13 sssd_nss
16255 root 20 0 254m 1704 1152 S 0.0 0.0 92:07.63 vmtoolsd
24180 root -51 -20 14788 1668 1080 S 0.0 0.0 9:48.57 midaemon
I do not have access to root
I have updated it to the following which seems to be better but i see occasional 7GB httpd process
<IfModule prefork.c>
StartServers 12
MinSpareServers 12
MaxSpareServers 12
ServerLimit 150
MaxClients 150
MaxRequestsPerChild 0
</IfModule>
top - 09:13:42 up 546 days, 10:18, 2 users, load average: 1.86, 1.51, 0.78
Tasks: 246 total, 2 running, 244 sleeping, 0 stopped, 0 zombie
Cpu(s): 28.6%us, 9.5%sy, 0.0%ni, 45.2%id, 16.7%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8060928k total, 7903004k used, 157924k free, 2540k buffers
Swap: 8191992k total, 8023596k used, 168396k free, 31348k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2466 apache 20 0 14.4g 7.1g 240 R 100.0 92.1 4:58.95 httpd
2285 apache 20 0 730m 31m 7644 S 0.0 0.4 0:02.37 httpd
2524 apache 20 0 723m 23m 7488 S 0.0 0.3 0:01.75 httpd
3770 apache 20 0 716m 21m 10m S 0.0 0.3 0:00.29 httpd
3435 apache 20 0 716m 20m 9496 S 0.0 0.3 0:00.60 httpd
3715 apache 20 0 713m 19m 10m S 0.0 0.2 0:00.35 httpd
3780 apache 20 0 713m 19m 10m S 0.0 0.2 0:00.22 httpd
3778 apache 20 0 713m 19m 10m S 0.0 0.2 0:00.28 httpd
3720 apache 20 0 712m 18m 10m S 0.0 0.2 0:00.21 httpd
3767 apache 20 0 712m 18m 10m S 0.0 0.2 0:00.21 httpd
3925 apache 20 0 712m 17m 10m S 0.0 0.2 0:00.11 httpd
2727 apache 20 0 716m 17m 7576 S 0.0 0.2 0:01.66 httpd
2374 root 20 0 2680m 14m 2344 S 0.0 0.2 173:44.40 python
9699 root 0 -20 50140 5556 624 S 0.0 0.1 1475:46 scopeux
3924 apache 20 0 598m 5016 2872 S 0.0 0.1 0:00.00 httpd
3926 apache 20 0 598m 5000 2872 S 0.0 0.1 0:00.00 httpd
14976 root 20 0 246m 2400 1280 S 0.0 0.0 42:51.54 sssd_be
9535 root 20 0 644m 2392 752 S 0.0 0.0 21:07.36 coda
22563 root 20 0 825m 2000 952 S 0.0 0.0 44:16.37 opcmona
22552 root 20 0 254m 1820 868 S 0.0 0.0 50:48.12 opcle
16255 root 20 0 254m 1688 1144 S 0.0 0.0 92:53.74 vmtoolsd
22536 root 20 0 282m 1268 892 S 0.0 0.0 24:21.73 opcacta
16784 root 20 0 597m 1236 180 S 0.0 0.0 0:02.16 httpd
14977 root 20 0 215m 1092 864 S 0.0 0.0 15:57.32 sssd_nss
22496 root 20 0 880m 1076 864 S 0.0 0.0 13:57.86 opcmsga
22425 root 20 0 1834m 944 460 S 0.0 0.0 74:12.96 ovcd
22433 root 20 0 654m 896 524 S 0.0 0.0 10:48.00 ovbbccb
2634 oiadmin 20 0 15172 876 516 R 9.1 0.0 0:14.78 top
2888 root 20 0 103m 808 776 S 0.0 0.0 0:00.19 sshd
16397 root 20 0 207m 748 420 S 0.0 0.0 32:52.23 ManagementAgent
2898 oiadmin 20 0 103m 696 556 S 0.0 0.0 0:00.08 sshd
22613 root 20 0 884m 580 300 S 0.0 0.0 132:34.94 agtrep
20886 root 20 0 245m 552 332 S 0.0 0.0 79:09.05 rsyslogd
2899 oiadmin 20 0 105m 496 496 S 0.0 0.0 0:00.03 bash
24180 root -51 -20 14788 456 408 S 0.0 0.0 9:50.43 midaemon
14978 root 20 0 203m 440 308 S 0.0 0.0 9:28.87 sssd_pam
14975 root 20 0 203m 432 288 S 0.0 0.0 21:45.01 sssd
8215 root 20 0 88840 420 256 S 0.0 0.0 3:28.13 sendmail
18909 oiadmin 20 0 103m 408 256 S 0.0 0.0 0:02.83 sshd
1896 root 20 0 9140 332 232 S 0.0 0.0 50:39.87 irqbalance
2990 oiadmin 20 0 98.6m 320 276 S 0.0 0.0 0:00.04 tail
4427 root 20 0 114m 288 196 S 0.0 0.0 8:58.77 crond
25628 root 20 0 4516 280 176 S 0.0 0.0 11:15.24 ndtask
4382 ntp 20 0 28456 276 176 S 0.0 0.0 0:28.61 ntpd
8227 smmsp 20 0 78220 232 232 S 0.0 0.0 0:05.09 sendmail
25634 root 20 0 6564 200 68 S 0.0 0.0 4:50.30 mgsusageag
4926 root 20 0 110m 188 124 S 0.0 0.0 3:23.79 abrt-dump-oops
9744 root 20 0 197m 180 136 S 0.0 0.0 1:46.59 perfalarm
22469 root 20 0 856m 128 128 S 0.0 0.0 1:18.65 ovconfd
4506 rpc 20 0 19036 84 40 S 0.0 0.0 1:44.05 rpcbind
32193 root 20 0 66216 68 60 S 0.0 0.0 4:54.51 sshd
18910 oiadmin 20 0 105m 52 52 S 0.0 0.0 0:00.11 bash
22521 root 20 0 152m 44 44 S 0.0 0.0 0:53.71 opcmsgi
18903 root 20 0 103m 12 12 S 0.0 0.0 0:00.22 sshd
1 root 20 0 19356 4 4 S 0.0 0.0 3:57.84 init
1731 root 20 0 105m 4 4 S 0.0 0.0 0:01.91 rhsmcertd
1983 dbus 20 0 97304 4 4 S 0.0 0.0 0:16.92 dbus-daemon
2225 root 20 0 4056 4 4 S 0.0 0.0 0:00.01 mingetty
Your server is slow because you are condeming it to a non-threaded, ever-reclaiming childs scenario.
That is you use more than 12 processes but your maxspareservers is 12 so HTTPD is constantly spawning and despawning processes and precisely that's the biggest weakness of a non-threaded mpm. And also such a low MaxRequestsPerChild won't help either if you have a decent amount of requests per second, although understable since you are using mod_php, that value will increase the constant re-spawning.
In any OS, spawning processes uses much more cpu than spawning threads inside a process.
So either you set MaxSpareServers to a very high number so your server have lots of them ready to serve your requests, or you STOP using mod_php+prefork and probably .htaccess (like everyone in here who seems to believe that's needed for apache httpd to work). to a more reliable: HTTPD with mpm_event + mod_proxy_fcgi + php-fpm, where you can configure dozens of hundreds of threads and apache will spawn and use them in less than a blink and you leave all your php load under php's own daemon, php-fpm.
So, it's not apache, it is your ever-respawning processes setup in a non-threaded mpm what's giving you troubles.

My VPS server is slow. Much slower than my share hosting

I have recently upgrade from shared hosting to VPS hosting.
But the VPS is much slow than the shared hosting. Any advice?
my website is www.sgyuan.com
top - 08:59:55 up 2 days, 15:10, 3 users, load average: 0.52, 0.40, 0.36
Tasks: 28 total, 1 running, 27 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1048576k total, 499848k used, 548728k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 0k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13782 mysql 15 0 157m 44m 6420 S 22 4.3 107:26.02 mysqld
19902 www-data 15 0 66396 12m 6916 S 1 1.2 0:06.69 apache2
19924 www-data 15 0 65928 12m 7120 S 1 1.2 0:07.39 apache2
1 root 18 0 2604 1464 1212 S 0 0.1 0:02.46 init
1155 root 15 0 2216 884 704 S 0 0.1 0:00.34 cron
1203 syslog 15 0 2020 724 572 S 0 0.1 0:02.38 syslogd
1264 root 15 0 5600 2156 1736 S 0 0.2 0:03.12 sshd
7555 root 15 0 8536 2884 2276 S 0 0.3 0:01.83 sshd
7567 root 15 0 3104 1760 1412 S 0 0.2 0:00.02 bash
7735 root 15 0 8548 2888 2268 S 0 0.3 0:01.86 sshd
7751 root 18 0 3176 1848 1428 S 0 0.2 0:00.21 bash
18341 memcache 18 0 43924 1104 808 S 0 0.1 0:00.02 memcached
19549 root 18 0 63972 8824 4960 S 0 0.8 0:00.13 apache2
19897 www-data 16 0 65652 12m 7008 S 0 1.2 0:06.78 apache2
19898 www-data 15 0 65896 12m 7328 S 0 1.2 0:07.16 apache2
19899 www-data 16 0 65932 12m 7328 S 0 1.2 0:07.29 apache2
19900 www-data 15 0 65640 12m 7320 S 0 1.2 0:07.60 apache2
19901 www-data 15 0 65676 12m 7048 S 0 1.2 0:10.32 apache2
19903 www-data 15 0 65672 11m 6568 S 0 1.2 0:06.38 apache2
19904 www-data 15 0 65640 12m 6876 S 0 1.2 0:06.32 apache2
19905 www-data 15 0 65928 12m 6800 S 0 1.2 0:06.66 apache2
20452 bind 18 0 105m 16m 2304 S 0 1.7 0:00.10 named
21720 root 15 0 17592 13m 1712 S 0 1.3 0:12.25 miniserv.pl
21991 root 18 0 2180 996 832 S 0 0.1 0:00.00 xinetd
22378 root 15 0 2452 1128 920 R 0 0.1 0:00.06 top
23834 root 15 0 8536 2920 2272 S 0 0.3 0:23.63 sshd
23850 root 15 0 3184 1868 1436 S 0 0.2 0:00.44 bash
29812 root 15 0 3820 1064 836 S 0 0.1 0:00.24 vsftpd
Is the web server config identical for the VPS and shared hosting configuration? That would be the first place I look because it's not trivial to tune apache to perform well. I'm assuming that with the VPS it is 100% your responsibility to configure the web server and you have to make the decisions about the number of clients, the process model, opcode caches, etc.

Passenger/Ruby memory usage goes out of control on Ubuntu

The last few days Passenger has been eating up loads of memory on my Slicehost VPS, and I can't seem to get it under control. It runs fine for a few hours, and then all of a sudden spawns tons of rubies. I thought Apache was the problem, so I switched to Nginx, but the problem persists. Here's a dump of top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5048 avishai 20 0 160m 43m 1192 S 0 10.9 0:00.77 ruby1.8
5102 avishai 20 0 151m 41m 1392 S 0 10.6 0:00.07 ruby1.8
5091 avishai 20 0 153m 30m 1400 D 0 7.6 0:00.27 ruby1.8
5059 avishai 20 0 158m 27m 1344 D 0 7.0 0:00.64 ruby1.8
4809 avishai 20 0 161m 27m 1208 D 0 6.9 0:06.65 ruby1.8
4179 avishai 20 0 162m 23m 1140 D 0 5.9 0:25.25 ruby1.8
5063 avishai 20 0 159m 23m 1200 D 0 5.9 0:00.65 ruby1.8
5044 avishai 20 0 159m 12m 1172 S 0 3.3 0:00.79 ruby1.8
5113 avishai 20 0 149m 9.8m 1576 D 0 2.5 0:00.00 ruby1.8
5076 avishai 20 0 155m 9.8m 1128 S 0 2.5 0:00.33 ruby1.8
3269 mysql 20 0 239m 5356 2156 S 0 1.3 0:00.35 mysqld
3510 root 20 0 49948 3580 736 S 0 0.9 1:01.86 ruby1.8
4792 root 20 0 98688 3560 644 S 0 0.9 0:00.84 ruby1.8
4799 avishai 20 0 148m 2204 600 S 0 0.5 0:01.64 ruby1.8
3508 root 20 0 295m 1972 1044 S 0 0.5 0:35.77 PassengerHelper
3562 nobody 20 0 39776 964 524 D 0 0.2 0:00.82 nginx
3561 nobody 20 0 39992 948 496 D 0 0.2 0:00.72 nginx
4238 avishai 20 0 19144 668 456 R 0 0.2 0:00.06 top
3293 syslog 20 0 123m 636 420 S 0 0.2 0:00.06 rsyslogd
3350 nobody 20 0 139m 432 220 S 0 0.1 0:00.05 memcached
3364 redis 20 0 50368 412 300 S 0 0.1 0:00.33 redis-server
1575 avishai 20 0 51912 324 216 S 0 0.1 0:00.00 sshd
3513 nobody 20 0 72272 192 160 S 0 0.0 0:00.02 PassengerLoggin
3330 root 20 0 21012 180 124 S 0 0.0 0:00.00 cron
3335 root 20 0 49184 152 144 S 0 0.0 0:00.01 sshd
1 root 20 0 23500 92 88 S 0 0.0 0:00.08 init
1573 root 20 0 51764 88 80 S 0 0.0 0:00.00 sshd
3505 root 20 0 89044 84 80 S 0 0.0 0:00.00 PassengerWatchd
3319 root 20 0 5996 68 64 S 0 0.0 0:00.00 getty
3323 root 20 0 6000 68 64 S 0 0.0 0:00.00 getty
3325 root 20 0 5996 68 64 S 0 0.0 0:00.00 getty
3326 root 20 0 6000 68 64 S 0 0.0 0:00.00 getty
3328 root 20 0 5996 68 64 S 0 0.0 0:00.00 getty
3383 root 20 0 5996 68 64 S 0 0.0 0:00.01 getty
Here's my environment:
RubyGems Environment:
- RUBYGEMS VERSION: 1.6.2
- RUBY VERSION: 1.8.7 (2011-02-18 patchlevel 334) [x86_64-linux]
- INSTALLATION DIRECTORY: /home/avishai/.rvm/gems/ruby-1.8.7-p334
- RUBY EXECUTABLE: /home/avishai/.rvm/rubies/ruby-1.8.7-p334/bin/ruby
- EXECUTABLE DIRECTORY: /home/avishai/.rvm/gems/ruby-1.8.7-p334/bin
- RUBYGEMS PLATFORMS:
- ruby
- x86_64-linux
- GEM PATHS:
- /home/avishai/.rvm/gems/ruby-1.8.7-p334
- /home/avishai/.rvm/gems/ruby-1.8.7-p334#global
- GEM CONFIGURATION:
- :update_sources => true
- :verbose => true
- :benchmark => false
- :backtrace => false
- :bulk_threshold => 1000
- "gem" => "--no-ri --no-rdoc"
- :sources => ["http://gems.rubyforge.org", "http://gems.github.com"]
- REMOTE SOURCES:
- http://gems.rubyforge.org
- http://gems.github.com
It appears you have a lot of instances running. Try limiting this as is appropriate for your system.
passenger_max_pool_size 2
I tend to go for one instance per 128MB of RAM you have.
Full documentation: http://www.modrails.com/documentation/Users%20guide%20Nginx.html#PassengerMaxPoolSize

percentage of memory used used by a process

percentage of memory used used by a process.
normally prstat -J will give the memory of process image and RSS(resident set size) etc.
how do i knowlist of processes with percentage of memory is used by a each process.
i am working on solaris unix.
addintionally ,what are the regular commands that you use for monitoring processes,performences of processes that might be very useful to all!
The top command will give you several memory-consumption numbers. htop is much nicer, and will give you percentages, but it isn't installed by default on most systems.
run
top and then Shift+O this will bring you to the options, press n (this maybe different on your machine) for memory and then hit enter
Example of memory sort.
top - 08:17:29 up 3 days, 8:54, 6 users, load average: 13.98, 14.01, 11.60
Tasks: 654 total, 2 running, 652 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.7%us, 1.5%sy, 0.0%ni, 59.5%id, 23.5%wa, 0.1%hi, 0.8%si, 0.0%st
Mem: 65851896k total, 49049196k used, 16802700k free, 1074664k buffers
Swap: 50331640k total, 0k used, 50331640k free, 32776940k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21635 oracle 15 0 6750m 636m 51m S 1.6 1.0 62:34.53 oracle
21623 oracle 15 0 6686m 572m 53m S 1.1 0.9 61:16.95 oracle
21633 oracle 16 0 6566m 445m 235m S 3.7 0.7 30:22.60 oracle
21615 oracle 16 0 6550m 428m 220m S 3.7 0.7 29:36.74 oracle
16349 oracle RT 0 431m 284m 41m S 0.5 0.4 2:41.08 ocssd.bin
17891 root RT 0 139m 118m 40m S 0.5 0.2 41:08.19 osysmond
18154 root RT 0 182m 98m 43m S 0.0 0.2 10:02.40 ologgerd
12211 root 15 0 1432m 84m 14m S 0.0 0.1 17:57.80 java
Another method on Solaris is to do the following
prstat -s size 1 1
Example prstat output
www004:/# prstat -s size 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
420 nobody 139M 60M sleep 29 10 1:46:56 0.1% webservd/76
603 nobody 135M 59M sleep 29 10 5:33:18 0.1% webservd/96
339 root 134M 70M sleep 59 0 0:35:38 0.0% java/24
435 iplanet 132M 55M sleep 29 10 1:10:39 0.1% webservd/76
573 nobody 131M 53M sleep 29 10 0:24:32 0.0% webservd/76
588 nobody 130M 53M sleep 29 10 2:40:55 0.1% webservd/86
454 nobody 128M 51M sleep 29 10 0:09:01 0.0% webservd/76
489 iplanet 126M 49M sleep 29 10 0:00:13 0.0% webservd/74
405 root 119M 45M sleep 29 10 0:00:13 0.0% webservd/31
717 root 54M 46M sleep 59 0 2:31:27 0.2% agent/7
Keep in mind this is sorted by Size not RSS, if you need it by RSS use the rss key
www004:/# prstat -s rss 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
339 root 134M 70M sleep 59 0 0:35:39 0.1% java/24
420 nobody 139M 60M sleep 29 10 1:46:57 0.4% webservd/76
603 nobody 135M 59M sleep 29 10 5:33:19 0.5% webservd/96
435 iplanet 132M 55M sleep 29 10 1:10:39 0.0% webservd/76
573 nobody 131M 53M sleep 29 10 0:24:32 0.0% webservd/76
588 nobody 130M 53M sleep 29 10 2:40:55 0.0% webservd/86
454 nobody 128M 51M sleep 29 10 0:09:01 0.0% webservd/76
489 iplanet 126M 49M sleep 29 10 0:00:13 0.0% webservd/74
I'm not sure if ps is standardized but at least on linux, ps -o %mem gives the percentage of memory used (you would obviously want to add some other columns as well)

Resources