When I run top -n 1 -d 2 | head -n 12; it returns processor usage for some processes sorted by %cpu desc as desired, but I'm not convinced that the results are aggregated as they should be. I'm wanting to put these results in a file maybe like
while true; do
top -n 1 -d 2 | head -n 12;
done > top_cpu_users;
When I run top -d 2; interactively, I first see some results, then two seconds later I see the results updated and they appear to be aggregated over the last two seconds. The first results do not appear to be aggregated in the same way.
How do I get top cpu users every two seconds aggregated over the previous two seconds?
top will always capture a first full scan of process info for use as a baseline. It uses that to initialize the utility's database of values used for later comparative reporting. That is the basis of the first report presented to the screen.
The follow-on reports are the true measures for the specified evaluation intervals.
Your code snippet will therefore never provide what you are really looking for.
You need to skip the results from the first scan and only use the follow on reports, but the only way to do that is to generate them from a single command by specifying the count of scans desired, then parse the resulting combined report.
To that end, here is a proposed solution:
#!/bin/bash
output="top_cpu_users"
rm -f ${output} ${output}.tmp
snapshots=5
interval=2
process_count=6 ### Number of heavy hitter processes being monitored
top_head=7 ### Number of header lines in top report
lines=$(( ${process_count} + ${top_head} )) ### total lines saved from each report run
echo -e "\n Collecting process snapshots every ${interval} seconds ..."
top -b -n $(( ${snapshots} + 1 )) -d ${interval} > ${output}.tmp
echo -e "\n Parsing snapshots ..."
awk -v max="${lines}" 'BEGIN{
doprint=0 ;
first=1 ;
}
{
if( $1 == "top" ){
if( first == 1 ){
first=0 ;
}else{
print NR | "cat >&2" ;
print "" ;
doprint=1 ;
entry=0 ;
} ;
} ;
if( doprint == 1 ){
entry++ ;
print $0 ;
if( entry == max ){
doprint=0 ;
} ;
} ;
}' ${output}.tmp >${output}
more ${output}
The session output for that will look like this:
Collecting process snapshots every 2 seconds ...
Parsing snapshots ...
266
531
796
1061
1326
top - 20:14:02 up 8:37, 1 user, load average: 0.15, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 1.0 sy, 0.0 ni, 98.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.6 free, 2753.7 used, 767.6 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31773 root 20 0 0 0 0 I 1.5 0.0 0:09.08 kworker/0:3-events
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.04 top
1503 mysql 20 0 2387360 20664 2988 S 0.5 0.5 3:10.11 mysqld
2250 ericthe+ 20 0 1949412 130004 20272 S 0.5 3.5 0:46.16 caja
3104 ericthe+ 20 0 4837044 461944 127416 S 0.5 12.3 81:26.50 firefox
29998 ericthe+ 20 0 2636764 165632 54700 S 0.5 4.4 0:36.97 Isolated Web Co
top - 20:14:04 up 8:37, 1 user, load average: 0.14, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 0.7 sy, 0.0 ni, 97.4 id, 0.4 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.7 used, 767.6 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3104 ericthe+ 20 0 4837044 462208 127416 S 3.0 12.3 81:26.56 firefox
1503 mysql 20 0 2387360 20664 2988 S 1.0 0.5 3:10.13 mysqld
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.06 top
1489 root 20 0 546692 61584 48956 S 0.5 1.6 17:23.78 Xorg
2233 ericthe+ 20 0 303744 11036 7500 S 0.5 0.3 4:46.84 compton
7239 ericthe+ 20 0 2617520 127452 44768 S 0.5 3.4 1:41.13 Isolated Web Co
top - 20:14:06 up 8:37, 1 user, load average: 0.14, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.4 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.7 used, 767.6 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1489 root 20 0 546700 61584 48956 S 1.5 1.6 17:23.81 Xorg
3104 ericthe+ 20 0 4837044 462208 127416 S 1.5 12.3 81:26.59 firefox
1503 mysql 20 0 2387360 20664 2988 S 0.5 0.5 3:10.14 mysqld
2233 ericthe+ 20 0 303744 11036 7500 S 0.5 0.3 4:46.85 compton
2478 ericthe+ 20 0 346156 10368 8792 S 0.5 0.3 0:22.97 mate-cpufreq-ap
2481 ericthe+ 20 0 346540 11148 9168 S 0.5 0.3 0:41.73 mate-sensors-ap
top - 20:14:08 up 8:37, 1 user, load average: 0.14, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.5 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.6 used, 767.7 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.08 top
3104 ericthe+ 20 0 4837044 462208 127416 S 0.5 12.3 81:26.60 firefox
18370 ericthe+ 20 0 2682392 97268 45144 S 0.5 2.6 0:55.36 Isolated Web Co
19436 ericthe+ 20 0 2618496 123608 52540 S 0.5 3.3 1:55.08 Isolated Web Co
26630 ericthe+ 20 0 2690464 179020 56060 S 0.5 4.8 1:45.57 Isolated Web Co
29998 ericthe+ 20 0 2636764 165632 54700 S 0.5 4.4 0:36.98 Isolated Web Co
top - 20:14:10 up 8:37, 1 user, load average: 0.13, 0.13, 0.15
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.5 us, 0.9 sy, 0.0 ni, 96.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3678.9 total, 157.5 free, 2753.6 used, 767.7 buff/cache
MiB Swap: 2048.0 total, 1116.4 free, 931.6 used. 629.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3104 ericthe+ 20 0 4837076 463000 127416 S 7.5 12.3 81:26.75 firefox
1489 root 20 0 546716 61584 48956 S 1.5 1.6 17:23.84 Xorg
1503 mysql 20 0 2387360 20664 2988 S 1.0 0.5 3:10.16 mysqld
32254 ericthe+ 20 0 14500 3876 3092 R 1.0 0.1 0:00.10 top
2233 ericthe+ 20 0 303744 11036 7500 S 0.5 0.3 4:46.86 compton
2481 ericthe+ 20 0 346540 11148 9168 S 0.5 0.3 0:41.74 mate-sensors-ap
Related
The command '(sleep 4 ; echo q) | topas -P |tee /tmp/top' on AIX produces this output in a command text file and I am not able to remove the blank spaces from it. I have tried, sed/perl and awk commands to print only the lines containing characters however nothing has helped. Do I need to convert this file to a ASCII text format to use sed/per/grep or awk to remove the empty lines?.
$ file /tmp/top
/tmp/top: commands text
$ head -n33 /tmp/top
DATA TEXT PAGE PGFAULTS
USER PID PPID PRI NI RES RES SPACE TIME CPU% I/O OTH COMMAND
root 9044256 20447712 60 20 2.43M 304K 2.43M 0:00 2.1 0 253 topas
root 14942646 8913178 60 20 72.0M 37.8M 72.0M 0:42 0.2 0 1 TaniumCl
root 20447712 21889434 60 20 148K 312K 508K 0:00 0.2 0 0 ksh
root 21955056 20447712 60 20 216K 36.0K 216K 0:00 0.1 0 3 sed
root 24838602 20447712 60 20 120K 8.00K 120K 0:00 0.1 0 1 tee
root 9830690 10355194 60 20 120K 4.00K 120K 0:00 0.1 0 0 sleep
root 12255642 13893896 60 41 57.5M 39.8M 57.5M 33:42 0.1 0 0 mmfsd
root 10355194 20447712 60 20 148K 312K 508K 0:00 0.1 0 0 ksh
root 9109790 4063622 39 20 12.9M 3.68M 12.9M 5:19 0.1 0 0 rmcd
root 13697394 4063622 60 20 8.27M 55.9M 8.27M 17:18 0.1 0 0 backup_a
root 20906328 1 60 20 1.81M 0 1.81M 3:15 0.0 0 0 nfsd
root 4260244 1 60 20 620K 88.0K 620K 41:23 0.0 0 0 getty
root 1573172 0 37 41 960K 0 960K 15:17 0.0 0 0 gil
nagios 9240876 4063622 60 20 23.7M 736K 23.7M 9:43 0.0 0 0 ncpa_pas
root 4391332 1 60 20 12.5M 252K 12.5M 4:43 0.0 0 0 secldapc
a_RTHOMA 8323456 12059082 60 20 636K 3.06M 1016K 0:00 0.0 0 0 sshd
root 8388902 4063622 60 20 1.76M 1.05M 1.76M 7:03 0.0 0 0 clcomd
root 3539312 1 60 20 448K 0 448K 5:07 0.0 0 0 lock_rcv
root 3670388 1 60 20 448K 0 448K 4:18 0.0 0 0 sec_rcv
root 5767652 4063622 48 8 392K 324K 392K 2:49 0.0 0 0 xntpd
root 6816242 1 60 20 1.19M 0 1.19M 1:05 0.0 0 0 rpc.lock
root 459026 0 16 41 640K 0 640K 2:19 0.0 0 0 reaffin
root 23921008 1 60 20 1.00M 0 1.00M 4:36 0.0 0 0 n4cb
lpar2rrd 23200112 25625020 64 22 868K 120K 868K 0:00 0.0 0 0 vmstat
root 7143896 1 40 41 448K 0 448K 0:48 0.0 0 0 nfsWatch
root 6160840 1 60 20 448K 0 448K 0:09 0.0 0 0 j2gt
Looks like your output file has some ASCII characters so it's better to print only those lines which starting from letters do like:
awk '/^[a-zA-Z]/' Input_file
I'm using a web host with an Apache terminal, using it to host a NodeJS application. For the most part everything runs smooth, however when I open the terminal I often get bash: fork: retry: no child processes and bash: fork: retry: resource temporarily unavailable.
I've narrowed down the cause of the problem to my .bashrc file, as when using top I could see that the many excess processes being created were bash instances:
top - 13:41:13 up 71 days, 20:57, 0 users, load average: 1.82, 1.81, 1.72
Tasks: 14 total, 1 running, 2 sleeping, 11 stopped, 0 zombie
%Cpu(s): 11.7 us, 2.7 sy, 0.1 ni, 85.5 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 41034544 total, 2903992 free, 6525792 used, 31604760 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 28583704 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1001511 xxxxxxxx 20 0 11880 3692 1384 S 0.0 0.0 0:00.02 bash
1001578 xxxxxxxx 20 0 11880 2840 524 T 0.0 0.0 0:00.00 bash
1001598 xxxxxxxx 20 0 11880 2672 348 T 0.0 0.0 0:00.00 bash
1001599 xxxxxxxx 20 0 11880 2896 524 T 0.0 0.0 0:00.00 bash
1001600 xxxxxxxx 20 0 11880 2720 396 T 0.0 0.0 0:00.00 bash
1001607 xxxxxxxx 20 0 11880 2928 532 T 0.0 0.0 0:00.00 bash
1001613 xxxxxxxx 20 0 11880 2964 532 T 0.0 0.0 0:00.00 bash
1001618 xxxxxxxx 20 0 11880 2780 348 T 0.0 0.0 0:00.00 bash
1001619 xxxxxxxx 20 0 12012 3024 544 T 0.0 0.0 0:00.00 bash
1001620 xxxxxxxx 20 0 11880 2804 372 T 0.0 0.0 0:00.00 bash
1001651 xxxxxxxx 20 0 12012 2836 352 T 0.0 0.0 0:00.00 bash
1001653 xxxxxxxx 20 0 12016 3392 896 T 0.0 0.0 0:00.00 bash
1004463 xxxxxxxx 20 0 9904 1840 1444 S 0.0 0.0 0:00.00 bash
1005200 xxxxxxxx 20 0 56364 1928 1412 R 0.0 0.0 0:00.00 top
~/.bashrc consists of only:
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
# User specific aliases and functions
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion
If I comment out the last 3 lines like so:
#export NVM_DIR="$HOME/.nvm"
#[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
#[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion
Then the terminal functions as expected and no excess processes are created. However I obviously can't use nvm/npm commands while it's disabled as nvm isn't started.
I'm relatively inexperienced with bash and can't seem to figure out why this is happening. It seems that bash is somehow calling itself every time it opens, which creates the loop/fork bomb once the terminal is opened.
How can I prevent this while still being able to use nvm/npm?
I have a centos server running apache with 8GB ram. It is running very slowly to load simple php pages. I have set the following on my config file. I cannot see any httpd process more than 100m. It usually slows down after 5mins from a restart.
<IfModule prefork.c>
StartServers 12
MinSpareServers 12
MaxSpareServers 12
ServerLimit 150
MaxClients 150
MaxRequestsPerChild 1000
</IfModule>
$ ps -ylC httpd | awk '{x += $8;y += 1} END {print "Apache Memory Usage (MB): "x/1024; print "Average Proccess Size (MB): "x/((y-1)*1024)}'
Apache Memory Usage (MB): 1896.09
Average Proccess Size (MB): 36.4633
What else can I do to make the pages load faster.
$ free -m
total used free shared buffers cached
Mem: 7872 847 7024 0 29 328
-/+ buffers/cache: 489 7382
Swap: 7999 934 7065
top - 15:42:17 up 545 days, 16:46, 2 users, load average: 0.05, 0.06, 0.
Tasks: 251 total, 1 running, 250 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 2.3%sy, 0.0%ni, 97.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.
Mem: 8060928k total, 909112k used, 7151816k free, 30216k buffers
Swap: 8191992k total, 956880k used, 7235112k free, 336612k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16544 apache 20 0 734m 47m 10m S 0.0 0.6 0:00.21 httpd
16334 apache 20 0 731m 45m 10m S 0.0 0.6 0:00.41 httpd
16212 apache 20 0 723m 37m 10m S 0.0 0.5 0:00.72 httpd
16555 apache 20 0 724m 37m 10m S 0.0 0.5 0:00.25 httpd
16347 apache 20 0 724m 36m 10m S 0.0 0.5 0:00.42 httpd
16608 apache 20 0 721m 34m 10m S 0.0 0.4 0:00.16 httpd
16088 apache 20 0 717m 31m 10m S 0.0 0.4 0:00.35 httpd
16012 apache 20 0 717m 30m 10m S 0.0 0.4 0:00.78 httpd
16338 apache 20 0 716m 30m 10m S 0.0 0.4 0:00.36 httpd
16336 apache 20 0 715m 29m 10m S 0.0 0.4 0:00.42 httpd
16560 apache 20 0 716m 29m 9.9m S 0.0 0.4 0:00.06 httpd
16346 apache 20 0 715m 28m 10m S 0.0 0.4 0:00.28 httpd
16016 apache 20 0 714m 28m 10m S 0.0 0.4 0:00.74 httpd
16497 apache 20 0 715m 28m 10m S 0.0 0.4 0:00.18 httpd
16607 apache 20 0 714m 27m 9m S 0.0 0.4 0:00.17 httpd
16007 root 20 0 597m 27m 15m S 0.0 0.3 0:00.13 httpd
16694 apache 20 0 713m 26m 10m S 0.0 0.3 0:00.10 httpd
16695 apache 20 0 712m 25m 9.9m S 0.0 0.3 0:00.04 httpd
16554 apache 20 0 712m 25m 10m S 0.0 0.3 0:00.15 httpd
16691 apache 20 0 598m 14m 2752 S 0.0 0.2 0:00.00 httpd
22613 root 20 0 884m 12m 6664 S 0.0 0.2 132:10.11 agtrep
16700 apache 20 0 597m 12m 712 S 0.0 0.2 0:00.00 httpd
16750 apache 20 0 597m 12m 712 S 0.0 0.2 0:00.00 httpd
16751 apache 20 0 597m 12m 712 S 0.0 0.2 0:00.00 httpd
2374 root 20 0 2616m 8032 1024 S 0.0 0.1 171:31.74 python
9699 root 0 -20 50304 6488 1168 S 0.0 0.1 1467:01 scopeux
9535 root 20 0 644m 6304 2700 S 0.0 0.1 21:01.24 coda
14976 root 20 0 246m 5800 2452 S 0.0 0.1 42:44.70 sssd_be
22563 root 20 0 825m 4704 2636 S 0.0 0.1 44:07.68 opcmona
22496 root 20 0 880m 4540 3304 S 0.0 0.1 13:54.78 opcmsga
22469 root 20 0 856m 4428 2804 S 0.0 0.1 1:18.45 ovconfd
22433 root 20 0 654m 4144 2752 S 0.0 0.1 10:45.71 ovbbccb
22552 root 20 0 253m 2936 1168 S 0.0 0.0 50:35.27 opcle
22521 root 20 0 152m 1820 1044 S 0.0 0.0 0:53.57 opcmsgi
14977 root 20 0 215m 1736 1020 S 0.0 0.0 15:53.13 sssd_nss
16255 root 20 0 254m 1704 1152 S 0.0 0.0 92:07.63 vmtoolsd
24180 root -51 -20 14788 1668 1080 S 0.0 0.0 9:48.57 midaemon
I do not have access to root
I have updated it to the following which seems to be better but i see occasional 7GB httpd process
<IfModule prefork.c>
StartServers 12
MinSpareServers 12
MaxSpareServers 12
ServerLimit 150
MaxClients 150
MaxRequestsPerChild 0
</IfModule>
top - 09:13:42 up 546 days, 10:18, 2 users, load average: 1.86, 1.51, 0.78
Tasks: 246 total, 2 running, 244 sleeping, 0 stopped, 0 zombie
Cpu(s): 28.6%us, 9.5%sy, 0.0%ni, 45.2%id, 16.7%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8060928k total, 7903004k used, 157924k free, 2540k buffers
Swap: 8191992k total, 8023596k used, 168396k free, 31348k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2466 apache 20 0 14.4g 7.1g 240 R 100.0 92.1 4:58.95 httpd
2285 apache 20 0 730m 31m 7644 S 0.0 0.4 0:02.37 httpd
2524 apache 20 0 723m 23m 7488 S 0.0 0.3 0:01.75 httpd
3770 apache 20 0 716m 21m 10m S 0.0 0.3 0:00.29 httpd
3435 apache 20 0 716m 20m 9496 S 0.0 0.3 0:00.60 httpd
3715 apache 20 0 713m 19m 10m S 0.0 0.2 0:00.35 httpd
3780 apache 20 0 713m 19m 10m S 0.0 0.2 0:00.22 httpd
3778 apache 20 0 713m 19m 10m S 0.0 0.2 0:00.28 httpd
3720 apache 20 0 712m 18m 10m S 0.0 0.2 0:00.21 httpd
3767 apache 20 0 712m 18m 10m S 0.0 0.2 0:00.21 httpd
3925 apache 20 0 712m 17m 10m S 0.0 0.2 0:00.11 httpd
2727 apache 20 0 716m 17m 7576 S 0.0 0.2 0:01.66 httpd
2374 root 20 0 2680m 14m 2344 S 0.0 0.2 173:44.40 python
9699 root 0 -20 50140 5556 624 S 0.0 0.1 1475:46 scopeux
3924 apache 20 0 598m 5016 2872 S 0.0 0.1 0:00.00 httpd
3926 apache 20 0 598m 5000 2872 S 0.0 0.1 0:00.00 httpd
14976 root 20 0 246m 2400 1280 S 0.0 0.0 42:51.54 sssd_be
9535 root 20 0 644m 2392 752 S 0.0 0.0 21:07.36 coda
22563 root 20 0 825m 2000 952 S 0.0 0.0 44:16.37 opcmona
22552 root 20 0 254m 1820 868 S 0.0 0.0 50:48.12 opcle
16255 root 20 0 254m 1688 1144 S 0.0 0.0 92:53.74 vmtoolsd
22536 root 20 0 282m 1268 892 S 0.0 0.0 24:21.73 opcacta
16784 root 20 0 597m 1236 180 S 0.0 0.0 0:02.16 httpd
14977 root 20 0 215m 1092 864 S 0.0 0.0 15:57.32 sssd_nss
22496 root 20 0 880m 1076 864 S 0.0 0.0 13:57.86 opcmsga
22425 root 20 0 1834m 944 460 S 0.0 0.0 74:12.96 ovcd
22433 root 20 0 654m 896 524 S 0.0 0.0 10:48.00 ovbbccb
2634 oiadmin 20 0 15172 876 516 R 9.1 0.0 0:14.78 top
2888 root 20 0 103m 808 776 S 0.0 0.0 0:00.19 sshd
16397 root 20 0 207m 748 420 S 0.0 0.0 32:52.23 ManagementAgent
2898 oiadmin 20 0 103m 696 556 S 0.0 0.0 0:00.08 sshd
22613 root 20 0 884m 580 300 S 0.0 0.0 132:34.94 agtrep
20886 root 20 0 245m 552 332 S 0.0 0.0 79:09.05 rsyslogd
2899 oiadmin 20 0 105m 496 496 S 0.0 0.0 0:00.03 bash
24180 root -51 -20 14788 456 408 S 0.0 0.0 9:50.43 midaemon
14978 root 20 0 203m 440 308 S 0.0 0.0 9:28.87 sssd_pam
14975 root 20 0 203m 432 288 S 0.0 0.0 21:45.01 sssd
8215 root 20 0 88840 420 256 S 0.0 0.0 3:28.13 sendmail
18909 oiadmin 20 0 103m 408 256 S 0.0 0.0 0:02.83 sshd
1896 root 20 0 9140 332 232 S 0.0 0.0 50:39.87 irqbalance
2990 oiadmin 20 0 98.6m 320 276 S 0.0 0.0 0:00.04 tail
4427 root 20 0 114m 288 196 S 0.0 0.0 8:58.77 crond
25628 root 20 0 4516 280 176 S 0.0 0.0 11:15.24 ndtask
4382 ntp 20 0 28456 276 176 S 0.0 0.0 0:28.61 ntpd
8227 smmsp 20 0 78220 232 232 S 0.0 0.0 0:05.09 sendmail
25634 root 20 0 6564 200 68 S 0.0 0.0 4:50.30 mgsusageag
4926 root 20 0 110m 188 124 S 0.0 0.0 3:23.79 abrt-dump-oops
9744 root 20 0 197m 180 136 S 0.0 0.0 1:46.59 perfalarm
22469 root 20 0 856m 128 128 S 0.0 0.0 1:18.65 ovconfd
4506 rpc 20 0 19036 84 40 S 0.0 0.0 1:44.05 rpcbind
32193 root 20 0 66216 68 60 S 0.0 0.0 4:54.51 sshd
18910 oiadmin 20 0 105m 52 52 S 0.0 0.0 0:00.11 bash
22521 root 20 0 152m 44 44 S 0.0 0.0 0:53.71 opcmsgi
18903 root 20 0 103m 12 12 S 0.0 0.0 0:00.22 sshd
1 root 20 0 19356 4 4 S 0.0 0.0 3:57.84 init
1731 root 20 0 105m 4 4 S 0.0 0.0 0:01.91 rhsmcertd
1983 dbus 20 0 97304 4 4 S 0.0 0.0 0:16.92 dbus-daemon
2225 root 20 0 4056 4 4 S 0.0 0.0 0:00.01 mingetty
Your server is slow because you are condeming it to a non-threaded, ever-reclaiming childs scenario.
That is you use more than 12 processes but your maxspareservers is 12 so HTTPD is constantly spawning and despawning processes and precisely that's the biggest weakness of a non-threaded mpm. And also such a low MaxRequestsPerChild won't help either if you have a decent amount of requests per second, although understable since you are using mod_php, that value will increase the constant re-spawning.
In any OS, spawning processes uses much more cpu than spawning threads inside a process.
So either you set MaxSpareServers to a very high number so your server have lots of them ready to serve your requests, or you STOP using mod_php+prefork and probably .htaccess (like everyone in here who seems to believe that's needed for apache httpd to work). to a more reliable: HTTPD with mpm_event + mod_proxy_fcgi + php-fpm, where you can configure dozens of hundreds of threads and apache will spawn and use them in less than a blink and you leave all your php load under php's own daemon, php-fpm.
So, it's not apache, it is your ever-respawning processes setup in a non-threaded mpm what's giving you troubles.
Problems:
More and more data nodes become bad health in Cloudera Manager.
Clue1:
no any task or job, just an idle data node here,
top
-bash-4.1$ top
top - 18:27:22 up 4:59, 3 users, load average: 4.55, 3.52, 3.18
Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
Cpu(s): 14.8%us, 85.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7932720k total, 1243372k used, 6689348k free, 52244k buffers
Swap: 6160376k total, 0k used, 6160376k free, 267228k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13766 root 20 0 2664m 21m 7048 S 85.4 0.3 190:34.75 java
17688 root 20 0 2664m 19m 7048 S 75.5 0.3 1:05.97 java
12765 root 20 0 2859m 21m 7140 S 36.9 0.3 133:25.46 java
2909 mapred 20 0 1894m 113m 14m S 1.0 1.5 2:55.26 java
1850 root 20 0 1469m 62m 4436 S 0.7 0.8 2:54.53 python
1332 root 20 0 50000 3000 2424 S 0.3 0.0 0:12.04 vmtoolsd
2683 hbase 20 0 1927m 152m 18m S 0.3 2.0 0:36.64 java
Clue2:
-bash-4.1$ ps -ef|grep 13766
root 13766 1850 99 16:01 ? 03:12:54 java -classpath /usr/share/cmf/lib/agent-4.6.3.jar com.cloudera.cmon.agent.DnsTest
Clue3:
in cloudera-scm-agent.log,
[30/Aug/2013 16:01:58 +0000] 1850 Monitor-HostMonitor throttling_logger ERROR Timeout with args ['java', '-classpath', '/usr/share/cmf/lib/agent-4.6.3.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[30/Aug/2013 16:01:58 +0000] 1850 Monitor-HostMonitor throttling_logger ERROR Failed to collect java-based DNS names
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 53, in collect
result, stdout, stderr = self._subprocess_with_timeout(args, self._poll_timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 42, in _subprocess_with_timeout
return SubprocessTimeout().subprocess_with_timeout(args, timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/subprocess_timeout.py", line 70, in subprocess_with_timeout
raise Exception("timeout with args %s" % args)
Exception: timeout with args ['java', '-classpath', '/usr/share/cmf/lib/agent-4.6.3.jar', 'com.cloudera.cmon.agent.DnsTest']
"cloudera-scm-agent.log" line 30357 of 30357 --100%-- col 1
Backgrouds:
if I restart all nodes, then everythings are OK, but after half and hour or more, bad health is coming one by one.
Version: Cloudera Standard 4.6.3 (#192 built by jenkins on 20130812-1221 git: fa61cf8559fbefeb5af7f223fd02164d1a0adfdb)
I added all nodes in /etc/hosts
the installed CDH is 4.3.1.
in fact, these nodes are VMs with fixed IP address.
Any suggestions?
BTW, where can I download source code of com.cloudera.cmon.agent.DnsTest?
percentage of memory used used by a process.
normally prstat -J will give the memory of process image and RSS(resident set size) etc.
how do i knowlist of processes with percentage of memory is used by a each process.
i am working on solaris unix.
addintionally ,what are the regular commands that you use for monitoring processes,performences of processes that might be very useful to all!
The top command will give you several memory-consumption numbers. htop is much nicer, and will give you percentages, but it isn't installed by default on most systems.
run
top and then Shift+O this will bring you to the options, press n (this maybe different on your machine) for memory and then hit enter
Example of memory sort.
top - 08:17:29 up 3 days, 8:54, 6 users, load average: 13.98, 14.01, 11.60
Tasks: 654 total, 2 running, 652 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.7%us, 1.5%sy, 0.0%ni, 59.5%id, 23.5%wa, 0.1%hi, 0.8%si, 0.0%st
Mem: 65851896k total, 49049196k used, 16802700k free, 1074664k buffers
Swap: 50331640k total, 0k used, 50331640k free, 32776940k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21635 oracle 15 0 6750m 636m 51m S 1.6 1.0 62:34.53 oracle
21623 oracle 15 0 6686m 572m 53m S 1.1 0.9 61:16.95 oracle
21633 oracle 16 0 6566m 445m 235m S 3.7 0.7 30:22.60 oracle
21615 oracle 16 0 6550m 428m 220m S 3.7 0.7 29:36.74 oracle
16349 oracle RT 0 431m 284m 41m S 0.5 0.4 2:41.08 ocssd.bin
17891 root RT 0 139m 118m 40m S 0.5 0.2 41:08.19 osysmond
18154 root RT 0 182m 98m 43m S 0.0 0.2 10:02.40 ologgerd
12211 root 15 0 1432m 84m 14m S 0.0 0.1 17:57.80 java
Another method on Solaris is to do the following
prstat -s size 1 1
Example prstat output
www004:/# prstat -s size 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
420 nobody 139M 60M sleep 29 10 1:46:56 0.1% webservd/76
603 nobody 135M 59M sleep 29 10 5:33:18 0.1% webservd/96
339 root 134M 70M sleep 59 0 0:35:38 0.0% java/24
435 iplanet 132M 55M sleep 29 10 1:10:39 0.1% webservd/76
573 nobody 131M 53M sleep 29 10 0:24:32 0.0% webservd/76
588 nobody 130M 53M sleep 29 10 2:40:55 0.1% webservd/86
454 nobody 128M 51M sleep 29 10 0:09:01 0.0% webservd/76
489 iplanet 126M 49M sleep 29 10 0:00:13 0.0% webservd/74
405 root 119M 45M sleep 29 10 0:00:13 0.0% webservd/31
717 root 54M 46M sleep 59 0 2:31:27 0.2% agent/7
Keep in mind this is sorted by Size not RSS, if you need it by RSS use the rss key
www004:/# prstat -s rss 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
339 root 134M 70M sleep 59 0 0:35:39 0.1% java/24
420 nobody 139M 60M sleep 29 10 1:46:57 0.4% webservd/76
603 nobody 135M 59M sleep 29 10 5:33:19 0.5% webservd/96
435 iplanet 132M 55M sleep 29 10 1:10:39 0.0% webservd/76
573 nobody 131M 53M sleep 29 10 0:24:32 0.0% webservd/76
588 nobody 130M 53M sleep 29 10 2:40:55 0.0% webservd/86
454 nobody 128M 51M sleep 29 10 0:09:01 0.0% webservd/76
489 iplanet 126M 49M sleep 29 10 0:00:13 0.0% webservd/74
I'm not sure if ps is standardized but at least on linux, ps -o %mem gives the percentage of memory used (you would obviously want to add some other columns as well)