I am trying to check status of nodes using nodetool
When i try nodetool status i get the following output
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.12 702.25 MB 256 23.5% 3ef5a6e7-123b-48cd-b486-8b6f61121a0c RAC2
UN 192.168.1.31 884.69 MB 256 25.6% ff0f1746-0379-4928-84b0-11efabbeea13 RAC1
UN 192.168.1.8 2.84 MB 1 0.1% 5fb9e1fa-c181-43a9-ac77-5578a1ee2086 RAC1
UN 192.168.1.27 692.18 MB 256 24.1% 95659096-97ef-419f-bd82-693f19ad7679 RAC2
UN 192.168.1.32 1.02 GB 256 26.7% 25a0c51a-9ffd-40f2-9e20-6899f36e8f3c RAC1
But When i check status for a particular keyspace e.g nodetool status keyspacetest
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.12 702.16 MB 256 49.2% 3ef5a6e7-123b-48cd-b486-8b6f61121a0c RAC2
UN 192.168.1.31 884.69 MB 256 48.0% ff0f1746-0379-4928-84b0-11efabbeea13 RAC1
UN 192.168.1.8 2.84 MB 1 0.1% 5fb9e1fa-c181-43a9-ac77-5578a1ee2086 RAC1
UN 192.168.1.27 692.18 MB 256 50.8% 95659096-97ef-419f-bd82-693f19ad7679 RAC2
UN 192.168.1.32 1.02 GB 256 51.9% 25a0c51a-9ffd-40f2-9e20-6899f36e8f3c RAC1
Now from the Owns % i understand how much % of data each nodes has and sum of all that give me 100% But for Owns(Effective) the % values is very different and the sum is around 200%.
I am confused with this stats.Can anyone help me out with this.
Owns (effective) shows numbers where replication factor is taken into account.
Thus in case of nodetool status keyspacetest it shows how much data with replicas each node is holding.
In other words, in your cluster you have 5 nodes (with unevenly spreaded vnodes). 4 out of 5 nodes has effectively ~50% of existing data in that keyspace.
Related
Relatively old Dell R620 server (32 cores / 128GB RAM) was working perfect for years with Ubuntu. Plain OS install, no Virtualization.
2 system disks in mirror (XFS)
6 RAID 5 disks for /var (XFS)
server is used for a nightly check of a MySQL Xtrabackup file.
Before the format and move to Centos 7 the process would finish by 08:00, Now running late at noon.
99% of the job is opening a large tar.gz file.
htop : there are only two processes doing something :
1. gzip -d : about 20% CPU
2. tar zxf Xtrabackup.tar.gz : about 4-7% CPU
iotop : it's steady at around 3M/s (Read) / 20-25 M/s (Write) which is about 25% of what i would expect at minimum.
Memory : Used : 1GB of 128GB
Server is fully updated both OS / HW / Firmware including the disks firmware.
IDRAC shows no problems.
Bottom line : Server is not working hard (to say the least) but performance is way off.
Any ideas would be appreciated.
vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 2 0 469072 0 130362040 0 0 57 341 0 0 0 0 98 2 0
0 2 0 456916 0 130374568 0 0 3328 24576 1176 3241 2 1 94 4 0
You have blocked processes and also io operations (around 20MB/s). And this mean for me you have few processes which concurrently access disc resources. What you can do to improve the performance is instead of
tar zxf Xtrabackup.tar.gz
use
gzip -d Xtrabackup.tar.gz|tar xvf -
The second add parallelism and can benefit from multy processor, You can also benefit from increase of the pipe (fifo) buffer. Check this answer for some ideas
Also consider to tune filesystem where are stored output files of tar
I have question about my disk partition,
here is the result from fdisk -l command
Disk /dev/loop0: 4294 MB, 4294967296 bytes
255 heads, 63 sectors/track, 522 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/xvda: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00050d75
Device Boot Start End Blocks Id System
/dev/xvda1 * 1 26109 209714176 83 Linux
As you can see, i have 500GB space (/dev/xvda) and our cPanel is using only 200GB (/dev/xvda1).
here is the result from lsblk command
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 4G 0 loop /tmp
xvda 202:0 0 500G 0 disk
└─xvda1 202:1 0 200G 0 part /
Here you can see i have 500GB disk,
My question is , How i can resize xvda1 so it can use it available space OR How i can can create new disk space to use in our cPanel to use more space.
My aim is to increase the disk space in cPanel but dont know how this is possible.
Thank's for your help !
You can use "growpart" to resize the partition and then reszie the file system.
install "cloud-guest-utils" if it is not installed already
apt install cloud-guest-utils
resize partition
growpart /dev/xvda 1
check the result
lsblk
resize filesystem
resize2fs /dev/xvda1
Check after resizing
df -h
Take a snapshot of your volume before trying this.
Run the following:
sudo yum install cloud-guest-utils
growpart /dev/xvda 1
then reboot
My server at Hetzner is crushed. Not all data was backed up. Please help me to mount and repair data from crashed disk.
Any help would be very good.
Here is mdstat information:
root#rescue ~ # cat /proc/mdstat
Personalities : [raid1] md2 : active raid1 sdb3[1]
1927689152 blocks super 1.2 [2/1] [_U]
md0 : active raid1 sda1[0] sdb1[1]
25149312 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sda2[0]
523968 blocks super 1.2 [2/1] [U_]
unused devices: <none>
Fdisk information:
root#rescue ~ # fdisk -l
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006cff0
Device Boot Start End Blocks Id System
/dev/sda1 2048 50333696 25165824+ fd Linux raid autodetect
/dev/sda2 50335744 51384320 524288+ fd Linux raid autodetect
/dev/sda3 51386368 3907027120 1927820376+ fd Linux raid autodetect
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000e2728
Device Boot Start End Blocks Id System
/dev/sdb1 2048 50333696 25165824+ fd Linux raid autodetect
/dev/sdb2 50335744 51384320 524288+ fd Linux raid autodetect
/dev/sdb3 51386368 3907027120 1927820376+ fd Linux raid autodetect
Disk /dev/md1: 536 MB, 536543232 bytes
2 heads, 4 sectors/track, 130992 cylinders, total 1047936 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/md1 doesn't contain a valid partition table
Disk /dev/md0: 25.8 GB, 25752895488 bytes
2 heads, 4 sectors/track, 6287328 cylinders, total 50298624 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/md0 doesn't contain a valid partition table
Disk /dev/md2: 1974.0 GB, 1973953691648 bytes
2 heads, 4 sectors/track, 481922288 cylinders, total 3855378304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/md2 doesn't contain a valid partition table
/md0 is swap..:
root#rescue ~ # mount /dev/md0 ./mnt2
/dev/md0 looks like swapspace - not mounted
mount: you must specify the filesystem type
Successfully mounted /dev/md1:
mount /dev/md1 ./mnt/
But these files a useless:
root#rescue ~/mnt # ls
abi-3.13.0-37-generic boot config-3.13.0-66-generic initrd.img-3.13.0-49-generic System.map-3.13.0-37-generic vmlinuz-3.13.0-37-generic
abi-3.13.0-49-generic config-3.13.0-37-generic grub initrd.img-3.13.0-66-generic System.map-3.13.0-49-generic vmlinuz-3.13.0-49-generic
abi-3.13.0-66-generic config-3.13.0-49-generic initrd.img-3.13.0-37-generic lost+found System.map-3.13.0-66-generic vmlinuz-3.13.0-66-generic
Next one..
root#rescue ~ # mount /dev/md2 ./mnt2
mount: Stale NFS file handle
mdadm says:
root#rescue ~ # mdadm -E -s
ARRAY /dev/md/1 metadata=1.2 UUID=81310c9d:bffcc81e:a2da516c:17950be1 name=rescue:1
ARRAY /dev/md/0 metadata=1.2 UUID=b8c0db78:11d75c0c:ee97c975:7f6dcfd5 name=rescue:0
ARRAY /dev/md/2 metadata=1.2 UUID=df88a4f2:bc751f21:609301cf:7336becf name=rescue:2
Going next..
root#rescue ~ # fsck /dev/md2
fsck from util-linux 2.20.1
e2fsck 1.42.5 (29-Jul-2012)
ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap
fsck.ext4: Group descriptors look bad... trying backup blocks...
fsck.ext4: Attempt to read block from filesystem resulted in short read while using the backup blocksfsck.ext4: going back to original superblock
/dev/md2 contains a file system with errors, check forced.
fsck.ext4: A block group is missing an inode table while reading bad blocks inode
This doesn't bode well, but we'll try to go on...
Pass 1: Checking inodes, blocks, and sizes
Group 2's inode table at 33554432 conflicts with some other fs block.
Relocate<y>? yes
Group 2's block bitmap at 33554432 conflicts with some other fs block.
Relocate<y>? yes
Restarting with -y:
root#rescue ~ # fsck /dev/md2 -y
...
Error reading block 24117281 (Attempt to read block from filesystem resulted in short read) while getting next inode from scan. Ignore error? yes
Force rewrite? yes
... (many of them)
...
fsck.ext4: e2fsck_read_bitmaps: illegal bitmap block(s) for /dev/md2
/dev/md2: ***** FILE SYSTEM WAS MODIFIED *****
e2fsck: aborted
/dev/md2: ***** FILE SYSTEM WAS MODIFIED *****
Give me a direction or any advice to move please.
Looks like I found a solution:
root#rescue ~ # mdadm -A -R /dev/md9 /dev/sda3
mdadm: /dev/md9 has been started with 1 drive (out of 2).
root#rescue ~ # mount /dev/md9 ./mnt
Thanks to https://blog.sleeplessbeastie.eu/2012/05/08/how-to-mount-software-raid1-member-using-mdadm/
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
i created my EC2 Machine using Community Image of Centos 6.3 x64. i have added a 35 GB disk. Now when i do #df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.9G 1.2G 6.4G 16% /
tmpfs 7.3G 0 7.3G 0% /dev/shm
my disk is 35GB but its showing 8 GB in root and 7 as tmpfs.
i tried to use resize2fs but it didnt work on centos. disk has ext4 partation..
# resize2fs /dev/xvda
resize2fs 1.41.12 (17-May-2010)
resize2fs: Device or resource busy while trying to open /dev/xvda
Couldn't find valid filesystem superblock.
or even if i tried resize2fs /dev/xvda1 it says device has nothing to do.
any idea or other way, its my root disk(/). so cant unmount it.
i found a way to do that, resize2fs not working in case not sure why but it says device or resource busy. i found a very good article on resizedisk using fdisk we can increase block size by deleting and creating it and Make the partition bootable. all it requires is a reboot. it wont effect your data if you use same start cylinder.
# df -h <<1>>
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 6.0G 2.0G 3.7G 35% /
tmpfs 15G 0 15G 0% /dev/shm
# fdisk -l <<2>>
Disk /dev/xvda: 21.5 GB, 21474836480 bytes
97 heads, 17 sectors/track, 25435 cylinders
Units = cylinders of 1649 * 512 = 844288 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003b587
Device Boot Start End Blocks Id System
/dev/xvda1 * 2 7632 6291456 83 Linux
# fdisk /dev/xvda <<3>>
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): u <<4>>
Changing display/entry units to sectors
Command (m for help): p <<5>>
Disk /dev/xvda: 21.5 GB, 21474836480 bytes
97 heads, 17 sectors/track, 25435 cylinders, total 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003b587
Device Boot Start End Blocks Id System
/dev/xvda1 * 2048 12584959 6291456 83 Linux
Command (m for help): d <<6>>
Selected partition 1
Command (m for help): n <<7>>
Command action
e extended
p primary partition (1-4)
p <<8>>
Partition number (1-4): 1 <<9>>
First sector (17-41943039, default 17): 2048 <<10>>
Last sector, +sectors or +size{K,M,G} (2048-41943039, default 41943039): <<11>>
Using default value 41943039
Command (m for help): p <<12>>
Disk /dev/xvda: 21.5 GB, 21474836480 bytes
97 heads, 17 sectors/track, 25435 cylinders, total 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003b587
Device Boot Start End Blocks Id System
/dev/xvda1 2048 41943039 20970496 83 Linux
Command (m for help): a <<13>>
Partition number (1-4): 1 <<14>>
Command (m for help): w <<15>>
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
# reboot <<16>>
<wait>
# df -h <<17>>
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 20G 2.0G 17G 11% /
tmpfs 15G 0 15G 0% /dev/shm
# resize2fs /dev/xvda1 <<18>>
resize2fs 1.41.12 (17-May-2010)
The filesystem is already 5242624 blocks long. Nothing to do!
The following steps very simple works very well for me:
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 30G 0 disk
└─xvda1 202:1 0 8G 0 part /
Perform the following command as root:
# yum install cloud-utils-growpart
# growpart /dev/xvda 1
# reboot
After the reboot:
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 30G 0 disk
└─xvda1 202:1 0 30G 0 part /
I got the same problem. All I need to do is
reboot the instance
run the command
sudo resize2fs -f /dev/xxxx
and it works well for me.
An Addition to Adeel Ahmad's Answer:
If you are attempting to start an instance from an AMI with a swap partition, then additional steps will have to be performed.
For example, if the ami contains as follows:
# fdisk -l
Disk /dev/xvde: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xe211223f
Device Boot Start End Blocks Id System
/dev/xvde1 * 1 1291 10369926 83 Linux
/dev/xvde2 1292 1305 112455 82 Linux swap / Solaris
If I have to upgrade my capacity to 20GB, i will create an AMI and try to launch another instance with 20GB space. After this, if i try the above steps, the disk space wont increase as there is a xvde2 partition in-between the xvde1 and the new space.
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvde1 9.8G 7.5G 1.8G 81% /
$ fdisk -l
Disk /dev/xvde: 21.5 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xe211223f
Device Boot Start End Blocks Id System
/dev/xvde1 * 1 1291 10369926 83 Linux
/dev/xvde2 1292 1305 112455 82 Linux swap / Solaris
$ resize2fs /dev/xvde1
resize2fs 1.41.12 (17-May-2010)
The filesystem is already 2592481 blocks long. Nothing to do!
In this case do the following
Delete both the partitions
Create new Primary partition with the new required size minus the size for swap space
Add bootable flag for this partition
Create second partition
Mark it as swap
write changes and reboot
Extend partition 1
Setup swap
OR
Deleting partition 1 Selected partition 1
Command (m for help): d <<6>>
Partition number (1-4): 1 <<6.0.1>>
Deleting partition 2 Selected partition 2
Command (m for help): d <<6.2>>
Creating resized primary partition 1
Command (m for help): n <<7>>
Command action
e extended
p primary partition (1-4)
p <<8>>
Partition number (1-4): 1 <<9>>
First sector (17-41943039, default 17): 2048 <<10>>
Last sector, +sectors or +size{K,M,G} (2048-41943039, default 41943039):<<NEW_UPPER_LIMIT>> <<11>>
TAKE CARE : 2048 should be replaced by your original starting sector
or the system wont boot. NEW_UPPER_LIMIT will be the new sector number
for upper limit and the rest will be left for swap. For maintaining
the same swap space, Subtract the original start and end sector
numbers and then subtract the result from 41943039(or your upper
limit)
Creating swap partition
Command (m for help): n <<12>>
Command action
e extended
p primary partition (1-4)
p <<13>>
Partition number (1-4): 2 <<14>>
First sector (<<NEW_UPPER_LIMIT+1>>-41943039, default <<NEW_UPPER_LIMIT+1>>): <<USE_DEFAULT>> <<15>>
Last sector, +sectors or +size{K,M,G}(<<NEW_UPPER_LIMIT+1>>-41943039,default 41943039):<<USE_DEFAULT>> <<16>>
Using default value 41943039
Adding bootable bit for partition 1
Command (m for help): a <<17>>
Partition number (1-4): 1 <<18>>
Marking partition 2 as swap
Command (m for help): l <<19>>
Now you will see a list of filesystems. Note the one corresponding to Linux swap (say 82)
Command (m for help): t <<20>>
Partition number (1-4): 2 <<21>>
Hex Code (type l to list codes) : 82 <<22>>
Write changes and reboot
Command (m for help): w <<23>> The partition table has been altered!
....
$ sudo reboot
After reboot run
resize2fs /dev/xvde1
This will resize your fs
Now to use the second partition as swap
$ mkswap /dev/<<SECOND SWAP PARTITION(run fdisk -l to get the name)>>
$ swapon /dev/<<SECOND SWAP PARTITION(run fdisk -l to get the name)>>
You can check the /proc/swaps file to verify
$ cat /proc/swaps
Now add the following to the /etc/fstab for these changes to be persistent
At the end of /etc/fstab (open with nano or vi etc)
/dev/<<SECOND SWAP PARTITION>> swap swap defaults 0 0
Save and Exit
Reboot and check
I had faced the same issue with my Debian 8 ec2 instance and getting below error
FAILED: failed to get CHS from /dev/xvda
Solution:
$ sudo parted /dev/xvda resizepart 1
Warning: Partition /dev/xvda1 is being used. Are you sure you want to continue?
Yes/No? yes
End? [8588MB]? 100
$ sudo resize2fs /dev/xvda1
$ lsblk
$ df -h
you will see that ebs volume has increased now.
I am running centos 5.5 with 768mb ram. i keep getting server reached MaxClients setting, consider raising the MaxClients setting in the logs also apache runs really slow. when i look at cacti graphs it shows the server is not even using all the resources.. here is the current configuration
<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 10
ServerLimit 1024
MaxClients 768
MaxRequestsPerChild 4000
</IfModule>
<IfModule worker.c>
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>
free -m
total used free shared buffers cached
Mem: 768 352 415 0 0 37
-/+ buffers/cache: 315 452
Swap: 0 0 0
top - 11:03:54 up 41 days, 11:53, 1 user, load average: 0.05, 0.03, 0.00
Tasks: 35 total, 1 running, 34 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Mem: 786432k total, 389744k used, 396688k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 38284k cached
I have tried the following but the server responds very slowly
<IfModule worker.c>
#StartServers 2
#MaxClients 150
#MinSpareThreads 25
#MaxSpareThreads 75
#ThreadsPerChild 25
#MaxRequestsPerChild 0
StartServers 20
MaxClients 1024
ServerLimit 1024
MinSpareThreads 128
MaxSpareThreads 768
ThreadsPerChild 64
MaxRequestsPerChild 0
</IfModule>
free -m
total used free shared buffers cached
Mem: 768 324 443 0 0 37
-/+ buffers/cache: 286 481
Swap: 0 0 0
#regilero
I have updated to
<IfModule prefork.c>
StartServers 12
MinSpareServers 12
MaxSpareServers 12
MaxClients 50
MaxRequestsPerChild 300
</IfModule>
using top i see
Tasks: 36 total, 1 running, 35 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 786432k total, 613180k used, 173252k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 76488k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 10364 92 60 S 0.0 0.0 1:09.53 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd/808
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper/808
124 root 16 -4 12620 8 4 S 0.0 0.0 0:00.00 udevd
533 root 20 0 95504 5692 228 S 0.0 0.7 4:02.94 memcached
546 root 20 0 5924 332 276 S 0.0 0.0 6:54.51 syslogd
557 root 20 0 101m 1456 868 S 0.0 0.2 13:18.64 snmpd
570 root 20 0 62640 316 208 S 0.0 0.0 2:39.56 sshd
579 root 20 0 21656 24 20 S 0.0 0.0 0:00.00 xinetd
589 root 20 0 12072 12 8 S 0.0 0.0 0:00.05 mysqld_safe
940 mysql 20 0 559m 164m 3832 S 0.3 21.5 209:33.88 mysqld
1015 root 20 0 20880 200 132 S 0.0 0.0 0:10.48 crond
1023 root 20 0 46748 4 0 S 0.0 0.0 0:00.00 saslauthd
1024 root 20 0 46748 4 0 S 0.0 0.0 0:00.00 saslauthd
3605 root 20 0 62832 2168 636 S 0.0 0.3 0:02.58 sendmail
3613 smmsp 20 0 57712 1648 504 S 0.0 0.2 0:00.01 sendmail
17610 root 20 0 85932 3312 2600 S 0.0 0.4 0:00.02 sshd
17612 mcmap 20 0 86072 1760 1012 S 0.0 0.2 0:00.17 sshd
17613 mcmap 20 0 12076 1656 1292 S 0.0 0.2 0:00.01 bash
17637 root 20 0 45052 1432 1120 S 0.0 0.2 0:00.00 su
17638 root 20 0 12180 1800 1324 S 0.0 0.2 0:00.08 bash
17740 root 20 0 246m 9264 4516 S 0.0 1.2 0:00.19 httpd
18264 apache 20 0 282m 43m 4940 S 0.0 5.7 0:00.56 httpd
18514 apache 20 0 279m 40m 4832 S 0.0 5.3 0:01.47 httpd
18518 apache 20 0 273m 36m 4396 S 0.0 4.7 0:00.45 httpd
18528 apache 20 0 251m 13m 3660 S 0.0 1.8 0:00.41 httpd
18529 apache 20 0 278m 40m 4340 S 0.0 5.3 0:00.99 httpd
18530 apache 20 0 278m 40m 4268 S 0.0 5.3 0:00.67 httpd
18548 apache 20 0 272m 33m 3516 S 0.0 4.4 0:00.28 httpd
18552 apache 20 0 280m 42m 3684 S 0.0 5.5 0:00.48 httpd
18553 apache 20 0 271m 33m 3768 S 0.0 4.3 0:00.45 httpd
18555 apache 20 0 274m 36m 3672 S 0.0 4.7 0:00.58 httpd
18572 apache 20 0 247m 9020 2856 S 0.0 1.1 0:00.01 httpd
18578 apache 20 0 280m 42m 3684 S 0.0 5.6 0:00.76 httpd
18589 apache 20 0 246m 5452 676 S 0.0 0.7 0:00.00 httpd
18588 root 20 0 12624 1216 932 R 0.0 0.2 0:00.06
free -m
total used free shared buffers cached
Mem: 768 578 189 0 0 74
-/+ buffers/cache: 504 263
Swap: 0 0 0
Just added current picture of cacti result last 4 hours. busy periods are monday tuesday. So i will wait till next week to see further results of the config change. but it looks like an improvement as before i only had max 10 threads available. Looking at this do you think i can make more improvment?
free -m
total used free shared buffers cached
Mem: 768 619 148 0 0 49
-/+ buffers/cache: 570 197
Swap: 0 0 0
NEW TEST
On a 2GB Ram VPS box i have now set prefork to
StartServers 20
MinSpareServers 20
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000
today morning my memcache server died from
Nov 20 09:28:40 vps22899094 kernel: Out of memory: Kill process 12517 (memcached) score 81 or sacrifice child
Nov 20 09:28:40 vps22899094 kernel: Killed process 12517, UID 497, (memcached) total-vm:565252kB, anon-rss:42940kB, file-rss:44kB
What should the optimal values be to set in apache?
#/etc/sysconfig/memcached
PORT="11211"
USER="memcached"
MAXCONN="1024"
CACHESIZE="1024"
OPTIONS="-l 127.0.0.1"
/etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
bind-address=127.0.0.1
#script
thread_concurrency=2
query_cache_size = 16M
query_cache_type=1
query_cache_limit=5M
# MyISAM #
#key-buffer-size = 32M
#myisam-recover = FORCE,BACKUP
# SAFETY #
#max-allowed-packet = 16M
#max-connect-errors = 1000000
# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 32M
#query-cache-type = 0
#query-cache-size = 0
max-connections = 50
thread-cache-size = 16
#open-files-limit = 65535
#table-definition-cache = 1024
#table-open-cache = 2048
# INNODB #
#innodb-flush-method = O_DIRECT
#innodb-log-files-in-group = 2
#innodb-log-file-size = 5M
#innodb-flush-log-at-trx-commit = 1
#innodb-file-per-table = 1
#innodb-buffer-pool-size = 921M
# LOGGING #
log-error = /var/log/mysqld.log
log-queries-not-using-indexes = 1
slow-query-log = 1
slow-query-log-file = /var/log/mysqld-slow.log
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
When you use Apache with mod_php apache is enforced in prefork mode, and not worker. As, even if php5 is known to support multi-thread, it is also known that some php5 libraries are not behaving very well in multithreaded environments (so you would have a locale call on one thread altering locale on other php threads, for example).
So, if php is not running in cgi way like with php-fpm you have mod_php inside apache and apache in prefork mode.
On your tests you have simply commented the prefork settings and increased the worker settings, what you now have is default values for prefork settings and some altered values for the shared ones :
StartServers 20
MinSpareServers 5
MaxSpareServers 10
MaxClients 1024
MaxRequestsPerChild 0
This means you ask apache to start with 20 process, but you tell it that, if there is more than 10 process doing nothing it should reduce this number of children, to stay between 5 and 10 process available. The increase/decrease speed of apache is 1 per minute. So soon you will fall back to the classical situation where you have a fairly low number of free available apache processes (average 2). The average is low because usually you have something like 5 available process, but as soon as the traffic grows they're all used, so there's no process available as apache is very slow in creating new forks. This is certainly increased by the fact your PHP requests seems to be quite long, they do not finish early and the apache forks are not released soon enough to treat another request.
See on the last graphic the small amount of green before the red peak? If you could graph this on a 1 minute basis instead of 5 minutes you would see that this green amount was not big enough to take the incoming traffic without any error message.
Now you set 1024 MaxClients. I guess the cacti graph are not taken after this configuration modification, because with such modification, when no more process are available, apache would continue to fork new children, with a limit of 1024 busy children. Take something like 20MB of RAM per child (or maybe you have a big memory_limit in PHP and allows something like 64MB or 256MB and theses PHP requests are really using more RAM), maybe a DB server... your server is now slowing down because you have only 768MB of RAM. Maybe when apache is trying to initiate the first 20 children you already reach the available RAM limit.
So. a classical way of handling that is to check the amount of memory used by an apache fork (make some top commands while it is running), then find how many parallel request you can handle with this amount of RAM (that mean parallel apache children in prefork mode). Let's say it's 12, for example. Put this number in apache mpm settings this way:
<IfModule prefork.c>
StartServers 12
MinSpareServers 12
MaxSpareServers 12
MaxClients 12
MaxRequestsPerChild 300
</IfModule>
That means you do not move the number of fork while traffic increase or decrease, because you always want to use all the RAM and be ready for traffic peaks. The 300 means you recyclate each fork after 300 requests, it's better than 0, it means you will not have potential memory leaks issues. MaxClients is set to 12 25 or 50 which is more than 12 to handle the ListenBacklog queue, which can enqueue some requests, you may take a bigger queue, but you would get some timeouts maybe (removed this strange sentende, I can't remember why I said that, if more than 12 requests are incoming the next one will be pushed in the Backlog queue, but you should set MaxClient to your targeted number of processes).
And yes, that means you cannot handle more than 12 parallel requests.
If you want to handle more requests:
buy some more RAM
try to use apache in worker mode, but remove mod_php and use php as a parallel daemon with his own pooler settings (this is called php-fpm), connect it with fastcgi. Note that you will certainly need to buy some RAM to allow a big number of parallel php-fpm process, but maybe less than with mod_php
Reduce the time spent in your php process. From your cacti graphs you have to potential problems: a real traffic peak around 11:25-11:30 or some php code getting very slow. Fast requests will reduce the number of parallel requests.
If your problem is really traffic peaks, solutions could be available with caches, like a proxy-cache server. If the problem is a random slowness in PHP then... it's an application problem, do you do some HTTP query to another site from PHP, for example?
And finally, as stated by #Jan Vlcinsky you could try nginx, where php will only be available as php-fpm. If you cannot buy RAM and must handle a big traffic that's definitively desserve a test.
Update: About internal dummy connections (if it's your problem, but maybe not).
Check this link and this previous answer. This is 'normal', but if you do not have a simple virtualhost theses requests are maybe hitting your main heavy application, generating slow http queries and preventing regular users to acces your apache processes. They are generated on graceful reload or children managment.
If you do not have a simple basic "It works" default Virtualhost prevent theses requests on your application by some rewrites:
RewriteCond %{HTTP_USER_AGENT} ^.*internal\ dummy\ connection.*$ [NC]
RewriteRule .* - [F,L]
Update:
Having only one Virtualhost does not protect you from internal dummy connections, it is worst, you are sure now that theses connections are made on your unique Virtualhost. So you should really avoid side effects on your application by using the rewrite rules.
Reading your cacti graphics, it seems your apache is not in prefork mode bug in worker mode. Run httpd -l or apache2 -l on debian, and check if you have worker.c or prefork.c. If you are in worker mode you may encounter some PHP problems in your application, but you should check the worker settings, here is an example:
<IfModule worker.c>
StartServers 3
MaxClients 500
MinSpareThreads 75
MaxSpareThreads 250
ThreadsPerChild 25
MaxRequestsPerChild 300
</IfModule>
You start 3 processes, each containing 25 threads (so 3*25=75 parallel requests available by default), you allow 75 threads doing nothing, as soon as one thread is used a new process is forked, adding 25 more threads. And when you have more than 250 threads doing nothing (10 processes) some process are killed. You must adjust theses settings with your memory. Here you allow 500 parallel process (that's 20 process of 25 threads). Your usage is maybe more:
<IfModule worker.c>
StartServers 2
MaxClients 250
MinSpareThreads 50
MaxSpareThreads 150
ThreadsPerChild 25
MaxRequestsPerChild 300
</IfModule>
Did you consider using nginx (or other event based web server) instead of apache?
nginx shall allow higher number of connections and consume much less resources (as it is event based and does not create separate process per connection). Anyway, you will need some processes, doing real work (like WSGI servers or so) and if they stay on the same server as the front end web server, you only shift the performance problem to a bit different place.
Latest apache version shall allow similar solution (configure it in event based manner), but this is not my area of expertise.
Here's an approach that could resolve your problem, and if not would help with troubleshooting.
Create a second Apache virtual server identical to the current one
Send all "normal" user traffic to the original virtual server
Send special or long-running traffic to the new virtual server
Special or long-running traffic could be report-generation, maintenance ops or anything else you don't expect to complete in <<1 second. This can happen serving APIs, not just web pages.
If your resource utilization is low but you still exceed MaxClients, the most likely answer is you have new connections arriving faster than they can be serviced. Putting any slow operations on a second virtual server will help prove if this is the case. Use the Apache access logs to quantify the effect.
I recommend to use bellow formula suggested on Apache:
MaxClients = (total RAM - RAM for OS - RAM for external programs) / (RAM per httpd process)
Find my script here which is running on Rhel 6.7. you can made change according to your OS.
#!/bin/bash
echo "HostName=`hostname`"
#Formula
#MaxClients . (RAM - size_all_other_processes)/(size_apache_process)
total_httpd_processes_size=`ps -ylC httpd --sort:rss | awk '{ sum += $9 } END { print sum }'`
#echo "total_httpd_processes_size=$total_httpd_processes_size"
total_http_processes_count=`ps -ylC httpd --sort:rss | wc -l`
echo "total_http_processes_count=$total_http_processes_count"
AVG_httpd_process_size=$(expr $total_httpd_processes_size / $total_http_processes_count)
echo "AVG_httpd_process_size=$AVG_httpd_process_size"
total_httpd_process_size_MB=$(expr $AVG_httpd_process_size / 1024)
echo "total_httpd_process_size_MB=$total_httpd_process_size_MB"
total_pttpd_used_size=$(expr $total_httpd_processes_size / 1024)
echo "total_pttpd_used_size=$total_pttpd_used_size"
total_RAM_size=`free -m |grep Mem |awk '{print $2}'`
echo "total_RAM_size=$total_RAM_size"
total_used_size=`free -m |grep Mem |awk '{print $3}'`
echo "total_used_size=$total_used_size"
size_all_other_processes=$(expr $total_used_size - $total_pttpd_used_size)
echo "size_all_other_processes=$size_all_other_processes"
remaining_memory=$(($total_RAM_size - $size_all_other_processes))
echo "remaining_memory=$remaining_memory"
MaxClients=$((($total_RAM_size - $size_all_other_processes) / $total_httpd_process_size_MB))
echo "MaxClients=$MaxClients"
exit