How to do server monitoring? [closed] - performance

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Having no exprience as devops I've just been given a project where I have to do the whole thing.
So, how do I keep an eye on usage of disk, memory, database space and access time, api reply times etc?

It's imperatively impossible for any admin to keep eyes on running processes at all time, this is where Server Monitory comes handy.
Try Monit, it can be easily installed with:
apt-get install monit -y
Monitoring:
nano /etc/monit/monitrc
Use the example config to configure what you would like to monitor, this is accessible over http or https as well, plus you don't really need to access it because it will alert you if anything goes wrong in your server. For example, you will get an email if your memory consumption is getting higher than what you specified in the config file above, or cpu is getting overloaded, or a certain website is down.
Let's dig into it a little bit.
type monit status to get status like the following:
The Monit daemon 5.3.2 uptime: 1h 32m
System 'myhost.mydomain.tld'
status Running
monitoring status Monitored
load average [0.03] [0.14] [0.20]
cpu 3.5%us 5.9%sy 0.0%wa
memory usage 26100 kB [10.4%]
swap usage 0 kB [0.0%]
data collected Thu, 30 Aug 2017 18:35:00
You can monitor virtually anything, apache, nginx, mysql, disks, process etc
Sample monit status:
File 'mysql_bin'
status Accessible
monitoring status Monitored
permission 755
uid 0
gid 0
timestamp Fri, 05 May 2017 22:33:39
size 16097088 B
checksum 6d7b5ffd8563f8ad44dde35ae4b8bd52 (MD5)
data collected Mon, 28 Aug 2017 06:21:02
File 'apache_rc'
status Accessible
monitoring status Monitored
permission 755
uid 0
gid 0
timestamp Fri, 05 May 2017 11:21:22
size 9974 B
checksum 55b2bc7ce5e4a0835877dbfd98c2646b (MD5)
data collected Mon, 28 Aug 2017 06:21:02
Filesystem 'Server01'
status Accessible
monitoring status Monitored
permission 660
uid 0
gid 6
filesystem flags 0x1000
block size 4096 B
blocks total 5006559 [19556.9 MB]
blocks free for non superuser 2615570 [10217.1 MB] [52.2%]
blocks free total 2875653 [11233.0 MB] [57.4%]
inodes total 1281120
inodes free 1085516 [84.7%]
data collected Mon, 28 Aug 2017 06:23:02
Filesystem 'Media'
status Accessible
monitoring status Monitored
permission 660
uid 0
gid 6
filesystem flags 0x1000
block size 4096 B
blocks total 4414923 [17245.8 MB]
blocks free for non superuser 3454811 [13495.4 MB] [78.3%]
blocks free total 3684839 [14393.9 MB] [83.5%]
inodes total 1130496
inodes free 1130384 [100.0%]
data collected Mon, 28 Aug 2017 06:23:02
System 'mywebsite.com'
status Resource limit matched
monitoring status Monitored
load average [0.01] [0.10] [0.61]
cpu 2.7%us 0.2%sy 0.0%wa
memory usage 1150372 kB [28.5%]
swap usage 184356 kB [35.2%]
data collected Mon, 28 Aug 2017 06:21:02
Setup with alert!
Don't forget that you will receive email alert for every rule that you specified to be monitor, eg when your website "mywebsite" is down, or when disk space is less than 20%, or disk failure, cpu is more than x% etc.
Install monit, check it's manual with man monit

You can user Window Performance Analyzer. Xperf is also helpful.
here is the link for the same.
https://msdn.microsoft.com/en-us/library/windows/hardware/hh162945.aspx

#!/bin/sh
file="/var/www/html/index.html"
linebreak="--------------------------------------------------------------------------------------------"
while true
do
echo "<html>" > $file
echo "<head>" >> $file
echo "<meta http-equiv="refresh" content="100">" >> $file
echo "</head>" >> $file
echo "<body>" >> $file
echo "<pre>" >> $file
date >> $file
echo $linebreak >> $file
uptime >> $file
echo $linebreak >> $file
top -b -n1 -u nobody | sed -n '3p' >> $file
echo $linebreak >> $file
free -m >> $file
echo $linebreak >> $file
df -h >> $file
echo $linebreak >> $file
iptables -nL >> $file
echo $linebreak >> $file
echo "</pre>" >> $file
echo "</body>" >> $file
echo "</html>" >> $file
sleep 100
done
I use this script to monitoring some information like temperature, disk usage, ram, firewall and so on.
I put the results in the index of an apache. So i can call the homepage of the server and see everything.
The script refreshs every 100 seconds the results. The webpage will refreshs every 100 seconds too.
With these script and apache you can monitor the server all over the world with mobile devices or pc.
Mo 28. Aug 14:36:03 CEST 2017
--------------------------------------------------------------------------------------------
14:36:03 up 1:34, 4 users, load average: 0,10, 0,09, 0,11
--------------------------------------------------------------------------------------------
%Cpu(s): 14,8 us, 1,6 sy, 0,7 ni, 82,2 id, 0,5 wa, 0,0 hi, 0,1 si, 0,0 st
--------------------------------------------------------------------------------------------
total used free shared buff/cache available
Mem: 3949 1027 756 74 2165 2542
Swap: 4093 0 4093
--------------------------------------------------------------------------------------------
Filesystem Size Used Avail Use% Mounted on
udev 2,0G 0 2,0G 0% /dev
tmpfs 395M 6,0M 389M 2% /run
/dev/sda1 21G 6,2G 14G 32% /
tmpfs 2,0G 43M 1,9G 3% /dev/shm
tmpfs 5,0M 4,0K 5,0M 1% /run/lock
tmpfs 2,0G 0 2,0G 0% /sys/fs/cgroup
Sharepoint 476G 300G 176G 64% /media/sf_Sharepoint
tmpfs 395M 92K 395M 1% /run/user/1000
--------------------------------------------------------------------------------------------
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
--------------------------------------------------------------------------------------------

Related

Bad Disk performance after moving from Ubuntu to Centos 7

Relatively old Dell R620 server (32 cores / 128GB RAM) was working perfect for years with Ubuntu. Plain OS install, no Virtualization.
2 system disks in mirror (XFS)
6 RAID 5 disks for /var (XFS)
server is used for a nightly check of a MySQL Xtrabackup file.
Before the format and move to Centos 7 the process would finish by 08:00, Now running late at noon.
99% of the job is opening a large tar.gz file.
htop : there are only two processes doing something :
1. gzip -d : about 20% CPU
2. tar zxf Xtrabackup.tar.gz : about 4-7% CPU
iotop : it's steady at around 3M/s (Read) / 20-25 M/s (Write) which is about 25% of what i would expect at minimum.
Memory : Used : 1GB of 128GB
Server is fully updated both OS / HW / Firmware including the disks firmware.
IDRAC shows no problems.
Bottom line : Server is not working hard (to say the least) but performance is way off.
Any ideas would be appreciated.
vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 2 0 469072 0 130362040 0 0 57 341 0 0 0 0 98 2 0
0 2 0 456916 0 130374568 0 0 3328 24576 1176 3241 2 1 94 4 0
You have blocked processes and also io operations (around 20MB/s). And this mean for me you have few processes which concurrently access disc resources. What you can do to improve the performance is instead of
tar zxf Xtrabackup.tar.gz
use
gzip -d Xtrabackup.tar.gz|tar xvf -
The second add parallelism and can benefit from multy processor, You can also benefit from increase of the pipe (fifo) buffer. Check this answer for some ideas
Also consider to tune filesystem where are stored output files of tar

EC2 error: cannot create temp file for here-document: Read-only file system

Looks like my Ubuntu 14.04 EC2 made the fs read-only.
cd /var/ (pressing tab for autocomplete)
cannot create temp file for here-document: Read-only file system
But I have plenty of free space and memory is not full either:
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Wed Feb 3 14:40:58 UTC 2016
System load: 0.0 Processes: 126
Usage of /: 14.9% of 11.67GB Users logged in: 0
Memory usage: 19% IP address for eth0: 172.31.15.38
Swap usage: 0%
df -hi:
/dev/xvda1 768K 85K 684K 12% /
none 251K 2 251K 1% /sys/fs/cgroup
udev 249K 387 249K 1% /dev
tmpfs 251K 309 250K 1% /run
none 251K 1 251K 1% /run/lock
none 251K 1 251K 1% /run/shm
none 251K 2 251K 1% /run/user
free:
total used free shared buffers cached
Mem: 2048484 1199420 849064 6248 180300 635596
-/+ buffers/cache: 383524 1664960
du -sch /tmp*
9.9M /tmp
9.9M total
What's the solution here? How can I fix the fs without losing my data?
Should I run:
mount -o remount,rw /
or should I reboot?
Thanks in advance!
Do you have btrfs filesystems? If so, when there´s not enough space for more snapshots, the OS changes their properties to read-only (including /tmp). For me, the solution was to delete snapshots and disable the snapper.
Use the following commands:
snapper list #shows a numbered list of snapshots
snapper delete nmbr #deletes snapshot number nmbr (retry after reboot if doesn't work at first)
Also, disable automatic snapshots by deleting corresponding files under /etc/cron.hourly, /etc/cron.daily, and so on.
I got the same issue and fixed the same as below:
There was a bad arguments given by someone for /etc/fstab
Wrong Entry in "/etc/fstab":
[ec2-user#XXXXXXXXXXX ~]$ cat /etc/fstab
#
UUID=XXXXXX / xfs defaults,noation 1 1
and Corrected entry is as below:
[ec2-user#XXXXXXXXXXX ~]$ cat /etc/fstab
#
UUID=XXXXXX / xfs defaults,noatime 1 1
Maybe you don't have write permission to the /tmp/ directory.
Check permission and it should be look like
ls -ld /tmp
drwxrwxrwt 10 root root 4096 Jun 5 11:32 /tmp
You can fix the permissions followed by
chmod a+rwxt /tmp

Puppet agent hangs and eventually gives a memory allocation error

I'm using puppet as a provisioner for Vagrant, and am coming across an issue where Puppet will hang for an extremely long time when I do a "vagrant provision". Building the box from scratch using "vagrant up" doesn't seem to be a problem, only subsequent provisions.
If I turn puppet debug on and watch where it hangs, it seems to stop at various, seemingly arbitrary, points the first of which is:
Info: Applying configuration version '1401868442'
Debug: Prefetching yum resources for package
Debug: Executing '/bin/rpm --version'
Debug: Executing '/bin/rpm -qa --nosignature --nodigest --qf '%{NAME} %|EPOCH?{% {EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n''
Executing this command on the server myself returns immediately.
Eventually, it gets past this and continues. Using the summary option, I get the following, after waiting for a very long time for it to complete:
Debug: Finishing transaction 70191217833880
Debug: Storing state
Debug: Stored state in 9.39 seconds
Notice: Finished catalog run in 1493.99 seconds
Changes:
Total: 2
Events:
Failure: 2
Success: 2
Total: 4
Resources:
Total: 18375
Changed: 2
Failed: 2
Skipped: 35
Out of sync: 4
Time:
User: 0.00
Anchor: 0.01
Schedule: 0.01
Yumrepo: 0.07
Augeas: 0.12
Package: 0.18
Exec: 0.96
Service: 1.07
Total: 108.93
Last run: 1401869964
Config retrieval: 16.49
Mongodb database: 3.99
File: 76.60
Mongodb user: 9.43
Version:
Config: 1401868442
Puppet: 3.4.3
This doesn't seem very helpful to me, as the amount of time total's 108 seconds, so where have the other 1385 seconds gone?
Throughout, Puppet seems to be hammering the box, using up a lot of CPU, but still doesn't seem to advance. The memory it uses seems to continually increase. When I kick off the command, top looks like this:
Cpu(s): 10.2%us, 2.2%sy, 0.0%ni, 85.5%id, 2.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4956928k total, 2849296k used, 2107632k free, 63464k buffers
Swap: 950264k total, 26688k used, 923576k free, 445692k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28486 root 20 0 439m 334m 3808 R 97.5 6.9 2:02.92 puppet
22 root 20 0 0 0 0 S 1.3 0.0 0:07.55 kblockd/0
18276 mongod 20 0 788m 31m 3040 S 1.3 0.6 2:31.82 mongod
20756 jboss-as 20 0 3081m 1.5g 21m S 1.3 31.4 7:13.15 java
20930 elastics 20 0 2340m 236m 6580 S 1.0 4.9 1:44.80 java
266 root 20 0 0 0 0 S 0.3 0.0 0:03.85 jbd2/dm-0-8
22717 vagrant 20 0 98.0m 2252 1276 S 0.3 0.0 0:01.81 sshd
28762 vagrant 20 0 15036 1228 932 R 0.3 0.0 0:00.10 top
1 root 20 0 19364 1180 964 S 0.0 0.0 0:00.86 init
To me, this seems fine, there's over 2GB of available memory and plenty of available swap. I have a max open files limit of 1024.
About 10-15 minutes later, still no advance in the console output, but top looks like this:
Cpu(s): 11.2%us, 1.6%sy, 0.0%ni, 86.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%s
Mem: 4956928k total, 3834376k used, 1122552k free, 64248k buffers
Swap: 950264k total, 24408k used, 925856k free, 445728k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28486 root 20 0 1397m 1.3g 3808 R 99.6 26.7 15:16.19 puppet
18276 mongod 20 0 788m 31m 3040 R 1.7 0.6 2:45.03 mongod
20756 jboss-as 20 0 3081m 1.5g 21m S 1.3 31.4 7:25.93 java
20930 elastics 20 0 2340m 238m 6580 S 0.7 4.9 1:52.03 java
8486 root 20 0 308m 952 764 S 0.3 0.0 0:06.03 VBoxService
As you can see, puppet is now using a lot more of the memory, and it seems to continue in this fashion. The box it's building has 5GB of RAM, so I wouldn't have expected it to have memory issues. However, further down the line, after a long wait, I do get "Cannot allocate memory - fork(2)"
Running unlimit -a, I get:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 38566
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Which, again looks fine to me...
To be honest, I'm completely at a loss as to how to go about solving this, or what is causing it.
Any help or insight would be greatly appreciated!
EDIT:
So I managed to fix this eventually... It came down to using recurse with a file directive for a large directory. The target directory in question contained around 2GB worth of files, and puppet took a huge amount of time loading this into memory and doing it's hashes and comparisons. The first time I stood the server up, the directory was relatively empty so the check was quick, but then other resources were placed in it that increased its size massively, meaning subsequent runs took much longer.
The memory error that eventually was thrown was because, I can only assume, Puppet was loading the whole thing into memory in order to do its stuff...
I found a way around using the recurse function, and am now trying to avoid it like the plague...
Yeah, the problem with the recurse parameter on the file type is that it checks every single file's checksum, which on a massive directory adds up real quick.
As Felix suggests, using checksum => none is one way to fix it, another is to accomplish the task you're trying to do (say chmod or chown a whole directory) with an exec performing the native task, with an unless to check if it's already been done.
Something like:
define check_mode($mode) {
exec { "/bin/chmod $mode $name":
unless => "/bin/sh -c '[ $(/usr/bin/stat -c %a $name) == $mode ]'",
}
}
Taken from http://projects.puppetlabs.com/projects/1/wiki/File_Permission_Check_Patterns

How do I get the percentage of used storage in the UNIX server [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I want to get the only Percentage of the Disk space in the UNIX server
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 457G 90G 344G 21% /
udev 2.0G 4.0K 2.0G 1% /dev
tmpfs 798M 1.1M 797M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 2.0G 23M 2.0G 2% /run/shm
cgroup 2.0G 0 2.0G 0% /sys/fs/cgroup
I am using the following command to get the Percentage data
df -h > space.txt
space=`head -2 space.txt | tail -1 | cut -d' ' -f15 | sed 's/%.*$//'`
Is there any command to get the "Used percentage" directly
You could use this:
used=$(df / | awk 'END{print $5}')
echo $used
56%
Rather than running df without specifying the filesystem you actually mean and then looking for it in a mass of lines, I specify / as the filesystem up front, then I know the result will be on the last line. I take advantage of that by using END in awk to get the 5th field in the last line only.
Using gnu coreutils df:
$ df -P | awk '/\/dev\/sd/ { print $1, $5, $6 } '
/dev/sda3 54% /
/dev/sda2 95% /media/data
Note the use of the -P flag to force each mount to print on exactly one line.
This is a requirement for predictable results when scripting.
This example includes both device $1 and mountpoint $6. You can drop either or both as you wish.
‘-P’
‘--portability’
Use the POSIX output format. This is like the default format except for the following:
The information about each file system is always printed on exactly one line; a mount device is never
put on a line by itself. This means that if the mount device name is more than 20 characters long
(e.g., for some network mounts), the columns are misaligned.
The labels in the header output line are changed to conform to POSIX.
The default block size and output format are unaffected by the DF_BLOCK_SIZE, BLOCK_SIZE
and BLOCKSIZE environment variables. However, the default block size is still affected by
POSIXLY_CORRECT: it is 512 if POSIXLY_CORRECT is set, 1024 otherwise. See Block size.

How to obtain the virtual private memory of a process from the command line under OSX?

I would like to obtain the virtual private memory consumed by a process under OSX from the command line. This is the value that Activity Monitor reports in the "Virtual Mem" column. ps -o vsz reports the total address space available to the process and is therefore not useful.
You can obtain the virtual private memory use of a single process by running
top -l 1 -s 0 -i 1 -stats vprvt -pid PID
where PID is the process ID of the process you are interested in. This results in about a dozen lines of output ending with
VPRVT
55M+
So by parsing the last line of output, one can at least obtain the memory footprint in MB. I tested this on OSX 10.6.8.
update
I realized (after I got downvoted) that #user1389686 gave an answer in the comment section of the OP that was better than my paltry first attempt. What follows is based on user1389686's own answer. I cannot take credit for it -- I've just cleaned it up a bit.
original, edited with -stats vprvt
As Mahmoud Al-Qudsi mentioned, top does what you want. If PID 8631 is the process you want to examine:
$ top -l 1 -s 0 -stats vprvt -pid 8631
Processes: 84 total, 2 running, 82 sleeping, 378 threads
2012/07/14 02:42:05
Load Avg: 0.34, 0.15, 0.04
CPU usage: 15.38% user, 30.76% sys, 53.84% idle
SharedLibs: 4668K resident, 4220K data, 0B linkedit.
MemRegions: 15160 total, 961M resident, 25M private, 520M shared.
PhysMem: 917M wired, 1207M active, 276M inactive, 2400M used, 5790M free.
VM: 171G vsize, 1039M framework vsize, 1523860(0) pageins, 811163(0) pageouts.
Networks: packets: 431147/140M in, 261381/59M out.
Disks: 487900/8547M read, 2784975/40G written.
VPRVT
8631
Here's how I get at this value using a bit of Ruby code:
# Return the virtual memory size of the current process
def virtual_private_memory
s = `top -l 1 -s 0 -stats vprvt -pid #{Process.pid}`.split($/).last
return nil unless s =~ /\A(\d*)([KMG])/
$1.to_i * case $2
when "K"
1000
when "M"
1000000
when "G"
1000000000
else
raise ArgumentError.new("unrecognized multiplier in #{f}")
end
end
Updated answer, thats work under Yosemite, from user1389686:
top -l 1 -s 0 -stats mem -pid PID

Resources