ubuntu$ echo $GODEBUG
memprofilerate=1024
I set this var in my bashrc. But still when I do heap profiling only allocs > 512kb are seen in top command. What am I missing?
More outputs:
ubuntu$ env | grep '^GODEBUG='
GODEBUG=memprofilerate=1024
ubuntu$ env | grep '^GODEBUG=' | xxd
00000000: 474f 4445 4255 473d 6d65 6d70 726f 6669 GODEBUG=memprofi
00000010: 6c65 7261 7465 3d31 3032 340a lerate=1024.
Edit: This is a long running process running over 4 days. The linux top -p command is showing a steady increase in memory used by this process but pprof is not able to catch newly allocated space.
Last 2 profile collections:
Jul 26 2pm : Top shows used KiB mem - 3145692. pprof profile heap1.out
Jul 28 7pm : Top shows used KiB mem - 3915992. pprof profile heap2.out
go tool pprof of heap1.out and heap2.out show the same top 27 inuse_space numbers.
Related
We are running a kubernetes environment and we have a pod that is encountering memory issues. The pod runs only a single container, and this container is responsible for running various utility jobs throughout the day.
The issue is that this pod's memory usage grows slowly over time. There is a 6 GB memory limit for this pod, and eventually, the memory consumption grows very close to 6GB.
A lot of our utility jobs are written in Java, and when the JVM spins up for them, they require -Xms256m in order to start. Yet, since the pod's memory is growing over time, eventually it gets to the point where there isn't 256MB free to start the JVM, and the Linux oom-killer kills the java process. Here is what I see from dmesg when this occurs:
[Thu Feb 18 17:43:13 2021] Memory cgroup stats for /kubepods/burstable/pod4f5d9d31-71c5-11eb-a98c-023a5ae8b224/921550be41cd797d9a32ed7673fb29ea8c48dc002a4df63638520fd7df7cf3f9: cache:8KB rss:119180KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:119132KB inactive_file:8KB active_file:0KB unevictable:4KB
[Thu Feb 18 17:43:13 2021] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Thu Feb 18 17:43:13 2021] [ 5579] 0 5579 253 1 4 0 -998 pause
[Thu Feb 18 17:43:13 2021] [ 5737] 0 5737 3815 439 12 0 907 entrypoint.sh
[Thu Feb 18 17:43:13 2021] [13411] 0 13411 1952 155 9 0 907 tail
[Thu Feb 18 17:43:13 2021] [28363] 0 28363 3814 431 13 0 907 dataextract.sh
[Thu Feb 18 17:43:14 2021] [28401] 0 28401 768177 32228 152 0 907 java
[Thu Feb 18 17:43:14 2021] Memory cgroup out of memory: Kill process 28471 (Finalizer threa) score 928 or sacrifice child
[Thu Feb 18 17:43:14 2021] Killed process 28401 (java), UID 0, total-vm:3072708kB, anon-rss:116856kB, file-rss:12056kB, shmem-rss:0kB
Based on research I've been doing, here for example, it seems like it is normal on Linux to grow in memory consumption over time as various caches grow. From what I understand, cached memory should also be freed when new processes (such as my java process) begin to run.
My main question is: should this pod's memory be getting freed in order for these java processes to run? If so, are there any steps I can take to begin to debug why this may not be happening correctly?
Aside from this concern, I've also been trying to track down what is responsible for the growing memory in the first place. I was able to narrow it down to a certain job that runs every 15 minutes. I noticed that after every time it ran, used memory for the pod grew by ~.1 GB.
I was able to figure this out by running this command (inside the container) before and after each execution of the job:
cat /sys/fs/cgroup/memory/memory.usage_in_bytes | numfmt --to si
From there I narrowed down the piece of bash code from which the memory seems to consistently grow. That code looks like this:
while [ "z${_STATUS}" != "z0" ]
do
RES=`$CURL -X GET "${TS_URL}/wcs/resources/admin/index/dataImport/status?jobStatusId=${JOB_ID}"`
_STATUS=`echo $RES | jq -r '.status.status' || exit 1`
PROGRES=`echo $RES | jq -r '.status.progress' || exit 1`
[ "x$_STATUS" == "x1" ] && exit 1
[ "x$_STATUS" == "x3" ] && exit 3
[ $CNT -gt 10 ] && PrintLog "WC Job ($JOB_ID) Progress: $PROGRES Status: $_STATUS " && CNT=0
sleep 10
((CNT++))
done
[ "z${_STATUS}" == "z0" ] && STATUS=Success || STATUS=Failed
This piece of code seems innocuous to me at first glance, so I do not know where to go from here.
I would really appreciate any help, I've been trying to get to the bottom of this issue for days now.
I did eventually get to the bottom of this so I figured I'd post my solution here. I mentioned in my original post that I narrowed down my issue to the while loop that I posted above in my question. Each time the job in question ran, that while loop would iterate maybe 10 times. After the while loop completed, I noticed that utilized memory increased by 100MB each time pretty consistently.
On a hunch, I had a feeling the CURL command within the loop could be the culprit. And in fact, it did turn out that CURL was eating up my memory and not releasing it for whatever reason. Instead of looping and running the following CURL command:
RES=`$CURL -X GET "${TS_URL}/wcs/resources/admin/index/dataImport/status?jobStatusId=${JOB_ID}"`
I replaced this command with a simple python script that utilized the requests module to check our job statuses instead.
I am not sure still why CURL was the culprit in this case. After running CURL --version it appears that the underlying library being used is libcurl/7.29.0. Maybe there is an bug within that library version causing some issues with memory management, but that is just a guess.
In any case, switching from using python's requests module instead of CURL has resolved my issue.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Having no exprience as devops I've just been given a project where I have to do the whole thing.
So, how do I keep an eye on usage of disk, memory, database space and access time, api reply times etc?
It's imperatively impossible for any admin to keep eyes on running processes at all time, this is where Server Monitory comes handy.
Try Monit, it can be easily installed with:
apt-get install monit -y
Monitoring:
nano /etc/monit/monitrc
Use the example config to configure what you would like to monitor, this is accessible over http or https as well, plus you don't really need to access it because it will alert you if anything goes wrong in your server. For example, you will get an email if your memory consumption is getting higher than what you specified in the config file above, or cpu is getting overloaded, or a certain website is down.
Let's dig into it a little bit.
type monit status to get status like the following:
The Monit daemon 5.3.2 uptime: 1h 32m
System 'myhost.mydomain.tld'
status Running
monitoring status Monitored
load average [0.03] [0.14] [0.20]
cpu 3.5%us 5.9%sy 0.0%wa
memory usage 26100 kB [10.4%]
swap usage 0 kB [0.0%]
data collected Thu, 30 Aug 2017 18:35:00
You can monitor virtually anything, apache, nginx, mysql, disks, process etc
Sample monit status:
File 'mysql_bin'
status Accessible
monitoring status Monitored
permission 755
uid 0
gid 0
timestamp Fri, 05 May 2017 22:33:39
size 16097088 B
checksum 6d7b5ffd8563f8ad44dde35ae4b8bd52 (MD5)
data collected Mon, 28 Aug 2017 06:21:02
File 'apache_rc'
status Accessible
monitoring status Monitored
permission 755
uid 0
gid 0
timestamp Fri, 05 May 2017 11:21:22
size 9974 B
checksum 55b2bc7ce5e4a0835877dbfd98c2646b (MD5)
data collected Mon, 28 Aug 2017 06:21:02
Filesystem 'Server01'
status Accessible
monitoring status Monitored
permission 660
uid 0
gid 6
filesystem flags 0x1000
block size 4096 B
blocks total 5006559 [19556.9 MB]
blocks free for non superuser 2615570 [10217.1 MB] [52.2%]
blocks free total 2875653 [11233.0 MB] [57.4%]
inodes total 1281120
inodes free 1085516 [84.7%]
data collected Mon, 28 Aug 2017 06:23:02
Filesystem 'Media'
status Accessible
monitoring status Monitored
permission 660
uid 0
gid 6
filesystem flags 0x1000
block size 4096 B
blocks total 4414923 [17245.8 MB]
blocks free for non superuser 3454811 [13495.4 MB] [78.3%]
blocks free total 3684839 [14393.9 MB] [83.5%]
inodes total 1130496
inodes free 1130384 [100.0%]
data collected Mon, 28 Aug 2017 06:23:02
System 'mywebsite.com'
status Resource limit matched
monitoring status Monitored
load average [0.01] [0.10] [0.61]
cpu 2.7%us 0.2%sy 0.0%wa
memory usage 1150372 kB [28.5%]
swap usage 184356 kB [35.2%]
data collected Mon, 28 Aug 2017 06:21:02
Setup with alert!
Don't forget that you will receive email alert for every rule that you specified to be monitor, eg when your website "mywebsite" is down, or when disk space is less than 20%, or disk failure, cpu is more than x% etc.
Install monit, check it's manual with man monit
You can user Window Performance Analyzer. Xperf is also helpful.
here is the link for the same.
https://msdn.microsoft.com/en-us/library/windows/hardware/hh162945.aspx
#!/bin/sh
file="/var/www/html/index.html"
linebreak="--------------------------------------------------------------------------------------------"
while true
do
echo "<html>" > $file
echo "<head>" >> $file
echo "<meta http-equiv="refresh" content="100">" >> $file
echo "</head>" >> $file
echo "<body>" >> $file
echo "<pre>" >> $file
date >> $file
echo $linebreak >> $file
uptime >> $file
echo $linebreak >> $file
top -b -n1 -u nobody | sed -n '3p' >> $file
echo $linebreak >> $file
free -m >> $file
echo $linebreak >> $file
df -h >> $file
echo $linebreak >> $file
iptables -nL >> $file
echo $linebreak >> $file
echo "</pre>" >> $file
echo "</body>" >> $file
echo "</html>" >> $file
sleep 100
done
I use this script to monitoring some information like temperature, disk usage, ram, firewall and so on.
I put the results in the index of an apache. So i can call the homepage of the server and see everything.
The script refreshs every 100 seconds the results. The webpage will refreshs every 100 seconds too.
With these script and apache you can monitor the server all over the world with mobile devices or pc.
Mo 28. Aug 14:36:03 CEST 2017
--------------------------------------------------------------------------------------------
14:36:03 up 1:34, 4 users, load average: 0,10, 0,09, 0,11
--------------------------------------------------------------------------------------------
%Cpu(s): 14,8 us, 1,6 sy, 0,7 ni, 82,2 id, 0,5 wa, 0,0 hi, 0,1 si, 0,0 st
--------------------------------------------------------------------------------------------
total used free shared buff/cache available
Mem: 3949 1027 756 74 2165 2542
Swap: 4093 0 4093
--------------------------------------------------------------------------------------------
Filesystem Size Used Avail Use% Mounted on
udev 2,0G 0 2,0G 0% /dev
tmpfs 395M 6,0M 389M 2% /run
/dev/sda1 21G 6,2G 14G 32% /
tmpfs 2,0G 43M 1,9G 3% /dev/shm
tmpfs 5,0M 4,0K 5,0M 1% /run/lock
tmpfs 2,0G 0 2,0G 0% /sys/fs/cgroup
Sharepoint 476G 300G 176G 64% /media/sf_Sharepoint
tmpfs 395M 92K 395M 1% /run/user/1000
--------------------------------------------------------------------------------------------
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
--------------------------------------------------------------------------------------------
I would like to know how the OS prioritises the execution of background processes in Linux.
Suppose I have the below command, would it be executed right away, or would the OS prioritise the execution order.
nohup /bin/bash /tmp/kill_loop.sh &
Thanks
All processes running at the same nice value will get an equal cpu-timeslice.
Here is a simple test that launches 2 processes, both performing the exact same operations. One is launched in the background and the other in the foreground.
dd if=/dev/zero of=/dev/null bs=1 &
dd if=/dev/zero of=/dev/null bs=1
The relevant extract from subsequently running the top command
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1366 root 20 0 1576 532 436 R 100 0.0 0:30.79 dd
1365 root 20 0 1576 532 436 R 100 0.0 0:30.79 dd
Next, if both the processes are restricted to the same CPU,
taskset -c 0 dd if=/dev/zero of=/dev/null bs=1 &
taskset -c 0 dd if=/dev/zero of=/dev/null bs=1
Again the relevant extract from subsequently running the top command shows
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1357 root 20 0 1576 532 436 R 50 0.0 0:38.74 dd
1358 root 20 0 1576 532 436 R 50 0.0 0:38.74 dd
both the processes compete for CPU-timeslice and are equally prioritised.
Finally,
kill -SIGINT 1357 &
kill -SIGINT 1358 &
kill -SIGINT 1365 &
kill -SIGINT 1366 &
results in similar amounts of data copied and throughput.
25129255+0 records in
25129255+0 records out
25129255 bytes (25 MB) copied, 34.883 s, 720 kB/s
Slight discrepancies in output may occur in the throughput due to differences in the exact moment the individual processes respond to the break-signal and stop running.
However also note that sched_autogroup_enabled exists.
if enabled, sched_autogroup_enabled ensures that the fairness in distributing cpu-timeslice is now performed between individual shells. By distributing cpu equally amongst the various active shells.
Thus if a shell launches 1 process A,
and another shell launches 2 processes B and C,
then the CPU execution timeslice will typically be distributed as
A <-- 50% <---- shell1 50%
B <-- 25% <-.
C <-- 25% <--`- shell2 50%
(though all 3 processes A, B & C are running at the same nice level.)
The process priorities in Linux kernel is given by NICE values.
Refer to the link
http://en.wikipedia.org/wiki/Nice_(Unix)
The nice values (ranging between -20 to +19) define the process priorities, -20 being the highest priority task. Usually the user-space processes are given default nice values of '0'. You can check the nice values for the running processes on your shell using the below command.
ps -al
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1039 1268 16889 0 80 0 - 11656 poll_s pts/8 00:00:08 vim
0 S 1047 1566 17683 0 80 0 - 2027 wait pts/18 00:00:00 arm-linux-andro
0 R 1047 1567 1566 21 80 0 - 9143 ? pts/18 00:00:00 cc1
0 R 1031 1570 15865 0 80 0 - 2176 - pts/24 00:00:00 ps
0 R 1031 17357 15865 99 80 0 - 2597 - pts/24 00:03:29 top
So from above output if you see the 'NI' column shows your nice values. When i tried running a background process, that too got a nice value of '0' (top is that process with PID 17357). That would mean, it will also be queued up for like a foreground process and will be scheduled likewise.
A program I compiled and executed (by a shell script, as another user) sometimes crashes:
./run.sh: line 19: 7964 Segmentation fault (core dumped) ./Program ARG1 ARG2 ARG3 2>&1
I wanted to take a look at the core file to figure out where the crash might have happened. Unfortunately, there's no standard core file to be found, but apparently Ubuntu called it's default crash handler apport, which says in its log:
ERROR: apport (pid 8841) Mon Jun 2 17:59:04 2014: called for pid 7964, signal 11, core limit 0
ERROR: apport (pid 8841) Mon Jun 2 17:59:04 2014: executable: /path/to/Program (command line "./Program ARG1 ARG2 ARG3")
ERROR: apport (pid 8841) Mon Jun 2 17:59:04 2014: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment
ERROR: apport (pid 8841) Mon Jun 2 17:59:16 2014: wrote report /var/crash/_path_to_Program.1001.crash
I've been trying to process the crash dump file with apport-retrace, but apport doesn't handle the file very well because it apparently expects Ubuntu-specific packages:
ERROR: report file does not contain one of the required fields: CoreDump DistroRelease Package ExecutablePath
Looking at the crash dump file, I think there's important debugging information inside, so my question: is there another way to process this file, either with gdb, or extract a core dump file from it if the core dump is indeed stored inside?
For reference, here's (partially) the .crash file:
ProblemType: Crash
Architecture: amd64
Date: Mon Jun 2 17:59:04 2014
DistroRelease: Ubuntu 13.04
ExecutablePath: /path/to/Program
ExecutableTimestamp: 1401723071
ProcCmdline: ./Program ARG1 ARG2 ARG3
ProcCwd: /path/to
ProcEnviron: PATH=(custom, no user)
ProcMaps:
... (memory map left out)
ProcStatus:
Name: Program
State: S (sleeping)
Tgid: 7964
Pid: 7964
PPid: 7963
TracerPid: 0
Uid: 1001 1001 1001 1001
Gid: 1001 1001 1001 1001
FDSize: 64
Groups: 4 27 1001
VmPeak: 1009888 kB
VmSize: 1009884 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 205400 kB
VmRSS: 205400 kB
VmData: 762620 kB
VmStk: 136 kB
VmExe: 3312 kB
VmLib: 64144 kB
VmPTE: 852 kB
VmSwap: 0 kB
Threads: 9
SigQ: 0/127009
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001206
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: ff
Cpus_allowed_list: 0-7
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 3669360
nonvoluntary_ctxt_switches: 85456
Signal: 11
Uname: Linux 3.8.0-35-generic x86_64
UserGroups: adm sudo
CoreDump: base64
H4sICAAAAAAC/0NvcmVEdW1wAA==
... (huge base64 encoded string left out)
I would like to obtain the virtual private memory consumed by a process under OSX from the command line. This is the value that Activity Monitor reports in the "Virtual Mem" column. ps -o vsz reports the total address space available to the process and is therefore not useful.
You can obtain the virtual private memory use of a single process by running
top -l 1 -s 0 -i 1 -stats vprvt -pid PID
where PID is the process ID of the process you are interested in. This results in about a dozen lines of output ending with
VPRVT
55M+
So by parsing the last line of output, one can at least obtain the memory footprint in MB. I tested this on OSX 10.6.8.
update
I realized (after I got downvoted) that #user1389686 gave an answer in the comment section of the OP that was better than my paltry first attempt. What follows is based on user1389686's own answer. I cannot take credit for it -- I've just cleaned it up a bit.
original, edited with -stats vprvt
As Mahmoud Al-Qudsi mentioned, top does what you want. If PID 8631 is the process you want to examine:
$ top -l 1 -s 0 -stats vprvt -pid 8631
Processes: 84 total, 2 running, 82 sleeping, 378 threads
2012/07/14 02:42:05
Load Avg: 0.34, 0.15, 0.04
CPU usage: 15.38% user, 30.76% sys, 53.84% idle
SharedLibs: 4668K resident, 4220K data, 0B linkedit.
MemRegions: 15160 total, 961M resident, 25M private, 520M shared.
PhysMem: 917M wired, 1207M active, 276M inactive, 2400M used, 5790M free.
VM: 171G vsize, 1039M framework vsize, 1523860(0) pageins, 811163(0) pageouts.
Networks: packets: 431147/140M in, 261381/59M out.
Disks: 487900/8547M read, 2784975/40G written.
VPRVT
8631
Here's how I get at this value using a bit of Ruby code:
# Return the virtual memory size of the current process
def virtual_private_memory
s = `top -l 1 -s 0 -stats vprvt -pid #{Process.pid}`.split($/).last
return nil unless s =~ /\A(\d*)([KMG])/
$1.to_i * case $2
when "K"
1000
when "M"
1000000
when "G"
1000000000
else
raise ArgumentError.new("unrecognized multiplier in #{f}")
end
end
Updated answer, thats work under Yosemite, from user1389686:
top -l 1 -s 0 -stats mem -pid PID