What is the Faults column in 'top'? - bash

I'm trying to download Xcode (onto version El Capitan) and it seems to be stuck. When I run 'top', I see a process called 'storedownloadd' and the "STATE" column is alternating between sleeping, stuck,and running. The 'FAULTS' has a quickly increasing number with a plus sign after it. The 'FAULTS' column is now over 400,000 and increasing. other than 'top', I see no sign of activity of the download. Does this indicate that something is amiss? Here's a screen shot:
Processes: 203 total, 2 running, 10 stuck, 191 sleeping, 795 threads 11:48:14
Load Avg: 4.72, 3.24, 1.69 CPU usage: 56.54% user, 6.41% sys, 37.3% idle SharedLibs: 139M resident, 19M data, 20M linkedit. MemRegions: 18620 total, 880M resident, 92M private, 255M shared. PhysMem: 7812M used (922M wired), 376M unused.
VM: 564G vsize, 528M framework vsize, 0(0) swapins, 512(0) swapouts. Networks: packets: 122536/172M in, 27316/2246K out. Disks: 78844/6532M read, 240500/6746M written.
PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPRS PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH
354 storedownloadd 0.3 00:47.58 16 5 200 255M 0B 0B 354 1 sleeping *3[1] 155.53838 0.00000 501 412506+ 54329 359852+ 6620+ 2400843+ 1186426+
57 UserEventAgent 0.0 00:00.35 22 17 378 4524K+ 0B 0B 57 1 sleeping *0[1] 0.23093 0.00000 0 7359+ 235 15403+ 7655+ 24224+ 17770
384 Terminal 3.3 00:12.02 10 4 213 34M+ 12K 0B 384 1 sleeping *0[42] 0.11292 0.04335 501 73189+ 482 31076+ 9091+ 1138809+ 72076+

When top reports back FAULTS it's referring to "page faults", which are more specifically:
The number of major page faults that have occurred for a task. A page
fault occurs when a process attempts to read from or write to a
virtual page that is not currently present in its address space. A
major page fault is when disk access is involved in making that page
available.
If an application tries to access an address on a memory page that is not currently in physical RAM, a page fault occurs. When that happens, the virtual memory system invokes a special page-fault handler to respond to the fault immediately. The page-fault handler stops the code from executing, locates a free page of physical memory, loads the page containing the data needed from disk, updates the page table, and finally returns control to the program — which can then access the memory address normally. This process is known as paging.
Minor page faults can be common depending on the code that is attempting to execute and the current memory availability on the system, however, there are also different levels to be aware of (minor, major, invalid), which are described in more detail at the links below.
↳ Apple : About The Virtual Memory System
↳ Wikipedia : Page Fault
↳ Stackoverflow.com : page-fault

Related

Why pprof heap inuse_space less than container_working_set_size?

I found in grafana that my pod <***-qkcdl> occupated about 1.0G of container_memory_working_set_bytes, and 1.4G of container_memory_rss;
pods momery usage in grafana
container_memory_rss of pod(max avg current)
and my query of container_memory_working_set_bytes and container_memory_rss is:
container_memory_working_set_bytes{k8s_cluster="$cluster", namespace="$dept", pod=~'$pod', container=~"$container"}
container_memory_cache{k8s_cluster="$cluster", namespace="$dept", pod=~'$pod', container=~"$container"}
then when I track the pprof heap inuse_space, it shows:
go tool pprof --inuse_space pprof http://{pod_ip}:8899/debug/pprof/heap
Fetching profile over HTTP from http://{pod_ip}:8899/debug/pprof/heap
pprof: read pprof: is a directory
Fetched 1 source profiles out of 2
Saved profile in {local_path}
File: {app}
Type: inuse_space
Time: Oct 15, 2021 at 6:38pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
(pprof) top10
Showing nodes accounting for 335.36MB, 91.58% of 366.19MB total
Dropped 195 nodes (cum <= 1.83MB)
Showing top 10 nodes out of 77
...
so, why my golang application use only 335.36MB heap space, but the grafana show about 1.0G of working_set_size and 1.4G of rss, what does the "335.36MB", "1.0G" and "1.4G" means ? why ?
PS: I know what the metrics means, but it does nothing to me
container_memory_rss: The amount of anonymous and swap cache memory (includes transparent hugepages).
container_memory_working_set_bytes: The amount of working set memory, this includes recently accessed memory,dirty memory, and kernel memory. Working set is <= "usage".

why do my various user programs terminate abruptly without an error message?

I do a variety of different kinds of data analysis and numerical simulation on my custom-built Ubuntu machine using custom-written programs that sometimes must run for days or even weeks. Some of those programs have been in Fortran, some in Python, some in C; there is literally zero commonality between these programs except that they run a long time and do a lot of disk i/o. Most are single-thread.
The typical execution command line looks like
./myprog &> myprog.log &
If an ordinary runtime error occurs, any buffered program output and the error message both faithfully appear in myprog.log and the logfile is cleanly closed. But what's been happening instead in many cases is that the program simply quits in mid-stream -- usually after half a day to a day or so, without any further output to the log file. It's like the program had been randomly hit with a 'kill -9'.
I don't know why this is happening, and it seems to be specific to this particular machine (I have been doing similar work for 30 years and never experienced this before). The operating system itself seems rock-stable; it has been rebooted only rarely over the past couple years for specific reasons like updates. It's only my longer-running user processes that seem to die abruptly like this with no accompanying diagnostic.
Not being a system-level expert, I'm at a loss for how to diagnose what's going on. Right now, my only option is to regularly check whether my program is still running and restart it if necessary.
System details:
Ubuntu 18.04.4 LTS
Linux kernel: 4.15.0-39-generic
CPU: AMD Ryzen Threadripper 1950x
UPDATE: Since dmesg was mentioned, here are some representive messages, which I have no idea how to interpret. The UFW BLOCK messages are by far the most numerous, but there are also a fair number of the ata6 messages, which seem to have something to do with the SATA hard drive. Could this be relevant?
[5301325.692596] audit: type=1400 audit(1594876149.572:218): apparmor="DENIED" operation="open" profile="/usr/sbin/cups-browsed" name="/usr/share/locale/" pid=19663 comm="cups-browsed" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[5352288.689739] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[5352288.689753] ata6.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 14 pio 16392 in
Get event status notification 4a 01 00 00 10 00 00 00 08 00res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[5352288.689756] ata6.00: status: { DRDY }
[5352288.689760] ata6: hard resetting link
[5352289.161877] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[5352289.166076] ata6.00: configured for PIO0
[5352289.166635] ata6: EH complete
[5353558.066052] [UFW BLOCK] IN=enp5s0 OUT= MAC=10:7b:44:93:2f:58:b4:0c:25:e0:40:12:08:00 SRC=172.105.89.161 DST=144.92.130.162 LEN=40 TOS=0x00 PREC=0x00 TTL=243 ID=50780 PROTO=TCP SPT=58944 DPT=68 WINDOW=1024 RES=0x00 SYN URGP=0

Coredump size different than process virtual memory space

I'm working on OS X 10.11, and generated dump file in the following manner :
1. ulimit -c unlimited
2. kill -10 5228 (process pid)
and got dump file with the rolling attributes : 642M Jun 26 15:00 core.5228
Right before that, I checked the process total memory space using vmmap command to try and estimate the expected dump size.
However, the estimation (238.7Mb) was much smaller than the actual size (642Mb).
Can this gap be explained ?
VIRTUAL REGION
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
Activity Tracing 2048K 2
Kernel Alloc Once 4K 2
MALLOC guard page 16K 4
MALLOC metadata 180K 6
MALLOC_SMALL 56.0M 4 see MALLOC ZONE table below
MALLOC_SMALL (empty) 8192K 2 see MALLOC ZONE table below
MALLOC_TINY 8192K 3 see MALLOC ZONE table below
STACK GUARD 56.0M 2
Stack 8192K 2
__DATA 1512K 44
__LINKEDIT 90.9M 4
__TEXT 8336K 44
shared memory 12K 4
=========== ======= =======
TOTAL 238.7M 110
VIRTUAL ALLOCATION BYTES REGION
MALLOC ZONE SIZE COUNT ALLOCATED % FULL COUNT
=========== ======= ========= ========= ====== ======
DefaultMallocZone_0x100e42000 72.0M 7096 427K 0% 6
coredump can, and does, filter the process memory. See the core man page:
Controlling which mappings are written to the core dump
Since kernel 2.6.23, the Linux-specific /proc/PID/coredump_filter file can be used to control which memory segments are written to the core dump file in the event that a core dump is performed for the process with the corresponding process ID.
The value in the file is a bit mask of memory mapping types (see mmap(2)). If a bit is set in the mask, then memory mappings of the corresponding type are dumped; otherwise they are not dumped. The bits in this file have the following meanings:
bit 0 Dump anonymous private mappings.
bit 1 Dump anonymous shared mappings.
bit 2 Dump file-backed private mappings.
bit 3 Dump file-backed shared mappings.
bit 4 (since Linux 2.6.24)
Dump ELF headers.
bit 5 (since Linux 2.6.28)
Dump private huge pages.
bit 6 (since Linux 2.6.28)
Dump shared huge pages.
bit 7 (since Linux 4.4)
Dump private DAX pages.
bit 8 (since Linux 4.4)
Dump shared DAX pages.
By default, the following bits are set: 0, 1, 4 (if the CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS kernel configuration option is enabled), and 5. This default can be modified at boot time using the coredump_filter boot option.
I assume OS X behaves similarly.

How to calculate Virtual Memory Size in Mavericks

I would like to know if there is a command/API call (or set of commands/API calls) that calculates each of (Virtual Memory, File Cache and App Memory) parameters listed in the screen shot above.
You can use vm_stat and sysctl terminal commands. Although there was no straightforward way or documentation on how to extract the new attributes from these commands, we had to do some trial and error till we discovered the relations between parameters in the commands and the attribute we need to calculate.
The Steps are as the following:
Run vm_stat
Run "sysctl hw.memsize" and "sysctl vm.swapusage".
The relationship between Memory usage which appears in Activity Monitor and previous commands are described in How to calc Memory usage in Mavericks programmatically.
Sample output from vm_stat:
Mach Virtual Memory Statistics: (page size of 4096 bytes)
Pages free: 24428.
Pages active: 1039653.
Pages inactive: 626002.
Pages speculative: 184530.
Pages throttled: 0.
Pages wired down: 156244.
Pages purgeable: 9429.
"Translation faults": 14335334.
Pages copy-on-write: 557301.
Pages zero filled: 5682527.
Pages reactivated: 74.
Pages purged: 52633.
File-backed pages: 660167.
Anonymous pages: 1190018.
Pages stored in compressor: 644.
Pages occupied by compressor: 603.
Decompressions: 18.
Compressions: 859.
Pageins: 253589.
Pageouts: 0.
Swapins: 0.
Swapouts: 0.

How is the Page File available calculated in Windows Task Manager?

In Vista Task Manager, I understand the available page file is listed like this:
Page File inUse M / available M
In XP it's listed as the Commit Charge Limit.
I had thought that:
Available Virtual Memory = Physical Memory Total + Sum of Page Files
But on my machine I've got Physical Memory = 2038M, Page Files = 4096M, Page File Available = 6051. There's 83M unaccounted for here. What's that used for. I thought it might be something to do with the Kernel memory, but the number doesn't seem to match up?
Info I've found so far:
See http://msdn.microsoft.com/en-us/library/aa965225(VS.85).aspx for more info.
Page file size can be found here: Computer Properties, advanced, performance settings, advanced.
I think you are correct in your guess it has to do something with the kernel - the kernel memory needs some physical backup as well.
However I have to admit that when trying to verify try, the numbers still do not match well and there is a significant amount of memory not accounted for by this.
I have:
Available Virtual Memory = 4 033 552 KB
Physical Memory Total = 2 096 148 KB
Sum of Page Files = 2048 MB
Kernel Non-Paged Memory = 28 264 KB
Kernel Paged Memory = 63 668 KB

Resources