Coredump size different than process virtual memory space - macos

I'm working on OS X 10.11, and generated dump file in the following manner :
1. ulimit -c unlimited
2. kill -10 5228 (process pid)
and got dump file with the rolling attributes : 642M Jun 26 15:00 core.5228
Right before that, I checked the process total memory space using vmmap command to try and estimate the expected dump size.
However, the estimation (238.7Mb) was much smaller than the actual size (642Mb).
Can this gap be explained ?
VIRTUAL REGION
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
Activity Tracing 2048K 2
Kernel Alloc Once 4K 2
MALLOC guard page 16K 4
MALLOC metadata 180K 6
MALLOC_SMALL 56.0M 4 see MALLOC ZONE table below
MALLOC_SMALL (empty) 8192K 2 see MALLOC ZONE table below
MALLOC_TINY 8192K 3 see MALLOC ZONE table below
STACK GUARD 56.0M 2
Stack 8192K 2
__DATA 1512K 44
__LINKEDIT 90.9M 4
__TEXT 8336K 44
shared memory 12K 4
=========== ======= =======
TOTAL 238.7M 110
VIRTUAL ALLOCATION BYTES REGION
MALLOC ZONE SIZE COUNT ALLOCATED % FULL COUNT
=========== ======= ========= ========= ====== ======
DefaultMallocZone_0x100e42000 72.0M 7096 427K 0% 6

coredump can, and does, filter the process memory. See the core man page:
Controlling which mappings are written to the core dump
Since kernel 2.6.23, the Linux-specific /proc/PID/coredump_filter file can be used to control which memory segments are written to the core dump file in the event that a core dump is performed for the process with the corresponding process ID.
The value in the file is a bit mask of memory mapping types (see mmap(2)). If a bit is set in the mask, then memory mappings of the corresponding type are dumped; otherwise they are not dumped. The bits in this file have the following meanings:
bit 0 Dump anonymous private mappings.
bit 1 Dump anonymous shared mappings.
bit 2 Dump file-backed private mappings.
bit 3 Dump file-backed shared mappings.
bit 4 (since Linux 2.6.24)
Dump ELF headers.
bit 5 (since Linux 2.6.28)
Dump private huge pages.
bit 6 (since Linux 2.6.28)
Dump shared huge pages.
bit 7 (since Linux 4.4)
Dump private DAX pages.
bit 8 (since Linux 4.4)
Dump shared DAX pages.
By default, the following bits are set: 0, 1, 4 (if the CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS kernel configuration option is enabled), and 5. This default can be modified at boot time using the coredump_filter boot option.
I assume OS X behaves similarly.

Related

Difference between aliased (ALI) and shared (SHM) memory on MacOS

I am using vmmap on MacOS. For one region it shows that sharing mode = aliased (ALI):
REGION TYPE START - END [ VSIZE RSDNT DIRTY SWAP] PRT/MAX SHRMOD PURGE REGION DETAIL
mapped file 1008dc000-1008e0000 [ 16K 16K 16K 0K] rw-/rwx SM=ALI /Users/USER/*/data
I wasn't able to find any information what does that mean. This page states that
Aliased (ALI) and shared (SHM) memory are shared between processes.
There is no further information about the difference between ALI and SHM. Can you help me understand what the difference is?
When the memory is shared (SHM) both processes can access is simultaneously.
However, when the memory is aliased (ALI) only one process at the time has the virtual address mapped to the physical memory. When second process tries accessing memory, these steps happen:
Process 2 gets page fault.
Kernel unmaps memory from the Process 1.
Kernel maps the memory to the Process 2.
Now, the process 2 can write/read from the memory.
This is different to how the memory works on linux where there is no aliased (ALI) mode, only shared.
Source.

Query failing with "ERROR: Canceling query because of high VMEM usage"

We have small array of gpdb cluster. in that, few queries are failing
System Related information
TOTAL RAM =30G
SWAP =15G
gp_vmem_protect_limit= 2700MB
TOTAL segment = 8 Primary + 8 mirror = 16
SEGMENT HOST=2
VM_OVERCOMMIT RATIO =72
Used this calc : http://greenplum.org/calc/#
SYMPTOM
The query failed with the error message shown below:
ERROR: XX000: Canceling query because of high VMEM usage. Used: 2433MB, available 266MB, red zone: 2430MB (runaway_cleaner.c:135) (seg2 slice74 DATANODE01:40002 pid=11294) (cdbdisp.c:1320)
We tried :
changed following parameters
statement_mem from 125 MB to 8GB
MAX_STATEMENT MEMORY from 200 MB TO 16 GB
Not sure what exactly needs to change here.still, trying to understand root cause of error.
Any help in it would be much appreciated ?
gp_vmem_protect_limit is for per segment. You have 16segments. based on your segments and vm_protect, you need 2700MB X 16 total memory.

What is the Faults column in 'top'?

I'm trying to download Xcode (onto version El Capitan) and it seems to be stuck. When I run 'top', I see a process called 'storedownloadd' and the "STATE" column is alternating between sleeping, stuck,and running. The 'FAULTS' has a quickly increasing number with a plus sign after it. The 'FAULTS' column is now over 400,000 and increasing. other than 'top', I see no sign of activity of the download. Does this indicate that something is amiss? Here's a screen shot:
Processes: 203 total, 2 running, 10 stuck, 191 sleeping, 795 threads 11:48:14
Load Avg: 4.72, 3.24, 1.69 CPU usage: 56.54% user, 6.41% sys, 37.3% idle SharedLibs: 139M resident, 19M data, 20M linkedit. MemRegions: 18620 total, 880M resident, 92M private, 255M shared. PhysMem: 7812M used (922M wired), 376M unused.
VM: 564G vsize, 528M framework vsize, 0(0) swapins, 512(0) swapouts. Networks: packets: 122536/172M in, 27316/2246K out. Disks: 78844/6532M read, 240500/6746M written.
PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPRS PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH
354 storedownloadd 0.3 00:47.58 16 5 200 255M 0B 0B 354 1 sleeping *3[1] 155.53838 0.00000 501 412506+ 54329 359852+ 6620+ 2400843+ 1186426+
57 UserEventAgent 0.0 00:00.35 22 17 378 4524K+ 0B 0B 57 1 sleeping *0[1] 0.23093 0.00000 0 7359+ 235 15403+ 7655+ 24224+ 17770
384 Terminal 3.3 00:12.02 10 4 213 34M+ 12K 0B 384 1 sleeping *0[42] 0.11292 0.04335 501 73189+ 482 31076+ 9091+ 1138809+ 72076+
When top reports back FAULTS it's referring to "page faults", which are more specifically:
The number of major page faults that have occurred for a task. A page
fault occurs when a process attempts to read from or write to a
virtual page that is not currently present in its address space. A
major page fault is when disk access is involved in making that page
available.
If an application tries to access an address on a memory page that is not currently in physical RAM, a page fault occurs. When that happens, the virtual memory system invokes a special page-fault handler to respond to the fault immediately. The page-fault handler stops the code from executing, locates a free page of physical memory, loads the page containing the data needed from disk, updates the page table, and finally returns control to the program — which can then access the memory address normally. This process is known as paging.
Minor page faults can be common depending on the code that is attempting to execute and the current memory availability on the system, however, there are also different levels to be aware of (minor, major, invalid), which are described in more detail at the links below.
↳ Apple : About The Virtual Memory System
↳ Wikipedia : Page Fault
↳ Stackoverflow.com : page-fault

Why core file is more than virtual memory?

I have a multithreaded program running which crashes after a day or two. Moreover the gdb backtrace of the core dump does not lead anywhere. There are no symbols at the point where it crashes.
Now the machine that generates the core file has a physical memory of 3 Gigs and 5 Gigs swap space. But the core dump that we get is around 25 Gigs. Isn't the core dump actually memory dump? Why is the core dump large?
And can anyone give me more lead on how to debug in such situation?
If you are running a 64-bit OS then you can have file-backed mappings that exceed many times the amount of available physical memory + swap space.
Since kernel version 2.6.23, Linux provides a mechanism to control what gets included in the core dump file, called core dump filter. The value of the filter is a bit-field manipulated via the /proc/<pid>/coredump_filter file (see core(5) man page):
bit 0 (0x01) - anonymous private mappings (e.g. dynamically allocated memory)
bit 1 (0x02) - anonymous shared mappings
bit 2 (0x04) - file-backed private mappings
bit 3 (0x08) - file-backed shared mappings (e.g. shared libraries)
bit 4 (0x10) - ELF headers
bit 5 (0x20) - private huge pages
bit 6 (0x40) - shared huge pages
The default value is 0x33 which corresponds to dumping all anonymous mappings as well as the ELF headers (but only if kernel is compiled with CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS) and the private huge pages. Reading from this file returns the hexadecimal value of the filter. Writing a new hexadecimal value to coredump_filter changes the filter for the particular process, e.g. to enable dump of all possible mappings one would:
echo 0x7f > /proc/<pid>/coredump_filter
(where <pid> is the PID of the process)
The value of the core dump filter is iherited in child processes created by fork().
Some Linux distributions might change the filter value for the init process early in the OS boot stage, e.g. to enable dumping the file-backed mappings. This would then affect any process started later.
A core dump contains more than just the state of the memory of the process. See the answer at https://stackoverflow.com/a/5321564/91757 for examples of other information included in the core dump (on Linux).

How is the Page File available calculated in Windows Task Manager?

In Vista Task Manager, I understand the available page file is listed like this:
Page File inUse M / available M
In XP it's listed as the Commit Charge Limit.
I had thought that:
Available Virtual Memory = Physical Memory Total + Sum of Page Files
But on my machine I've got Physical Memory = 2038M, Page Files = 4096M, Page File Available = 6051. There's 83M unaccounted for here. What's that used for. I thought it might be something to do with the Kernel memory, but the number doesn't seem to match up?
Info I've found so far:
See http://msdn.microsoft.com/en-us/library/aa965225(VS.85).aspx for more info.
Page file size can be found here: Computer Properties, advanced, performance settings, advanced.
I think you are correct in your guess it has to do something with the kernel - the kernel memory needs some physical backup as well.
However I have to admit that when trying to verify try, the numbers still do not match well and there is a significant amount of memory not accounted for by this.
I have:
Available Virtual Memory = 4 033 552 KB
Physical Memory Total = 2 096 148 KB
Sum of Page Files = 2048 MB
Kernel Non-Paged Memory = 28 264 KB
Kernel Paged Memory = 63 668 KB

Resources