java8 -XX:+UseCompressedOops -XX:ObjectAlignmentInBytes=16 - java-8

So, I'm trying to run some simple code, jdk-8, output via jol
System.out.println(VMSupport.vmDetails());
Integer i = new Integer(23);
System.out.println(ClassLayout.parseInstance(i)
.toPrintable());
The first attempt is to run it with compressed oops disabled and compressed klass also on 64-bit JVM.
-XX:-UseCompressedOops -XX:-UseCompressedClassPointers
The output, pretty much expected is :
Running 64-bit HotSpot VM.
Objects are 8 bytes aligned.
java.lang.Integer object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 48 33 36 97 (01001000 00110011 00110110 10010111) (-1758055608)
12 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
16 4 int Integer.value 23
20 4 (loss due to the next object alignment)
Instance size: 24 bytes (reported by Instrumentation API)
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
That makes sense : 8 bytes klass word + 8 bytes mark word + 4 bytes for the actual value and 4 for padding (to align on 8 bytes) = 24 bytes.
The second attempt it to run it with compressed oops enabled compressed klass also on 64-bit JVM.
Again, the output is pretty much understandable :
Running 64-bit HotSpot VM.
Using compressed oop with 3-bit shift.
Using compressed klass with 3-bit shift.
Objects are 8 bytes aligned.
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) f9 33 01 f8 (11111001 00110011 00000001 11111000) (-134138887)
12 4 int Dummy.i 42
Instance size: 16 bytes (reported by Instrumentation API).
4 bytes compressed oop (klass word) + 8 bytes mark word + 4 bytes for the value + no space loss = 16 bytes.
The thing that does NOT make sense to me is this use-case:
-XX:+UseCompressedOops -XX:+UseCompressedClassPointers -XX:ObjectAlignmentInBytes=16
The output is this:
Running 64-bit HotSpot VM.
Using compressed oop with 4-bit shift.
Using compressed klass with 0x0000001000000000 base address and 0-bit shift.
I was really expecting to both be "4-bit shift". Why they are not?
EDIT
The second example is run with :
XX:+UseCompressedOops -XX:+UseCompressedClassPointers
And the third one with :
-XX:+UseCompressedOops -XX:+UseCompressedClassPointers -XX:ObjectAlignmentInBytes=16

Answers to these questions are mostly easy to figure out when looking into OpenJDK code.
For example, grep for "UseCompressedClassPointers", this will get you to arguments.cpp:
// Check the CompressedClassSpaceSize to make sure we use compressed klass ptrs.
if (UseCompressedClassPointers) {
if (CompressedClassSpaceSize > KlassEncodingMetaspaceMax) {
warning("CompressedClassSpaceSize is too large for UseCompressedClassPointers");
FLAG_SET_DEFAULT(UseCompressedClassPointers, false);
}
}
Okay, interesting, there is "CompressedClassSpaceSize"? Grep for its definition, it's in globals.hpp:
product(size_t, CompressedClassSpaceSize, 1*G, \
"Maximum size of class area in Metaspace when compressed " \
"class pointers are used") \
range(1*M, 3*G) \
Aha, so the class area is in Metaspace, and it takes somewhere between 1 Mb and 3 Gb of space. Let's grep for "CompressedClassSpaceSize" usages, because that will take us to actual code that handles it, say in metaspace.cpp:
// For UseCompressedClassPointers the class space is reserved above
// the top of the Java heap. The argument passed in is at the base of
// the compressed space.
void Metaspace::initialize_class_space(ReservedSpace rs) {
So, compressed classes are allocated in a smaller class space outside the Java heap, which does not require shifting -- even 3 gigabytes is small enough to use only the lowest 32 bits.

I will try to extend a little bit on the answer provided by Alexey as some things might not be obvious.
Following Alexey suggestion, if we search the source code of OpenJDK for where compressed klass bit shift value is assigned, we will find the following code in metaspace.cpp:
void Metaspace::set_narrow_klass_base_and_shift(address metaspace_base, address cds_base) {
// some code removed
if ((uint64_t)(higher_address - lower_base) <= UnscaledClassSpaceMax) {
Universe::set_narrow_klass_shift(0);
} else {
assert(!UseSharedSpaces, "Cannot shift with UseSharedSpaces");
Universe::set_narrow_klass_shift(LogKlassAlignmentInBytes);
}
As we can see, the class shift can either be 0(or basically no shifting) or 3 bits, because LogKlassAlignmentInBytes is a constant defined in globalDefinitions.hpp:
const int LogKlassAlignmentInBytes = 3;
So, the answer to your quetion:
I was really expecting to both be "4-bit shift". Why they are not?
is that ObjectAlignmentInBytes does not have any effect on compressed class pointers alignment in the metaspace which is always 8bytes.
Of course this conclusion does not answer the question:
"Why when using -XX:ObjectAlignmentInBytes=16 with -XX:+UseCompressedClassPointers the narrow klass shift becomes zero? Also, without shifting how can the JVM reference the class space with 32-bit references, if the heap is 4GBytes or more?"
We already know that the class space is allocated on top of the java heap and can be up to 3G in size. With that in mind let's make a few tests. -XX:+UseCompressedOops -XX:+UseCompressedClassPointers are enabled by default, so we can eliminate these for conciseness.
Test 1: Defaults - 8 Bytes aligned
$ java -XX:ObjectAlignmentInBytes=8 -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version
heap address: 0x00000006c0000000, size: 4096 MB, zero based Compressed Oops
Narrow klass base: 0x0000000000000000, Narrow klass shift: 3
Compressed class space size: 1073741824 Address: 0x00000007c0000000 Req Addr: 0x00000007c0000000
Notice that the heap starts at address 0x00000006c0000000 in the virtual space and has a size of 4GBytes. Let's jump by 4Gbytes from where the heap starts and we land just where class space begins.
0x00000006c0000000 + 0x0000000100000000 = 0x00000007c0000000
The class space size is 1Gbyte, so let's jump by another 1Gbyte:
0x00000007c0000000 + 0x0000000040000000 = 0x0000000800000000
and we land just below 32Gbytes. With a 3 bits class space shifting the JVM is able to reference the entire class space, although it's at the limit (intentionally).
Test 2: 16 bytes aligned
java -XX:ObjectAlignmentInBytes=16 -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version
heap address: 0x0000000f00000000, size: 4096 MB, zero based Compressed Oops
Narrow klass base: 0x0000001000000000, Narrow klass shift: 0
Compressed class space size: 1073741824 Address: 0x0000001000000000 Req Addr: 0x0000001000000000
This time we can observe that the heap address is different, but let's try the same steps:
0x0000000f00000000 + 0x0000000100000000 = 0x0000001000000000
This time around heap space ends just below 64GBytes virtual space boundary and the class space is allocated above 64Gbyte boundary. Since class space can use only 3 bits shifting, how can the JVM reference the class space located above 64Gbyte? The key is:
Narrow klass base: 0x0000001000000000
The JVM still uses 32 bit compressed pointers for the class space, but when encoding and decoding these, it will always add 0x0000001000000000 base to the compressed reference instead of using shifting. Note, that this approach works as long as the referenced chunk of memory is lower than 4Gbytes (the limit for 32 bits references). Considering that the class space can have a maximum of 3Gbytes we are comfortably within the limits.
3: 16 bytes aligned, pin heap base at 8g
$ java -XX:ObjectAlignmentInBytes=16 -XX:HeapBaseMinAddress=8g -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version
heap address: 0x0000000200000000, size: 4096 MB, zero based Compressed Oops
Narrow klass base: 0x0000000000000000, Narrow klass shift: 3
Compressed class space size: 1073741824 Address: 0x0000000300000000 Req Addr: 0x0000000300000000
In this test we are still keeping the -XX:ObjectAlignmentInBytes=16, but also asking the JVM to allocate the heap at the 8th GByte in the virtual address space using -XX:HeapBaseMinAddress=8g JVM argument. The class space will begin at 12th GByte in the virtual address space and 3 bits shifting is more than enough to reference it.
Hopefully, these tests and their results answer the question:
"Why when using -XX:ObjectAlignmentInBytes=16 with -XX:+UseCompressedClassPointers the narrow klass shift becomes zero? Also, without shifting how can the JVM reference the class space with 32-bit references, if the heap is 4GBytes or more?"

Related

Windows 10 x64: Unable to get PXE on Windbg

Can't understand how Windows Memory Manager works.
I look at the attached user process (dbgview.exe).
It is WOW64-process. At the specified address (0x76560000) there is .text section of the kernel32.dll module (also WOW64).
Why there is no PTE and other tables in the process page table pointing to those virtual address?
kd> db 76560000
00000000`76560000 8b ff 55 8b ec 51 56 57-33 f6 89 55 fc 56 68 80 ..U..QVW3..U.Vh.
<...>
kd> !pte 76560000
VA 0000000076560000
PXE at FFFFF6FB7DBED000 PPE at FFFFF6FB7DA00008 PDE at FFFFF6FB40001D90 PTE at FFFFF680003B2B00
Unable to get PXE FFFFF6FB7DBED000
kd> db FFFFF680003B2B00
fffff680`003b2b00 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????
<...>
I know that pages will be allocated after first access (with page fault) have occured, but why there is no protype PTE too?
Firstly, translate an arbitrary virtual address to physical using !vtop to see the dirbase of the process in the process of translation, or use !process to find the dirbase of the process:
lkd> .process /p fffffa8046a2e5f0
Implicit process is now fffffa80`46a2e5f0
lkd> .context 77fa90000
lkd> !vtop 0 13fe60000
Amd64VtoP: Virt 00000001`3fe60000, pagedir 7`7fa90000
Amd64VtoP: PML4E 7`7fa90000
Amd64VtoP: PDPE 1`c2e83020
Amd64VtoP: PDE 7`84e04ff8
Amd64VtoP: PTE 4`be585300
Amd64VtoP: Mapped phys 6`3efae000
Virtual address 13fe60000 translates to physical address 63efae000.
Then find that physical frame in the PFN database (in this case the physical page for PML4 (cr3 page aka. dirbase) is 77fa90 with full physical address 77fa90000:
lkd> !pfn 77fa90
PFN 0077FA90 at address FFFFFA80167EFB00
flink FFFFFA8046A2E5F0 blink / share count 00000005 pteaddress FFFFF6FB7DBEDF68
reference count 0001 used entry count 0000 Cached color 0 Priority 0
restore pte 00000080 containing page 77FA90 Active M
Modified
The address FFFFF6FB7DBED000 is therefore the virtual address of the PML4 page and FFFFF6FB7DBEDF68 is the virtual address of the PML4E self reference entry (1ed*8 = f68).
FFFFF6FB7DBED000 = 1111111111111111111101101111101101111101101111101101000000000000
1111111111111111 111101101 111101101 111101101 111101101 000000000000
The PML4 can only be at a virtual address where the PML4E, PDTPE, PDE and PTE index are the same, so there are actually 2^9 different combinations of that and windows 7 always selects 0x1ed i.e. 111101101. The reason for this is because the PML4 contains a PML4 that points to itself i.e. the physical frame of the PML4, so it will need to keep indexing to that same location at every level of the hierarchy.
The PML4, being a page table page, must reside in the kernel, and kernel addresses are high-canonical, i.e. prefixed with 1111111111111111, and kernel addresses begin with 00001 through 11111 i.e. from 08 to ff
The range of possible addresses that a 64 bit OS that uses 8TiB for user address space can place it at is therefore 31*(2^4) = 496 different possible locations and not actually 2^9:
1111111111111111 000010000 000010000 000010000 000010000 000000000000
1111111111111111 111111111 111111111 111111111 111111111 000000000000
I.e. the first is FFFF080402010000, the second is FFFF088442211000, the last is FFFFFFFFFFFFF000.
Note:
Up until Windows 10 TH2, the magic index for the Self-Reference PML4 entry was 0x1ed as mentioned above. But what about Windows 10 from 1607? Well Microsoft uped their game, as a constant battle for improving Windows security the index is randomized at boot-time, so 0x1ed is now one of the 512 [sic. (496)] possible values (i.e. 9-bit index) that the Self-Reference entry index can have. And side effect, it also broke some of their own tools, like the !pte2va WinDbg command.
0xFFFFF68000000000 is the address of the first PTE in the first page table page, so basically MmPteBase, except because on Windows 10 1607 the PML4E can be an other than 0x1ed, the base is not always 0xFFFFF68000000000 as a result, and it uses a variable nt!MmPteBase to know instantly where the base of the page table page allocations begins. Previously, this symbol does not exist in ntoskrnl.exe, because it has a hardcoded base 0xFFFFF68000000000. The address of the first and last page table page is going to be:
first last
* pml4e_offset : 0x1ed 0x1ed
* pdpe_offset : 0x000 0x1ff
* pde_offset : 0x000 0x1ff
* pte_offset : 0x000 0x1ff
* offset : 0x000 0x000
This gives 0xFFFFF68000000000 for the first and 0xFFFFF6FFFFFFF000 for the last page table page when the PML4E index is 0x1ed. PDEs + PDPTEs + PML4Es + PTEs are assigned in this range.
Therefore, to be able to translate a virtual address to its PTE virtual address (and !pte2va is the reverse of this), you affix 111101101 to the start of the virtual address and then you truncate the last 12 bits (the page offset, which is no longer useful) and then you times it by 8 bytes (the PTE size) (i.e. add 3 zeroes to the end, which creates a new page offset from the last level index into the page that contains the PTEs times the size of a PTE structure). Concatenating the PML4E index to the start simply causes it to loop back one time such that you actually get the PTE rather than what the PTE points to. Concatenating it to the start is the same thing as adding it to MmPteBase.
Here is simple C++ code to do it:
// pte.cpp
#include<iostream>
#include<string>
int main(int argc, char *argv[]) {
unsigned long long int input = std::stoull(argv[1], nullptr, 16);
long long int ptebase = 0xFFFFF68000000000;
long long int pteaddress = ptebase + ((input >> 12) << 3);
std::cout << "0x" << std::hex << pteaddress;
}
C:\> pte 13fe60000
0xfffff680009ff300
To get the PDE virtual address you have to affix it twice and then truncate the last 21 bits and then times by 8. This is how !pte is supposed to work, and is the opposite of !pte2va.
Similarly, PDEs + PDPTEs + PML4Es are assigned in the range:
first last
* pml4e_offset : 0x1ed 0x1ed
* pdpe_offset : 0x1ed 0x1ed
* pde_offset : 0x000 0x1ff
* pte_offset : 0x000 0x1ff
* offset : 0x000 0x000
Because when you get to 0x1ed for the pdpte offset within the page table page range, all of a sudden, you are looping back in the PML4 once again, so you get the PDE.
If it says there is no PTE for an address within a virtual page for which the corresponding physical frame is shown to be part of the working set by VMMap, then you might be experiencing my issue, where you need to use .process /P if you're doing live kernel debugging (local or remote) to explicitly tell the debugger that you want to translate user and kernel addresses in the context of the process and not the debugger.
I have found that since Windows 10 Anniversary Update (1607, 10.0.14393) PML4 table had been randomized to mitigate kernel heap spraying.
It means that probably Page Table is not placed at 0xFFFFF6800000.

how is thread local storage via gcc __thread keyword implemented in x86_64?

I'm digging around in libc and found an interesting asm sequence that I try to understand. glibc-2.27/malloc/malloc.c has:
static __thread tcache_perthread_struct *tcache = NULL;
...
# define MAYBE_INIT_TCACHE() \
if (__glibc_unlikely (tcache == NULL)) \
....
void *
__libc_malloc (size_t bytes) {
...
MAYBE_INIT_TCACHE()
gcc translates it to:
96a97: 48 8b 2d da 42 35 00 mov 0x3542da(%rip),%rbp # 3ead78 <.got+0x18>
...
96aa6: 64 48 8b 4d 00 mov %fs:0x0(%rbp),%rcx
in runtime mov 0x3542da(%rip),%rbp will yield a negative value, i.e.:
(gdb) p $rbp
$1 = (void *) 0xfffffffffffffec0
The %fs segment is loaded in __libc_setup_tls via syscall arch_prct (as I learned in another thread) and there seem to be a loop over program headers of type PT_TLS that probably determines the aggregated tls variable sizes that are marked via gcc's __thread keyword. The __thread marked variables seem to be accessed below the struct pthread tcb using negative indexes.
The negative indexes of tls variables seems to be located in the global offset table, in the above example i.e.
0x3542da(%rip) ... # 3ead78 <.got+0x18>
Question:
Is there a description on which elements (libc, ld, gcc) are involved in GOT tls indexes calculation and how it is done in detail? I guess that there is maybe a pre-calculated layout, but how are libraries handled that are loaded via libdl? etc...

Mistake in Virtual Hard Disk Image Format Specification?

I want to calculate the end offset of a parent locator in a VHD. Here is a part of the VHD header:
Cookie: cxsparse
Data offset: 0xffffffffffffffff
Table offset: 0x2000
Header version: 0x00010000
Max table entries: 10240
Block size: 0x200000
Checksum: 4294956454
Parent Unique Id: 0x9678bf077e719640b55e40826ce5d178
Parent time stamp: 525527478
Reserved: 0
Parent Unicode name:
Parent locator 1:
- platform code: 0x57326b75
- platform_data_space: 4096
- platform_data_length: 86
- reserved: 0
- platform_data_offset: 0x1000
Parent locator 2:
- platform code: 0x57327275
- platform_data_space: 65536
- platform_data_length: 34
- reserved: 0
- platform_data_offset: 0xc000
Some definitions from the Virtual Hard Disk Image Format Specification:
"Table Offset: This field stores the absolute byte offset of the Block Allocation Table (BAT) in the file.
Platform Data Space: This field stores the number of 512-byte sectors needed to store the parent hard disk locator.
Platform Data Offset: This field stores the absolute file offset in bytes where the platform specific file locator data is stored.
Platform Data Length. This field stores the actual length of the parent hard disk locator in bytes."
Based on this the end offset of the two parent locators should be:
data offset + 512 * data space:
0x1000 + 512 * 4096 = 0x201000
0xc000 + 512 * 65536 = 0x200c000
But if one uses only data offset + data space:
0x1000 + 4096 = 0x2000 //end of parent locator 1, begin of BAT
0xc000 + 65536 = 0x1c000
This latter calculation makes much more sense: the end of the first parent locator is the beginning of the BAT (see header data above); and since the first BAT entry is 0xe7 (sector offset), this corresponds to file offset 0x1ce00 (sector offset * 512), which is OK, if the second parent locator ends at 0x1c000.
But if one uses the formula data offset + 512 * data space, he ends up having other data written in the parent locator. (But, in this example there would be no data corruption, since Platform Data Length is very small)
So is this a mistake in the specification, and the sentence
"Platform Data Space: This field stores the number of 512-byte sectors needed to store the parent hard disk locator."
should be
"Platform Data Space: This field stores the number of bytes needed to store the parent hard disk locator."?
Apparently Microsoft does not care about correcting their mistake, this being already discovered by Virtualbox developers. VHD.cpp contains the following comment:
/*
* The VHD spec states that the DataSpace field holds the number of sectors
* required to store the parent locator path.
* As it turned out VPC and Hyper-V store the amount of bytes reserved for the
* path and not the number of sectors.
*/

Win dbg Dump OOM exception in IIS

Occasionally, we get an OutOfMemoryException in one of our IIS processes. I tried to analyze the dump but wasn't able to reach concrete conclusions. I also tried looking into MS hotfixes, found similar problems and resolutions, but not sure if its related or not: link
Below is the output of the !analyze -v command in WinDbg:
!analyze -v
[...]
CoInitialize failed 80010106
CoInitialize failed 80010106
CoInitialize failed 80010106
GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/StageOne/w3wp_exe/7_5_7601_17514/4ce7a5f8/unknown/0_0_0_0/bbbbbbb4/80000007/00000000.htm?Retriage=1
FAULTING_IP:
+75d2faf02afdbf0
00000000 ?? ???
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00000000
ExceptionCode: 80000007 (Wake debugger)
ExceptionFlags: 00000000
NumberParameters: 0
BUGCHECK_STR: 80000007
PROCESS_NAME: w3wp.exe
ERROR_CODE: (NTSTATUS) 0x80000007 - {Kernel Debugger Awakened} the system debugger was awakened by an interrupt.
EXCEPTION_CODE: (HRESULT) 0x80000007 (2147483655) - Operation aborted
MOD_LIST: *** ERROR: Could not build analysis XML
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0x2364 (0)
Current frame:
ChildEBP RetAddr Caller, Callee
DERIVED_WAIT_CHAIN:
Dl Eid Cid WaitType
-- --- ------- --------------------------
0 370.2364 Event
WAIT_CHAIN_COMMAND: ~0s;k;;
BLOCKING_THREAD: 00002364
DEFAULT_BUCKET_ID: APPLICATION_HANG_BlockedOn_EventHandle
PRIMARY_PROBLEM_CLASS: APPLICATION_HANG_BlockedOn_EventHandle
LAST_CONTROL_TRANSFER: from 758e149d to 778df8c1
FAULTING_THREAD: 00000000
STACK_TEXT:
002efb8c 758e149d 000001d4 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
002efbf8 75c71194 000001d4 ffffffff 00000000 KERNELBASE!WaitForSingleObjectEx+0x98
002efc10 75c71148 000001d4 ffffffff 00000000 kernel32!WaitForSingleObjectExImplementation+0x75
002efc24 7470765a 000001d4 ffffffff 747057c1 kernel32!WaitForSingleObject+0x12
002efc30 747057c1 00000000 74706f84 00a21320 w3wphost!WP_IPM::WaitForShutdown+0xb
002efc38 74706f84 00a21320 00a215d0 002efd58 w3wphost!W3WP_HOST::WaitForShutdown+0x11
002efc48 00a22bdb 002efc68 00a25708 00000001 w3wphost!AppHostInitialize+0x11e
002efd58 00a23584 0000000f 00702828 00703b48 w3wp!wmain+0x373
002efd9c 75c733aa fffde000 002efde8 778f9ed2 w3wp!_initterm_e+0x163
002efda8 778f9ed2 fffde000 71b16c75 00000000 kernel32!BaseThreadInitThunk+0xe
002efde8 778f9ea5 00a236b5 fffde000 ffffffff ntdll!__RtlUserThreadStart+0x70
002efe00 00000000 00a236b5 fffde000 00000000 ntdll!_RtlUserThreadStart+0x1b
FOLLOWUP_IP:
w3wphost!WP_IPM::WaitForShutdown+b
7470765a f60520d0707403 test byte ptr [w3wphost!g_dwDebugFlags (7470d020)],3
SYMBOL_STACK_INDEX: 4
SYMBOL_NAME: w3wphost!WP_IPM::WaitForShutdown+b
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: w3wphost
IMAGE_NAME: w3wphost.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 4ce7a5d0
STACK_COMMAND: ~0s ; kb
BUCKET_ID: 80000007_w3wphost!WP_IPM::WaitForShutdown+b
FAILURE_BUCKET_ID: APPLICATION_HANG_BlockedOn_EventHandle_80000007_w3wphost.dll!WP_IPM::WaitForShutdown
WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/w3wp_exe/7_5_7601_17514/4ce7a5f8/unknown/0_0_0_0/bbbbbbb4/80000007/00000000.htm?Retriage=1
Followup: MachineOwner
Additional information as requested from comments:
[0:000> !AnalyzeOOM
---------Heap 11---------
Managed OOM occured after GC #15967 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 20---------
Managed OOM occured after GC #15977 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 21---------
Managed OOM occured after GC #15979 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 22---------
Managed OOM occured after GC #15529 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 23---------
Managed OOM occured after GC #15975 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 25---------
Managed OOM occured after GC #15985 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 27---------
Managed OOM occured after GC #40008 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
---------Heap 30---------
Managed OOM occured after GC #40006 (Requested to allocate 0 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
0:000> !vmstat
TYPE MINIMUM MAXIMUM AVERAGE BLK COUNT TOTAL
~~~~ ~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~~~~~~~ ~~~~~
Free:
Small 4K 64K 57K 4,651 266,932K
Medium 68K 1,024K 288K 97 27,967K
Large 1,088K 6,080K 2,305K 27 62,247K
Summary 4K 6,080K 74K 4,775 357,150K
Reserve:
Small 4K 64K 12K 926 11,567K
Medium 68K 1,020K 277K 390 108,263K
Large 1,148K 16,376K 12,201K 190 2,318,211K
Summary 4K 16,376K 1,618K 1,506 2,438,043K
Commit:
Small 4K 64K 10K 8,169 85,567K
Medium 68K 1,024K 322K 552 178,023K
Large 1,028K 23,300K 5,137K 221 1,135,447K
Summary 4K 23,300K 156K 8,942 1,399,038K
Private:
Small 4K 64K 11K 5,939 65,578K
Medium 68K 1,024K 311K 472 146,891K
Large 1,028K 23,300K 9,725K 316 3,073,339K
Summary 4K 23,300K 488K 6,727 3,285,811K
Mapped:
Small 4K 64K 11K 85 979K
Medium 68K 1,004K 366K 12 4,399K
Large 1,520K 2,888K 2,206K 4 8,824K
Summary 4K 2,888K 140K 101 14,203K
Image:
Small 4K 64K 9K 3,071 30,575K
Medium 68K 1,024K 294K 458 134,995K
Large 1,032K 15,480K 4,082K 91 371,495K
Summary 4K 15,480K 148K 3,620 537,064K][1]
#############################
0:000> !eeheap -gc
Number of GC Heaps: 32
------------------------------
Heap 0 (1a616d08)
generation 0 starts at 0xa062179c
generation 1 starts at 0xa0621000
generation 2 starts at 0x1ab91000
ephemeral segment allocation context: none
segment begin allocated size
1ab90000 1ab91000 1adce1c8 0x23d1c8(2347464)
a0620000 a0621000 a0867db8 0x246db8(2387384)
Large object heap starts at 0x3ab91000
segment begin allocated size
3ab90000 3ab91000 3b343490 0x7b2490(8070288)
Heap Size: Size: 0xc36410 (12805136) bytes.
------------------------------
Heap 1 (1a619970)
generation 0 starts at 0xa965da00
generation 1 starts at 0xa9621000
generation 2 starts at 0x1bb91000
ephemeral segment allocation context: none
segment begin allocated size
1bb90000 1bb91000 1be9bbd0 0x30abd0(3189712)
a9620000 a9621000 a982dd14 0x20cd14(2149652)
Large object heap starts at 0x3b391000
segment begin allocated size
3b390000 3b391000 3bae09f0 0x74f9f0(7666160)
Heap Size: Size: 0xc672d4 (13005524) bytes.
------------------------------
Heap 2 (1a6215d8)
generation 0 starts at 0xa762370c
generation 1 starts at 0xa7621000
generation 2 starts at 0x1cb91000
ephemeral segment allocation context: none
segment begin allocated size
1cb90000 1cb91000 1d0a4604 0x513604(5322244)
a7620000 a7621000 a78a3a20 0x282a20(2632224)
Large object heap starts at 0x3bb91000
segment begin allocated size
3bb90000 3bb91000 3c384cf8 0x7f3cf8(8338680)
736b0000 736b1000 73769790 0xb8790(755600)
Heap Size: Size: 0x10424ac (17048748) bytes.
------------------------------
Heap 3 (1a624240)
generation 0 starts at 0xb56226d0
generation 1 starts at 0xb5621000
generation 2 starts at 0x1db91000
ephemeral segment allocation context: none
segment begin allocated size
1db90000 1db91000 1debd778 0x32c778(3327864)
b5620000 b5621000 b56346dc 0x136dc(79580)
Large object heap starts at 0x3c391000
segment begin allocated size
3c390000 3c391000 3c88b720 0x4fa720(5220128)
Heap Size: Size: 0x83a574 (8627572) bytes.
------------------------------
Heap 4 (1a626ea8)
generation 0 starts at 0x9762eb1c
generation 1 starts at 0x97621000
generation 2 starts at 0x1eb91000
ephemeral segment allocation context: none
segment begin allocated size
1eb90000 1eb91000 1ee6ae1c 0x2d9e1c(2989596)
97620000 97621000 97a87308 0x466308(4612872)
Large object heap starts at 0x3cb91000
segment begin allocated size
3cb90000 3cb91000 3d36c7b8 0x7db7b8(8239032)
f9e70000 f9e71000 f9e975a0 0x265a0(157088)
Heap Size: Size: 0xf41e7c (15998588) bytes.
------------------------------
Heap 5 (1a639b10)
generation 0 starts at 0x8f62107c
generation 1 starts at 0x8f621000
generation 2 starts at 0x1fb91000
ephemeral segment allocation context: none
segment begin allocated size
1fb90000 1fb91000 20b8500c 0xff400c(16728076)
8f620000 8f621000 8f777088 0x156088(1400968)
Large object heap starts at 0x3d391000
segment begin allocated size
3d390000 3d391000 3d903cb0 0x572cb0(5713072)
Heap Size: Size: 0x16bcd44 (23842116) bytes.
------------------------------
Heap 6 (1a63c778)
generation 0 starts at 0xba6611e8
generation 1 starts at 0xba621000
generation 2 starts at 0x20b91000
ephemeral segment allocation context: none
segment begin allocated size
20b90000 20b91000 20e66118 0x2d5118(2969880)
ba620000 ba621000 ba7051f4 0xe41f4(934388)
Large object heap starts at 0x3db91000
segment begin allocated size
3db90000 3db91000 3e348dd8 0x7b7dd8(8093144)
Heap Size: Size: 0xb710e4 (11997412) bytes.
------------------------------
Heap 7 (1a63f3e0)
generation 0 starts at 0xad621918
generation 1 starts at 0xad621000
generation 2 starts at 0x21b91000
ephemeral segment allocation context: none
segment begin allocated size
21b90000 21b91000 21fe7dd0 0x456dd0(4550096)
ad620000 ad621000 adad37e8 0x4b27e8(4925416)
Large object heap starts at 0x3e391000
segment begin allocated size
3e390000 3e391000 3eaea868 0x759868(7706728)
Heap Size: Size: 0x1062e20 (17182240) bytes.
------------------------------
Heap 8 (1a642048)
generation 0 starts at 0xf5e724e0
generation 1 starts at 0xf5e71000
generation 2 starts at 0x22b91000
ephemeral segment allocation context: none
segment begin allocated size
22b90000 22b91000 22ee2cc8 0x351cc8(3480776)
f5e70000 f5e71000 f5eb04ec 0x3f4ec(259308)
Large object heap starts at 0x3eb91000
segment begin allocated size
3eb90000 3eb91000 3f03b3c0 0x4aa3c0(4891584)
Heap Size: Size: 0x83b574 (8631668) bytes.
------------------------------
Heap 9 (1a648cb0)
generation 0 starts at 0x8d630bc4
generation 1 starts at 0x8d621000
generation 2 starts at 0x23b91000
ephemeral segment allocation context: none
segment begin allocated size
23b90000 23b91000 23e4d69c 0x2bc69c(2868892)
8d620000 8d621000 8daf7fb4 0x4d6fb4(5074868)
Large object heap starts at 0x3f391000
segment begin allocated size
3f390000 3f391000 3f991138 0x600138(6291768)
Heap Size: Size: 0xd93788 (14235528) bytes.
------------------------------
Heap 10 (1a64b918)
generation 0 starts at 0xa86261d0
generation 1 starts at 0xa8621000
generation 2 starts at 0x24b91000
ephemeral segment allocation context: none
segment begin allocated size
24b90000 24b91000 250b5b3c 0x524b3c(5393212)
a8620000 a8621000 a891ad34 0x2f9d34(3120436)
Large object heap starts at 0x3fb91000
segment begin allocated size
3fb90000 3fb91000 3ff89810 0x3f8810(4163600)
Heap Size: Size: 0xc17080 (12677248) bytes.
------------------------------
Heap 11 (1a64e580)
generation 0 starts at 0x916238ec
generation 1 starts at 0x91621000
generation 2 starts at 0x25b91000
ephemeral segment allocation context: none
segment begin allocated size
25b90000 25b91000 25ea5d64 0x314d64(3231076)
91620000 91621000 91930198 0x30f198(3207576)
Large object heap starts at 0x40391000
segment begin allocated size
40390000 40391000 40ac8f50 0x737f50(7569232)
Heap Size: Size: 0xd5be4c (14007884) bytes.
------------------------------
Heap 12 (1a65b850)
generation 0 starts at 0x7c52281c
generation 1 starts at 0x7c521000
generation 2 starts at 0x26b91000
ephemeral segment allocation context: none
segment begin allocated size
26b90000 26b91000 2702cad8 0x49bad8(4831960)
7c520000 7c521000 7c7b662c 0x29562c(2709036)
Large object heap starts at 0x40b91000
segment begin allocated size
40b90000 40b91000 41378c38 0x7e7c38(8289336)
e73d0000 e73d1000 e78cce00 0x4fbe00(5225984)
Heap Size: Size: 0x1414b3c (21056316) bytes.
------------------------------
Heap 13 (1a65ef20)
generation 0 starts at 0xf7e77370
generation 1 starts at 0xf7e71000
generation 2 starts at 0x27b91000
ephemeral segment allocation context: none
segment begin allocated size
27b90000 27b91000 27ee43d4 0x3533d4(3486676)
f7e70000 f7e71000 f828f6fc 0x41e6fc(4318972)
Large object heap starts at 0x41391000
segment begin allocated size
41390000 41391000 41b8edf0 0x7fddf0(8379888)
ebc80000 ebc81000 ec460740 0x7df740(8255296)
7e520000 7e521000 7e56dba8 0x4cba8(314280)
Heap Size: Size: 0x179bba8 (24755112) bytes.
------------------------------
Heap 14 (1a661458)
generation 0 starts at 0x9e65f268
generation 1 starts at 0x9e621000
generation 2 starts at 0x28b91000
ephemeral segment allocation context: none
segment begin allocated size
28b90000 28b91000 28f1aacc 0x389acc(3709644)
9e620000 9e621000 9e96f57c 0x34e57c(3466620)
Large object heap starts at 0x41b91000
segment begin allocated size
41b90000 41b91000 42268f58 0x6d7f58(7176024)
Heap Size: Size: 0xdaffa0 (14352288) bytes.
------------------------------
Heap 15 (1a663990)
generation 0 starts at 0x9faacc7c
generation 1 starts at 0x9faa8ac4
generation 2 starts at 0x29b91000
ephemeral segment allocation context: none
segment begin allocated size
29b90000 29b91000 29cde0e8 0x14d0e8(1364200)
9f620000 9f621000 9fd16c88 0x6f5c88(7298184)
Large object heap starts at 0x42391000
segment begin allocated size
42390000 42391000 42adf6a0 0x74e6a0(7661216)
Heap Size: Size: 0xf91410 (16323600) bytes.
------------------------------
Heap 16 (1a665ec8)
generation 0 starts at 0xc362a47c
generation 1 starts at 0xc3621000
generation 2 starts at 0x2ab91000
ephemeral segment allocation context: none
segment begin allocated size
2ab90000 2ab91000 2afbc464 0x42b464(4371556)
c3620000 c3621000 c3854488 0x233488(2307208)
Large object heap starts at 0x42b91000
segment begin allocated size
42b90000 42b91000 42f635f8 0x3d25f8(4007416)
Heap Size: Size: 0xa30ee4 (10686180) bytes.
------------------------------
Heap 17 (1a668418)
generation 0 starts at 0x94622638
generation 1 starts at 0x94621000
generation 2 starts at 0x2bb91000
ephemeral segment allocation context: none
segment begin allocated size
2bb90000 2bb91000 2bfd1374 0x440374(4457332)
94620000 94621000 948da24c 0x2b924c(2855500)
Large object heap starts at 0x43391000
segment begin allocated size
43390000 43391000 43b7a280 0x7e9280(8295040)
67350000 67351000 6739db20 0x4cb20(314144)
Heap Size: Size: 0xf2f360 (15922016) bytes.
------------------------------
Heap 18 (1a669d20)
generation 0 starts at 0x9a621f68
generation 1 starts at 0x9a621000
generation 2 starts at 0x2cb91000
ephemeral segment allocation context: none
segment begin allocated size
2cb90000 2cb91000 2ce5c30c 0x2cb30c(2929420)
9a620000 9a621000 9a6e597c 0xc497c(805244)
Large object heap starts at 0x43b91000
segment begin allocated size
43b90000 43b91000 43f1f520 0x38e520(3728672)
Heap Size: Size: 0x71e1a8 (7463336) bytes.
------------------------------
Heap 19 (1a66b628)
generation 0 starts at 0x83641300
generation 1 starts at 0x83621000
generation 2 starts at 0x2db91000
ephemeral segment allocation context: none
segment begin allocated size
2db90000 2db91000 2dfaecb8 0x41dcb8(4316344)
83620000 83621000 83855614 0x234614(2311700)
Large object heap starts at 0x44391000
segment begin allocated size
44390000 44391000 44a37488 0x6a6488(6972552)
Heap Size: Size: 0xcf8754 (13600596) bytes.
------------------------------
Heap 20 (1a66cf30)
generation 0 starts at 0x8b621738
generation 1 starts at 0x8b621000
generation 2 starts at 0x2eb91000
ephemeral segment allocation context: none
segment begin allocated size
2eb90000 2eb91000 2ef0c5e4 0x37b5e4(3651044)
8b620000 8b621000 8b94d484 0x32c484(3327108)
Large object heap starts at 0x44b91000
segment begin allocated size
44b90000 44b91000 450100c0 0x47f0c0(4714688)
Heap Size: Size: 0xb26b28 (11692840) bytes.
------------------------------
Heap 21 (1a66e838)
generation 0 starts at 0xf31d3830
generation 1 starts at 0xf31d1000
generation 2 starts at 0x2fb91000
ephemeral segment allocation context: none
segment begin allocated size
2fb90000 2fb91000 2fe8b854 0x2fa854(3123284)
f31d0000 f31d1000 f35a9948 0x3d8948(4032840)
Large object heap starts at 0x45391000
segment begin allocated size
45390000 45391000 458c3008 0x532008(5447688)
Heap Size: Size: 0xc051a4 (12603812) bytes.
------------------------------
Heap 22 (1a670140)
generation 0 starts at 0x9867de74
generation 1 starts at 0x98621000
generation 2 starts at 0x30b91000
ephemeral segment allocation context: none
segment begin allocated size
30b90000 30b91000 3102bbdc 0x49abdc(4828124)
98620000 98621000 988edc84 0x2ccc84(2935940)
Large object heap starts at 0x45b91000
segment begin allocated size
45b90000 45b91000 462adab8 0x71cab8(7457464)
Heap Size: Size: 0xe84318 (15221528) bytes.
------------------------------
Heap 23 (1a671a48)
generation 0 starts at 0xe8c810dc
generation 1 starts at 0xe8c81000
generation 2 starts at 0x31b91000
ephemeral segment allocation context: none
segment begin allocated size
31b90000 31b91000 31de8af0 0x257af0(2456304)
e8c80000 e8c81000 e8f756f8 0x2f46f8(3098360)
Large object heap starts at 0x46391000
segment begin allocated size
46390000 46391000 467d71b0 0x4461b0(4481456)
Heap Size: Size: 0x992398 (10036120) bytes.
------------------------------
Heap 24 (1a673350)
generation 0 starts at 0xa1621544
generation 1 starts at 0xa1621000
generation 2 starts at 0x32b91000
ephemeral segment allocation context: none
segment begin allocated size
32b90000 32b91000 32f74f04 0x3e3f04(4079364)
a1620000 a1621000 a1803858 0x1e2858(1976408)
Large object heap starts at 0x46b91000
segment begin allocated size
46b90000 46b91000 4737fc08 0x7eec08(8317960)
67b90000 67b91000 67d11100 0x180100(1573120)
Heap Size: Size: 0xf35464 (15946852) bytes.
------------------------------
Heap 25 (1a674c58)
generation 0 starts at 0x8c6222b8
generation 1 starts at 0x8c621000
generation 2 starts at 0x33b91000
ephemeral segment allocation context: none
segment begin allocated size
33b90000 33b91000 33edff20 0x34ef20(3469088)
8c620000 8c621000 8ca2c690 0x40b690(4241040)
Large object heap starts at 0x47391000
segment begin allocated size
47390000 47391000 47a011a0 0x6701a0(6750624)
Heap Size: Size: 0xdca750 (14460752) bytes.
------------------------------
Heap 26 (1a676560)
generation 0 starts at 0x9b62150c
generation 1 starts at 0x9b621000
generation 2 starts at 0x34b91000
ephemeral segment allocation context: none
segment begin allocated size
34b90000 34b91000 34fa6200 0x415200(4280832)
9b620000 9b621000 9b8b531c 0x29431c(2704156)
Large object heap starts at 0x47b91000
segment begin allocated size
47b90000 47b91000 48373ec0 0x7e2ec0(8269504)
7aa10000 7aa11000 7ab44168 0x133168(1257832)
Heap Size: Size: 0xfbf544 (16512324) bytes.
------------------------------
Heap 27 (1a677e68)
generation 0 starts at 0x92630b90
generation 1 starts at 0x92621000
generation 2 starts at 0x35b91000
ephemeral segment allocation context: none
segment begin allocated size
35b90000 35b91000 361323f0 0x5a13f0(5903344)
92620000 92621000 929fcd4c 0x3dbd4c(4046156)
Large object heap starts at 0x48391000
segment begin allocated size
48390000 48391000 48b76c48 0x7e5c48(8281160)
f0680000 f0681000 f06f4570 0x73570(472432)
Heap Size: Size: 0x11d62f4 (18703092) bytes.
------------------------------
Heap 28 (1a679770)
generation 0 starts at 0xe1c610dc
generation 1 starts at 0xe1c61000
generation 2 starts at 0x36b91000
ephemeral segment allocation context: none
segment begin allocated size
36b90000 36b91000 37076c64 0x4e5c64(5135460)
e1c60000 e1c61000 e1ed5044 0x274044(2572356)
Large object heap starts at 0x48b91000
segment begin allocated size
48b90000 48b91000 4937c3a8 0x7eb3a8(8303528)
f51d0000 f51d1000 f56afdf8 0x4dedf8(5107192)
Heap Size: Size: 0x1423e48 (21118536) bytes.
------------------------------
Heap 29 (1a67b078)
generation 0 starts at 0xa6621380
generation 1 starts at 0xa6621000
generation 2 starts at 0x37b91000
ephemeral segment allocation context: none
segment begin allocated size
37b90000 37b91000 37ecffc0 0x33efc0(3403712)
a6620000 a6621000 a6873190 0x252190(2433424)
Large object heap starts at 0x49391000
segment begin allocated size
49390000 49391000 49a365c8 0x6a55c8(6968776)
Heap Size: Size: 0xc36718 (12805912) bytes.
------------------------------
Heap 30 (1a67c980)
generation 0 starts at 0xb36238ac
generation 1 starts at 0xb3621000
generation 2 starts at 0x38b91000
ephemeral segment allocation context: none
segment begin allocated size
38b90000 38b91000 38eda4b8 0x3494b8(3445944)
b3620000 b3621000 b36978b8 0x768b8(485560)
Large object heap starts at 0x49b91000
segment begin allocated size
49b90000 49b91000 49ffd360 0x46c360(4637536)
Heap Size: Size: 0x82c0d0 (8569040) bytes.
------------------------------
Heap 31 (1a67e288)
generation 0 starts at 0x79a11784
generation 1 starts at 0x79a11000
generation 2 starts at 0x39b91000
ephemeral segment allocation context: none
segment begin allocated size
39b90000 39b91000 3a35caf0 0x7cbaf0(8174320)
79a10000 79a11000 79ec789c 0x4b689c(4941980)
Large object heap starts at 0x4a391000
segment begin allocated size
4a390000 4a391000 4a94e330 0x5bd330(6017840)
Heap Size: Size: 0x123f6bc (19134140) bytes.
------------------------------
GC Heap Size: Size: 0x1c1341b8 (471024056) bytes.
Based on the output from !vmstat, you are out of memory. There's some mild address space fragmentation, but you only have a total of ~350MB of free memory, so you're really running close to the address space limit. The largest free block is just 6MB, and the CLR allocates virtual memory segments that are at least 16MB in size.
Your total GC heap size is just 470MB (see the last line from the !eeheap -gc output), which means you have other stuff in your process using up address space. Namely, you have >500MB of images (DLLs) and >3GB of memory classified as "Private". This can be a bunch of different things; for example, it can be unmanaged heap allocations.
You can try to further zoom in on the space hog by running !heap -s -h 0 to see if you have large unmanaged heaps in your process. I suggest that once you have a direction (is it an unmanaged heap leak? something else?) to ask another question with your findings. From the information you posted so far, we can conclude it's likely unrelated to what the managed part of your application is doing. Do you have large unmanaged components in your application? There are techniques for analyzing unmanaged memory leaks, such as UMDH or ETW heap allocation tracing.
One final comment: why are you running a 32-bit app on a system with 32 processors? Looks like a server system, and I bet you have more than 4GB of physical memory. If it's at all under your control, try making the move to 64-bit.

ARM linux kernel head-common.S

I was looking head-common.S
at the __mmap_switched:
.long init_thread_union + THREAD_START_SP # sp //for stack pointer
THREAD_START_SP is defined THREAD_SIZE(8192) - 8 in "thread+info.h"
set stack size 8KB(8129) and minus 8byte.
why minus 8byte?
i suspect, i think DA(decrement after) right?
The 8 bytes aligned is the requirement in APCS.
In APCS, the chapter 5.2.1 The Stack,
The stack must also conform to the following constraint at a public interface:
SP mod 8 = 0. The stack must be double-word aligned.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.swdev.abi/index.html

Resources