I've profiled Internet Explorer 11 to find out why does it consume lots of CPU in kernel-mode.
The hottest path (that was present in 85% of samples) was the following stack:
NtAllocateVirtualMemory
whNtAllocateVirtualMemory
Wow64SystemServiceEx
ServiceNoTurbo
RunCpuSimulation
Wow64LdrpInitialize
_LdrInitialize
LdrInitializeThunk
_NtAllocateVirtualMemory#24
RtlIpv6AddressToStringW (85% of samples!)
_RtlpLfhBucketIndexMap
RtlpLowFragHeapAllocFromContext
RtlpAllocateUserBlockFromHeap
#RtlpLowFragHeapAllocateFromZone#8
_RtlpLfhBucketIndexMap
RtlAllocateHeap
How can a call to RtlAllocateHeap result in a call to RtlIpv6AddressToStringW?
Related
I am reading this article about the SEH on Windows.
and here is the source code of myseh.cpp
I debugged myseh.cpp. I set 2 breakpoints at printf("Hello from an exception handler\n"); at line:24 and DWORD handler = (DWORD)_except_handler; at line: 36 respectively.
Then I ran it and it broke at line:36. I saw the stack trace as follows.
As going, AccessViolationException occurred because of mov [eax], 1
Then it broke at line:24. I saw the stack trace as follows.
The same thread but the frame of main was gone! Instead of _except_handle. And ESP jumped from 0018f6c8 to 0018ef34;it's a big gap between 0018f6c8 and 0018ef34
After Exception handled.
I know that _except_handle must be run at user mode rather than kernel mode.
After _except_handle returned, the thread turned to ring0 and then windows kernel modified CONTEXT EAX to &scratch & and then returned to ring3 . Thus thread ran continually.
I am curious about the mechanism of windows dealing with exception:
WHY the frame calling main was gone?
WHY the ESP jumped from 0018f6c8 to 0018ef34?(I mean a big pitch), Do those ESP address belong to same thread's stack??? Did the kernel play some tricks on ESP in ring3??? If so, WHY did it choose the address of 0018ef34 as handler callback's frame? Many thanks!
You are using the default debugger settings, not good enough to see all the details. They were chosen to help you focus on your own code and get the debug session started as quickly as possible.
The [External Code] block tells you that there are parts of the stack frame that do not belong to code that you have written. They don't, they belong to the operating system. Use Tools > Options > Debugging > General and untick the "Enable Just My Code" option.
The [Frames below might be incorrect...] warning tells you that the debugger doesn't have accurate PDBs to correctly walk the stack. Use Tools > Options > Debugging > Symbols and tick the "Microsoft Symbol Servers" option and choose a cache location. The debugger will now download the PDBs you need to debug through the operating system DLLs. Might take a while, it is only done once.
You can reason out the big ESP change, the CONTEXT structure is quite large and takes up space on the stack.
After these changes you ought to now see something resembling:
ConsoleApplication1942.exe!_except_handler(_EXCEPTION_RECORD * ExceptionRecord, void * EstablisherFrame, _CONTEXT * ContextRecord, void * DispatcherContext) Line 22 C++
ntdll.dll!ExecuteHandler2#20() Unknown
ntdll.dll!ExecuteHandler#20() Unknown
ntdll.dll!_KiUserExceptionDispatcher#8() Unknown
ConsoleApplication1942.exe!main() Line 46 C++
ConsoleApplication1942.exe!invoke_main() Line 64 C++
ConsoleApplication1942.exe!__scrt_common_main_seh() Line 255 C++
ConsoleApplication1942.exe!__scrt_common_main() Line 300 C++
ConsoleApplication1942.exe!mainCRTStartup() Line 17 C++
kernel32.dll!#BaseThreadInitThunk#12() Unknown
ntdll.dll!__RtlUserThreadStart() Unknown
ntdll.dll!__RtlUserThreadStart#8() Unknown
Recorded on Win10 version 1607 and VS2015 Update 2. This isn't the correct way to write SEH handlers, find a better example in this post.
I have a .NET application running in production environment (WINDOWS XP + .NET 3.5 SP1) with a stable handle count around 2000, but in some unknown situation, its handle count will increase extremely fast and finally crash itself(over 10,000 which monitored by PerfMon tool).
I've made a memory dump from there during the increasing period (not crash yet) and imported to WinDbg, can see the overall handle summary:
0:000> !handle 0 0
7229 Handles
Type Count
None 19
Event 504
Section 6108
File 262
Port 15
Directory 3
Mutant 56
WindowStation 2
Semaphore 70
Key 97
Token 2
Process 3
Thread 75
Desktop 1
IoCompletion 9
Timer 2
KeyedEvent 1
so no surprise, the leak type is the Section, dig more:
0:000> !handle 0 ff Section
Handle 00007114
Type Section
Attributes 0
GrantedAccess 0xf0007:
Delete,ReadControl,WriteDac,WriteOwner
Query,MapWrite,MapRead
HandleCount 2
PointerCount 4
Name \BaseNamedObjects\MSCTF.MarshalInterface.FileMap.IBC.AKCHAC.CGOOBGKD
No object specific information available
Handle 00007134
Type Section
Attributes 0
GrantedAccess 0xf0007:
Delete,ReadControl,WriteDac,WriteOwner
Query,MapWrite,MapRead
HandleCount 2
PointerCount 4
Name \BaseNamedObjects\MSCTF.MarshalInterface.FileMap.IBC.GKCHAC.KCLBDGKD
No object specific information available
...
...
...
...
6108 handles of type Section
can see the BaseNamedObjects' naming convention are all MSCTF.MarshalInterface.FileMap.IBC.***.*****.
Basically I was stopped here, and could not go any further to link the information to my application.
Anyone could help?
[Edit0]
Tried several combination of GFlags command(+ust or via UI), with no luck, the dumps opened with WinDbg always see nothing via !htrace, so have to using attach process which finally I got the stack for above leaking handle:
0:033> !htrace 1758
--------------------------------------
Handle = 0x00001758 - OPEN
Thread ID = 0x00000768, Process ID = 0x00001784
0x7c809543: KERNEL32!CreateFileMappingA+0x0000006e
0x74723917: MSCTF!CCicFileMappingStatic::Create+0x00000022
0x7473fc0f: MSCTF!CicCoMarshalInterface+0x000000f8
0x747408e9: MSCTF!CStub::stub_OutParam+0x00000110
0x74742b05: MSCTF!CStubIUnknown::stub_QueryInterface+0x0000009e
0x74743e75: MSCTF!CStubITfLangBarItem::Invoke+0x00000014
0x7473fdb9: MSCTF!HandleSendReceiveMsg+0x00000171
0x7474037f: MSCTF!CicMarshalWndProc+0x00000161
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Windows\system32\USER32.dll -
0x7e418734: USER32!GetDC+0x0000006d
0x7e418816: USER32!GetDC+0x0000014f
0x7e4189cd: USER32!GetWindowLongW+0x00000127
--------------------------------------
and then I got stuck again, the stack seems not contain any of our user code, what is the suggestion for move forward?
WinDbg isn't the ideal tool for memory leaks, especially not without preparation in advance.
There's a GFlags option (+ust) which can be enabled for a process to record the stack trace for handle allocations. If you don't have this flag enabled, you'll probably not get more info out of your dump. If you have it, use !htrace to see the stack.
You can also try UMDH (user mode dump heap), which is a free tool. Or get something like memory validator which has certainly a better usability, so it might pay off in the long run.
I have been reading up on the call stack lately. However all examples and articles I have been reading has been single threaded. I am interested in how the call stack looks like in memory and how we can analyse it.
Sorry for including so many questions in one post. But it seems messy to create one post for each question when they are all related.
My questions here are for Windows x86.
So the questions I am having difficulty with is:
Is there always one call stack for each thread in a process? Ie, threads do not share call stacks?
Is the size of each call stack fixed? Or can it be different for each thread?
Let's pretend that we are doing everything ourselves and write our program in assembly. Is the call stack magically given to us? Or do we have to implement it ourselves?
If we make our program in assembly, do we then reserve some memory and set the call stack memory start address to ESP in order to set it up?
-Michael
1) Each thread has its own stack - almost by definition.
2) Maximum stack size is a process limit, specified in header. Initial thread stack size is a thread creation parameter - see CreateThread() API.
3) The OS manages all memory. The stack for new threads is dynamically allocated by the kernel upon thread creation and the top of the stack filled in with a stack frame that, amongst other stuff, allows the thread to begin execution by popping the frame in a similar manner to an interrupt-return. Don't try to do this at home.
4) NO! Import and call the CreateThread() API.
We have a slow memory leak in our application and I've already gone through the following steps in trying to analyize the cause for the leak:
Enabling user mode stack trace database in GFlags
In Windbg, typing the following command: !heap -stat -h 1250000 (where 1250000 is the address of the heap that has the leak)
After comparing multiple dumps, I see that a memory blocks of size 0xC are increasing over time and are probably the memory that is leaked.
typing the following command: !heap -flt s c
gives the UserPtr of those allocations and finally:
typing !heap -p -a address on some of those addresses always shows the following allocation call stack:
0:000> !heap -p -a 10576ef8
address 10576ef8 found in
_HEAP # 1250000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
10576ed0 000a 0000 [03] 10576ef8 0000c - (busy)
mscoreei!CLRRuntimeInfoImpl::`vftable'
7c94b244 ntdll!RtlAllocateHeapSlowly+0x00000044
7c919c0c ntdll!RtlAllocateHeap+0x00000e64
603b14a4 mscoreei!UtilExecutionEngine::ClrHeapAlloc+0x00000014
603b14cb mscoreei!ClrHeapAlloc+0x00000023
603b14f7 mscoreei!ClrAllocInProcessHeapBootstrap+0x0000002e
603b1614 mscoreei!operator new[]+0x0000002b
603d402b +0x0000005f
603d5142 mscoreei!GetThunkUseState+0x00000025
603d6fe8 mscoreei!_CorDllMain+0x00000056
79015012 mscoree!ShellShim__CorDllMain+0x000000ad
7c90118a ntdll!LdrpCallInitRoutine+0x00000014
7c919a6d ntdll!LdrpInitializeThread+0x000000c0
7c9198e6 ntdll!_LdrpInitialize+0x00000219
7c90e457 ntdll!KiUserApcDispatcher+0x00000007
This looks like thread initialization call stack but I need to know more than this.
What is the next step you would recommend to do in order to put the finger at the exact cause for the leak.
The stack recorded when using GFlags is done without utilizing .pdb and often not correct.
Since you have traced the leak down to a specific size on a given heap, you can try
to set a live break in RtlAllocateHeap and inspect the stack in windbg with proper symbols. I have used the following with some success. You must edit it to suit your heap and size.
$$ Display stack if heap handle eq 0x00310000 and size is 0x1303
$$ ====================================================================
bp ntdll!RtlAllocateHeap "j ((poi(#esp+4) = 0x00310000) & (poi(#esp+c) = 0x1303) )'k';'gc'"
Maybe you then get another stack and other ideas for the offender.
The first thing is that the new operator is the new [] operator so is there a corresponding delete[] call and not a plain old delete call?
If you suspect this code I would put a test harness around it, for instance put it in a loop and execute it 100 or 1000 times, does it still leak and proportionally.
You can also measure the memory increase using process explorer or programmatically using GetProcessInformation.
The other obvious thing is to see what happens when you comment out this function call, does the memory leak go away? You may need to do a binary chop if possible of the code to reduce the likely suspect code by half (roughly) each time by commenting out code, however, changing the behaviour of the code may cause more problems or dependant code path issues which can cause memory leaks or strange behaviour.
EDIT
Ignore the following seeing as you are working in a managed environment.
You may also consider using the STL or better yet boost reference counted pointers like shared_ptr or scoped_array for array structures to manage the lifetime of the objects.
I am trying to compile Ruby 1.9.1-p0 on HP-UX. After a small change to ext/pty.c it compiles successfully, albeit with a lot of warning messages (about 5K). When I run the self-tests using "make test" it crashes and core-dumps with the following error:
sendsig: useracc failed. 0x9fffffffbf7dae00 0x00000000005000
Pid 3044 was killed due to failure in writing the signal context - possible stack overflow.
Illegal instruction
From googling this problem the Illegal instruction is just a signal that the system uses to kill the process, and not related to the problem. It would seem that there is a problem with the re-establishing the context when calling the signal handler. Bringing the core up in gdb doesn't show a particularly deep stack, so I don't think the "possible stack overflow" is right either.
The gdb stack backtrace output looks like this:
#0 0xc00000000033a990:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
#1 0xc0000000001280a0:0 in __mxn_sleep+0xae0 ()
from /usr/lib/hpux64/libpthread.so.1
#2 0xc0000000000c0f90:0 in <unknown_procedure> + 0xc50 ()
from /usr/lib/hpux64/libpthread.so.1
#3 0xc0000000000c1e30:0 in pthread_cond_timedwait+0x1d0 ()
from /usr/lib/hpux64/libpthread.so.1
Answering my own question:
The problem was that the stack being allocated was too small. So it really was a stack overflow. The sendsig() function was preparing a context structure to be copied from kernel space to user space. The useracc() function checks that there's enough space at the address specified to do so.
The Ruby 1.9.1-p0 code was using PTHREAD_STACK_MIN to allocate the stack for any threads created. According to HP-UX documentation, on Itanium this is 256KB, but when I checked the header files, it was only 4KB. The error message from useracc() indicated that it was trying to copy 20KB.
So if a thread received a signal, it wouldn't have enough space to receive the signal context on its stack.