I try to use !htrace to detect some handle leaks (I enable before in gflags user mode callstack)
The problem is that even though it shows me callstacks of handle allocation their size is limited to 14 frames. Windbg command ".kframes biggerLimit" does not help.
what do you mean by only 14 frames ?
do you do enough operations after you enable !htrace for htrace to collect traces ?
as far as i can tell there is no 14 frame limit
just to confirm i attached cdb to a running instance of notepad++ and logged the traces
cdb -pn notepad++
!htrace -enable
.logopen d:\htrace.txt
g
open closed several tabs about , plugins etc to possibly collect
broke back with ctrl+c
did !htrace and quit and awk grepped the htrace.txt
i can see a lot of traces and frames > 14 o a log 1.61 MB big for a few minutes
:\>ls -lag *.txt
-rw-r--r-- 1 197121 1691957 Oct 19 23:01 htrace.txt
:\>awk "/Handle = /" htrace.txt | tail
Handle = 0x0000000000000484 - CLOSE
Handle = 0x0000000000000484 - OPEN
Handle = 0x0000000000000480 - CLOSE
Handle = 0x0000000000000480 - OPEN
Handle = 0x000000000000047c - CLOSE
Handle = 0x000000000000047c - OPEN
Handle = 0x0000000000000478 - CLOSE
Handle = 0x0000000000000478 - OPEN
Handle = 0x0000000000000474 - CLOSE
Handle = 0x0000000000000474 - OPEN
:\>grep -iE "parse|dump" htrace.txt
Parsed 0x56C stack traces.
Dumped 0x56C stack traces.
:\>awk "/Handle =/{print NR-1;NR=0}" htrace.txt | sort | uniq
13
14
15
16
17
18 <<<<<<<<<<<<<
3
:\>
one such trace contaning 15 frames as below
Handle = 0x00000000000004a0 - CLOSE
Thread ID = 0x0000000000002488, Process ID = 0x0000000000000644
0x00007ffa5769d084: ntdll!NtClose+0x0000000000000014
0x00007ffa54fe3c56: KERNELBASE!RegCloseKey+0x00000000000000b6
0x00007ffa566e48d3: shcore!CRegistrySource::Release+0x0000000000000043
0x00007ffa54962773: windows_storage!CProgidArray::EnumerateCapableFileHandlers+0x00000000000001d3
0x00007ffa54961ce6: windows_storage!CAssocProgidElement::_GetUserChoice+0x0000000000000082
0x00007ffa549629ac: windows_storage!CAssocProgidElement::_MapExtensionToUserDefault+0x0000000000000204
0x00007ffa54962206: windows_storage!CAssocProgidElement::_InitSource+0x0000000000000066
0x00007ffa5496d6ac: windows_storage!CAssocShellElement::SetKey+0x000000000000005c
0x00007ffa5495a957: windows_storage!AssocElemCreateForClass2+0x0000000000000083
0x00007ffa5495a6d1: windows_storage!CFileAssocList::CreateAssoc+0x00000000000000d1
0x00007ffa5493e17f: windows_storage!CAssocListBase::_GetOrCreate+0x000000000000006f
0x00007ffa5495e773: windows_storage!CAssocListBase::GetAssoc+0x00000000000000a7
0x00007ffa5495e657: windows_storage!CFileAssocList::_IsLink+0x0000000000000043
0x00007ffa5495e5bb: windows_storage!CFileAssocList::GetAssocTable+0x000000000000001b
0x00007ffa5495ebde: windows_storage!CAssocListBase::EnumerateElements+0x00000000000000de
--------------------------------------
Handle = 0x00000000000004a0 - OPEN
it appears to be a hardcoded max of 16 frames according to the now disappeared post from ertwhile windbg msdn newsgroup
quoting from a copy of the quote
"Dan Mihai [MSFT]" dmihai#xxxxxxxxxxxxxxxxxxxx wrote in message
news:%23NGSSnSnGHA.4604#xxxxxxxxxxxxxxxxxxxxxxx
Small correction: the !htrace stack traces are captured by the OS kernel
(not ntdll). The greatest thing about that is that the order of these
traces is "fully accurate". For example, if process A is closing a handle
inside process B (using DuplicateHandle), with handle tracing enabled for
B you will get a log entry for the cross-process CLOSE operation. If stack
tracing would have been implemented in user-mode (e.g. inside ntdll),
process B's ntdll would not get "notified" about the cross process CLOSE
and B's handle would go away without any trace in the !htrace log. That
would reduce the value of !htrace.
The maximum depth of the stack trace is currently hardcoded to 16
(although it's possible it will change in the future). Also, that includes
a few entries for the kernel-mode portion of the stack trace. Those stack
trace entries can be displayed by kernel or driver developers by using
!htrace in a kernel debugger. So getting around 11 user-mode entries for
each of your traces sounds accurate.
The kernel doesn't currently allow very deep stack traces because the
array of traces is stored in non-paged pool, a very expensive system
resource.
Dan
Related
I have an embedded systems running linux 2.6.30 on a powerpc CPU (Freescale MPC5125). After writing new code for this device, I suddenly observed user space hangs for about 5-6 minutes.
It turned out, that the new code calls clock_gettime(CLOCK_MONOTONIC_RAW,...) and the system freezes after one of these clock_gettime() calls returned an unplausible value.
Here is the recorded output of strace:
14:39:48.769746 clock_gettime(0x4 /* CLOCK_??? */, {14496, 285316209}) = 0
14:39:48.782047 clock_gettime(0x4 /* CLOCK_??? */, {14496, 285627946}) = 0
14:39:48.782354 select(14, [4 5 6 7 9 10 12 13], NULL, NULL, {19, 999689}) = 1 (in [13], left {19, 853317})
14:39:48.928554 read(13, "\0\0\0\257\0\0\0\1", 8) = 8
14:39:48.928917 clock_gettime(0x4 /* CLOCK_??? */, {1266889381, 847702609}) = 0
14:45:15.612681 time(NULL) = 1646750715
14:45:15.613026 ...
14:45:15.818364 clock_gettime(0x4 /* CLOCK_??? */, {14819, 27615047}) = 0
The system continues to respond to ICMP echo requests, accepts new TCP connections and even acks incoming data on these new TCP connections. But the entire user space seem to be hanging, even no echo on the serial console.
After 5-6 minutes, the system recovers and continues to work, TCP sessions did not time out, all ssh sessions continue to work and input on the serial console is echoed again.
The problem seems to go away, when I use CLOCK_MONOTONIC instead of CLOCK_MONOTONIC_RAW. I could just do this change to my software and just live with regarding this problem a a known kernel bug on my system, which could be easy to avoid, but other programs on that system also use CLOCK_MONOTONIC_RAW and I cannot change them. I should at least understand, what is going wrong inside of the kernel.
Unfortunately, the system does not have a JTAG pads on its PCB, so I cannot debug the kernel, while the user space is blocked.
So, here are my questions:
Has anybody ever observed problems like this?
What might be the cause of clock_gettime() calls blocking all user space processes?
What can I do to continune hunting this problem in the kernel?
I am reading this article about the SEH on Windows.
and here is the source code of myseh.cpp
I debugged myseh.cpp. I set 2 breakpoints at printf("Hello from an exception handler\n"); at line:24 and DWORD handler = (DWORD)_except_handler; at line: 36 respectively.
Then I ran it and it broke at line:36. I saw the stack trace as follows.
As going, AccessViolationException occurred because of mov [eax], 1
Then it broke at line:24. I saw the stack trace as follows.
The same thread but the frame of main was gone! Instead of _except_handle. And ESP jumped from 0018f6c8 to 0018ef34;it's a big gap between 0018f6c8 and 0018ef34
After Exception handled.
I know that _except_handle must be run at user mode rather than kernel mode.
After _except_handle returned, the thread turned to ring0 and then windows kernel modified CONTEXT EAX to &scratch & and then returned to ring3 . Thus thread ran continually.
I am curious about the mechanism of windows dealing with exception:
WHY the frame calling main was gone?
WHY the ESP jumped from 0018f6c8 to 0018ef34?(I mean a big pitch), Do those ESP address belong to same thread's stack??? Did the kernel play some tricks on ESP in ring3??? If so, WHY did it choose the address of 0018ef34 as handler callback's frame? Many thanks!
You are using the default debugger settings, not good enough to see all the details. They were chosen to help you focus on your own code and get the debug session started as quickly as possible.
The [External Code] block tells you that there are parts of the stack frame that do not belong to code that you have written. They don't, they belong to the operating system. Use Tools > Options > Debugging > General and untick the "Enable Just My Code" option.
The [Frames below might be incorrect...] warning tells you that the debugger doesn't have accurate PDBs to correctly walk the stack. Use Tools > Options > Debugging > Symbols and tick the "Microsoft Symbol Servers" option and choose a cache location. The debugger will now download the PDBs you need to debug through the operating system DLLs. Might take a while, it is only done once.
You can reason out the big ESP change, the CONTEXT structure is quite large and takes up space on the stack.
After these changes you ought to now see something resembling:
ConsoleApplication1942.exe!_except_handler(_EXCEPTION_RECORD * ExceptionRecord, void * EstablisherFrame, _CONTEXT * ContextRecord, void * DispatcherContext) Line 22 C++
ntdll.dll!ExecuteHandler2#20() Unknown
ntdll.dll!ExecuteHandler#20() Unknown
ntdll.dll!_KiUserExceptionDispatcher#8() Unknown
ConsoleApplication1942.exe!main() Line 46 C++
ConsoleApplication1942.exe!invoke_main() Line 64 C++
ConsoleApplication1942.exe!__scrt_common_main_seh() Line 255 C++
ConsoleApplication1942.exe!__scrt_common_main() Line 300 C++
ConsoleApplication1942.exe!mainCRTStartup() Line 17 C++
kernel32.dll!#BaseThreadInitThunk#12() Unknown
ntdll.dll!__RtlUserThreadStart() Unknown
ntdll.dll!__RtlUserThreadStart#8() Unknown
Recorded on Win10 version 1607 and VS2015 Update 2. This isn't the correct way to write SEH handlers, find a better example in this post.
I have a .NET application running in production environment (WINDOWS XP + .NET 3.5 SP1) with a stable handle count around 2000, but in some unknown situation, its handle count will increase extremely fast and finally crash itself(over 10,000 which monitored by PerfMon tool).
I've made a memory dump from there during the increasing period (not crash yet) and imported to WinDbg, can see the overall handle summary:
0:000> !handle 0 0
7229 Handles
Type Count
None 19
Event 504
Section 6108
File 262
Port 15
Directory 3
Mutant 56
WindowStation 2
Semaphore 70
Key 97
Token 2
Process 3
Thread 75
Desktop 1
IoCompletion 9
Timer 2
KeyedEvent 1
so no surprise, the leak type is the Section, dig more:
0:000> !handle 0 ff Section
Handle 00007114
Type Section
Attributes 0
GrantedAccess 0xf0007:
Delete,ReadControl,WriteDac,WriteOwner
Query,MapWrite,MapRead
HandleCount 2
PointerCount 4
Name \BaseNamedObjects\MSCTF.MarshalInterface.FileMap.IBC.AKCHAC.CGOOBGKD
No object specific information available
Handle 00007134
Type Section
Attributes 0
GrantedAccess 0xf0007:
Delete,ReadControl,WriteDac,WriteOwner
Query,MapWrite,MapRead
HandleCount 2
PointerCount 4
Name \BaseNamedObjects\MSCTF.MarshalInterface.FileMap.IBC.GKCHAC.KCLBDGKD
No object specific information available
...
...
...
...
6108 handles of type Section
can see the BaseNamedObjects' naming convention are all MSCTF.MarshalInterface.FileMap.IBC.***.*****.
Basically I was stopped here, and could not go any further to link the information to my application.
Anyone could help?
[Edit0]
Tried several combination of GFlags command(+ust or via UI), with no luck, the dumps opened with WinDbg always see nothing via !htrace, so have to using attach process which finally I got the stack for above leaking handle:
0:033> !htrace 1758
--------------------------------------
Handle = 0x00001758 - OPEN
Thread ID = 0x00000768, Process ID = 0x00001784
0x7c809543: KERNEL32!CreateFileMappingA+0x0000006e
0x74723917: MSCTF!CCicFileMappingStatic::Create+0x00000022
0x7473fc0f: MSCTF!CicCoMarshalInterface+0x000000f8
0x747408e9: MSCTF!CStub::stub_OutParam+0x00000110
0x74742b05: MSCTF!CStubIUnknown::stub_QueryInterface+0x0000009e
0x74743e75: MSCTF!CStubITfLangBarItem::Invoke+0x00000014
0x7473fdb9: MSCTF!HandleSendReceiveMsg+0x00000171
0x7474037f: MSCTF!CicMarshalWndProc+0x00000161
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Windows\system32\USER32.dll -
0x7e418734: USER32!GetDC+0x0000006d
0x7e418816: USER32!GetDC+0x0000014f
0x7e4189cd: USER32!GetWindowLongW+0x00000127
--------------------------------------
and then I got stuck again, the stack seems not contain any of our user code, what is the suggestion for move forward?
WinDbg isn't the ideal tool for memory leaks, especially not without preparation in advance.
There's a GFlags option (+ust) which can be enabled for a process to record the stack trace for handle allocations. If you don't have this flag enabled, you'll probably not get more info out of your dump. If you have it, use !htrace to see the stack.
You can also try UMDH (user mode dump heap), which is a free tool. Or get something like memory validator which has certainly a better usability, so it might pay off in the long run.
I have a Core Data iOS app that uses private queue concurrency in a background process. I'm getting a deadlock that makes the UI freeze up from time to time (fairly regularly, to be honest) - but all the info I get from the debugger (LLDB) is that it is stuck on pthread_mutex_lock. The stack trace is no longer than that, which makes debugging near on impossible:
thread #1: tid = 0x2503, 0x3b5060fc libsystem_kernel.dylib`__psynch_mutexwait + 24, stop reason = signal SIGSTOP
frame #0: 0x3b5060fc libsystem_kernel.dylib`__psynch_mutexwait + 24
frame #1: 0x3b44f128 libsystem_c.dylib`pthread_mutex_lock + 392
The XCode process pane is similarly only showing those two entries on the stack.
I'm quite new to this multithreading stuff so am at a total loss where to begin with fixing the issue. Any suggestions for how to go about debugging this?
Your stack is obviously longer than two frames, you can't start a thread with pthread_mutex_lock. So the truncation of the stack frame is pretty clearly just a bug in the lldb unwinder. If you have an ADC account, please file a bug about this at bugreporter.apple.com. Also if you're not using the most recent version of lldb you can get your hands on you might want to try that, maybe it fixed whatever bug you are seeing. You can install multiple Xcode's side by side so you don't have to remove the one you are currently using to try a newer one.
You might also try another tool that will give you a backtrace (e.g. the Instruments time profiler) when your app gets into this state, since it uses a different unwinder. That will at least let you see what the full backtrace is.
Our product consumes a lot of windows resources, such as socket handles, memory, threads and so on. Usually there are 700-900 active threads, but in some cases product can rapidly create new threads and do some work, and close it.
I came across with crash memory dump of our product. With ~* windbg command I can see 817 active threads, but when I run !handle command it prints me these summary:
Type Count
None 15
Event 2603
Section 13
File 705
Directory 4
Mutant 32
WindowStation 2
Semaphore 789
Key 208
Process 1
Thread 5766
Desktop 1
IoCompletion 308
Timer 276
KeyedEvent 1
TpWorkerFactory 48
So, actually process holds 5766 threads. So, my question, When Windows actually frees handles for process? Is it possible some kind of delay, or cashing? Can someone explain this behavior?
I don't think that we have handle leaks, but we have weird behavior in legacy part of system with rapidly creating and closing threads for small tasks. Also I would like to point, that we unlikely run more than 1000 threads simultaneously, I am pretty sure about this.
Thanks.
When you say So, actually process holds 5766 threads., what you really mean is that the process holds 5766 thread handles.
Even though a thread may no longer be running, whether that is the result of a call to ExitThread()/TerminateThread() or returning from the ThreadProc, any handles to that thread will remain valid. This makes it possible to do things like call GetExitCodeThread() on the handle of a thread that has finished its work.
Unfortunately, that means that you have to remember to call CloseHandle() instead of just letting it leak. The MSDN example on Creating Threads covers this to some extent.
Another thing that I will note is that somewhere not too far above 1000 running threads, you are likely to exhaust the amount of virtual address space available to a 32bit process since each thread by default reserves 1MB of address space for its stack.