Our product consumes a lot of windows resources, such as socket handles, memory, threads and so on. Usually there are 700-900 active threads, but in some cases product can rapidly create new threads and do some work, and close it.
I came across with crash memory dump of our product. With ~* windbg command I can see 817 active threads, but when I run !handle command it prints me these summary:
Type Count
None 15
Event 2603
Section 13
File 705
Directory 4
Mutant 32
WindowStation 2
Semaphore 789
Key 208
Process 1
Thread 5766
Desktop 1
IoCompletion 308
Timer 276
KeyedEvent 1
TpWorkerFactory 48
So, actually process holds 5766 threads. So, my question, When Windows actually frees handles for process? Is it possible some kind of delay, or cashing? Can someone explain this behavior?
I don't think that we have handle leaks, but we have weird behavior in legacy part of system with rapidly creating and closing threads for small tasks. Also I would like to point, that we unlikely run more than 1000 threads simultaneously, I am pretty sure about this.
Thanks.
When you say So, actually process holds 5766 threads., what you really mean is that the process holds 5766 thread handles.
Even though a thread may no longer be running, whether that is the result of a call to ExitThread()/TerminateThread() or returning from the ThreadProc, any handles to that thread will remain valid. This makes it possible to do things like call GetExitCodeThread() on the handle of a thread that has finished its work.
Unfortunately, that means that you have to remember to call CloseHandle() instead of just letting it leak. The MSDN example on Creating Threads covers this to some extent.
Another thing that I will note is that somewhere not too far above 1000 running threads, you are likely to exhaust the amount of virtual address space available to a 32bit process since each thread by default reserves 1MB of address space for its stack.
Related
here https://blog.packagecloud.io/eng/2017/02/06/monitoring-tuning-linux-networking-stack-sending-data/#queuing-disciplines
it is written:
As you’ll see from the previous post, the NET_TX_SOFTIRQ softirq has
the function net_tx_action registered to it. This means that there is
a kernel thread executing net_tx_action. That thread is occasionally
paused and raise_softirq_irqoff resumes it. Let’s take a look at what
net_tx_action does so we can understand how the kernel processes
transmit requests.
It is written that kthread is occasionally paused. When a kthread is paused and why?
How kthread knows about work to execute? Does it poll a queue?
I think what's said about pausing a thread there is more like a figure of speech. In this case it's not kthread that is paused, the thread works just fine.
The body of work related to softirq is in __do_softirq() function.
There's a number of softirq types, and each softirq type is represented by a bit in a bitmask. Whenever there's work for a specific type of softirq, the corresponding bit is raised in the bitmask. __do_softirq() processes this bitmask bit by bit starting with the least significant bit, and does the work for each softirq type that has the bit set. Thus softirq types are processed in the order of priority, with the bit 0 representing the highest priority. In fact, if you look at the code you'll see that the bitmask is saved (copied) and then cleared before the processing starts, and it's the copy that is processed.
The bit for NET_TX_SOFTIRQ is raised each time a new skb is submitted to the kernel networking stack to send data out. That causes __do_softirq() to call net_tx_action() for outgoing data. If there's no data to send out, then the bit is not raised. Essentially, that's what causes the kernel softirq thread to "pause" which is just a layman way to say that there's no work for it, so net_tx_action() is not called. As soon as there's more data, the bit is raised again as data is submitted to the kernel networking stack. __do_softirq() sees that and calls net_tx_action() again.
There's a softirq thread on each CPU. A thread is run when there's at least one pending softirq type. The threads are defined in softirq_threads structure and started in spawn_softirqd() function.
I am attempting to do a relatively simple task using boost interprocess semaphore and shared memory. I want to fixed buffer of data shared between two processes, where the first process is a producer and the second process is a consumer. The buffer will consist of 3 parts. The first part will be a boost::interprocess::semaphore, used to coordinate the producer/consumer. The second part will just be an integer, so the consumer knows how many items are on the buffer. The third part will be the actual array of items. I have a very basic implementation started, but the processes hang when attempting to access open the shared memory, and i'm not certain why. I am doing this on 64-bit Centos 6.5 with gcc/g++ 4.8.2. I should note also that the machine has two CPUs, and using process affinity I am ensuring that the producer and the consumer both run on separate CPUs.
The code is at http://pastie.org/9693362. I am experiencing the following issues; with the code as is, the consumer and producer both hang at line 3 (fixed_managed_shared_memory shm(open_only, "SharedMem" );). If i comment that line out, then both end up terminating (no error is caught however) at line 26 (the post/wait on the semaphore). This makes me think that somehow the memory isnt being shared, b/c when i print out the addresses, they seem to be properly formed (as in the offsets seem to be correct), and they are properly passed between processes. Is there something I'm missing on how to properly set this up?
Today I found a very strange problem.
I ran Redhat Enterprise Linux 6, and the CPU was Intel E31275 (4 cores, 8 threads). I found one kernel thread (I called it as my_thread) didn't work correctly.
With "ps" command, I found the status of my_thread was always running:
ps ax
5545 ? R 3:14 [my_thread]
15774 ttyS0 Ss 0:00 -bash
...
But its running time was always 3:14. Since it ws running, why didn't the total time increase?
From the proc file /proc/5545/sched, I found the all statistics including wakeups count (se.nr_wakeups) for this thread was always the same, too.
From /proc/5545/stack, I found this thread called this function and never returned:
interruptible_sleep_on_timeout(&q, 3*HZ);
In theory this function would return every 3 seconds if no other threads woke up the thread. Each time after the function returned, se.nr_wakeups in /proc/5545/sched would be increased by 1. But this never happened after I found the thread had some problems.
Does any one have some ideas? Is it possible that interruptible_sleep_on_timeout() never returns?
Update:
I find the problem won't occur if I set CPU affinity for this thread. If I pin it to a dedicated core, then everything is OK. Are there any problems with SMP scheduling?
Update again:
After I disalbe hyperthread in BIOS, I have not seen such a problem until now.
First off, R indicates the thread is not in running state but runnable. That is, it does not mean it runs, it means it is in a state the scheduler is allowed to pick it for running. There is a big difference between the two.
In a similar sense interruptible_sleep_on_timeout(&q, 3*HZ); will not run the thread after 3 jiffies, but rather make it available for running after 3 jiffies - and indeed you see it in "ps" as available for running, so possibly the timeout has indeed occurred.
Since you did not say anything about the kernel thread in question I don't even know if it is in your own code or standard kernel code so I cannot really answer in detail.
One possible reason for the situation you described is that some other thread (user or kernel) has higher priority then your thread and so the scheduler never picks it for running. If so, it is not probably a thread running in real time priority (SCHED_FIFO or SCHED_RR).
Hi I've looked around for an answer to this question and I am wondering if anyone with experience in windows internals knows if the kernel ever will assign a process id that is the same as a thread id. What I mean is say there is process a.exe that I have started that has a thread with id 123. If another process is started, for example b.exe, will the process id be 123? In other words do process and thread identifiers ever collide? Thanks
EDIT: It appears that process and thread ids come from the same pool called the PspCidTable. A hacker named Polynomial who reviewed the windows nt source says the following:
The kernel needs to be able to generate a sequence of process and
thread IDs that are unique across the whole system. To efficiently and
safely do this, the kernel creates a pool of IDs that can be used for
both processes and threads. This pool is exported in the kernel as a
HANDLE_TABLE object called PspCidTable. During Phase0 startup of the
system, the PspInitPhase0 function is called. This function creates a
HANDLE_TABLE object using ExCreateHandleTable, which automatically
populates the table with 65536 entires. Each entry is a 16-bit
unsigned integer (at least it is on a 32-bit OS) stored inside a list
item object that is part of a doubly linked list. Both process and
thread IDs come from the PspCidTable pool.
Source for above: Stuff you (probably) didn't know about Windows
The PspCidTable still exists in Windows XP and empirical observations in Windows 7 lead me to believe the above is still true.
Thread and process ids come from the same pool in all versions of windows AFAIK but that does not mean that this will be true forever. In practice it should not matter at all since you should only pass things that you know is thread id to OpenThread and vice versa.
Don't assume other things about these ids either (They are not 16 bit, they might seem like they are on NT but it is possible to get ids > 0xffff (On Win9x they are xor'ed with a secret and often use the full 32 bits))
The only weird thing you should keep in the back of your mind is that on 64 bit systems they are 32 bit in user mode and pointer sized in kernel mode (Use HandleToUlong/UlongToHandle)
I hit a bug in my code which uses WSARecv and WSAGetOverlapped result on an overlapped socket. Under heavy load, WSAGetOverlapped returns with WSASYSCALLFAILURE ('A system call that should never fail has failed') and my TCP stream is out of sync afterwards, causing mayhem in the upper levels of my program.
So far I have not been able to isolate it to a given set of hardware or drivers. Has somebody hit this issue as well, and found a solution or workaround?
How many connections, how many pending recvs, how many outsanding sends? What does perfmon or task manager say about the amount of non-paged pool used? How much memory in the box? Does it go away if you run the program on Vista or above? Do you have any LSPs installed?
You could be exhausting non-paged pool and causing a badly written driver to misbehave when it fails to allocate memory. This issue is less likely to bite on Vista or later as the amount of non-paged pool available has increased dramatically (see http://www.lenholgate.com/blog/2009/03/excellent-article-on-non-paged-pool.html for details). Alternatively you might be hitting the "locked pages" limit (you can only lock a fixed number of pages in memory on the OS and each pending I/O operation locks one or more pages depending on buffer size and allocation alignment).
It seems I have solved this issue by sleeping 1ms and retrying the WSAGetOverlapped result when it reports a WSASYSCALLFAILURE.
I had another issue related to overlapped events firing, even though there is no data, which I also had to solve first. The test is now running for over an hour, with a few WSASYSCALLFAILURE handled correctly. Hopefully the overnight test will succeed as well.
#Len: thanks again for your help.
EDIT: The overnight test was successful. My bug was caused by two interdependent issues:
Issue 1: WaitForMultipleObjects in ConnectionSet::select occasionally
signals data on an empty socket, causing SocketConnection::readSync to
deadlock.
Fix: Do a non-blocking read on the first byte of each packet. Reset
ConnectionSet if socket was empty
Issue 2: WSAGetOverlappedResult returns occasionally WSASYSCALLFAILURE,
causing out-of-sync on the TCP stream.
Fix: Retry WSAGetOverlappedResult after a small sleep period.
http://equalizer.svn.sourceforge.net/viewvc/equalizer?view=revision&revision=4649