How can I tell Windows XP/7 not to switch threads during a certain segment of my code? - windows

I want to prevent a thread switch by Windows XP/7 in a time critical part of my code that runs in a background thread. I'm pretty sure I can't create a situation where I can guarantee that won't happen, because of higher priority interrupts from system drivers, etc. However, I'd like to decrease the probability of a thread switch during that part of my code to the minimum that I can. Are there any create-thread flags or Window API calls that can assist me? General technique tips are appreciated too. If there is a way to get this done without having to raise the threads priority to real-time-critical that would be great, since I worry about creating system performance issues for the user if I do that.
UPDATE: I am adding this update after seeing the first responses to my original post. The concrete application that motivated the question has to do with real-time audio streaming. I want to eliminate every bit of delay I can. I found after coding up my original design that a thread switch can cause a 70ms or more delay at times. Since my app is between two sockets acting as a middleman for delivering audio, the instant I receive an audio buffer I want to immediately turn around and push it out the the destination socket. My original design used two cooperating threads and a semaphore since the there was one thread managing the source socket, and another thread for the destination socket. This architecture evolved from the fact the two devices behind the sockets are disparate entities.
I realized that if I combined the two sockets onto the same thread I could write a code block that reacted immediately to the socket-data-received message and turned it around to the destination socket in one shot. Now if I can do my best to avoid an intervening thread switch, that would be the optimal coding architecture for minimizing delay. To repeat, I know I can't guarantee this situation, but I am looking for tips/suggestions on how to write a block of code that does this and minimizes as best as I can the chance of an intervening thread switch.
Note, I am aware that O/S code behind the sockets introduces (potential) delays of its own.

AFAIK there are no such flags in CreateThread or etc (This also doesn't make sense IMHO). You may snooze other threads in your process from execution during in critical situations (by enumerating them and using SuspendThread), as well as you theoretically may enumerate & suspend threads in other processes.
OTOH snoozing threads is generally not a good idea, eventually you may call some 3rd-party code that would implicitly wait for something that should be accomplished in another threads, which you suspended.
IMHO - you should use what's suggested for the case - playing with thread/process priorities (also you may consider SetThreadPriorityBoost). Also the OS tends to raise the priority to threads that usually don't use CPU aggressively. That is, threads that work often but for short durations (before calling one of the waiting functions that suspend them until some condition) are considered to behave "nicely", and they get prioritized.

Related

How to set up a thread pool for an IOCP server with connections that may perform blocking operations

I'm currently working on a server application that's written in the proactor style, using select() + a dynamically sized thread pool (there's a simple mechanism based on keeping track of idle worker threads).
I need to modify it to use IOCP instead of select() on windows, and I'm wondering what the best way to utilize threads is.
For background information, the server has stateful, long-lived connections, and any request may require significant processing, and block. In fact, most requests call into customer-written code, which may block at will.
I've read that the OS can tell when an IOCP thread blocks, and unblock another one, but it doesn't look like there's any support for creating additional threads under heavy load, or if many of the threads are blocked.
I've read one site which suggested that you have a small, fixed-size thread pool which uses IOCP to deal with I/O only, which sends requests which can block to another, dynamically-sized thread pool. This seems non-optimal due to the additional thread synchronization required (although you can use IOCP as well for the tasks for the second thread pool), and the larger number of threads needed (extra context switching).
Is that the best way?
It sounds like what you've read is one of my articles on IOCP (most probably this one). That's likely a bit out of date now as the whole problem that it sort to avoid (that of I/O being cancelled if the thread that issued it exits before the I/O completes) is no longer a problem with any of Microsoft's currently supported OS's (it's only an issue on XP and before).
You're correct in noticing that my design from 2000/2002 was sub optimal from a context switching point of view; but it worked pretty well at the time, given the constraints of the underlying API.
On a modern OS there's no real advantage in having separate thread pools for I/O and blocking work. A more modern solution would probably involve dynamically expanding and reducing the number of I/O threads servicing the IOCP as required.
You'd need to track the number of IOCP threads that are active (i.e. not waiting on GetQueuedCompletionStatus() ) and spawn more when there are "too few". Likewise just as a thread is about to go back and wait on GQCS you could check to see if you have "too many" and if so, let it die instead.
I should probably update those articles.

Handling user interface in a multi-threaded application (or being forced to have a UI-only main thread)

In my application, I have a 'logging' window, which shows all the logging, warnings, errors of the application.
Last year my application was still single-threaded so this worked [quite] good.
Now I am introducing multithreading. I quickly noticed that it's not a good idea to update the logging window from different threads. Reading some articles on keeping the UI in the main thread, I made a communication buffer, in which the other threads are adding their logging messages, and from which the main thread takes the messages and shows them in the logging window (this is done in the message loop).
Now, in a part of my application, the memory usage increases dramatically, because the separate threads are generating lots of logging messages, and the main thread cannot empty the communication buffer quickly enough. After the while the memory decreases again (if the other threads have finished their work and the main thread gradually empties the communication buffer).
I solved this problem by having a maximum size on the communication buffer, but then I run into a problem in the following situation:
the main thread has to perform a complex action
the main thread takes some parts of the action and let's separate threads execute this
while the seperate threads are executing their logic, the main thread processes the results from the other threads and continues with its work if the other threads are finished
Problem is that in this situation, if the other threads perform logging, there is no UI-message loop, and so the communication buffer is filled, but not emptied.
I see two solutions in solving this problem:
require the main thread to do regular polling of the communication buffer
only performing user interface logic in the main thread (no other logic)
I think the second solution seems the best, but this may not that easy to introduce in a big application (in my case it performs mathematical simulations).
Are there any other solutions or tips?
Or is one of the two proposed the best, easiest, most-pragmatic solution?
Thanks,
Patrick
Let's make some order first.
you may not hold UI processing for any time U would have noticed, or he will be frustrated
you may still perform long operations in the UI thread. this is done by means of PeakMessage loop. If you design one or more proper peakmessage loops, you do not need multithreading, unless for performance optimization.
you may consider MsgWaitForSingleObject() loop instead of GetMessage if you want to communicate with threads efficiently (always better than polling)
Therefore, if you do not redesign your message loop
There's no way you can perform syncronous requests from other threads
You may design a separate thread for the logging
All non-UI logic will have to be elsewhere.
About the memory problem:
it is a bad design to have one thread able to allocate all memory if another thread is stuck. Such dependency is a clear recipe for a disaster.
If the buffer is limited, you need to decide what happens when it's overrun. You have two options - suspend the thread or discard the message.
UI code:
It is possible to design logger code that would display messages with incredible speed. Such designs are complicated, rely on sophisticated caching and arranging data for fast access, viewport management and rendering only the part that corresponds to actual pixels that the user is looking at.
For most applications it is just a gimmick, because users do not read very fast. Most of the time it is better to design a different approach to showing logs, perhaps a stateful UI to let user choose what is interesting to him at the moment. Spy++ for example, some sysinternals tools like regmon, filemon are incredibly fast in showing their own logs. You can have a look at their source code.

Is it better to poll or wait?

I have seen a question on why "polling is bad". In terms of minimizing the amount of processor time used by one thread, would it be better to do a spin wait (i.e. poll for a required change in a while loop) or wait on a kernel object (e.g. a kernel event object in windows)?
For context, assume that the code would be required to run on any type of processor, single core, hyperthreaded, multicore, etc. Also assume that a thread that would poll or wait can't continue until the polling result is satisfactory if it polled instead of waiting. Finally, the time between when a thread starts waiting (or polling) and when the condition is satisfied can potentially vary from a very short time to a long time.
Since the OS is likely to more efficiently "poll" in the case of "waiting", I don't want to see the "waiting just means someone else does the polling" argument, that's old news, and is not necessarily 100% accurate.
Provided the OS has reasonable implementations of these type of concurrency primitives, it's definitely better to wait on a kernel object.
Among other reasons, this lets the OS know not to schedule the thread in question for additional timeslices until the object being waited-for is in the appropriate state. Otherwise, you have a thread which is constantly getting rescheduled, context-switched-to, and then running for a time.
You specifically asked about minimizing the processor time for a thread: in this example the thread blocking on a kernel object would use ZERO time; the polling thread would use all sorts of time.
Furthermore, the "someone else is polling" argument needn't be true. When a kernel object enters the appropriate state, the kernel can look to see at that instant which threads are waiting for that object...and then schedule one or more of them for execution. There's no need for the kernel (or anybody else) to poll anything in this case.
Waiting is the "nicer" way to behave. When you are waiting on a kernel object your thread won't be granted any CPU time as it is known by the scheduler that there is no work ready. Your thread is only going to be given CPU time when it's wait condition is satisfied. Which means you won't be hogging CPU resources needlessly.
I think a point that hasn't been raised yet is that if your OS has a lot of work to do, blocking yeilds your thread to another process. If all processes use the blocking primitives where they should (such as kernel waits, file/network IO etc.) you're giving the kernel more information to choose which threads should run. As such, it will do more work in the same amount of time. If your application could be doing something useful while waiting for that file to open or the packet to arrive then yeilding will even help you're own app.
Waiting does involve more resources and means an additional context switch. Indeed, some synchronization primitives like CLR Monitors and Win32 critical sections use a two-phase locking protocol - some spin waiting is done fore actually doing a true wait.
I imagine doing the two-phase thing would be very difficult, and would involve lots of testing and research. So, unless you have the time and resources, stick to the windows primitives...they already did the research for you.
There are only few places, usually within the OS low-level things (interrupt handlers/device drivers) where spin-waiting makes sense/is required. General purpose applications are always better off waiting on some synchronization primitives like mutexes/conditional variables/semaphores.
I agree with Darksquid, if your OS has decent concurrency primitives then you shouldn't need to poll. polling usually comes into it's own on realtime systems or restricted hardware that doesn't have an OS, then you need to poll, because you might not have the option to wait(), but also because it gives you finegrain control over exactly how long you want to wait in a particular state, as opposed to being at the mercy of the scheduler.
Waiting (blocking) is almost always the best choice ("best" in the sense of making efficient use of processing resources and minimizing the impact to other code running on the same system). The main exceptions are:
When the expected polling duration is small (similar in magnitude to the cost of the blocking syscall).
Mostly in embedded systems, when the CPU is dedicated to performing a specific task and there is no benefit to having the CPU idle (e.g. some software routers built in the late '90s used this approach.)
Polling is generally not used within OS kernels to implement blocking system calls - instead, events (interrupts, timers, actions on mutexes) result in a blocked process or thread being made runnable.
There are four basic approaches one might take:
Use some OS waiting primitive to wait until the event occurs
Use some OS timer primitive to check at some defined rate whether the event has occurred yet
Repeatedly check whether the event has occurred, but use an OS primitive to yield a time slice for an arbitrary and unknown duration any time it hasn't.
Repeatedly check whether the event has occurred, without yielding the CPU if it hasn't.
When #1 is practical, it is often the best approach unless delaying one's response to the event might be beneficial. For example, if one is expecting to receive a large amount of serial port data over the course of several seconds, and if processing data 100ms after it is sent will be just as good as processing it instantly, periodic polling using one of the latter two approaches might be better than setting up a "data received" event.
Approach #3 is rather crude, but may in many cases be a good one. It will often waste more CPU time and resources than would approach #1, but it will in many cases be simpler to implement and the resource waste will in many cases be small enough not to matter.
Approach #2 is often more complicated than #3, but has the advantage of being able to handle many resources with a single timer and no dedicated thread.
Approach #4 is sometimes necessary in embedded systems, but is generally very bad unless one is directly polling hardware and the won't have anything useful to do until the event in question occurs. In many circumstances, it won't be possible for the condition being waited upon to occur until the thread waiting for it yields the CPU. Yielding the CPU as in approach #3 will in fact allow the waiting thread to see the event sooner than would hogging it.

Win32 Overlapped I/O - Completion routines or WaitForMultipleObjects?

I'm wondering which approach is faster and why ?
While writing a Win32 server I have read a lot about the Completion Ports and the Overlapped I/O, but I have not read anything to suggest which set of API's yields the best results in the server.
Should I use completion routines, or should I use the WaitForMultipleObjects API and why ?
You suggest two methods of doing overlapped I/O and ignore the third (or I'm misunderstanding your question).
When you issue an overlapped operation, a WSARecv() for example, you can specify an OVERLAPPED structure which contains an event and you can wait for that event to be signalled to indicate the overlapped I/O has completed. This, I assume, is your WaitForMultipleObjects() approach and, as previously mentioned, this doesn't scale well as you're limited to the number of handles that you can pass to WaitForMultipleObjects().
Alternatively you can pass a completion routine which is called when completion occurs. This is known as 'alertable I/O' and requires that the thread that issued the WSARecv() call is in an 'alertable' state for the completion routine to be called. Threads can put themselves in an alertable state in several ways (calling SleepEx() or the various EX versions of the Wait functions, etc). The Richter book that I have open in front of me says "I have worked with alertable I/O quite a bit, and I'll be the first to tell you that alertable I/O is horrible and should be avoided". Enough said IMHO.
There's a third way, before issuing the call you should associate the handle that you want to do overlapped I/O on with a completion port. You then create a pool of threads which service this completion port by calling GetQueuedCompletionStatus() and looping. You issue your WSARecv() with an OVERLAPPED structure WITHOUT an event in it and when the I/O completes the completion pops out of GetQueuedCompletionStatus() on one of your I/O pool threads and can be handled there.
As previously mentioned, Vista/Server 2008 have cleaned up how IOCPs work a little and removed the problem whereby you had to make sure that the thread that issued the overlapped request continued to run until the request completed. Link to a reference to that can be found here. But this problem is easy to work around anyway; you simply marshal the WSARecv over to one of your I/O pool threads using the same IOCP that you use for completions...
Anyway, IMHO using IOCPs is the best way to do overlapped I/O. Yes, getting your head around the overlapped/async nature of the calls can take a little time at the start but it's well worth it as the system scales very well and offers a simple "fire and forget" method of dealing with overlapped operations.
If you need some sample code to get you going then I have several articles on writing IO completion port systems and a heap of free code that provides a real-world framework for high performance servers; see here.
As an aside; IMHO, you really should read "Windows Via C/C++ (PRO-Developer)" by Jeffrey Richter and Christophe Nasarre as it deals will all you need to know about overlapped I/O and most other advanced windows platform techniques and APIs.
WaitForMultipleObjects is limited to 64 handles; in a highly concurrent application this could become a limitation.
Completion ports fit better with a model of having a pool of threads all of which are capable of handling any event, and you can queue your own (non-IO based) events into the port, whereas with waits you would need to code your own mechanism.
However completion ports, and the event based programming model, are a more difficult concept to really work against.
I would not expect any significant performance difference, but in the end you can only make your own measurements to reflect your usage. Note that Vista/Server2008 made a change with completion ports that the originating thread is not now needed to complete IO operations, this may make a bigger difference (see this article by Mark Russinovich).
Table 6-3 in the book Network Programming for Microsoft Windows, 2nd Edition compares the scalability of overlapped I/O via completion ports vs. other techniques. Completion ports blow all the other I/O models out of the water when it comes to throughput, while using far fewer threads.
The difference between WaitForMultipleObjects() and I/O completion ports is that IOCP scales to thousands of objects, whereas WFMO() does not and should not be used for anything more than 64 objects (even though you could).
You can't really compare them for performance, because in the domain of < 64 objects, they will be essentially identical.
WFMO() however does a round-robin on its objects, so busy objects with low index numbers can starve objects with high index numbers. (E.g. if object 0 is going off constantly, it will starve objects 1, 2, 3, etc). This is obviously undesireable.
I wrote an IOCP library (for sockets) to solve the C10K problem and put it in the public domain. I was able on a 512mb W2K machine to get 4,000 sockets concurrently transferring data. (You can get a lot more sockets, if they're idle - a busy socket consumes more non-paged pool and that's the ultimate limit on how many sockets you can have).
http://www.45mercystreet.com/computing/libiocp/index.html
The API should give you exactly what you need.
Not sure. But I use WaitForMultipleObjects and/or WaitFoSingleObjects. It's very convenient.
Either routine works and I don't really think one is any significant faster then another.
These two approaches exists to satisfy different programming models.
WaitForMultipleObjects is there to facilitate async completion pattern (like UNIX select() function) while completion ports is more towards event driven model.
I personally think WaitForMultipleObjects() approach result in cleaner code and more thread safe.

Best practice regarding number of threads in GUI applications

In the past I've worked with a number of programmers who have worked exclusively writing GUI applications.
And I've been given the impression that they have almost universally minimised the use of multiple threads in their applications. In some cases they seem to have gone to extreme lengths to ensure that they use a single thread.
Is this common? Is this the generally accepted philosophy for gui application design?
And if so, why?
[edit]
There are a number of answers saying that thread usage should be minimised to reduce complexity. Reducing complexity in general is a good thing.
But if you look at any number of applications where response to external events is of paramount importance (eg. web servers, any number of embedded applications) there seems to be a world of difference in the attitude toward thread usage.
Generally speaking, GUI frameworks aren't thread safe. For things like Swing(Java's GUI API), only one thread can be updating the UI (or bad things can happen). Only one thread handles dispatching events. If you have multiple threads updating the screen, you can get some ugly flicker and incorrect drawing.
That doesn't mean the application needs to be single threaded, however. There are certainly circumstances when you don't want this to be the case. If you click on a button that calculates pi to 1000 digits, you don't want the UI to be locked up and the button to be depressed for the next couple of days. This is when things like SwingWorker come in handy. It has two parts a doInBackground() which runs in a seperate thread and a done() that gets called by the thread that handles updating the UI sometime after the doInBackground thread has finished. This allows events to be handled quickly, or events that would take a long time to process in the background, while still having the single thread updating the screen.
I think in terms of windows you are limited to all GUI operations happening on a single thread - because of the way the windows message pump works, to increase responsivness most apps add at least one additional worker thread for longer running tasks that would otherwise block and make the ui unresponsive.
Threading is fundamentally hard and so thinking in terms or more than a couple threads can often result in a lot of debugging effort - there is a quote that escapes me right now that goes something like - "if you think you understand threading then you really dont"
I've seen the same thing. Ideally you should perform any operation that is going to take longer then a few hundred ms in a background thread. Anything sorter than 100ms and a human probably wont notice the difference.
A lot of GUI programmers I've worked with in the past are scared of threads because they are "hard". In some GUI frameworks such as the Delphi VCL there are warnings about using the VCL from multiple threads, and this tends to scare some people (others take it as a challenge ;) )
One interesting example of multi-threaded GUI coding is the BeOS API. Every window in an application gets its own thread. From my experience this made BeOS apps feel more responsive, but it did make programming things a little more tricky. Fortunately since BeOS was designed to be multi-threaded by default there was a lot of stuff in the API to make things easier than on some other OSs I've used.
Most GUI frameworks are not thread safe, meaning that all controls have to me accessed from the same thread that created them. Still, it's a good practice to create worker threads to have responsive applications, but you need to be careful to delegate GUI updates to the GUI thread.
Yes.
GUI applications should minimize the the number of threads that they use for the following reasons:
Thread programming is very hard and complicated
In general, GUI applications do at most 2 things at once : a) Respond to User Input, and b) Perform a background task (such as load in data) in response to a user action or an anticipated user action
In general therefore, the added complexity of using multiple threads is not justified by the needs of the application.
There are of course exceptions to the rule.
GUIs generally don't use a whole lot of threads, but they often do throw off another thread for interacting with certain sub-systems especially if those systems take awhile or are very shared resources.
For example, if you're going to print, you'll often want to throw off another thread to interact with the printer pool as it may be very busy for awhile and there's no reason not to keep working.
Another example would be database loads where you're interacting with SQL server or something like that and because of the latency involved you may want to create another thread so your main UI processing thread can continue to respond to commands.
The more threads you have in an application, (generally) the more complex the solution is. By attempting to minimise the number of threads being utilised within a GUI, there are less potential areas for problems.
The other issue is the biggest problem in GUI design: the human. Humans are notorious in their inability to want to do multiple things at the same time. Users have a habit of clicking multiple butons/controls in quick sucession in order to attempt to get something done quicker. Computers cannot generally keep up with this (this is only componded by the GUIs apparent ability to keep up by using multiple threads), so to minimise this effect GUIs will respond to input on a first come first serve basis on a single thread. By doing this, the GUI is forced to wait until system resorces are free untill it can move on. Therefore elimating all the nasty deadlock situations that can arise. Obviously if the program logic and the GUI are on different threads, then this goes out the window.
From a personal preference, I prefer to keep things simple on one thread but not to the detriment of the responsivness of the GUI. If a task is taking too long, then Ill use a different thread, otherwise Ill stick to just one.
As the prior comments said, GUI Frameworks (at least on Windows) are single threaded, thus the single thread. Another recommendation (that is difficult to code in practice) is to limit the number of the threads to the number of available cores on the machine. Your CPU can only do one operation at a time with one core. If there are two threads, a context switch has to happen at some point. If you've got too many threads, the computer can sometimes spend more time swapping between threads than letting threads work.
As Moore's Law changes with more cores, this will change and hopefully programming frameworks will evolve to help us use threads more effectively, depending on the number of cores available to the program, such as the TPL.
Generally all the windowing messages from the window manager / OS will go to a single queue so its natural to have all UI elements in a single thread. Some frameworks, such as .Net, actually throw exceptions if you attempt to directly access UI elements from a thread other than the thread that created it.

Resources