Increasing stack space with setrlimit() on a multi-threaded application with split stacks

Increasing stack space with setrlimit() on a multi-threaded application with split stacks - gcc

I'm thinking of developing my own work-stealing scheduler, and one of the issues that needs to be solved is the possibility of stack overflows. These occur only on infrequent cases where one worker continuously steals tasks from the others e.g.:
steal();
work();
steal();
work();
steal();
...
Several techniques can be used to avoid this pattern, however simply increasing stack space is probably the best option as it allows for other optimizations. On single threaded applications this can be done with a call to setrlimit() but with multiple threads it has no effect (unless called from the main thread).
This behavior is possibly related to stacks having a fixed size across multiple threads. However with split-stacks (implemented on GCC 4.6.0+) this restriction is no longer true.
My question is whether the call to setrlimit() simply works with split-stacks, or in the negative case if one can call the underlying brk()/mmap()/sbrk() and do it manually.

In a hackish way, I guess I could use pthread_attr_setstacksize()/pthread_create()/pthread_join() to create a new thread and do all the the work inside it, this however as the unnecessary overhead of thread creation/scheduling.

Related

How can I tell Windows XP/7 not to switch threads during a certain segment of my code?

I want to prevent a thread switch by Windows XP/7 in a time critical part of my code that runs in a background thread. I'm pretty sure I can't create a situation where I can guarantee that won't happen, because of higher priority interrupts from system drivers, etc. However, I'd like to decrease the probability of a thread switch during that part of my code to the minimum that I can. Are there any create-thread flags or Window API calls that can assist me? General technique tips are appreciated too. If there is a way to get this done without having to raise the threads priority to real-time-critical that would be great, since I worry about creating system performance issues for the user if I do that.
UPDATE: I am adding this update after seeing the first responses to my original post. The concrete application that motivated the question has to do with real-time audio streaming. I want to eliminate every bit of delay I can. I found after coding up my original design that a thread switch can cause a 70ms or more delay at times. Since my app is between two sockets acting as a middleman for delivering audio, the instant I receive an audio buffer I want to immediately turn around and push it out the the destination socket. My original design used two cooperating threads and a semaphore since the there was one thread managing the source socket, and another thread for the destination socket. This architecture evolved from the fact the two devices behind the sockets are disparate entities.
I realized that if I combined the two sockets onto the same thread I could write a code block that reacted immediately to the socket-data-received message and turned it around to the destination socket in one shot. Now if I can do my best to avoid an intervening thread switch, that would be the optimal coding architecture for minimizing delay. To repeat, I know I can't guarantee this situation, but I am looking for tips/suggestions on how to write a block of code that does this and minimizes as best as I can the chance of an intervening thread switch.
Note, I am aware that O/S code behind the sockets introduces (potential) delays of its own.

AFAIK there are no such flags in CreateThread or etc (This also doesn't make sense IMHO). You may snooze other threads in your process from execution during in critical situations (by enumerating them and using SuspendThread), as well as you theoretically may enumerate & suspend threads in other processes.
OTOH snoozing threads is generally not a good idea, eventually you may call some 3rd-party code that would implicitly wait for something that should be accomplished in another threads, which you suspended.
IMHO - you should use what's suggested for the case - playing with thread/process priorities (also you may consider SetThreadPriorityBoost). Also the OS tends to raise the priority to threads that usually don't use CPU aggressively. That is, threads that work often but for short durations (before calling one of the waiting functions that suspend them until some condition) are considered to behave "nicely", and they get prioritized.

Handling user interface in a multi-threaded application (or being forced to have a UI-only main thread)

In my application, I have a 'logging' window, which shows all the logging, warnings, errors of the application.
Last year my application was still single-threaded so this worked [quite] good.
Now I am introducing multithreading. I quickly noticed that it's not a good idea to update the logging window from different threads. Reading some articles on keeping the UI in the main thread, I made a communication buffer, in which the other threads are adding their logging messages, and from which the main thread takes the messages and shows them in the logging window (this is done in the message loop).
Now, in a part of my application, the memory usage increases dramatically, because the separate threads are generating lots of logging messages, and the main thread cannot empty the communication buffer quickly enough. After the while the memory decreases again (if the other threads have finished their work and the main thread gradually empties the communication buffer).
I solved this problem by having a maximum size on the communication buffer, but then I run into a problem in the following situation:
the main thread has to perform a complex action
the main thread takes some parts of the action and let's separate threads execute this
while the seperate threads are executing their logic, the main thread processes the results from the other threads and continues with its work if the other threads are finished
Problem is that in this situation, if the other threads perform logging, there is no UI-message loop, and so the communication buffer is filled, but not emptied.
I see two solutions in solving this problem:
require the main thread to do regular polling of the communication buffer
only performing user interface logic in the main thread (no other logic)
I think the second solution seems the best, but this may not that easy to introduce in a big application (in my case it performs mathematical simulations).
Are there any other solutions or tips?
Or is one of the two proposed the best, easiest, most-pragmatic solution?
Thanks,
Patrick

Let's make some order first.
you may not hold UI processing for any time U would have noticed, or he will be frustrated
you may still perform long operations in the UI thread. this is done by means of PeakMessage loop. If you design one or more proper peakmessage loops, you do not need multithreading, unless for performance optimization.
you may consider MsgWaitForSingleObject() loop instead of GetMessage if you want to communicate with threads efficiently (always better than polling)
Therefore, if you do not redesign your message loop
There's no way you can perform syncronous requests from other threads
You may design a separate thread for the logging
All non-UI logic will have to be elsewhere.
About the memory problem:
it is a bad design to have one thread able to allocate all memory if another thread is stuck. Such dependency is a clear recipe for a disaster.
If the buffer is limited, you need to decide what happens when it's overrun. You have two options - suspend the thread or discard the message.
UI code:
It is possible to design logger code that would display messages with incredible speed. Such designs are complicated, rely on sophisticated caching and arranging data for fast access, viewport management and rendering only the part that corresponds to actual pixels that the user is looking at.
For most applications it is just a gimmick, because users do not read very fast. Most of the time it is better to design a different approach to showing logs, perhaps a stateful UI to let user choose what is interesting to him at the moment. Spy++ for example, some sysinternals tools like regmon, filemon are incredibly fast in showing their own logs. You can have a look at their source code.

Are there any concurrent algorithms that in use that work correctly without any synchronization?

All of the concurrent programs I've seen or heard details of (admittedly a small set) at some point use hardware synchronization features, generally some form of compare-and-swap. The question is: are there any concurrent programs in the wild where the thread interact throughout there life and get away without any synchronization?
Example of what I'm thinking of include:
A program that amounts to a single thread running a yes/no test on a large set of cases and a big pile of threads tagging cases based on a maybe/no tests. This doesn't need synchronization because dirty data will only effect performance rather than correctness.
A program that has many threads updating a data structure where any state that is valid now, will always be valid, so dirty reads or writes don't invalidate anything. An example of this is (I think) path compression in the union-find algorithm.

If you can break work up into completely independent chunks, then yes there are concurrent algorithms whose only synchronisation point is the one at the end of the work where all threads join. Parallel speedup is then a factor of being able to break into tasks whose sizes are as similiar as possible.

Some indirect methods for solving systems of linear equations, like Successive over-relaxation ( http://en.wikipedia.org/wiki/Successive_over-relaxation ), don't really need the iterations to be synchronized.

I think it's a bit trick question because e.g. if you program in C, malloc() must be multi-thread safe and uses hardware synchronization, and in Java the garbage collector requires hardware synchronization anyway. All Java programs require the GC, and hardly any C program makes it without malloc() (or C++ program / new() operator).

There is a whole class of algorithms which are sometimes referred to as "embarallel" (contraction of "embarrassingly parallel"). Many image processing algorithms fall into this class, where each pixel may be processed independently (which makes implementation with e.g. SIMD or GPGPU very straightforward).

Well, without any synchronization at all (even at the end of the algorithm) you obviously can't do anything useful because you can't even transfer the results of concurrent computations to the main thread: suppose that they were on remote machines without any communication channels to the main machine.

The simplest example is inside java.lang.String which is immutable and lazily caches its hash code. This cache is written to without synchronization because (a) its cheaper, (b) the value is recomputable, and (c) JVM guarantees no tearing. The tolerance of data races in purely functional contexts allows tricks like this to be used safely without explicit synchronization.

I agree with Mitch's answer. I would like to add that the ray tracing algorithm can work without synchronization until the point where all threads join.

What to avoid for performance reasons in multithreaded code?

I'm currently reviewing/refactoring a multithreaded application which is supposed to be multithreaded in order to be able to use all the available cores and theoretically deliver a better / superior performance (superior is the commercial term for better :P)
What are the things I should be aware when programming multithreaded applications?
I mean things that will greatly impact performance, maybe even to the point where you don't gain anything with multithreading at all but lose a lot by design complexity. What are the big red flags for multithreading applications?
Should I start questioning the locks and looking to a lock-free strategy or are there other points more important that should light a warning light?
Edit: The kind of answers I'd like are similar to the answer by Janusz, I want red warnings to look up in code, I know the application doesn't perform as well as it should, I need to know where to start looking, what should worry me and where should I put my efforts. I know it's kind of a general question but I can't post the entire program and if I could choose one section of code then I wouldn't be needing to ask in the first place.
I'm using Delphi 7, although the application will be ported / remake in .NET (c#) for the next year so I'd rather hear comments that are applicable as a general practice, and if they must be specific to either one of those languages

One thing to definitely avoid is lots of write access to the same cache lines from threads.
For example: If you use a counter variable to count the number of items processed by all threads, this will really hurt performance because the CPU cache lines have to synchronize whenever the other CPU writes to the variable.

One thing that decreases performance is having two threads with much hard drive access. The hard drive would jump from providing data for one thread to the other and both threads would wait for the disk all the time.

Something to keep in mind when locking: lock for as short a time as possible. For example, instead of this:
lock(syncObject)
{
bool value = askSomeSharedResourceForSomeValue();
if (value)
DoSomethingIfTrue();
else
DoSomtehingIfFalse();
}
Do this (if possible):
bool value = false;
lock(syncObject)
{
value = askSomeSharedResourceForSomeValue();
}
if (value)
DoSomethingIfTrue();
else
DoSomtehingIfFalse();
Of course, this example only works if DoSomethingIfTrue() and DoSomethingIfFalse() don't require synchronization, but it illustrates this point: locking for as short a time as possible, while maybe not always improving your performance, will improve the safety of your code in that it reduces surface area for synchronization problems.
And in certain cases, it will improve performance. Staying locked for long lengths of time means that other threads waiting for access to some resource are going to be waiting longer.

More threads then there are cores, typically means that the program is not performing optimally.
So a program which spawns loads of threads usually is not designed in the best fashion. A good example of this practice are the classic Socket examples where every incoming connection got it's own thread to handle of the connection. It is a very non scalable way to do things. The more threads there are, the more time the OS will have to use for context switching between threads.

You should first be familiar with Amdahl's law.
If you are using Java, I recommend the book Java Concurrency in Practice; however, most of its help is specific to the Java language (Java 5 or later).
In general, reducing the amount of shared memory increases the amount of parallelism possible, and for performance that should be a major consideration.
Threading with GUI's is another thing to be aware of, but it looks like it is not relevant for this particular problem.

What kills performance is when two or more threads share the same resources. This could be an object that both use, or a file that both use, a network both use or a processor that both use. You cannot avoid these dependencies on shared resources but if possible, try to avoid sharing resources.

Run-time profilers may not work well with a multi-threaded application. Still, anything that makes a single-threaded application slow will also make a multi-threaded application slow. It may be an idea to run your application as a single-threaded application, and use a profiler, to find out where its performance hotspots (bottlenecks) are.
When it's running as a multi-threaded aplication, you can use the system's performance-monitoring tool to see whether locks are a problem. Assuming that your threads would lock instead of busy-wait, then having 100% CPU for several threads is a sign that locking isn't a problem. Conversely, something that looks like 50% total CPU utilitization on a dual-processor machine is a sign that only one thread is running, and so maybe your locking is a problem that's preventing more than one concurrent thread (when counting the number of CPUs in your machine, beware multi-core and hyperthreading).
Locks aren't only in your code but also in the APIs you use: e.g. the heap manager (whenever you allocate and delete memory), maybe in your logger implementation, maybe in some of the O/S APIs, etc.
Should I start questioning the locks and looking to a lock-free strategy
I always question the locks, but have never used a lock-free strategy; instead my ambition is to use locks where necessary, so that it's always threadsafe but will never deadlock, and to ensure that locks are acquired for a tiny amount of time (e.g. for no more than the amount of time it takes to push or pop a pointer on a thread-safe queue), so that the maximum amount of time that a thread may be blocked is insignificant compared to the time it spends doing useful work.

You don't mention the language you're using, so I'll make a general statement on locking. Locking is fairly expensive, especially the naive locking that is native to many languages. In many cases you are reading a shared variable (as opposed to writing). Reading is threadsafe as long as it is not taking place simultaneously with a write. However, you still have to lock it down. The most naive form of this locking is to treat the read and the write as the same type of operation, restricting access to the shared variable from other reads as well as writes. A read/writer lock can dramatically improve performance. One writer, infinite readers. On an app I've worked on, I saw a 35% performance improvement when switching to this construct. If you are working in .NET, the correct lock is the ReaderWriterLockSlim.

I recommend looking into running multiple processes rather than multiple threads within the same process, if it is a server application.
The benefit of dividing the work between several processes on one machine is that it is easy to increase the number of servers when more performance is needed than a single server can deliver.
You also reduce the risks involved with complex multithreaded applications where deadlocks, bottlenecks etc reduce the total performance.
There are commercial frameworks that simplifies server software development when it comes to load balancing and distributed queue processing, but developing your own load sharing infrastructure is not that complicated compared with what you will encounter in general in a multi-threaded application.

I'm using Delphi 7
You might be using COM objects, then, explicitly or implicitly; if you are, COM objects have their own complications and restrictions on threading: Processes, Threads, and Apartments.

You should first get a tool to monitor threads specific to your language, framework and IDE. Your own logger might do fine too (Resume Time, Sleep Time + Duration). From there you can check for bad performing threads that don't execute much or are waiting too long for something to happen, you might want to make the event they are waiting for to occur as early as possible.
As you want to use both cores you should check the usage of the cores with a tool that can graph the processor usage on both cores for your application only, or just make sure your computer is as idle as possible.
Besides that you should profile your application just to make sure that the things performed within the threads are efficient, but watch out for premature optimization. No sense to optimize your multiprocessing if the threads themselves are performing bad.
Looking for a lock-free strategy can help a lot, but it is not always possible to get your application to perform in a lock-free way.

Threads don't equal performance, always.
Things are a lot better in certain operating systems as opposed to others, but if you can have something sleep or relinquish its time until it's signaled...or not start a new process for virtually everything, you're saving yourself from bogging the application down in context switching.

Best practice regarding number of threads in GUI applications

In the past I've worked with a number of programmers who have worked exclusively writing GUI applications.
And I've been given the impression that they have almost universally minimised the use of multiple threads in their applications. In some cases they seem to have gone to extreme lengths to ensure that they use a single thread.
Is this common? Is this the generally accepted philosophy for gui application design?
And if so, why?
[edit]
There are a number of answers saying that thread usage should be minimised to reduce complexity. Reducing complexity in general is a good thing.
But if you look at any number of applications where response to external events is of paramount importance (eg. web servers, any number of embedded applications) there seems to be a world of difference in the attitude toward thread usage.

Generally speaking, GUI frameworks aren't thread safe. For things like Swing(Java's GUI API), only one thread can be updating the UI (or bad things can happen). Only one thread handles dispatching events. If you have multiple threads updating the screen, you can get some ugly flicker and incorrect drawing.
That doesn't mean the application needs to be single threaded, however. There are certainly circumstances when you don't want this to be the case. If you click on a button that calculates pi to 1000 digits, you don't want the UI to be locked up and the button to be depressed for the next couple of days. This is when things like SwingWorker come in handy. It has two parts a doInBackground() which runs in a seperate thread and a done() that gets called by the thread that handles updating the UI sometime after the doInBackground thread has finished. This allows events to be handled quickly, or events that would take a long time to process in the background, while still having the single thread updating the screen.

I think in terms of windows you are limited to all GUI operations happening on a single thread - because of the way the windows message pump works, to increase responsivness most apps add at least one additional worker thread for longer running tasks that would otherwise block and make the ui unresponsive.
Threading is fundamentally hard and so thinking in terms or more than a couple threads can often result in a lot of debugging effort - there is a quote that escapes me right now that goes something like - "if you think you understand threading then you really dont"

I've seen the same thing. Ideally you should perform any operation that is going to take longer then a few hundred ms in a background thread. Anything sorter than 100ms and a human probably wont notice the difference.
A lot of GUI programmers I've worked with in the past are scared of threads because they are "hard". In some GUI frameworks such as the Delphi VCL there are warnings about using the VCL from multiple threads, and this tends to scare some people (others take it as a challenge ;) )
One interesting example of multi-threaded GUI coding is the BeOS API. Every window in an application gets its own thread. From my experience this made BeOS apps feel more responsive, but it did make programming things a little more tricky. Fortunately since BeOS was designed to be multi-threaded by default there was a lot of stuff in the API to make things easier than on some other OSs I've used.

Most GUI frameworks are not thread safe, meaning that all controls have to me accessed from the same thread that created them. Still, it's a good practice to create worker threads to have responsive applications, but you need to be careful to delegate GUI updates to the GUI thread.

Yes.
GUI applications should minimize the the number of threads that they use for the following reasons:
Thread programming is very hard and complicated
In general, GUI applications do at most 2 things at once : a) Respond to User Input, and b) Perform a background task (such as load in data) in response to a user action or an anticipated user action
In general therefore, the added complexity of using multiple threads is not justified by the needs of the application.
There are of course exceptions to the rule.

GUIs generally don't use a whole lot of threads, but they often do throw off another thread for interacting with certain sub-systems especially if those systems take awhile or are very shared resources.
For example, if you're going to print, you'll often want to throw off another thread to interact with the printer pool as it may be very busy for awhile and there's no reason not to keep working.
Another example would be database loads where you're interacting with SQL server or something like that and because of the latency involved you may want to create another thread so your main UI processing thread can continue to respond to commands.

The more threads you have in an application, (generally) the more complex the solution is. By attempting to minimise the number of threads being utilised within a GUI, there are less potential areas for problems.
The other issue is the biggest problem in GUI design: the human. Humans are notorious in their inability to want to do multiple things at the same time. Users have a habit of clicking multiple butons/controls in quick sucession in order to attempt to get something done quicker. Computers cannot generally keep up with this (this is only componded by the GUIs apparent ability to keep up by using multiple threads), so to minimise this effect GUIs will respond to input on a first come first serve basis on a single thread. By doing this, the GUI is forced to wait until system resorces are free untill it can move on. Therefore elimating all the nasty deadlock situations that can arise. Obviously if the program logic and the GUI are on different threads, then this goes out the window.
From a personal preference, I prefer to keep things simple on one thread but not to the detriment of the responsivness of the GUI. If a task is taking too long, then Ill use a different thread, otherwise Ill stick to just one.

As the prior comments said, GUI Frameworks (at least on Windows) are single threaded, thus the single thread. Another recommendation (that is difficult to code in practice) is to limit the number of the threads to the number of available cores on the machine. Your CPU can only do one operation at a time with one core. If there are two threads, a context switch has to happen at some point. If you've got too many threads, the computer can sometimes spend more time swapping between threads than letting threads work.
As Moore's Law changes with more cores, this will change and hopefully programming frameworks will evolve to help us use threads more effectively, depending on the number of cores available to the program, such as the TPL.

Generally all the windowing messages from the window manager / OS will go to a single queue so its natural to have all UI elements in a single thread. Some frameworks, such as .Net, actually throw exceptions if you attempt to directly access UI elements from a thread other than the thread that created it.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio