Is the following algorithm lock-free? - parallel-processing

In my understanding, an algorithm is lock free if some thread always makes progress.
Now isn't this condition met by the following example (using locks):
Thread 1: using lock1 for critical section
Thread 2: using lock1 for critical section
Thread 3: some call independent of lock1 and the other two threads in general

What you describe certainly isn't "lock-free" in any meaningful sense. Thread 1 and 2 are dependent on the same shared resource (a critical section, enforced by "lock1"), so if one of them acquires that lock and then gets suspended, then the other thread will deadlock waiting on it.
Sure, Thread 3 will continue on unabated, but so will Thread A, B, and C (i.e., threads that have nothing to do with your program). That seems to be irrelevant when looking at the algorithm as a whole, which presumably consists of all three threads.
Essentially, the problem with your definition of "lock-free" is that it isn't precise enough. You say that an algorithm is lock-free "if some thread always makes progress", but you fail to look at the algorithm as a whole.
Thread 3 is trivially lock-free (because it makes progress on its own, without the assistance of any other threads), but your overall algorithm is not.
A better way to assess whether an algorithm is lock-free is asking yourself: if a single thread gets suspended, will this prevent the other threads from making progress as a group? In this case, yes. If Thread 1 or Thread 2 acquires the lock and then gets suspended, this will prevent the other thread from executing, and presumably these other threads are doing work that is important to your algorithm.
I recommend Jeff Preshing's blog post, "An Introduction to Lock-Free Programming". Here's a handy flow chart from that blog post:

Related

Go goroutine under the hood

I'm trying to understand golang architecture and what "lightweight thread" means. I've already read something, but want to ask question to clarify it.
Am I right if I'll say what "go" keyword under the hood just puts following function in queue of inner thread pool, but for user it looks like creation of thread?
This is copied from the Go FAQ:
Why goroutines instead of threads?
Goroutines are part of making concurrency easy to use. The idea, which has been around for a while, is to multiplex independently executing functions—coroutines—onto a set of threads. When a coroutine blocks, such as by calling a blocking system call, the run-time automatically moves other coroutines on the same operating system thread to a different, runnable thread so they won't be blocked. The programmer sees none of this, which is the point. The result, which we call goroutines, can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes.
What's lacking here is the definition of thread. If we resort to Wikipedia, we find:
In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, ...
but that's just a description of, well, the same thing that a goroutine is. The problem here is that the word thread tends to refer to kernel thread and/or user thread (both defined on that same Wikipedia page) and these threads are heavier-weight than the goroutine threads. Which brings us right back to this:
I'm trying to understand golang architecture and what "lightweight thread" means ...
To cut to the chase, this means "lighter than the OS-provided ones". That's really all it means. There are OS-provided threads (on multiple OSes on which Go runs), but they generally do too much and cost too much to switch between so Go provides its own language-level ones that it calls "goroutines" that are much lighter.
From comments:
Why need to move tasks from one thread to another by some planner ...
This is an implementation detail, which involves another aspect of the OS-provided kernel threads:
I can't understand how [a goroutine] can be preempted if single thread process [is] blocked by [a] system call to read [a] long file
The current Go runtime goroutine / thread / processor scheduler (see What is relationship between goroutine and thread in kernel and user state and note that there have been more than just the current implementation) predicts that some system call will block, and makes sure to assign that system call its own OS-level kernel thread (see also JimB's comment). These threads do not count against the GOMAXPROCS setting. This is in fact sometimes a problem, as it's possible for the Go runtime to try to spin off more threads than the OS allows: it might be nice if there were a system-call-thread-pool here (though there are also obvious problems with this).
So, the current runtime creates up to GOMAXPROCS kernel-style OS-level threads and uses those to multiplex up to that many goroutines onto the CPUs, but creates extra kernel-style OS-level threads whenever it wants to. As the blog post linked in the question above notes, the P entities act as queues to hold goroutines (Gs) on a per-processor basis for localized cache lookup (remember that on some systems, especially NUMA ones, it's expensive to reach out "across" CPUs: the scheduler is still willing to do this, but won't do it too often, for some definition of "too often").
Earlier versions of the current scheduler required explicit yields (runtime.Gosched()) calls or various other runtime operations to cause a switch from the current goroutine to some other goroutine. See What exactly does runtime.Gosched do? for example. In Go 1.14, some OSes provide automatic goroutine preemption; see Will Go's scheduler yield control from one goroutine to another for CPU-intensive work?

Why are goroutines much cheaper than threads in other languages?

In his talk - https://blog.golang.org/concurrency-is-not-parallelism, Rob Pike says that go routines are similar to threads but much much cheaper. Can someone explain why?
See "How goroutines work".
They are cheaper in:
memory consumption:
A thread starts with a large memory as opposed to a few Kb.
Setup and teardown costs
(That is why you have to maintain a pool of thread)
Switching costs
Threads are scheduled preemptively, and during a thread switch, the scheduler needs to save/restore ALL registers.
As opposed to Go where the the runtime manages the goroutines throughout from creation to scheduling to teardown. And the number of registers to save is lower.
Plus, as mentioned in "Go’s march to low-latency GC", a GC is easier to implement when the runtime is in charge of managing goroutines:
Since the introduction of its concurrent GC in Go 1.5, the runtime has kept track of whether a goroutine has executed since its stack was last scanned. The mark termination phase would check each goroutine to see whether it had recently run, and would rescan the few that had.
In Go 1.7, the runtime maintains a separate short list of such goroutines. This removes the need to look through the entire list of goroutines while user code is paused, and greatly reduces the number of memory accesses that can trigger the kernel’s NUMA-related memory migration code.

Could process running only on one processor have threads running on other processors?

Is it possible, in multiprocessor environment (PC) that one windows process is configured to run only on one processor (affinity mask = 1 or SetProcessAffinityMask(GetCurrentProcess(),1)), but its thread are spawned on other processors?
(Question came from discussion started in one company, regarding using synchronization objects (Events, Mutexes, Semaphores) and WinAPIs, like WaitForSignleObject, etc, especially SignalObjectAndWait for which MSDN states
"Note that the "signal" and "wait" are not guaranteed to be performed
as an atomic operation. Threads executing on other processors can
observe the signaled state of the first object before the thread
calling SignalObjectAndWait begins its wait on the second object"
Does it mean that for single processor it's guaranteed to be atomic?
P.S. Is there any differences for Windows Context Switching that there are multiple processors or single processor with more real cores?
P.P.S. Please be patient with this question if I didn't use exact and concrete terms - this are is still not very good known for me.
No.
The set of processor cores a thread can run on is the intersection of the process affinity mask and the thread affinity mask.
To get the behavior you describe, one would set the thread affinity mask for the main thread, and not mess with the process mask.
For your followup questions: If it isn't atomic, it isn't atomic. There are additional guarantees for threads sharing a core, because preemption follows certain rules, but they are very complex, since relative priority and dynamic priority are important factors in thread scheduling. Because of the complexity, it is best to use proper synchronization.
Notably, race conditions between threads of equal priority certainly still exist on a single core (or single core restricted) system, but they are far less frequent and therefore far more difficult to find and debug.
Is it possible, in multiprocessor environment (PC) that one windows process is configured to run only on one processor (affinity mask = 1 or SetProcessAffinityMask(GetCurrentProcess(),1)), but its thread are spawned on other processors?
If not set cpu affinity to only one core, one process could run on multiple cores?
What's the difference between processes and threads?
Thread could have processes or process could have threads?
Could process seen from a thread point of view or vice verse?
What is atomic notion?
when number 1 could seen as multidimensional unit?
Could we divide 1/0 (to zero)? When could we or couldn't?
Does it mean that for single processor it's guaranteed to be atomic?
One cpu: do you remember: run and stay resident? Good old time!
Then Unix: multiprocessing, multithreading, etc. :)
Note:
You couldn't ask a question without knowing answer to that question.
Try to ask something you don't know, that's impossible! You're asking because you have an answer. Look inside your question. Answer is evident. :)

Handling user interface in a multi-threaded application (or being forced to have a UI-only main thread)

In my application, I have a 'logging' window, which shows all the logging, warnings, errors of the application.
Last year my application was still single-threaded so this worked [quite] good.
Now I am introducing multithreading. I quickly noticed that it's not a good idea to update the logging window from different threads. Reading some articles on keeping the UI in the main thread, I made a communication buffer, in which the other threads are adding their logging messages, and from which the main thread takes the messages and shows them in the logging window (this is done in the message loop).
Now, in a part of my application, the memory usage increases dramatically, because the separate threads are generating lots of logging messages, and the main thread cannot empty the communication buffer quickly enough. After the while the memory decreases again (if the other threads have finished their work and the main thread gradually empties the communication buffer).
I solved this problem by having a maximum size on the communication buffer, but then I run into a problem in the following situation:
the main thread has to perform a complex action
the main thread takes some parts of the action and let's separate threads execute this
while the seperate threads are executing their logic, the main thread processes the results from the other threads and continues with its work if the other threads are finished
Problem is that in this situation, if the other threads perform logging, there is no UI-message loop, and so the communication buffer is filled, but not emptied.
I see two solutions in solving this problem:
require the main thread to do regular polling of the communication buffer
only performing user interface logic in the main thread (no other logic)
I think the second solution seems the best, but this may not that easy to introduce in a big application (in my case it performs mathematical simulations).
Are there any other solutions or tips?
Or is one of the two proposed the best, easiest, most-pragmatic solution?
Thanks,
Patrick
Let's make some order first.
you may not hold UI processing for any time U would have noticed, or he will be frustrated
you may still perform long operations in the UI thread. this is done by means of PeakMessage loop. If you design one or more proper peakmessage loops, you do not need multithreading, unless for performance optimization.
you may consider MsgWaitForSingleObject() loop instead of GetMessage if you want to communicate with threads efficiently (always better than polling)
Therefore, if you do not redesign your message loop
There's no way you can perform syncronous requests from other threads
You may design a separate thread for the logging
All non-UI logic will have to be elsewhere.
About the memory problem:
it is a bad design to have one thread able to allocate all memory if another thread is stuck. Such dependency is a clear recipe for a disaster.
If the buffer is limited, you need to decide what happens when it's overrun. You have two options - suspend the thread or discard the message.
UI code:
It is possible to design logger code that would display messages with incredible speed. Such designs are complicated, rely on sophisticated caching and arranging data for fast access, viewport management and rendering only the part that corresponds to actual pixels that the user is looking at.
For most applications it is just a gimmick, because users do not read very fast. Most of the time it is better to design a different approach to showing logs, perhaps a stateful UI to let user choose what is interesting to him at the moment. Spy++ for example, some sysinternals tools like regmon, filemon are incredibly fast in showing their own logs. You can have a look at their source code.

Win32 Thread scheduling

As I understand, windows thread scheduler does not discriminate beween threads belonging two different processes, provided all of them have the same base priority. My question is if I have two applications one with only one thread and the other with say 50 threads all with same base priority, does it mean that the second process enjoys more CPU time then the first one?
Scheduling in Windows is at the thread granularity. The basic idea behind this approach is that processes don't run but only provide resources and a context in which their threads run. Coming back to your question, because scheduling decisions are made strictly on a thread basis, no consideration is given to what process the thread belongs to. In your example, if process A has 1 runnable thread and process B has 50 runnable threads, and all 51 threads are at the same priority, each thread would receive 1/51 of the CPU time—Windows wouldn't give 50 percent of the CPU to process A and 50 percent to process B.
To understand the thread-scheduling algorithms, you must first understand the priority levels that Windows uses. You can refer here for quick reference.
Try reading Windows Internals for in depth understanding.
All of the above are accurate but if you're worried about the 50 thread process hogging all the CPU, there ARE techniques you can do to ensure that no single process overwhelms the CPU.
IMHO the best way to do this is to use job objects to manage the usage of a process. First call CreateJobObject, then SetInformationJobObject to limit the max CPU usage of the processes in the job object and AssignProcessToJobObject to assign the process with 50 threads to the job object. You can then let the OS ensure that the 50 thread process doesn't consume too much CPU time.
The unit of scheduling is a thread, not a process, so a process with 50 threads, all in a tight loop, will get much more of the cpu than a process with only a single thread, provided all are running at the same priority. This is normally not a concern since most threads in the system are not in a runnable state and will not be up for scheduling; they are waiting on I/O, waiting for input from the user, and so on.
Windows Internals is a great book for learning more about the Windows thread scheduler.
That depends on the behavior of the threads. In general with a 50 : 1 difference in thread count, yes, the application with more threads is going to get a lot more time. However, windows also uses dynamic thread prioritization, which can change this somewhat. Dynamic thread prioritization is described here:
https://web.archive.org/web/20130312225716/http://support.microsoft.com/kb/109228
Relevant excerpt:
The base priority of a thread is the base level from which these upward adjustments are made. The current priority of a thread is called its dynamic priority. Interactive threads that yield before their time slice is up will tend to be adjusted upward in priority from their base priority. Compute-bound threads that do not yield, consuming their entire time slice, will tend to have their priority decreased, but not below the base level. This arrangement is often called heuristic scheduling. It provides better interactive performance and tends to lessen the system impact of "CPU hog" threads.
There is a local 'advanced' setting that purportedly can be used to shade scheduling slightly in favor of the app with focus. With the 'services' setting, there is no preference. In previous versions of Windows, this setting used to be somewhat more granular than just 'applications with focus'(slight preference to app with focus) and 'services' (all equal weigthing)
As this can be set by the user on the targe machine, it seems like it is asking for grief to depend on this setting...

Resources