Acquiring mutex based on given priority on Windows

Acquiring mutex based on given priority on Windows - windows

I have several processes which need to communicate, and I want to be able to let only one talk to another at the same time, for example, if I have processes A, B, C, D. If B wants to send a message to A, C cannot send a message, not even to D.
I already have the necessaries pipes for the communication, and I was looking for an object in the WinAPI to implement the exclusivity behaviour.
I'd also need to set a priority, so let's say C is sending a message to D and both A and B want to trasmit, when C is done I need to be able to assure A will always go first.
I know, this sounds like a situation which will lead to starvation, and in fact it will, but I am trying to simulate a real set up which suffer from starvation (CANbus).
Any idea what kind of object I could use?

Windows, like all other modern OSes, will schedule threads according to priority. If you have threads of different priority, and they're all waiting for the same mutex semaphore, the highest priority thread will be run when that mutex is next available.
So you should be able to set your thread priorities such that A is higher up than B, and have a single mutex semaphore that they all acquire before communicating, and release once they've finished. So when C has finished communicating with D it will release the mutex, and of the two ready threads (A and B) Windows will schedule A ahead of B. As A will be taking the semaphore, B will become not ready until A has finished and has released the mutex.

Related

Go goroutine under the hood

I'm trying to understand golang architecture and what "lightweight thread" means. I've already read something, but want to ask question to clarify it.
Am I right if I'll say what "go" keyword under the hood just puts following function in queue of inner thread pool, but for user it looks like creation of thread?

This is copied from the Go FAQ:
Why goroutines instead of threads?
Goroutines are part of making concurrency easy to use. The idea, which has been around for a while, is to multiplex independently executing functions—coroutines—onto a set of threads. When a coroutine blocks, such as by calling a blocking system call, the run-time automatically moves other coroutines on the same operating system thread to a different, runnable thread so they won't be blocked. The programmer sees none of this, which is the point. The result, which we call goroutines, can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes.
What's lacking here is the definition of thread. If we resort to Wikipedia, we find:
In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, ...
but that's just a description of, well, the same thing that a goroutine is. The problem here is that the word thread tends to refer to kernel thread and/or user thread (both defined on that same Wikipedia page) and these threads are heavier-weight than the goroutine threads. Which brings us right back to this:
I'm trying to understand golang architecture and what "lightweight thread" means ...
To cut to the chase, this means "lighter than the OS-provided ones". That's really all it means. There are OS-provided threads (on multiple OSes on which Go runs), but they generally do too much and cost too much to switch between so Go provides its own language-level ones that it calls "goroutines" that are much lighter.
From comments:
Why need to move tasks from one thread to another by some planner ...
This is an implementation detail, which involves another aspect of the OS-provided kernel threads:
I can't understand how [a goroutine] can be preempted if single thread process [is] blocked by [a] system call to read [a] long file
The current Go runtime goroutine / thread / processor scheduler (see What is relationship between goroutine and thread in kernel and user state and note that there have been more than just the current implementation) predicts that some system call will block, and makes sure to assign that system call its own OS-level kernel thread (see also JimB's comment). These threads do not count against the GOMAXPROCS setting. This is in fact sometimes a problem, as it's possible for the Go runtime to try to spin off more threads than the OS allows: it might be nice if there were a system-call-thread-pool here (though there are also obvious problems with this).
So, the current runtime creates up to GOMAXPROCS kernel-style OS-level threads and uses those to multiplex up to that many goroutines onto the CPUs, but creates extra kernel-style OS-level threads whenever it wants to. As the blog post linked in the question above notes, the P entities act as queues to hold goroutines (Gs) on a per-processor basis for localized cache lookup (remember that on some systems, especially NUMA ones, it's expensive to reach out "across" CPUs: the scheduler is still willing to do this, but won't do it too often, for some definition of "too often").
Earlier versions of the current scheduler required explicit yields (runtime.Gosched()) calls or various other runtime operations to cause a switch from the current goroutine to some other goroutine. See What exactly does runtime.Gosched do? for example. In Go 1.14, some OSes provide automatic goroutine preemption; see Will Go's scheduler yield control from one goroutine to another for CPU-intensive work?

Is MPI_Bcast() blocking?

Is MPI_Bcast() blocking or nonblocking? In other word, when the root sends a data, do all processors block until every processor has received this data? If not, how to synchronized (block) all of them so that no one proceeds until all receives the same data.

You need to be a bit careful about terminology here as what MPI means by "blocking" may not be how you have seen it used in other contexts.
In MPI terms, Bcast is blocking. Blocking means that, when the function returns, it has completed the operation it was meant to do. In this case, it means that on return from Bcast it is guaranteed that the receive buffer in every process contains the data you want to broadcast. The non-blocking version is Ibcast.
In MPI terms, what you are asking is whether the operation is synchronous, i.e. implies synchronisation amongst processes. For a point-to-point operation such as Send, this refers to whether or not the sender waits for the receive to be posted before returning from the send call. For collective operations, the question is whether there is a barrier (as pointed out by #Vladimir). Bcast does not necessarily imply a barrier.
However, the reason I am posting is that, in almost all MPI programs written using the standard Send/Recv calls (as opposed to single-sided Put/Get) you do not care if there is a synchronisation after the barrier. All each process cares about is that it has received the data it needs - why would it matter what the other processes are doing? If you subsequently want to communicate with any other process then the MPI routines are designed so that the required synchronisation happens automatically. If you issue a receive and another process is slow, you wait; if you issue a send and the other process has not issued a receive, everything will still work correctly (this is assuming you don't call Rsend - you should never call Rsend!). Whether or not there is synchronisation has effects on performance, but rarely affects whether a program is correct or not.
Unless processes are interacting via some other mechanism (e.g. all accessing the same file) then it is hard to come up with a real example where you care whether or not the Bcast synchronises. Of course you can always construct some edge case, but in real practical applications of MPI it almost never matters.
Many MPI programs are littered with barriers and in my experience they are almost never required for correctness; the only common use case is to ensure meaningful timings for performance measurements.

No, this kind of blocking (waiting for the other processes to finish their part of the job) would be very bad for performance. Every process continues as soon as it has all it need -- that means that the data it was to receive are there, or the data to be sent are at least copied to some buffer.
You can use an MPI_Barrier to synchronize processes if you need to be sure all processes finished. As already said, it can slowdown the program significantly. I use it only for certain diagnostic logging when initializing my code. Not during the actual integration.

Why is a yielding mutex implementation not recommended?

While implementing a mutex there are several architectural choices, like,
Spinning mutex (spinlock)
Sleeping mutex (a FIFO sleep queue is maintained while WAITING)
Yielding mutex (call the scheduler to run another process when WAITING)
Why is the Yielding Mutex least preferred? And how severe the consequences would be in using it?

The sleeping mutex has more fairness. Yielding mutex can cause starvation.
The problem with the yielding model is that a process may be asked to yield over and over, while other processes scoop the mutex (see also barging), or only have to wait a much shorter time.
Depending on how new processes are added to the queue, it could even happen that every time a certain process reaches its turn, it is forced to yield because the mutex is already taken, starving the process.
The FIFO model ensures that waiting processes are served on a first-come/first-served basis. It is harder to implement in the OS, but more fair.
The models can get more complicated. For instance, the OS may also have priorities for processes, and the priorities can change over time. Then the queue system can get even more tricky to implement.

I/O Completion Ports vs. RegisterWaitForSingleObject?

What's the difference between using I/O completion ports, versus just using RegisterWaitForSingleObject to have a thread pool thread wait for I/O to complete?
Is one of them faster, and if so, why?

IOCP's are generally the fastest performing IO turn-around mechanism you will find for one reason above all else: blocking detection.
The simple example of this is a server that is responsible for serving up files from a disk. An IOCP is generally made up of three primary things:
The pool of N threads for servicing the IOCP requests.
A limit of M threads (M is always < N) the tells the IOCP how many concurrent, non-blocked threads to allow.
A completion-status loop that all threads run on.
The difference between N and M in this is very important. The general philosophy is to configure M to be the number of cores on the machine, and N to be larger. How much larger depends on the amount of time your worker threads spend in a blocked-state. If you're reading disk files, your threads will be bound to the speed of the disk IO channel. When you make that call to ReadFile() you've just introduced a blocking call. If M == N, then as soon as you hit all threads reading disk files, you're utterly stalled, with all threads on the disk IO channel.
But what if there was a way for some fancy scheduler to "know" that this thread is (a) participating in an IOCP thread pool, and (b) just stalled because it issued an API call that will be time consuming? What if, when that happens, that fancy scheduler could temporarily "move" that thread into a special "running-but-stalled" group, and then "release" an extra thread that has volunteered to work while there are threads stalled?
That is exactly what IOCP brings. When N is greater than M, The IOCP will put the thread that just issued the stall into a special running-but-stalled state, and then temporarily "borrow" an additional thread from your pool of N. It will continue to do this until the N pool is exhausted, or threads that were stalled begin returning from their blocking requests.
So under that light, an IOCP configured to have, say 8 threads concurrently running on an 8-core machine could actually have a few hundred threads in the real pool. Only 8 will ever be "allowed" to be concurrently running in non-blocked state, though you may pop over that temporarily when blocked threads return from their blocks and you already have borrowed threads servicing additional requests.
Finally, though not as important for your cause, it is still important: An IOCP thread will NOT block, nor context switch, if there is pending work on the queue when it finishes its current work and issues its next GetQueueCompletionStatus() call. If there is work waiting, it will pick it up and continue executing with no mandated preemption. Of course the OS scheduler may preempt anyway, but only as part of the general scheduler; not because of the specific call to GetQueueCompletionStatus(). The lone exception to this is if there are already over M threads running and non-blocked. In that case, GetQueueCompletionStatus() will block the calling thread until it is needed again for slack-work when enough threads once-again become blocked.
The description you gave indicates you will be heavily disk-io-bound. For absolute performance-critical io-server architectures, it is near-impossible to beat the benefits of IOCP, especially the OS-level block-detection that allows the scheduler to know it can temporarily release extra threads from your master-pool to keep things pumping while other threads are stalled.
You simply cannot replicate that specific feature of IOCPs using Windows thread pools. If all of your threads were number crunchers with little or no IO, I would say thread-pools would be a better fit, but your specificity of disk-IO tells me you should be using an IOCP instead.

Is message passing via channels in go guaranteed to be non-blocking?

In order to assess whether go is a possible option for an audio/video application, I would like to know whether message passing in go satisfies any non-blocking progress guarantees (being obstruction-free, lock-free or wait-free). In particular, the following scenarios are relevant:
Single producer single consumer:
Two threads communicate using a shared channel. Thread A only does asynchronous sends, thread B only does asynchronous receives. Suppose the OS scheduler decides to interrupt thread A at the "worst possible moment" for an indefinite amount of time. Is thread B guaranteed to finish a receive operation in a bounded number of cpu cycles or is there a (theoretical) possibility that thread A can put the channel into a state where thread B needs to wait for the OS to resume thread A?
Multiple producers:
Several threads A1, A2, A3, ... communicate with one or more others threads using a shared channel. The threads Ai only do asynchronous sends. Suppose A2, A3, ... are suspended by the OS scheduler at the "worst possible moment" for an indefinite amount of time. Is thread A1 still guaranteed to finish a send operation in a bounded number of cpu cycles? Suppose further that each thread only wants to do one send. If the program is run sufficiently long (with a "malicious" scheduler which potentially starves some threads or interrupts and resumes threads at the "worst possible moment"), is at least one send guaranteed to succeed?
I am not so much interested in typical scenarios here, but rather worst-case guarantees.
See Non-blocking algorithm (Wikipedia) for more details on obstruction-, lock- and wait-free algorithms.

Normal sends and receives are blocking operations by definition. You can do a non-blocking send or receive by using a select statement:
select {
case ch <- msg:
default:
}
(Receiving is very similar; just replace the case statement.)
The send only takes place when there is room in the channel's buffer. Otherwise the default case runs. Note that internally a mutex is still used (if I'm reading the code correctly).

The Go Memory Model doesn't require sends and receives to be non-blocking, and the current runtime implementation locks the channel for send and recv. This means, for instance, that it is possible to starve a sending or receiving go-routine if the OS-scheduler interrupts another thread running another go-routine which tries to send or receive on the same channel while it has already acquired the channel's lock.
So the answer is unfortunately no :(
(unless someone reimplement parts of the runtime using non-blocking algorithms).

You're asking whether an operation is guarantee to complete within a bounded number of cycles, which of course is not a design consideration for this language (or most underlying OSes).
If run in a single thread, then Go uses cooperative multitasking between goroutines. So if one routine never yields, then the other will never run. If your program runs on multiple threads (as set by GOMAXPROCS), then you can run several goroutines simultaneously, in which case the OS controls scheduling between the threads. However, in neither case is there a guaranteed upper bound on the time to completion for a function call.
Note that the cooperative nature of goroutines gives you some amount of control over scheduling execution -- that is, routines are never preempted. Until you yield, you retain control of the thread.
As for blocking behavior, see The language specification:
The capacity, in number of elements, sets the size of the buffer in the channel. If the capacity is greater than zero, the channel is asynchronous: communication operations succeed without blocking if the buffer is not full (sends) or not empty (receives), and elements are received in the order they are sent. If the capacity is zero or absent, the communication succeeds only when both a sender and receiver are ready.
Note that non-blocking sends and receives on channels can be accomplished using the select syntax already mentioned.

Goroutines do not own channels or the values sent on them. So, the execution status of a goroutine that has sent / is sending values on a channel has no impact on the ability of other goroutines to send or receive values on that channel, unless the channel's buffer is full, in which case all sends will block until a corresponding receive occurs, or the buffer is empty, in which case all receives will block until there is a corresponding send.
Because goroutines use cooperative scheduling (they have to yield to the scheduler, either through a channel operation, a syscall, or an explicit call to runtime.Gosched()), it is impossible for a goroutine to be interrupted at the "worst possible time". It is possible for a goroutine to never yield, in which case, it could tie up a thread indefinitely. If you have only one thread of execution, then your other goroutines will never be scheduled. It is possible, but statistically improbable, for a goroutine to just never be scheduled. However, if all goroutines but one are blocked on sends or receives, then the remaining goroutine must be scheduled.
It is possible to get a deadlock. If I have two goroutines executing:
func Goroutine(ch1, ch2 chan int) {
i := <-ch1
ch2 <- i
}
...
ch1, ch2 := make(chan int), make(chan int)
go Goroutine(ch1, ch2)
go Goroutine(ch2, ch1)
Then, as should be apparent, both goroutines are waiting for the other to send a value, which will never happen.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio