Communication in Go via channels - go

Consider such situation. There are one main goroutine and ten subsidiary
goroutines. All of them have access to channel. Main one sends 1000 numbers to this channel and subsidiary ones will read from it.
Is there any guarantee that each subsidiary goroutine will read exactly 100 numbers or this amount may vary like some goroutine will read 99 numbers and another 101?

No, there is no guarantee, because it depends on the runtime of every goroutine and that depends on how well the goroutines are distributed across the CPU.

With goroutine scheduling via an unbuffered channel, then interesting thing to note is that the channel serves purely as a blocking mechanism - in that the value is never "sent into the channel" and nothing actually "reads from" it.
An unbuffered channel works purely as a synchronization mechanism: for the goroutine sending on a channel, it works more along the lines of "sleep until some goroutine is ready to receive" and for the receiving goroutine it's "sleep until some goroutine is ready to send".
This should make it clear that sends and receives have no kind of fairness or distribution system built in - they're purely first come first served, or possibly even more arbitrary depending on the scheduler's current load.

Related

Forcing a goroutine to run X times a second

Is there a way to force that a goroutine will run X times a second, no matter if there are other goroutines which may be doing a CPU intensive operation?
A little background on why, I am working on a game server written in go, I have a goroutine that handles the gameloop, the game is updated at X ticks per-second, of course some of the operations the server does are expensive (for example, terrain generation), currently I just spawn a goroutine and let that handle the generation in way that would not block the gameloop goroutine, but after testing on a server with a single vcore, I saw that it still blocks the gameloop while doing CPU intensive operations.
After searching online I found out that go would not reschedule a goroutine while it is not in a blocking syscall, now I could do as suggested which is to just manually call reschedule for the goroutine, but that has two problems, it will make the cpu intensive code more messy, with needing to handle timeouts at specific points, and even after manual reschedule it could just reschedule another cpu intensive goroutine instead of the gameloop...
Is there a way to force that a goroutine will run X times a second, no matter if there are other goroutines which may be doing a CPU intensive operation?
No
after testing on a server with a single vcore, I saw that it still blocks the gameloop while doing CPU intensive operations.
What else do you expect to happen? You have one core and two operations to be performed.
After searching online I found out that go would not reschedule a goroutine while it is not in a blocking syscall
Not true.
From go runtime:
Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is supported on all platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.

Advantages of epoll integration

On one job interview I had to answer this question - which advantages do you have owing to use epoll integration/implementation in Go.
I just know what can do epoll and that complexity for any descriptors count is O(1), but have no idea why Go is better than other languages.
I found this branch https://news.ycombinator.com/item?id=15624586 and guy say that reason, maybe, that Go don't use stack switching. It's hard to understand for me. Which part of program don't use stack switching? Every goroutine has his own stack.
That's not the netpoller integration per se which makes Go strong in its field—it's rather the way that integration is done: instead of being bolted-on as a library, in Go, the netpoller is tightly integrated right into the runtime and the scheduler (which decides which goroutine to run, and when).
The coupling of super-light-weight threads of execution—goroutines—with the netpoller allows for callback-free programming. That is, once your service gets another client connected, you just hand this connection to a goroutine which merely reads the data from it (and writes its response stream to it). As soon as there's no data available when the goroutine wants to read it, the scheduler suspends the goroutine and unblocks it once the netpoller reports there's data available; the same happens when the goroutine wants to write data but the sending buffer is full.
To recap, the netpoller in Go is intertwined with the goroutine scheduler which allows goroutines transparently wait for data availability without requiring the programmer to explicitly code the event loop and callbacks or deal with "futures" and "promises" which are mere callbacks wrapped in pretty objects.
I invite you to read this classic essay which explains this stuff with much more beautiful words.

Is MPI_Bcast() blocking?

Is MPI_Bcast() blocking or nonblocking? In other word, when the root sends a data, do all processors block until every processor has received this data? If not, how to synchronized (block) all of them so that no one proceeds until all receives the same data.
You need to be a bit careful about terminology here as what MPI means by "blocking" may not be how you have seen it used in other contexts.
In MPI terms, Bcast is blocking. Blocking means that, when the function returns, it has completed the operation it was meant to do. In this case, it means that on return from Bcast it is guaranteed that the receive buffer in every process contains the data you want to broadcast. The non-blocking version is Ibcast.
In MPI terms, what you are asking is whether the operation is synchronous, i.e. implies synchronisation amongst processes. For a point-to-point operation such as Send, this refers to whether or not the sender waits for the receive to be posted before returning from the send call. For collective operations, the question is whether there is a barrier (as pointed out by #Vladimir). Bcast does not necessarily imply a barrier.
However, the reason I am posting is that, in almost all MPI programs written using the standard Send/Recv calls (as opposed to single-sided Put/Get) you do not care if there is a synchronisation after the barrier. All each process cares about is that it has received the data it needs - why would it matter what the other processes are doing? If you subsequently want to communicate with any other process then the MPI routines are designed so that the required synchronisation happens automatically. If you issue a receive and another process is slow, you wait; if you issue a send and the other process has not issued a receive, everything will still work correctly (this is assuming you don't call Rsend - you should never call Rsend!). Whether or not there is synchronisation has effects on performance, but rarely affects whether a program is correct or not.
Unless processes are interacting via some other mechanism (e.g. all accessing the same file) then it is hard to come up with a real example where you care whether or not the Bcast synchronises. Of course you can always construct some edge case, but in real practical applications of MPI it almost never matters.
Many MPI programs are littered with barriers and in my experience they are almost never required for correctness; the only common use case is to ensure meaningful timings for performance measurements.
No, this kind of blocking (waiting for the other processes to finish their part of the job) would be very bad for performance. Every process continues as soon as it has all it need -- that means that the data it was to receive are there, or the data to be sent are at least copied to some buffer.
You can use an MPI_Barrier to synchronize processes if you need to be sure all processes finished. As already said, it can slowdown the program significantly. I use it only for certain diagnostic logging when initializing my code. Not during the actual integration.

When to use a buffered channel?

What are the uses cases for buffered channels ? If i want multiple parallel actions i could just use the default, synchronous channel eq.
package main
import "fmt"
import "time"
func longLastingProcess(c chan string) {
time.Sleep(2000 * time.Millisecond)
c <- "tadaa"
}
func main() {
c := make(chan string)
go longLastingProcess(c)
go longLastingProcess(c)
go longLastingProcess(c)
fmt.Println(<- c)
}
What would be the practical cases for increasing the buffer size ?
To give a single, slightly-more-concrete use case:
Suppose you want your channel to represent a task queue, so that a task scheduler can send jobs into the queue, and a worker thread can consume a job by receiving it in the channel.
Suppose further that, though in general you expect each job to be handled in a timely fashion, it takes longer for a worker to complete a task than it does for the scheduler to schedule it.
Having a buffer allows the scheduler to deposit jobs in the queue and still remain responsive to user input (or network traffic, or whatever) because it does not have to sleep until the worker is ready each time it schedules a task. Instead, it goes about its business, and trusts the workers to catch up during a quieter period.
If you want an EVEN MORE CONCRETE example dealing with a specific piece of software then I'll see what I can do, but I hope this meets your needs.
Generally, buffering in channels is beneficial for performance reasons.
If a program is designed using an event-flow or data-flow approach, channels provide the means for the events to pass between one process and another (I use the term process in the same sense as in Tony Hoare's Communicating Sequential Processes (CSP), ie. effectively synonymous with the goroutine).
There are times when a program needs its components to remain in lock-step synchrony. In this case, unbuffered channels are required.
Otherwise, it is typically beneficial to add buffering to the channels. This should be seen as an optimisation step (deadlock may still be possible if not designed out).
There are novel throttle structures made possible by using channels with small buffers (example).
There are special overwriting or lossy forms of channels used in occam and jcsp for fixing the special case of a cycle (or loop) of processes that would otherwise probably deadlock. This is also possible in Go by writing an overwriting goroutine buffer (example).
You should never add buffering merely to fix a deadlock. If your program deadlocks, it's far easier to fix by starting with zero buffering and think through the dependencies. Then add buffering when you know it won't deadlock.
You can construct goroutines compositionally - that is, a goroutine may itself contain goroutines. This is a feature of CSP and benefits scalability greatly. The internal channels between a group of goroutines are not of interest when designing the external use of the group as a self-contained component. This principle can be applied repeatedly at increasingly-larger scales.
If the receiver of the channel is always slower than sender a buffer of any size will eventually be consumed. That will leave you with a channel that pauses your go routine as often as a unbuffered channel so you might as well use an unbuffered channel.
If the receiver is typically faster than the sender except for an occasional burst a buffered channel may be helpful and the buffer should be set to the size of the typical burst which you can arrive at by measurement at runtime.
As an alternative to a buffered channel it may better to just send an array or a struct containing an array over the channel to deal with bursts/batches.
Buffered channels are non-blocking for the sender as long as there's still room. This can increase responsiveness and throughput.
Sending several items on one buffered channel makes sure they are processed in the order in which they are sent.
From Effective Go (with example): "A buffered channel can be used like a semaphore, for instance to limit throughput."
In general, there are many use-cases and patterns of channel usage, so this is not an exhausting answer.
It's a hard question b/c the program is incorrect: It exits after receiving a signal from one goroutine, but three were started. Buffering the channel makes it no different.
EDIT: For example, here is a bit of general discussion about channel buffers. And some exercise. And a book chapter about the same.

Is message passing via channels in go guaranteed to be non-blocking?

In order to assess whether go is a possible option for an audio/video application, I would like to know whether message passing in go satisfies any non-blocking progress guarantees (being obstruction-free, lock-free or wait-free). In particular, the following scenarios are relevant:
Single producer single consumer:
Two threads communicate using a shared channel. Thread A only does asynchronous sends, thread B only does asynchronous receives. Suppose the OS scheduler decides to interrupt thread A at the "worst possible moment" for an indefinite amount of time. Is thread B guaranteed to finish a receive operation in a bounded number of cpu cycles or is there a (theoretical) possibility that thread A can put the channel into a state where thread B needs to wait for the OS to resume thread A?
Multiple producers:
Several threads A1, A2, A3, ... communicate with one or more others threads using a shared channel. The threads Ai only do asynchronous sends. Suppose A2, A3, ... are suspended by the OS scheduler at the "worst possible moment" for an indefinite amount of time. Is thread A1 still guaranteed to finish a send operation in a bounded number of cpu cycles? Suppose further that each thread only wants to do one send. If the program is run sufficiently long (with a "malicious" scheduler which potentially starves some threads or interrupts and resumes threads at the "worst possible moment"), is at least one send guaranteed to succeed?
I am not so much interested in typical scenarios here, but rather worst-case guarantees.
See Non-blocking algorithm (Wikipedia) for more details on obstruction-, lock- and wait-free algorithms.
Normal sends and receives are blocking operations by definition. You can do a non-blocking send or receive by using a select statement:
select {
case ch <- msg:
default:
}
(Receiving is very similar; just replace the case statement.)
The send only takes place when there is room in the channel's buffer. Otherwise the default case runs. Note that internally a mutex is still used (if I'm reading the code correctly).
The Go Memory Model doesn't require sends and receives to be non-blocking, and the current runtime implementation locks the channel for send and recv. This means, for instance, that it is possible to starve a sending or receiving go-routine if the OS-scheduler interrupts another thread running another go-routine which tries to send or receive on the same channel while it has already acquired the channel's lock.
So the answer is unfortunately no :(
(unless someone reimplement parts of the runtime using non-blocking algorithms).
You're asking whether an operation is guarantee to complete within a bounded number of cycles, which of course is not a design consideration for this language (or most underlying OSes).
If run in a single thread, then Go uses cooperative multitasking between goroutines. So if one routine never yields, then the other will never run. If your program runs on multiple threads (as set by GOMAXPROCS), then you can run several goroutines simultaneously, in which case the OS controls scheduling between the threads. However, in neither case is there a guaranteed upper bound on the time to completion for a function call.
Note that the cooperative nature of goroutines gives you some amount of control over scheduling execution -- that is, routines are never preempted. Until you yield, you retain control of the thread.
As for blocking behavior, see The language specification:
The capacity, in number of elements, sets the size of the buffer in the channel. If the capacity is greater than zero, the channel is asynchronous: communication operations succeed without blocking if the buffer is not full (sends) or not empty (receives), and elements are received in the order they are sent. If the capacity is zero or absent, the communication succeeds only when both a sender and receiver are ready.
Note that non-blocking sends and receives on channels can be accomplished using the select syntax already mentioned.
Goroutines do not own channels or the values sent on them. So, the execution status of a goroutine that has sent / is sending values on a channel has no impact on the ability of other goroutines to send or receive values on that channel, unless the channel's buffer is full, in which case all sends will block until a corresponding receive occurs, or the buffer is empty, in which case all receives will block until there is a corresponding send.
Because goroutines use cooperative scheduling (they have to yield to the scheduler, either through a channel operation, a syscall, or an explicit call to runtime.Gosched()), it is impossible for a goroutine to be interrupted at the "worst possible time". It is possible for a goroutine to never yield, in which case, it could tie up a thread indefinitely. If you have only one thread of execution, then your other goroutines will never be scheduled. It is possible, but statistically improbable, for a goroutine to just never be scheduled. However, if all goroutines but one are blocked on sends or receives, then the remaining goroutine must be scheduled.
It is possible to get a deadlock. If I have two goroutines executing:
func Goroutine(ch1, ch2 chan int) {
i := <-ch1
ch2 <- i
}
...
ch1, ch2 := make(chan int), make(chan int)
go Goroutine(ch1, ch2)
go Goroutine(ch2, ch1)
Then, as should be apparent, both goroutines are waiting for the other to send a value, which will never happen.

Resources