If I am using channels properly, should I need to use mutexes to protect against concurrent access?
You don't need mutex if you use channels correctly. In some cases a solution with mutex might be simpler though.
Just make sure the variable(s) holding the channel values are properly initialized before multiple goroutines try to access the channel variables. Once this is done, accessing the channels (e.g. sending values to or receiving values from them) is safe by design.
Supporting documents with references (emphases added by me):
Spec: Channel types:
A single channel may be used in send statements, receive operations, and calls to the built-in functions cap and len by any number of goroutines without further synchronization. Channels act as first-in-first-out queues. For example, if one goroutine sends values on a channel and a second goroutine receives them, the values are received in the order sent.
Effective Go: Concurrency: Share by communicating
Concurrent programming in many environments is made difficult by the subtleties required to implement correct access to shared variables. Go encourages a different approach in which shared values are passed around on channels and, in fact, never actively shared by separate threads of execution. Only one goroutine has access to the value at any given time. Data races cannot occur, by design. To encourage this way of thinking we have reduced it to a slogan:
Do not communicate by sharing memory; instead, share memory by communicating.
This approach can be taken too far. Reference counts may be best done by putting a mutex around an integer variable, for instance. But as a high-level approach, using channels to control access makes it easier to write clear, correct programs.
This article is also very helpful: The Go Memory Model
Also quoting from the package doc of sync:
Package sync provides basic synchronization primitives such as mutual exclusion locks. Other than the Once and WaitGroup types, most are intended for use by low-level library routines. Higher-level synchronization is better done via channels and communication.
Related
There is one go routine generating data. Also are many go routines that handles http response. I want generated data to be passed to all http handler routines. All dispatched data are same.
I thought two solutions. Using channel pipeline to fan-out or using mutex and condition variable.
I concern if former way needs memory allocation to put data in channel.
What should I choose?
Your use case sounds like it benefits from channels. In general channels are preferred when communication between go routines is needed. Sounds like a classic example of a worker pool.
Mutexes are used to protect a piece of memory, so only 1 goroutine can access/modify it at a time. Often times this is the opposite of what people want, which is to parallelize execution.
A good rule of thumb is to not worry about optimization(memory allocation or not) until it actually becomes an issue. Premature optimization is a common anti-pattern.
I have a big and deeply nested shared struct. Each goroutine of my program may use a different part of the struct, a slice, a map, etc. To make things worse, all these goroutines do long operations, which means it may not be a good idea to use a big lock for that shared struct.
Therefore, I have come up with an idea which is to lock the struct before accessing one part of the struct and then encode it, as soon as the encoding is done the goroutine can release the lock and decodes the data. In this way, one single goroutine won't hold the lock for a long time.
The question is: I'm not sure if this is a good practice, is there a better way to solve this kind of problem? Or is there any better ideology to such kinds of problems?
You may use following techniques instead of sync.Mutex:
sync.RWMutex - read-write lock allows to read the structure by multiple goroutine at the same time and only when write lock is not acquired. If your problem is that multiple goroutines cannot code different parts of struct in parallel, this may be the best choice.
You can granulate locks as you mentioned in the question, but you must be careful to properly acquire all involved locks to preserve consistency when you modify different parts of the struct.
You can take a snapshot of the interesting part of struct under lock and outside lock do some heavy read operation. This will of course put extra pressure on the GC, but it may be worth it.
You could also mix them to achieve better results.
I am contemplating inter-process sharing of custom objects. My current implementation uses ZeroMQ where the objects are packed into a message and sent from process A to process B.
I am wondering whether it would be faster instead to have a concurrent container implemented using boost::interprocess (where process A will insert into the container and process B will retrieve from it). Not sure if this will be faster than having to serialise the object in process A and then de-serialising it in process B.
Just wondering if anyone has done benchmarking? Is it conceptually right to compare the two?
In principle, ZeroMq should be slower, because the metaphor it's using is the passing of messages. These kinds of libraries are not intended for sharing regions of memory, in place, and for different processes to be able to modify them concurrently.
Specifically, you mentioned "packing". When sharing memory regions, you can - ideally - avoid any packing and just work on data as-is (of course, with the care necessary in concurrent use of the same data structures, using offsets instead of pointers etc.)
Also note that even when sharing is a one-directional back-and-forth (i.e. only one process at a time accesses any of the data), ZeroMq can only match the use of IPC shared memory if it supports zero-copying all the way down. This is not clear to me from the FAQ page on zero-copying (but may be the case anyway).
I agree with Nim, they're too different for easy comparison.
ZeroMQ has inproc which uses shared memory as a byte transport.
Boost.Interprocess seems to be mostly about having objects constructed in shared memory, accessible to multiple processes / threads. However it does have message queues, but they too are just byte transports requiring objects to be serialised, just like you have to with ZeroMQ. They're not object containers, so are more comparable to ZeroMQ but is quite a long way from what Boost.Interprocess seems to represent.
I have done a ZeroMQ / STL container hybrid. Yeurk. I used a C++ STL queue to store objects, but then used a ZeroMQ PUSH/PULL socket to govern which thread could read from that queue. Reading threads were blocked on a ZeroMQ poll, and when they received a message they'd lock the queue and read an object out from it. This avoided having to serialise objects, which was handy, so it was pretty fast. This doesn't work for PUB/SUB which implies copying objects between recipients, which would need object serialisation.
ZMQ IPC is effective only in linux(using UNIX domain socket)
The performance is slower than boost::interprocess shared_memory
I would like to know which is the best way to ensure an exclusive access to a shared resource (such as memory window) among n processes in MPI. I've tried MPI_Win_lock & MPI_Win_fence but they don't seem to work as expected, i.e: I can see that multiple processes enter a critical region (code between MPI_Win_lock & MPI_Win_unlock that contains MPI_Get and/or MPI_Put) at the same time.
I would appreciate your suggestions. Thanks.
In MPI 2 you cannot truly do atomic operations. This is introduced in MPI 3 using MPI_Fetch_and_op. This is why your critical data is modified.
Furthermore, take care with `MPI_Win_lock'. As described here:
The name of this routine is misleading. In particular, this routine need not block, except when the target process is the calling process.
The actual blocking process is MPI_Win_unlock, meaning that only after returning from this procedure you can be sure that the values from put and get are correct. Perhaps this is better described here:
MPI passive target operations are organized into access epochs that are bracketed by MPI Win lock and MPI Win unlock calls. Clever MPI implementations [10] will combine all the data movement operations (puts, gets, and accumulates) into one network transaction that occurs at the unlock.
This same document can also provide a solution to your problem, which is that critical data is not written atomically. It does this through the use of a mutex, which is a mechanism that ensures only one process can access data at the time.
I recommend you read this document: The solution they propose is not difficult to implement.
What are the uses cases for buffered channels ? If i want multiple parallel actions i could just use the default, synchronous channel eq.
package main
import "fmt"
import "time"
func longLastingProcess(c chan string) {
time.Sleep(2000 * time.Millisecond)
c <- "tadaa"
}
func main() {
c := make(chan string)
go longLastingProcess(c)
go longLastingProcess(c)
go longLastingProcess(c)
fmt.Println(<- c)
}
What would be the practical cases for increasing the buffer size ?
To give a single, slightly-more-concrete use case:
Suppose you want your channel to represent a task queue, so that a task scheduler can send jobs into the queue, and a worker thread can consume a job by receiving it in the channel.
Suppose further that, though in general you expect each job to be handled in a timely fashion, it takes longer for a worker to complete a task than it does for the scheduler to schedule it.
Having a buffer allows the scheduler to deposit jobs in the queue and still remain responsive to user input (or network traffic, or whatever) because it does not have to sleep until the worker is ready each time it schedules a task. Instead, it goes about its business, and trusts the workers to catch up during a quieter period.
If you want an EVEN MORE CONCRETE example dealing with a specific piece of software then I'll see what I can do, but I hope this meets your needs.
Generally, buffering in channels is beneficial for performance reasons.
If a program is designed using an event-flow or data-flow approach, channels provide the means for the events to pass between one process and another (I use the term process in the same sense as in Tony Hoare's Communicating Sequential Processes (CSP), ie. effectively synonymous with the goroutine).
There are times when a program needs its components to remain in lock-step synchrony. In this case, unbuffered channels are required.
Otherwise, it is typically beneficial to add buffering to the channels. This should be seen as an optimisation step (deadlock may still be possible if not designed out).
There are novel throttle structures made possible by using channels with small buffers (example).
There are special overwriting or lossy forms of channels used in occam and jcsp for fixing the special case of a cycle (or loop) of processes that would otherwise probably deadlock. This is also possible in Go by writing an overwriting goroutine buffer (example).
You should never add buffering merely to fix a deadlock. If your program deadlocks, it's far easier to fix by starting with zero buffering and think through the dependencies. Then add buffering when you know it won't deadlock.
You can construct goroutines compositionally - that is, a goroutine may itself contain goroutines. This is a feature of CSP and benefits scalability greatly. The internal channels between a group of goroutines are not of interest when designing the external use of the group as a self-contained component. This principle can be applied repeatedly at increasingly-larger scales.
If the receiver of the channel is always slower than sender a buffer of any size will eventually be consumed. That will leave you with a channel that pauses your go routine as often as a unbuffered channel so you might as well use an unbuffered channel.
If the receiver is typically faster than the sender except for an occasional burst a buffered channel may be helpful and the buffer should be set to the size of the typical burst which you can arrive at by measurement at runtime.
As an alternative to a buffered channel it may better to just send an array or a struct containing an array over the channel to deal with bursts/batches.
Buffered channels are non-blocking for the sender as long as there's still room. This can increase responsiveness and throughput.
Sending several items on one buffered channel makes sure they are processed in the order in which they are sent.
From Effective Go (with example): "A buffered channel can be used like a semaphore, for instance to limit throughput."
In general, there are many use-cases and patterns of channel usage, so this is not an exhausting answer.
It's a hard question b/c the program is incorrect: It exits after receiving a signal from one goroutine, but three were started. Buffering the channel makes it no different.
EDIT: For example, here is a bit of general discussion about channel buffers. And some exercise. And a book chapter about the same.