I have studied the Godoc of the gorilla/websocket package.
In the Godoc it is clearly stated that
Concurrency
Connections support one concurrent reader and one concurrent writer.
Applications are responsible for ensuring that no more than one goroutine calls the write methods (NextWriter, SetWriteDeadline, WriteMessage, WriteJSON, EnableWriteCompression, SetCompressionLevel) concurrently and that no more than one goroutine calls the read methods (NextReader, SetReadDeadline, ReadMessage, ReadJSON, SetPongHandler, SetPingHandler) concurrently.
The Close and WriteControl methods can be called concurrently with all other
methods.
However, in one of the example provided by the package
func (c *Conn) readPump() {
defer func() {
hub.unregister <- c
c.ws.Close()
}()
c.ws.SetReadLimit(maxMessageSize)
c.ws.SetReadDeadline(time.Now().Add(pongWait))
c.ws.SetPongHandler(func(string) error {
c.ws.SetReadDeadline(time.Now().Add(pongWait)); return nil
})
for {
_, message, err := c.ws.ReadMessage()
if err != nil {
if websocket.IsUnexpectedCloseError(err, websocket.CloseGoingAway) {
log.Printf("error: %v", err)
}
break
}
message = bytes.TrimSpace(bytes.Replace(message, newline, space, -1))
hub.broadcast <- message
}
}
Source: https://github.com/gorilla/websocket/blob/a68708917c6a4f06314ab4e52493cc61359c9d42/examples/chat/conn.go#L50
This line
c.ws.SetPongHandler(func(string) error {
c.ws.SetReadDeadline(time.Now().Add(pongWait)); return nil
})
and this line
_, message, err := c.ws.ReadMessage()
seems to be not synchronized because the first line is a callback function so it should be invoked in a Goroutine created in the package and the second line is executing in the Goroutine that invoke serveWs
More importantly, how should I ensure that no more than one goroutine calls the SetReadDeadline, ReadMessage, SetPongHandler, SetPingHandler concurrently?
I tries to use a Mutex lock and lock it whenever I call the above functions, and unlock it afterwards, but quickly I realize a problem. It is usual (also in the example) that ReadMessage is being called in a for-loop. But if the Mutext is locked before the ReadMessage, then no other Read-functions can acquire the lock and execute until next message is received
Is there any better way in handling this concurrency issue? Thanks in advance.
The best way to ensure that there are no concurrent calls to the read methods is to execute all of the read methods from a single goroutine.
All of the Gorilla websocket examples use this approach including the example pasted in the question. In the example, all calls to the read methods are from the readPump method. The readPump method is called once for a connection on a single goroutine. It follows that the connection read methods are not called concurrently.
The section of the documentation on control messages says that the application must read the connection to process control messages. Based on this and Gorilla's own examples, I think it's safe to assume that the ping, pong and close handlers will be called from the application's reading goroutine as it is in the current implementation. It would be nice if the documentation could be more explicit about this. Maybe file an issue?
Related
I'm trying to understand best practices for Golang concurrency. I read O'Reilly's book on Go's concurrency and then came back to the Golang Codewalks, specifically this example:
https://golang.org/doc/codewalk/sharemem/
This is the code I was hoping to review with you in order to learn a little bit more about Go. My first impression is that this code is breaking some best practices. This is of course my (very) unexperienced opinion and I wanted to discuss and gain some insight on the process. This isn't about who's right or wrong, please be nice, I just want to share my views and get some feedback on them. Maybe this discussion will help other people see why I'm wrong and teach them something.
I'm fully aware that the purpose of this code is to teach beginners, not to be perfect code.
Issue 1 - No Goroutine cleanup logic
func main() {
// Create our input and output channels.
pending, complete := make(chan *Resource), make(chan *Resource)
// Launch the StateMonitor.
status := StateMonitor(statusInterval)
// Launch some Poller goroutines.
for i := 0; i < numPollers; i++ {
go Poller(pending, complete, status)
}
// Send some Resources to the pending queue.
go func() {
for _, url := range urls {
pending <- &Resource{url: url}
}
}()
for r := range complete {
go r.Sleep(pending)
}
}
The main method has no way to cleanup the Goroutines, which means if this was part of a library, they would be leaked.
Issue 2 - Writers aren't spawning the channels
I read that as a best practice, the logic to create, write and cleanup a channel should be controlled by a single entity (or group of entities). The reason behind this is that writers will panic when writing to a closed channel. So, it is best for the writer(s) to create the channel, write to it and control when it should be closed. If there are multiple writers, they can be synced with a WaitGroup.
func StateMonitor(updateInterval time.Duration) chan<- State {
updates := make(chan State)
urlStatus := make(map[string]string)
ticker := time.NewTicker(updateInterval)
go func() {
for {
select {
case <-ticker.C:
logState(urlStatus)
case s := <-updates:
urlStatus[s.url] = s.status
}
}
}()
return updates
}
This function shouldn't be in charge of creating the updates channel because it is the reader of the channel, not the writer. The writer of this channel should create it and pass it to this function. Basically saying to the function "I will pass updates to you via this channel". But instead, this function is creating a channel and it isn't clear who is responsible of cleaning it up.
Issue 3 - Writing to a channel asynchronously
This function:
func (r *Resource) Sleep(done chan<- *Resource) {
time.Sleep(pollInterval + errTimeout*time.Duration(r.errCount))
done <- r
}
Is being referenced here:
for r := range complete {
go r.Sleep(pending)
}
And it seems like an awful idea. When this channel is closed, we'll have a goroutine sleeping somewhere out of our reach waiting to write to that channel. Let's say this goroutine sleeps for 1h, when it wakes up, it will try to write to a channel that was closed in the cleanup process. This is another example of why the writters of the channels should be in charge of the cleanup process. Here we have a writer who's completely free and unaware of when the channel was closed.
Please
If I missed any issues from that code (related to concurrency), please list them. It doesn't have to be an objective issue, if you'd have designed the code in a different way for any reason, I'm also interested in learning about it.
Biggest lesson from this code
For me the biggest lesson I take from reviewing this code is that the cleanup of channels and the writing to them has to be synchronized. They have to be in the same for{} or at least communicate somehow (maybe via other channels or primitives) to avoid writing to a closed channel.
It is the main method, so there is no need to cleanup. When main returns, the program exits. If this wasn't the main, then you would be correct.
There is no best practice that fits all use cases. The code you show here is a very common pattern. The function creates a goroutine, and returns a channel so that others can communicate with that goroutine. There is no rule that governs how channels must be created. There is no way to terminate that goroutine though. One use case this pattern fits well is reading a large resultset from a
database. The channel allows streaming data as it is read from the
database. In that case usually there are other means of terminating the
goroutine though, like passing a context.
Again, there are no hard rules on how channels should be created/closed. A channel can be left open, and it will be garbage collected when it is no longer used. If the use case demands so, the channel can be left open indefinitely, and the scenario you worry about will never happen.
As you are asking about if this code was part of a library, yes it would be poor practice to spawn goroutines with no cleanup inside a library function. If those goroutines carry out documented behaviour of the library, it's problematic that the caller doesn't know when that behaviour is going to happen. If you have any behaviour that is typically "fire and forget", it should be the caller who chooses when to forget about it. For example:
func doAfter5Minutes(f func()) {
go func() {
time.Sleep(5 * time.Minute)
f()
log.Println("done!")
}()
}
Makes sense, right? When you call the function, it does something 5 minutes later. The problem is that it's easy to misuse this function like this:
// do the important task every 5 minutes
for {
doAfter5Minutes(importantTaskFunction)
}
At first glance, this might seem fine. We're doing the important task every 5 minutes, right? In reality, we're spawning many goroutines very quickly, probably consuming all available memory before they start dropping off.
We could implement some kind of callback or channel to signal when the task is done, but really, the function should be simplified like so:
func doAfter5Minutes(f func()) {
time.Sleep(5 * time.Minute)
f()
log.Println("done!")
}
Now the caller has the choice of how to use it:
// call synchronously
doAfter5Minutes(importantTaskFunction)
// fire and forget
go doAfter5Minutes(importantTaskFunction)
This function arguably should also be changed. As you say, the writer should effectively own the channel, as they should be the one closing it. The fact that this channel-reading function insists on creating the channel it reads from actually coerces itself into this poor "fire and forget" pattern mentioned above. Notice how the function needs to read from the channel, but it also needs to return the channel before reading. It therefore had to put the reading behaviour in a new, un-managed goroutine to allow itself to return the channel right away.
func StateMonitor(updates chan State, updateInterval time.Duration) {
urlStatus := make(map[string]string)
ticker := time.NewTicker(updateInterval)
defer ticker.Stop() // not stopping the ticker is also a resource leak
for {
select {
case <-ticker.C:
logState(urlStatus)
case s := <-updates:
urlStatus[s.url] = s.status
}
}
}
Notice that the function is now simpler, more flexible and synchronous. The only thing that the previous version really accomplishes, is that it (mostly) guarantees that each instance of StateMonitor will have a channel all to itself, and you won't have a situation where multiple monitors are competing for reads on the same channel. While this may help you avoid a certain class of bugs, it also makes the function a lot less flexible and more likely to have resource leaks.
I'm not sure I really understand this example, but the golden rule for channel closing is that the writer should always be responsible for closing the channel. Keep this rule in mind, and notice a few points about this code:
The Sleep method writes to r
The Sleep method is executed concurrently, with no method of tracking how many instances are running, what state they are in, etc.
Based on these points alone, we can say that there probably isn't anywhere in the program where it would be safe to close r, because there's seemingly no way of knowing if it will be used again.
ORIGINAL 09/11/2019
conn := createConnection() // or a file handle
go getData(conn)
Is it possible the thread for getData, is in different thread of conn handle. Therefore, it can result an connection error.
---- UPDATED 11/11/2019 09am ----
Senario 1
func createConnection() handler {
... create a socket connection (tcp://.....) or file open handler
return conn
}
func sendData(conn handler, data string) {
conn.send(data)
}
conn := createConnection() // or a file handle
go sendData(conn, "test data")
Senario 2
func createConnection() handler {
... create a socket connection (tcp://.....) or open file handler
return conn
}
func sendData(ch chan handler, data string) {
conn := <- ch
conn.send(data)
}
ch := make(chan conn, 10)
ch <- createConnection() // or a file handle
go sendData(ch, "test data")
Story behind:
I was working on a task to proxy data to a socket server. My solution towards the challenge was using the idea of [Senario 2].
Few of my colleagues are C programmer, work with system level programming. They pointed out that golang channel better only contains data - put file handler in channel can cause unknow problem, such as: the thread for channel get is in different thread of channel put, therefore, the file handler can also missing.
To my understanding, golang should solve the problem by itself already. I, then, asked the question above.
By looking into some of the source code of socket related projects, I think [Senario 1] is fine. However, [Senario 2] is still a question to me.
Again, my question is not [can I pass a file handle to a function], everyone knows "It is a yes". The question is in golang CSP, use go and chan together, with file handler pass through, can it be a problem? Or, more intersetingly: use pointer in golang channel put and channel get can be a problem or not; it is a big "no no" in C by books. If it is fine in golang, how does golang achive it?
---- UPDATED 11/11/2019 10am ----
The question only apply to golang. Such problem does not happen to node.js, since it is single threaded language. The question focuses on threades and file handler. By the fact, I have limited knowledge around the problem, I apologise to ask bad question or provide miss leading infomation.
---- UPDATED 11/11/2019 10:40am ----
I re-confirmed with my colleague, the concern is "everytime code ask for a file handler, system return a number. Howerver, the number is only unique in one process, which means the same file handler number, in different process, may point to different resource. I am not sure goroutine take care it or not."
There is nothing wrong with passing a connection handle to a separate goroutine as long as you are careful about the following:
Do not close the handle while the goroutine is working, or write the goroutine to deal with it.
If you are using the handle from multiple goroutines, make sure the connection you're dealing with is thread-safe, or put a lock around it.
Be clear and explicit about who's going to close it. The goroutine may close it when it is done, or another goroutine closes it when all work using the handle is done.
I've wrapped a queue to implement the Writer and Reader interfaces (for pushing and popping, respectively).
I need to continuously listen to the queue, and handle every message that comes through. This is simple when the queue is represented as a channel, but more difficult otherwise:
loop:
for {
var data []byte
select {
case <-done:
break loop
case _, err := queue.Read(data):
fmt.Println(string(data))
}
}
What's the proper way to do this? Read here is blocking - it waits until the queue has a message.
Is there a better, more idiomatic way to achieve this?
It’s harder to take a synchronous API (like queue.Read as you described above) and make it asynchronous than it is to do the opposite.
The idea would be to create a new goroutine (using, for example go func() {...}) and have that goroutine execute the read and write the output to a channel.
Then the first goroutine would block on that channel and the one it’s already blocking on.
This has the potentially to leave orphaned resources for a little while if the read takes to long but if you have a synchronous API, it’s the best you can do.
I currently have a MQTT code that can subscribe to a topic, print out the messages received, then publish further instructions to a new topic. The subscribing/printing is completed in one Goroutine, and the publishing is done in another Goroutine. Here is my code:
var wg, pg sync.WaitGroup
// All messages are handled here - printing published messages and publishing new messages
var f MQTT.MessageHandler = func(client MQTT.Client, msg MQTT.Message) {
wg.Add(1)
pg.Add(1)
go func() {
defer wg.Done()
fmt.Printf("%s\n", msg.Payload())
//fmt.Println(os.Getpid())
}()
go func(){
defer pg.Done()
message := ""
//Changing configurations
if strings.Contains(string(msg.Payload()), "arduinoLED") == true {
message = fmt.Sprintf("change configuration")
}
if strings.Contains(string(msg.Payload()), "NAME CHANGED") == true{
message = fmt.Sprintf("change back")
}
// Publish further instructions to "sensor/instruction"
token := client.Publish("sensor/instruction", 0, false, message)
//fmt.Println(os.Getpid())
token.Wait()
}()
}
func main() {
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt, syscall.SIGTERM)
opts := MQTT.NewClientOptions().AddBroker("tcp://test.mosquitto.org:1883")
opts.SetDefaultPublishHandler(f)
// Topic to subscribe to for sensor data
topic := "sensor/data"
opts.OnConnect = func(c MQTT.Client) {
if token := c.Subscribe(topic, 0, f); token.Wait() && token.Error() != nil {
panic(token.Error())
}
}
// Creating new client
client := MQTT.NewClient(opts)
if token := client.Connect(); token.Wait() && token.Error() != nil {
panic(token.Error())
} else {
fmt.Printf("Connected to server\n")
}
wg.Wait()
pg.Wait()
<-c
}
The commented out os.Getpid() line is to check which process I am running that Goroutine on. Right now they both display the same number (which means both are running on the same process?).
My question is: How can I run the two Goroutines on separate processes? Is there a way?
Edit: If this cannot be done, I want to write this code using channels. Here is the code I have for that:
var f MQTT.MessageHandler = func(client MQTT.Client, msg MQTT.Message) {
sensorData := make(chan []byte)
wg.Add(1)
pg.Add(1)
go func() {
defer wg.Done()
//fmt.Printf("%s\n", msg.Payload())
sensorData <- string(msg.Payload())
fmt.Println(<-sensorData) //currently not printing anything
}()
go func(){
defer pg.Done()
message := ""
//Changing configurations
if strings.Contains(<-sensorData, "arduinoLED") == true{
message = fmt.Sprintf("change configuration")
}
if strings.Contains(<-sensorData, "NAME CHANGED") == true{
message = fmt.Sprintf("change back")
}
// Publish further instructions to "sensor/instruction"
token := client.Publish("sensor/instruction", 0, false, message)
token.Wait()
}()
}
However, I am not able to print out any data using channels. What am I doing wrong?
You might be coming from Python, right? ;-)
It has the module named
multiprocessing
in its stdlib, and this might well explain why you have used
this name in the title of your question and why you apparently
are having trouble interpreting what #JimB meant by saying
If you need a separate process, you need to exec it yourself
"Multiprocessing" in Python
The thing is, Python's multiprocessing is a quite high-level
thing which hides under its hood a whole lot of stuff.
When you spawn a multiprocessing.Process and make it run
a function, what really happens is this:
The Python interpreter creates another operating system's
process (using
fork(2) on Unix-like systems
or CreateProcess on Windows) and arranges
for it to execute a Python interpter, too.
The crucial point is that you will now have two processes
running two Python interpters.
It is arranged for that Python interpterer in the
child process to have a way to communicate with the Python
interpreter in the parent process.
This "communication link" necessarily involves some form
of IPC #JimB referred to.
There is simply no other way to communicate data and actions
between separate processes exactly because a commodity
contemporary OS provides strict process separation.
When you exchange Python objects between the processes, the two communicating Python
interpreters serialize and deserialize them behind your back
before sending them over their IPC link and after receiving
them from there, correspondingly.
This is implemented using the pickle module.
Back to Go
Go does not have any direct solution which would closely
match Python's multiprocessing, and I really doubt it could
have been sensibly implemented.
The chief reason for this mostly stems from the fact Go
is quite more lower level than Python, and hence it does not
have the Python's luxury of making sheer assumptions about
the types of values it manages, and it also strives to have
as few hidden costs in its constructs as possible.
Go also strives to steer clear of "framework-style" approaches
to solve problems, and use "library-style" solutions when
possible. (A good rundown of the "framework vs library"
is given, for instance, here.)
Go has everything in its standard library to implement
something akin to Python's multiprocessing but there is no
ready-made frakework-y solution for this.
So what you could do for this is to roll along these lines:
Use os/exec to run another copy of your own process.
Make sure the spawned process "knows" it's started
in the special "slave" mode—to act accordingly.
Use any form of IPC to communicate with the new process.
Exchanging data via the standard I/O streams
of the child process is supposedly
the simplest way to roll (except when you need to exchange
opened files but this is a harder topic, so let's not digress).
Use any suitable package in the encoding/ hierarchy — such as binary, gob, xml — to serialize
and deserialize data when exchanging.
The "go-to" solution is supposedly encoding/gob
but encoding/json will also do just fine.
Invent and implement a simple protocol to tell the
child process what to do, and with which data,
and how to communicate the results back to master.
Does it really worth the trouble?
I would say that no, it doesn't—for a number of reasons:
Go has nothing like the dreaded GIL,
so there's no need to sidestep it to achieve real parallelism
when it is naturally possible.
Memory safety is all in your hands, and achieving it is
not really that hard when you dutifully obey the principle
that what is sent over a channel is now owned by
the receiver. In other words, sending values over a channel
is also the transfer of ownership of those values.
The Go toolchain has integrated race detector, so you
may run your test suite with the -race flag and create evaluation
builds of your program using go build -race for the same
purpose: when a program instrumented in such a way runs,
the race detector crashes it as soon as it detects any
unsynchronized read/write memory access.
The printout resulting from that crash includes
explanatory messages on what, and where went wrong,
with stack traces.
IPC is slow, so the gains may well be offset by the losses.
All-in-all, I see no real reason to separate processes unless
you're writing something like an e-mail processing server
where this concept comes naturally.
Channel is used for communicating between goroutines, you shouldn't use it in same goroutine like this code:
sensorData <- string(msg.Payload())
fmt.Println(<-sensorData) //currently not printing anything
If you like to test printing by channel, you can use buffered channel in same goroutine to avoid blocking, like this:
sensorData := make(chan []byte, 1)
Cheers
I'm learning Go and so far very impressed with it. I've read all the online docs at golang.org and am halfway through Chrisnall's "The Go Programming Language Phrasebook". I get the concept of channels and think that they will be extremely useful. However, I must have missed something important along the way, as I can't see the point to one-way channels.
If I'm interpreting them correctly, a read-only channel can only be received on and a write-only channel can only be transmitted on, so why have a channel that you can send to and never receive on? Can they be cast from one "direction" to the other? If so, again, what's the point if there's no actual constraint? Are they nothing more than a hint to client code of the channel's purpose?
A channel can be made read-only to whoever receives it, while the sender still has a two-way channel to which they can write. For example:
func F() <-chan int {
// Create a regular, two-way channel.
c := make(chan int)
go func() {
defer close(c)
// Do stuff
c <- 123
}()
// Returning it, implicitly converts it to read-only,
// as per the function return type.
return c
}
Whoever calls F(), receives a channel from which they can only read.
This is mostly useful to avoid potential misuse of a channel at compile time.
Because read/write-only channels are distinct types, the compiler can use
its existing type-checking mechanisms to ensure the caller does not try to write
stuff into a channel it has no business writing to.
I think the main motivation for read-only channels is to prevent corruption and panics of the channel. Imagine if you could write to the channel returned by time.After. This could mess up a lot of code.
Also, panics can occur if you:
close a channel more than once
write to a closed channel
These operations are compile-time errors for read-only channels, but they can cause nasty race conditions when multiple go-routines can write/close a channel.
One way of getting around this is to never close channels and let them be garbage collected. However, close is not just for cleanup, but it actually has use when the channel is ranged over:
func consumeAll(c <-chan bool) {
for b := range c {
...
}
}
If the channel is never closed, this loop will never end. If multiple go-routines are writing to a channel, then there's a lot of book-keeping that has to go on with deciding which one will close the channel.
Since you cannot close a read-only channel, this makes it easier to write correct code. As #jimt pointed out in his comment, you cannot convert a read-only channel to a writeable channel, so you're guaranteed that only parts of the code with access to the writable version of a channel can close/write to it.
Edit:
As for having multiple readers, this is completely fine, as long as you account for it. This is especially useful when used in a producer/consumer model. For example, say you have a TCP server that just accepts connections and writes them to a queue for worker threads:
func produce(l *net.TCPListener, c chan<- net.Conn) {
for {
conn, _ := l.Accept()
c<-conn
}
}
func consume(c <-chan net.Conn) {
for conn := range c {
// do something with conn
}
}
func main() {
c := make(chan net.Conn, 10)
for i := 0; i < 10; i++ {
go consume(c)
}
addr := net.TCPAddr{net.ParseIP("127.0.0.1"), 3000}
l, _ := net.ListenTCP("tcp", &addr)
produce(l, c)
}
Likely your connection handling will take longer than accepting a new connection, so you want to have lots of consumers with a single producer. Multiple producers is more difficult (because you need to coordinate who closes the channel) but you can add some kind of a semaphore-style channel to the channel send.
Go channels are modelled on Hoare's Communicating Sequential Processes, a process algebra for concurrency that is oriented around event flows between communicating actors (small 'a'). As such, channels have a direction because they have a send end and a receive end, i.e. a producer of events and a consumer of events. A similar model is used in Occam and Limbo also.
This is important - it would be hard to reason about deadlock issues if a channel-end could arbitrarily be re-used as both sender and receiver at different times.