I have some code that is a job dispatcher and is collating a large amount of data from lots of TCP sockets. This code is a result of an approach to Large number of transient objects - avoiding contention and it largely works with CPU usage down a huge amount and locking not an issue now either.
From time to time my application locks up and the "Channel length" log is the only thing that keeps repeating as data is still coming in from my sockets. However the count remains at 5000 and no downstream processing is taking place.
I think the issue might be a race condition and the line it is possibly getting hung up on is channel <- msg within the select of the jobDispatcher. Trouble is I can't work out how to verify this.
I suspect that as select can take items at random the goroutine is returning and the shutdownChan doesn't have a chance to process. Then data hits inboundFromTCP and it blocks!
Someone might spot something really obviously wrong here. And offer a solution hopefully!?
var MessageQueue = make(chan *trackingPacket_v1, 5000)
func init() {
go jobDispatcher(MessageQueue)
}
func addMessage(trackingPacket *trackingPacket_v1) {
// Send the packet to the buffered queue!
log.Println("Channel length:", len(MessageQueue))
MessageQueue <- trackingPacket
}
func jobDispatcher(inboundFromTCP chan *trackingPacket_v1) {
var channelMap = make(map[string]chan *trackingPacket_v1)
// Channel that listens for the strings that want to exit
shutdownChan := make(chan string)
for {
select {
case msg := <-inboundFromTCP:
log.Println("Got packet", msg.Avr)
channel, ok := channelMap[msg.Avr]
if !ok {
packetChan := make(chan *trackingPacket_v1)
channelMap[msg.Avr] = packetChan
go processPackets(packetChan, shutdownChan, msg.Avr)
packetChan <- msg
continue
}
channel <- msg
case shutdownString := <-shutdownChan:
log.Println("Shutting down:", shutdownString)
channel, ok := channelMap[shutdownString]
if ok {
delete(channelMap, shutdownString)
close(channel)
}
}
}
}
func processPackets(ch chan *trackingPacket_v1, shutdown chan string, id string) {
var messages = []*trackingPacket_v1{}
tickChan := time.NewTicker(time.Second * 1)
defer tickChan.Stop()
hasCheckedData := false
for {
select {
case msg := <-ch:
log.Println("Got a messages for", id)
messages = append(messages, msg)
hasCheckedData = false
case <-tickChan.C:
messages = cullChanMessages(messages)
if len(messages) == 0 {
messages = nil
shutdown <- id
return
}
// No point running checking when packets have not changed!!
if hasCheckedData == false {
processMLATCandidatesFromChan(messages)
hasCheckedData = true
}
case <-time.After(time.Duration(time.Second * 60)):
log.Println("This channel has been around for 60 seconds which is too much, kill it")
messages = nil
shutdown <- id
return
}
}
}
Update 01/20/16
I tried to rework with the channelMap as a global with some mutex locking but it ended up deadlocking still.
Slightly tweaked the code, still locks but I don't see how this one does!!
https://play.golang.org/p/PGpISU4XBJ
Update 01/21/17
After some recommendations I put this into a standalone working example so people can see. https://play.golang.org/p/88zT7hBLeD
It is a long running process so will need running locally on a machine as the playground kills it. Hopefully this will help get to the bottom of it!
I'm guessing that your problem is getting stuck doing this channel <- msg at the same time as the other goroutine is doing shutdown <- id.
Since neither the channel nor the shutdown channels are buffered, they block waiting for a receiver. And they can deadlock waiting for the other side to become available.
There are a couple of ways to fix it. You could declare both of those channels with a buffer of 1.
Or instead of signalling by sending a shutdown message, you could do what Google's context package does and send a shutdown signal by closing the shutdown channel. Look at https://golang.org/pkg/context/ especially WithCancel, WithDeadline and the Done functions.
You might be able to use context to remove your own shutdown channel and timeout code.
And JimB has a point about shutting down the goroutine while it might still be receiving on the channel. What you should do is send the shutdown message (or close, or cancel the context) and continue to process messages until your ch channel is closed (detect that with case msg, ok := <-ch:), which would happen after the shutdown is received by the sender.
That way you get all of the messages that were incoming until the shutdown actually happened, and should avoid a second deadlock.
I'm new to Go but in this code here
case msg := <-inboundFromTCP:
log.Println("Got packet", msg.Avr)
channel, ok := channelMap[msg.Avr]
if !ok {
packetChan := make(chan *trackingPacket_v1)
channelMap[msg.Avr] = packetChan
go processPackets(packetChan, shutdownChan, msg.Avr)
packetChan <- msg
continue
}
channel <- msg
Aren't you putting something in channel (unbuffered?) here
channel, ok := channelMap[msg.Avr]
So wouldn't you need to empty out that channel before you can add the msg here?
channel <- msg
Like I said, I'm new to Go so I hope I'm not being goofy. :)
Related
In a project the program receives data via websocket. This data needs to be processed by n algorithms. The amount of algorithms can change dynamically.
My attempt is to create some pub/sub pattern where subscriptions can be started and canceled on the fly. Turns out that this is a bit more challenging than expected.
Here's what I came up with (which is based on https://eli.thegreenplace.net/2020/pubsub-using-channels-in-go/):
package pubsub
import (
"context"
"sync"
"time"
)
type Pubsub struct {
sync.RWMutex
subs []*Subsciption
closed bool
}
func New() *Pubsub {
ps := &Pubsub{}
ps.subs = []*Subsciption{}
return ps
}
func (ps *Pubsub) Publish(msg interface{}) {
ps.RLock()
defer ps.RUnlock()
if ps.closed {
return
}
for _, sub := range ps.subs {
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
}
}
func (ps *Pubsub) Subscribe() (context.Context, *Subsciption, error) {
ps.Lock()
defer ps.Unlock()
// prep channel
ctx, cancel := context.WithCancel(context.Background())
sub := &Subsciption{
Data: make(chan interface{}, 1),
cancel: cancel,
ps: ps,
}
// prep subsciption
ps.subs = append(ps.subs, sub)
return ctx, sub, nil
}
func (ps *Pubsub) unsubscribe(s *Subsciption) bool {
ps.Lock()
defer ps.Unlock()
found := false
index := 0
for i, sub := range ps.subs {
if sub == s {
index = i
found = true
}
}
if found {
s.cancel()
ps.subs[index] = ps.subs[len(ps.subs)-1]
ps.subs = ps.subs[:len(ps.subs)-1]
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
}
return found
}
func (ps *Pubsub) Close() {
ps.Lock()
defer ps.Unlock()
if !ps.closed {
ps.closed = true
for _, sub := range ps.subs {
sub.cancel()
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(sub.Data)
}
}
}
type Subsciption struct {
Data chan interface{}
cancel func()
ps *Pubsub
}
func (s *Subsciption) Unsubscribe() {
s.ps.unsubscribe(s)
}
As mentioned in the comments there are (at least) two issues with this:
ISSUE1:
After a while of running in implementation of this I get a few errors of this kind:
goroutine 120624 [runnable]:
bm/internal/pubsub.(*Pubsub).Publish.func1(0x8586c0, 0xc00b44e880, 0xc008617740)
/home/X/Projects/bm/internal/pubsub/pubsub.go:30
created by bookmaker/internal/pubsub.(*Pubsub).Publish
/home/X/Projects/bm/internal/pubsub/pubsub.go:30 +0xbb
Without really understanding this it appears to me that the goroutines created in Publish() do accumulate/leak. Is this correct and what am I doing wrong here?
ISSUE2:
When I end a subscription via Unsubscribe() it occurs that Publish() tried to write to a closed channel and panics. To mitigate this I have created a goroutine to close the channel with a delay. This feel really off-best-practice but I was not able to find a proper solution to this. What would be a deterministic way to do this?
Heres a little playground for you to test with: https://play.golang.org/p/K-L8vLjt7_9
Before diving into your solution and its issues, let me recommend again another Broker approach presented in this answer: How to broadcast message using channel
Now on to your solution.
Whenever you launch a goroutine, always think of how it will end and make sure it does if the goroutine is not ought to run for the lifetime of your app.
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
This goroutine tries to send a value on ch. This may be a blocking operation: it will block if ch's buffer is full and there is no ready receiver on ch. This is out of the control of the launched goroutine, and also out of the control of the pubsub package. This may be fine in some cases, but this already places a burden on the users of the package. Try to avoid these. Try to create APIs that are easy to use and hard to misuse.
Also, launching a goroutine just to send a value on a channel is a waste of resources (goroutines are cheap and light, but you shouldn't spam them whenever you can).
You do it because you don't want to get blocked. To avoid blocking, you may use a buffered channel with a "reasonable" high buffer. Yes, this doesn't solve the blocking issue, in only helps with "slow" clients receiving from the channel.
To "truly" avoid blocking without launching a goroutine, you may use non-blocking send:
select {
case ch <- msg:
default:
// ch's buffer is full, we cannot deliver now
}
If send on ch can proceed, it will happen. If not, the default branch is chosen immediately. You have to decide what to do then. Is it acceptable to "lose" a message? Is it acceptable to wait for some time until "giving up"? Or is it acceptable to launch a goroutine to do this (but then you'll be back at what we're trying to fix here)? Or is it acceptable to get blocked until the client can receive from the channel...
Choosing a reasonable high buffer, if you encounter a situation when it still gets full, it may be acceptable to block until the client can advance and receive from the message. If it can't, then your whole app might be in an unacceptable state, and it might be acceptable to "hang" or "crash".
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
Closing a channel is a signal to the receiver(s) that no more values will be sent on the channel. So always it should be the sender's job (and responsibility) to close the channel. Launching a goroutine to close the channel, you "hand" that job and responsibility to another "entity" (a goroutine) that will not be synchronized to the sender. This may easily end up in a panic (sending on a closed channel is a runtime panic, for other axioms see How does a non initialized channel behave?). Don't do that.
Yes, this was necessary because you launched goroutines to send. If you don't do that, then you may close "in-place", without launching a goroutine, because then the sender and closer will be the same entity: the Pubsub itself, whose sending and closing operations are protected by a mutex. So solving the first issue solves the second naturally.
In general if there are multiple senders for a channel, then closing the channel must be coordinated. There must be a single entity (often not any of the senders) that waits for all senders to finish, practically using a sync.WaitGroup, and then that single entity can close the channel, safely. See Closing channel of unknown length.
I was following the chat client/server example for the gorilla websocket library.
https://github.com/gorilla/websocket/blob/master/examples/chat/hub.go#L36
I tried modifying the code to notify other clients when a new client connects, like so:
for {
select {
case client := <-h.register:
h.clients[client] = true
// My addition. Hangs after this (no further register/unregister events are processed):
h.broadcast <- []byte("Another client connected!")
case client := <-h.unregister:
if _, ok := h.clients[client]; ok {
delete(h.clients, client)
close(client.send)
}
case message := <-h.broadcast:
for client := range h.clients {
select {
case client.send <- message:
default:
close(client.send)
delete(h.clients, client)
}
}
}
}
}
My understanding is on the next iteration of the outer for loop, the broadcast channel should receive that data and follow the logic in the message case, but it just hangs.
Why? I can't spot any reason. No further channel events are processed (nothing on register/unregister or broadcast), which makes me think it's some kind of unbuffered channel mechanism it's stuck on, but I don't see how?
The hub's broadcast channel is unbuffered. Communication on an unbuffered channel waits for a ready sender and a ready receiver. The hub goroutine blocks because the goroutine cannot be ready to send and receive at the same time.
Changing the channel from an unbuffered channel to a buffered channel does not fix the problem. Consider the case where the buffer capacity is one:
return &Hub{
broadcast: make(chan []byte, 1),
...
}
with this timeline:
1 clientA: client.hub.register <- client
2 clientB: c.hub.broadcast <- message
3 hub: case client := <-h.register:
4 hub: h.broadcast <- []byte("Another client connected!")
The hub blocks at #4 because the channel was filled to capacity at #2. Increasing the channel capacity to two or more does not fix the problem because any number of clients can broadcast a message while another client is registering.
To fix the problem, move the broadcast code to a function and call that function from both cases in the select:
// sendAll sends message to all registered clients.
// This method must only be called by Hub.run.
func (h *Hub) sendAll(message []byte) {
for client := range h.clients {
select {
case client.send <- message:
default:
close(client.send)
delete(h.clients, client)
}
}
}
func (h *Hub) run() {
for {
select {
case client := <-h.register:
h.clients[client] = true
h.sendAll([]byte("Another client connected!"))
case client := <-h.unregister:
if _, ok := h.clients[client]; ok {
delete(h.clients, client)
close(client.send)
}
case message := <-h.broadcast:
h.sendAll(message)
}
}
}
Your channels are unbuffered, this means that each read/write blocks until an other goroutine performs the opposite operation on the same channel.
When you try to write to h.broadcast the goroutine stops, waiting for a reader. But the same goroutine is supposed to act as a reader of this channel, which never happens because the goroutine is blocked by the write. Thus the program deadlock.
Yea, this won't work. You cannot send/receive on the same unbuffered channel in the same go routine.
The line h.broadcast <- []byte("Another client connected!") blocks until another go routine pops off that queue. An easy solution is to make the broadcast channel have a length 1 buffer. broadcast := make(chan []byte, 1)
You can see it in this playground example
// c := make(chan int) <- This will hang
c := make(chan int, 1)
c <- 1
fmt.Println(<-c)
Remove the length 1 buffer and the whole system deadlocks. One issue you can run into is if 2 clients register at the same time, then you can have the case where 2 items are trying to be stuffed into the broadcast channel, and we're back to the same problem with the unbuffered channel. You can avoid this and keep 1 go routine like so:
for {
select {
case message := <-h.broadcast:
// ...
default:
}
select { // This select statement can only add 1 item to broadcast at most
case client := <-h.register:
// ...
h.broadcast <- []byte("Another client connected!")
}
}
}
However, this will still break if another go routine is also adding to the broadcast channel. So I'd go with Cerise Limon's solution, or buffer the channel enough that other go routines won't ever fill the buffer.
I have code that looks like this, where I'm listening to a channel up until a timeout interval. Let's say this goroutine 1
select {
case <-time.After(TimeoutInterval):
mu.Lock()
defer mu.Unlock()
delete(msgChMap, index)
return ""
case msg := <-msgCh:
return msg
}
Elsewhere, I have a goroutine 2 that runs something like this where it grabs the appropriate msgCh from a Map, deletes the entry in the map and then sends a message through the channel.
mu.Lock()
msgCh, ok := msgChMap[index]
delete(msgChMap, index)
mu.Unlock()
if ok {
msgCh <- "yay"
}
It seems like it is possible for me to grab the message channel msgCh from the Map, try to send a message but because TimeoutInterval has already passed, there will be nothing listening to the channel, and my code will get stuck waiting for a listener. If I put the lock after sending yay to the msgCh, it seems possible that I could deadlock as 2 will be waiting for a listener to the channel and is not releasing the lock, but 1 is no longer listening but requires the lock.
What is a general pattern to avoid getting stuck waiting for a listener? Perhaps go is smart enough to not get stuck here.
You can prevent getting stuck when waiting for a listener by using select for the sender.
By using select you can use more case for sender in this situation
mu.Lock()
msgCh, ok := msgChMap[index]
delete(msgChMap, index)
mu.Unlock()
if ok {
select {
// listener is available
case msgCh <- "yay":
fmt.Println("sent")
// if not avalable (execute immediately)
default:
fmt.Println("no available listener")
// ...just ignore or do something else
}
}
Or waiting for a short time
mu.Lock()
msgCh, ok := msgChMap[index]
delete(msgChMap, index)
mu.Unlock()
if ok {
select {
// listener is available
case msgCh <- "yay":
fmt.Println("sent")
// if not available, waiting for listener
case <-time.After(30 * time.Second):
fmt.Println("after 30 seconds, still no available listener")
// ...just ignore or do something else
}
}
The problem here is that the channel reader my stop without the writer knowing it. It should be possible to structure this solution so that this situation never happens, but ignoring that for now, for this specific problem what you need is atomic access to the channel itself, along with a flag for channel status:
type channel struct {
sync.Mutex
msgCh chan Msg
active bool
}
Writing to the channel is now done by locking it:
ch.Lock()
if ch.active {
ch.msgCh<-data
}
ch.Unlock()
And when you "inactivate" the channel, reset the flag:
case <-time.After(TimeoutInterval):
mu.Lock()
defer mu.Unlock()
ch.Lock()
defer ch.Unlock()
delete(msgChMap, index)
ch.active=false
return ""
And of course, with this now you have to keep a *channel in your map.
I'm running single goroutine to handle messages channel related to some user. After processing messages the user state is updated and stored in database by the goroutine. While request to database is in progress a number of messages can be sent to the channel. I would like to process them all before sending another request to the database.
Currently I'm using len(ch) to check number of messages in the channel and reading them in a for loop.
func (c *consumer) handleUser(userID string, ch chan Message) {
user := c.db.LoadUser(userID)
for {
var msgs []Message
for n := len(ch); n > 0; n-- {
msgs = append(msgs, <-ch)
}
apply.Messages(user, msgs)
c.db.SaveUser(user)
}
ch := make(chan Message, 100)
go c.handleUser("user-1", ch)
I was searching in the internet if this is some common pattern but I couldn't find similar solutions and I'm wondering if my approach is valid/idiomatic for go programs.
Your solution would cause the spawned goroutine to spin over the channel until at least one message is sent. In other words, the goroutine never blocks at all.
Here you are trying to process multiple messages in one batch. There are different ways to implement that. But the main question to answer is: how do you know that the batch of messages is complete? The sender goroutine might have this knowledge and it can pack all the messages in one slice. On the other hand, you might not know when the batch is ready. In those cases, you need to use a timeout, like the following example.
func (c *consumer) handleUser(userID string, ch chan Message) {
user := c.db.LoadUser(userID)
for {
var msgs []Message
select {
case msg := <-ch:
//Append the message in the current batch slice
msgs = append(msgs, msg)
//Wait up to 5 seconds and then process the batch
case <-time.After(time.Second * 5):
//Timeout: process the batch of messages
if len(msgs) > 0 {
apply.Messages(user, msgs)
c.db.SaveUser(user)
}
}
}
}
Note that a possible goroutine executing this function runs only when there is actually something to do.
I'm new to goroutines and trying to work out the idiomatic way to organise this code. My program will generate async status events that I want to transmit to a server over a websocket. Right now I have a global channel messagesToServer to receive the status messages. The idea is it that will send the data if we currently have a websocket open, or quietly drop it if the connection to the server is currently closed or unavailable.
Relevant snippets are below. I don't really like the non-blocking send - if for some reason my writer goroutine took a while to process a message I think it could end up dropping a quick second message for no reason?
But if I use a blocking send, sendStatusToServer could block something that shouldn't be blocked if the connection is offline. I could try to track connected/disconnected state but if a message was sent at the same time as the disconnection occurred I think there would be a race condition.
Is there a tidy way I can write this?
var (
messagesToServer chan common.StationStatus
)
// ...
func sendStatusToServer(msg common.StationStatus) {
// Must be non-blocking in case we're not connected
select {
case messagesToServer <- msg:
break
default:
break
}
}
// ...
// after making websocket connection
log.Println("Connected to central server");
finished := make(chan struct{})
// Writer
go func() {
for {
select {
case msg := <-messagesToServer:
var buff bytes.Buffer
enc := gob.NewEncoder(&buff)
err = enc.Encode(msg)
conn.WriteMessage(websocket.BinaryMessage, buff.Bytes()); // ignore errors by design
case <-finished:
return;
}
}
}()
// Reader as busy loop on this goroutine
for {
messageType, p, err := conn.ReadMessage()