Concurrency Problems using unbuffered channel - go

I created a tool in Go that could bruteforce subdomains using Go concurrency. The problem is that it just shows first few results. I mean if the threads i specify are 10, it shows 10, if 100 then it shows 100. Any solutions for this. I am following this example.
func CheckWildcardSubdomain(state *State, domain string, words <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for {
preparedSubdomain := <-words + "." + domain
ipAddress, err := net.LookupHost(preparedSubdomain)
if err == nil {
if !state.WildcardIPs.ContainsAny(ipAddress) {
if state.Verbose == true {
fmt.Printf("\n%s", preparedSubdomain)
}
state.FinalResults = append(state.FinalResults, preparedSubdomain)
}
}
}
}
func RemoveWildcardSubdomains(state *State, subdomains []string) []string {
var wg sync.WaitGroup
var channel = make(chan string)
wg.Add(state.Threads)
for i := 0; i < state.Threads; i++ {
go CheckWildcardSubdomain(state, state.Domain, channel, &wg)
}
for _, entry := range subdomains {
sub := strings.Join(strings.Split(entry, ".")[:2][:], ".")
channel <- sub
}
close(channel)
wg.Wait()
return state.FinalResults
}
Thanks in advance for your help.

2 mistakes that immediately stand out.
First, in CheckWildcardSubdomain() you should range over the words channel like this:
for word := range words {
preparedSubdomain := word + "." + domain
// ...
}
The for ... range over a channel will terminate once all values sent on the channel (sent before the channel was closed) are received. Note that the simple receive operator will not terminate nor panic if the channel is closed, instead it will yield the zero value of the channel's element type. So your original loop would never terminate. Spec: Receive operator:
A receive operation on a closed channel can always proceed immediately, yielding the element type's zero value after any previously sent values have been received.
Second, inside CheckWildcardSubdomain() the state.FinalResults field is read / modified concurrently, without synchronization. This is undefined behavior.
You must synchronize access to this field, e.g. using a mutex, or you should find other ways to communicate and collect results, e.g. using a channel.
See this related question for an elegant, efficient and scalabe way to do it:
Is this an idiomatic worker thread pool in Go?

Related

how to batch dealing with files using Goroutine?

Assuming I have a bunch of files to deal with(say 1000 or more), first they should be processed by function A(), function A() will generate a file, then this file will be processed by B().
If we do it one by one, that's too slow, so I'm thinking process 5 files at a time using goroutine(we can not process too much at a time cause the CPU cannot bear).
I'm a newbie in golang, I'm not sure if my thought is correct, I think the function A() is a producer and the function B() is a consumer, function B() will deal with the file that produced by function A(), and I wrote some code below, forgive me, I really don't know how to write the code, can anyone give me a help? Thank you in advance!
package main
import "fmt"
var Box = make(chan string, 1024)
func A(file string) {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1"
Box <- fileGenByA
}
func B(file string) {
fmt.Println(file, "is processing in func B()...")
}
func main() {
// assuming that this is the file list read from a directory
fileList := []string{
"/path/to/file1",
"/path/to/file2",
"/path/to/file3",
}
// it seems I can't do this, because fileList may have 1000 or more file
for _, v := range fileList {
go A(v)
}
// can I do this?
for file := range Box {
go B(file)
}
}
Update:
sorry, maybe I haven’t made myself clear, actually the file generated by function A() is stored in the hard disk(generated by a command line tool, I just simple execute it using exec.Command()), not in a variable(the memory), so it doesn't have to be passed to function B() immediately.
I think there are 2 approach:
approach1
approach2
Actually I prefer approach2, as you can see, the first B() doesn't have to process the file1GenByA, it's the same for B() to process any file in the box, because file1GenByA may generated after file2GenByA(maybe the file is larger so it takes more time).
You could spawn 5 goroutines that read from a work channel. That way you have at all times 5 goroutines running and don't need to batch them so that you have to wait until 5 are finished to start the next 5.
func main() {
stack := []string{"a", "b", "c", "d", "e", "f", "g", "h"}
work := make(chan string)
results := make(chan string)
// create worker 5 goroutines
wg := sync.WaitGroup{}
for i := 0; i < 5; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for s := range work {
results <- B(A(s))
}
}()
}
// send the work to the workers
// this happens in a goroutine in order
// to not block the main function, once
// all 5 workers are busy
go func() {
for _, s := range stack {
// could read the file from disk
// here and pass a pointer to the file
work <- s
}
// close the work channel after
// all the work has been send
close(work)
// wait for the workers to finish
// then close the results channel
wg.Wait()
close(results)
}()
// collect the results
// the iteration stops if the results
// channel is closed and the last value
// has been received
for result := range results {
// could write the file to disk
fmt.Println(result)
}
}
https://play.golang.com/p/K-KVX4LEEoK
you're halfway there. There's a few things you need to fix:
your program deadlocks because nothing closes Box, so the main function can never get done rangeing over it.
You aren't waiting for your goroutines to finish, and there than 5 goroutines. (The solutions to these are too intertwined to describe them separately)
1. Deadlock
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
When you range over a channel, you read each value from the channel until it is both closed and empty. Since you never close the channel, the range over that channel can never complete, and the program can never finish.
This is a fairly easy problem to solve in your case: we just need to close the channel when we know there will be no more writes to the channel.
for _, v := range fileList {
go A(v)
}
close(Box)
Keep in mind that closeing a channel doesn't stop it from being read, only written. Now consumers can distinguish between an empty channel that may receive more data in the future, and an empty channel that will never receive more data.
Once you add the close(Box), the program doesn't deadlock anymore, but it still doesn't work.
2. Too Many Goroutines and not waiting for them to complete
To run a certain maximum number of concurrent executions, instead of creating a goroutine for each input, create the goroutines in a "worker pool":
Create a channel to pass the workers their work
Create a channel for the goroutines to return their results, if any
Start the number of goroutines you want
Start at least one additional goroutine to either dispatch work or collect the result, so you don't have to try doing both from the main goroutine
use a sync.WaitGroup to wait for all data to be processed
close the channels to signal to the workers and the results collector that their channels are done being filled.
Before we get into the implementation, let's talk aobut how A and B interact.
first they should be processed by function A(), function A() will generate a file, then this file will be processed by B().
A() and B() must, then, execute serially. They can still pass their data through a channel, but since their execution must be serial, it does nothing for you. Simpler is to run them sequentially in the workers. For that, we'll need to change A() to either call B, or to return the path for B and the worker can call. I choose the latter.
func A(file string) string {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1"
return fileGenByA
}
Before we write our worker function, we also must consider the result of B. Currently, B returns nothing. In the real world, unless B() cannot fail, you would at least want to either return the error, or at least panic. I'll skip over collecting results for now.
Now we can write our worker function.
func worker(wg *sync.WaitGroup, incoming <-chan string) {
defer wg.Done()
for file := range incoming {
B(A(file))
}
}
Now all we have to do is start 5 such workers, write the incoming files to the channel, close it, and wg.Wait() for the workers to complete.
incoming_work := make(chan string)
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1)
go worker(&wg, incoming_work)
}
for _, v := range fileList {
incoming_work <- v
}
close(incoming_work)
wg.Wait()
Full example at https://go.dev/play/p/A1H4ArD2LD8
Returning Results.
It's all well and good to be able to kick off goroutines and wait for them to complete. But what if you need results back from your goroutines? In all but the simplest of cases, you would at least want to know if files failed to process so you could investigate the errors.
We have only 5 workers, but we have many files, so we have many results. Each worker will have to return several results. So, another channel. It's usually worth defining a struct for your return:
type result struct {
file string
err error
}
This tells us not just whether there was an error but also clearly defines which file from which the error resulted.
How will we test an error case in our current code? In your example, B always gets the same value from A. If we add A's incoming file name to the path it passes to B, we can mock an error based on a substring. My mocked error will be that file3 fails.
func A(file string) string {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1/" + file
return fileGenByA
}
func B(file string) (r result) {
r.file = file
fmt.Println(file, "is processing in func B()...")
if strings.Contains(file, "file3") {
r.err = fmt.Errorf("Test error")
}
return
}
Our workers will be sending results, but we need to collect them somewhere. main() is busy dispatching work to the workers, blocking on its write to incoming_work when the workers are all busy. So the simplest place to collect the results is another goroutine. Our results collector goroutine has to read from a results channel, print out errors for debugging, and the return the total number of failures so our program can return a final exit status indicating overall success or failure.
failures_chan := make(chan int)
go func() {
var failures int
for result := range results {
if result.err != nil {
failures++
fmt.Printf("File %s failed: %s", result.file, result.err.Error())
}
}
failures_chan <- failures
}()
Now we have another channel to close, and it's important we close it after all workers are done. So we close(results) after we wg.Wait() for the workers.
close(incoming_work)
wg.Wait()
close(results)
if failures := <-failures_chan; failures > 0 {
os.Exit(1)
}
Putting all that together, we end up with this code:
package main
import (
"fmt"
"os"
"strings"
"sync"
)
func A(file string) string {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1/" + file
return fileGenByA
}
func B(file string) (r result) {
r.file = file
fmt.Println(file, "is processing in func B()...")
if strings.Contains(file, "file3") {
r.err = fmt.Errorf("Test error")
}
return
}
func worker(wg *sync.WaitGroup, incoming <-chan string, results chan<- result) {
defer wg.Done()
for file := range incoming {
results <- B(A(file))
}
}
type result struct {
file string
err error
}
func main() {
// assuming that this is the file list read from a directory
fileList := []string{
"/path/to/file1",
"/path/to/file2",
"/path/to/file3",
}
incoming_work := make(chan string)
results := make(chan result)
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1)
go worker(&wg, incoming_work, results)
}
failures_chan := make(chan int)
go func() {
var failures int
for result := range results {
if result.err != nil {
failures++
fmt.Printf("File %s failed: %s", result.file, result.err.Error())
}
}
failures_chan <- failures
}()
for _, v := range fileList {
incoming_work <- v
}
close(incoming_work)
wg.Wait()
close(results)
if failures := <-failures_chan; failures > 0 {
os.Exit(1)
}
}
And when we run it, we get:
/path/to/file1 is processing in func A()...
/path/to/fileGenByA1//path/to/file1 is processing in func B()...
/path/to/file2 is processing in func A()...
/path/to/fileGenByA1//path/to/file2 is processing in func B()...
/path/to/file3 is processing in func A()...
/path/to/fileGenByA1//path/to/file3 is processing in func B()...
File /path/to/fileGenByA1//path/to/file3 failed: Test error
Program exited.
A final thought: buffered channels.
There is nothing wrong with buffered channels. Especially if you know the overall size of incoming work and results, buffered channels can obviate the results collector goroutine because you can allocate a buffered channel big enough to hold all results. However, I think it's more straightforward to understand this pattern if the channels are unbuffered. The key takeaway is that you don't need to know the number of incoming or outgoing results, which could indeed be different numbers or based on something that can't be predetermined.

Can't get map of channels to work

This might be a rookies mistake. I have a slice with a string value and a map of channels. For each string in the slice, a channel is created and a map entry is created for it, with the string as key.
I watch the channels and pass a value to one of them, which is never found.
package main
import (
"fmt"
"time"
)
type TestStruct struct {
Test string
}
var channelsMap map[string](chan *TestStruct)
func main() {
stringsSlice := []string{"value1"}
channelsMap := make(map[string](chan *TestStruct))
for _, value := range stringsSlice {
channelsMap[value] = make(chan *TestStruct, 1)
go watchChannel(value)
}
<-time.After(3 * time.Second)
testStruct := new(TestStruct)
testStruct.Test = "Hello!"
channelsMap["value1"] <- testStruct
<-time.After(3 * time.Second)
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string) {
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range channelsMap[channelMapKey] {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
}
Playground link: https://play.golang.org/p/IbucTqMjdGO
Output:
Watching channel: value1
Program ended
How do I execute something when the message is fed into the channel?
There are many problems with your approach.
The first one is that you're redeclaring ("shadowing") the global
variable channelsMap in your main function.
(Had you completed at least some
most basic intro to Go, you should have had no such problem.)
This means that your watchChannel (actually, all the goroutines which execute that function) read the global channelsMap while your main function writes to its local channelsMap.
What happens next, is as follows:
The range statement
in the watchChannel has a simple
map lookup expression as its source—channelsMap[channelMapKey].
In Go, this form of map lookup
never fails, but if the map has no such key (or if the map is not initialized, that is, it's nil), the so-called
"zero value"
of the appropriate type is returned.
Since the global channelsMap is always empty, any call to watchChannel performs a map lookup which always returns
the zero value of type chan *TestStruct.
The zero value for any channel is nil.
The range statement executed over a nil channel
produces zero iterations.
In other words, the for loop in watchChannel always executes
zero times.
The more complex problem, still, is not shadowing of the global variable but rather the complete absense of synchronization between the goroutines. You're using "sleeping" as a sort of band-aid in an attempt to perform implicit synchronization between goroutines
but while this does appear to be okay judged by so-called
"common sense", it's not going to work in practice for two
reasons:
Sleeping is always a naïve approach to synchronization as it solely depens of the fact all the goroutines will run relatively freely and uncontended. This is far from being true in many (if not most) production settings and hence is always the reason for subtle bugs. Don't ever do that again, please.
Nothing in the Go memory model
says that waiting against wall-clock timing is considered by the runtime as establishing the order on how execution of different goroutines relate to each other.
There exist various ways to synchronize execution between goroutines. Basically they amount to sends and receives over channels and using the types provided by the sync package.
In your particular case the simplest approach is probably using the sync.WaitGroup type.
Here is what we would
have after fixing the problems explained above:
- Initialize the map variable right at the point of its
definition and not mess with it in main.
- Use sync.WaitGroup to make main properly wait for all
the goroutines it spawned to singal they're done:
package main
import (
"fmt"
"sync"
)
type TestStruct struct {
Test string
}
var channelsMap = make(map[string](chan *TestStruct))
func main() {
stringsSlice := []string{"value1"}
var wg sync.WaitGroup
wg.Add(len(stringsSlice))
for _, value := range stringsSlice {
channelsMap[value] = make(chan *TestStruct, 1)
go watchChannel(value, &wg)
}
testStruct := new(TestStruct)
testStruct.Test = "Hello!"
channelsMap["value1"] <- testStruct
wg.Wait()
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range channelsMap[channelMapKey] {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
}
The next two problems with your code become apparent once we will
have fixed the former two—after you make the "watcher" goroutines
use the same map variable as the goroutine running main, and
make the latter properly wait for the watchers:
There is a data race
over the map variable between the
code which updates the map after the for loop spawning the
watcher goroutines ended and the code which accesses this
variable in all the watcher goroutines.
There is a deadlock
between the watcher goroutines and the main goroutine which waits for them to complete.
The reason for the deadlock is that the watcher goroutines
never receive any signal they have to quit processing and
hence are stuck forever trying to read from their respective
channels.
The ways to fix these two new problems are simple but they
might actually "break" your original idea of structuring
your code.
First, I'd remove the data race by simply making the watchers
not access the map variable. As you can see, each call to
watchChannel receives a single value to use as the key to
read a value off the shared map, and hence each watcher always
reads a single value exactly once during its run time.
The code would become much clearer if we remove this extra
map access altogether and instead pass the appropriate channel
value directly to each watcher.
A nice byproduct of this is that we do not need a global
map variable anymore.
Here's what we'll get:
package main
import (
"fmt"
"sync"
)
type TestStruct struct {
Test string
}
func main() {
stringsSlice := []string{"value1"}
channelsMap := make(map[string](chan *TestStruct))
var wg sync.WaitGroup
wg.Add(len(stringsSlice))
for _, value := range stringsSlice {
channelsMap[value] = make(chan *TestStruct, 1)
go watchChannel(value, channelsMap[value], &wg)
}
testStruct := new(TestStruct)
testStruct.Test = "Hello!"
channelsMap["value1"] <- testStruct
wg.Wait()
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string, ch <-chan *TestStruct, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range ch {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
}
Okay, we still have the deadlock.
There are multiple approaches to solving this but they depend
on the actual circumstances, and with this toy example, any
attempt to iterate over at least a subset of them would just
muddle the waters.
Instead, let's employ the simplest one for this case: closing
a channel makes any pending receive operation on it immediately
unblock and produce the zero value for the channel's type.
For a channel being iterated over using the range statement
it simply means the stamement terminates without producing any
value from the channel.
In other words, let's just close all the channels to unblock
the range statements being run by the watcher goroutines
and then wait for these goroutines to report their completion via the wait group.
To not make the answer overly long, I also added programmatic initialization of the string slice to make the example more interesting by having multiple watchers—not just a single one—actually do useful work:
package main
import (
"fmt"
"sync"
)
type TestStruct struct {
Test string
}
func main() {
var stringsSlice []string
channelsMap := make(map[string](chan *TestStruct))
for i := 1; i <= 10; i++ {
stringsSlice = append(stringsSlice, fmt.Sprintf("value%d", i))
}
var wg sync.WaitGroup
wg.Add(len(stringsSlice))
for _, value := range stringsSlice {
channelsMap[value] = make(chan *TestStruct, 1)
go watchChannel(value, channelsMap[value], &wg)
}
for _, value := range stringsSlice {
testStruct := new(TestStruct)
testStruct.Test = fmt.Sprint("Hello! ", value)
channelsMap[value] <- testStruct
}
for _, ch := range channelsMap {
close(ch)
}
wg.Wait()
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string, ch <-chan *TestStruct, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range ch {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
}
Playground link.
As you can see, there are things you should actually learn
about in way more greater detail before embarking on working with
concurrency.
I'd recommend to proceed in the following order:
The Go tour would make you accustomed with the bare bones of concurrency.
The Go Programming Language has two chapters dedicated to providing the readers with a gentle introduction with tackling concurrency both using channels and the types from the sync package.
Concurrency In Go goes on with presenting more hard-core details of how one deals with concurrency in Go, including advanced topics approaching the real-world problems concurrent programs face in production—such as ways to rate-limit incoming requests.
The shadowing in main of channelsMap mentioned above was a critical bug, but aside from that, the program was playing "Russian roulette" with the calls to time.After so that main wouldn't finish before the watcher goroutines did. This is unstable and unreliable, so I recommend the following approach using a channel to signal when all watcher goroutines are done:
package main
import (
"fmt"
)
type TestStruct struct {
Test string
}
var channelsMap map[string](chan *TestStruct)
func main() {
stringsSlice := []string{"value1", "value2", "value3"}
structsSlice := []TestStruct{
{"Hello1"},
{"Hello2"},
{"Hello3"},
}
channelsMap = make(map[string](chan *TestStruct))
// Signal channel to wait for watcher goroutines.
done := make(chan struct{})
for _, s := range stringsSlice {
channelsMap[s] = make(chan *TestStruct)
// Give watcher goroutines the signal channel.
go watchChannel(s, done)
}
for _, ts := range structsSlice {
for _, s := range stringsSlice {
channelsMap[s] <- &ts
}
}
// Close the channels so watcher goroutines can finish.
for _, s := range stringsSlice {
close(channelsMap[s])
}
// Wait for all watcher goroutines to finish.
for range stringsSlice {
<-done
}
// Now we're really done!
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string, done chan<- struct{}) {
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range channelsMap[channelMapKey] {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
done <- struct{}{}
}
(Go Playground link: https://play.golang.org/p/eP57Ru44-NW)
Of importance is the use of the done channel to let watcher goroutines signal that they're finished to main. Another critical part is the closing of the channels once you're done with them. If you don't close them, the range loops in the watcher goroutines will never end, waiting forever. Once you close the channel, the range loop exits and the watcher goruoutine can send on the done channel, signaling that it has finished working.
Finally, back in main, you have to receive on the done channel once for each watcher goroutine you created. Since the number of watcher goroutines is equal to the number of items in stringsSlice, you simply range over stringsSlice to receive the correct amount of times from the done channel. Once that's finished, the main function can exit with a guarantee that all watchers have finished.

Golang Concurrency Issue

I am learning Golang concurrency and have written a program to display URL's in order. I expect the code to return
http://bing.com*
http://google.com*
But it always returns http:/google.com*** . As if the variable is being overwritten.Since i am using goroutines i would expect it to return both values at the sametime.
func check(u string) string {
tmpres := u+"*****"
return tmpres
}
func IsReachable(url string) string {
ch := make(chan string, 1)
go func() {
ch <- check(url)
}()
select {
case reachable := <-ch:
// use err and reply
return reachable
case <-time.After(3* time.Second):
// call timed out
return "none"
}
}
func main() {
var urls = []string{
"http://bing.com/",
"http://google.com/",
}
for _, url := range urls {
go func() {
fmt.Println(IsReachable(url))
}()
}
time.Sleep(1 * time.Second)
}
Two problems. First, you've created a race condition. By closing over the loop variable, you're sharing it between the thread running the loop and the thread running the goroutine, which is causing your described problem: by the time the goroutine that was started for the first URL tries to run, the value of the variable has changed. You need to either copy it to a local variable, or pass it as an argument, e.g.:
for _, url := range urls {
go func(url string) {
fmt.Println(IsReachable(url))
}(url)
}
Second, you said you wanted to display them "in order", which is not a goal generally compatible with concurrency/parallism, because you cannot control the order of parallel operations. If you want them in order, you should do them in order in a single thread. Otherwise, you'll have to collect the results, wait for all them to come back, then sort the results back into the desired order before printing them.

How to broadcast message using channel

I am new to go and I am trying to create a simple chat server where clients can broadcast messages to all connected clients.
In my server, I have a goroutine (infinite for loop) that accepts connection and all the connections are received by a channel.
go func() {
for {
conn, _ := listener.Accept()
ch <- conn
}
}()
Then, I start a handler (goroutine) for every connected client. Inside the handler, I try to broadcast to all connections by iterating through the channel.
for c := range ch {
conn.Write(msg)
}
However, I cannot broadcast because (I think from reading the docs) the channel needs to be closed before iterating. I am not sure when I should close the channel because I want to continuously accept new connections and closing the channel won't let me do that. If anyone can help me, or provide a better way to broadcast messages to all connected clients, it would be appreciated.
What you are doing is a fan out pattern, that is to say, multiple endpoints are listening to a single input source. The result of this pattern is, only one of these listeners will be able to get the message whenever there's a message in the input source. The only exception is a close of channel. This close will be recognized by all of the listeners, and thus a "broadcast".
But what you want to do is broadcasting a message read from connection, so we could do something like this:
When the number of listeners is known
Let each worker listen to dedicated broadcast channel, and dispatch the message from the main channel to each dedicated broadcast channel.
type worker struct {
source chan interface{}
quit chan struct{}
}
func (w *worker) Start() {
w.source = make(chan interface{}, 10) // some buffer size to avoid blocking
go func() {
for {
select {
case msg := <-w.source
// do something with msg
case <-quit: // will explain this in the last section
return
}
}
}()
}
And then we could have a bunch of workers:
workers := []*worker{&worker{}, &worker{}}
for _, worker := range workers { worker.Start() }
Then start our listener:
go func() {
for {
conn, _ := listener.Accept()
ch <- conn
}
}()
And a dispatcher:
go func() {
for {
msg := <- ch
for _, worker := workers {
worker.source <- msg
}
}
}()
When the number of listeners is not known
In this case, the solution given above still works. The only difference is, whenever you need a new worker, you need to create a new worker, start it up, and then push it into workers slice. But this method requires a thread-safe slice, which need a lock around it. One of the implementation may look like as follows:
type threadSafeSlice struct {
sync.Mutex
workers []*worker
}
func (slice *threadSafeSlice) Push(w *worker) {
slice.Lock()
defer slice.Unlock()
workers = append(workers, w)
}
func (slice *threadSafeSlice) Iter(routine func(*worker)) {
slice.Lock()
defer slice.Unlock()
for _, worker := range workers {
routine(worker)
}
}
Whenever you want to start a worker:
w := &worker{}
w.Start()
threadSafeSlice.Push(w)
And your dispatcher will be changed to:
go func() {
for {
msg := <- ch
threadSafeSlice.Iter(func(w *worker) { w.source <- msg })
}
}()
Last words: never leave a dangling goroutine
One of the good practices is: never leave a dangling goroutine. So when you finished listening, you need to close all of the goroutines you fired. This will be done via quit channel in worker:
First we need to create a global quit signalling channel:
globalQuit := make(chan struct{})
And whenever we create a worker, we assign the globalQuit channel to it as its quit signal:
worker.quit = globalQuit
Then when we want to shutdown all workers, we simply do:
close(globalQuit)
Since close will be recognized by all listening goroutines (this is the point you understood), all goroutines will be returned. Remember to close your dispatcher routine as well, but I will leave it to you :)
A more elegant solution is a "broker", where clients may subscribe and unsubscribe to messages.
To also handle subscribing and unsubscribing elegantly, we may utilize channels for this, so the main loop of the broker which receives and distributes the messages can incorporate all these using a single select statement, and synchronization is given from the solution's nature.
Another trick is to store the subscribers in a map, mapping from the channel we use to distribute messages to them. So use the channel as the key in the map, and then adding and removing the clients is "dead" simple. This is made possible because channel values are comparable, and their comparison is very efficient as channel values are simple pointers to channel descriptors.
Without further ado, here's a simple broker implementation:
type Broker[T any] struct {
stopCh chan struct{}
publishCh chan T
subCh chan chan T
unsubCh chan chan T
}
func NewBroker[T any]() *Broker[T] {
return &Broker[T]{
stopCh: make(chan struct{}),
publishCh: make(chan T, 1),
subCh: make(chan chan T, 1),
unsubCh: make(chan chan T, 1),
}
}
func (b *Broker[T]) Start() {
subs := map[chan T]struct{}{}
for {
select {
case <-b.stopCh:
return
case msgCh := <-b.subCh:
subs[msgCh] = struct{}{}
case msgCh := <-b.unsubCh:
delete(subs, msgCh)
case msg := <-b.publishCh:
for msgCh := range subs {
// msgCh is buffered, use non-blocking send to protect the broker:
select {
case msgCh <- msg:
default:
}
}
}
}
}
func (b *Broker[T]) Stop() {
close(b.stopCh)
}
func (b *Broker[T]) Subscribe() chan T {
msgCh := make(chan T, 5)
b.subCh <- msgCh
return msgCh
}
func (b *Broker[T]) Unsubscribe(msgCh chan T) {
b.unsubCh <- msgCh
}
func (b *Broker[T]) Publish(msg T) {
b.publishCh <- msg
}
Example using it:
func main() {
// Create and start a broker:
b := NewBroker[string]()
go b.Start()
// Create and subscribe 3 clients:
clientFunc := func(id int) {
msgCh := b.Subscribe()
for {
fmt.Printf("Client %d got message: %v\n", id, <-msgCh)
}
}
for i := 0; i < 3; i++ {
go clientFunc(i)
}
// Start publishing messages:
go func() {
for msgId := 0; ; msgId++ {
b.Publish(fmt.Sprintf("msg#%d", msgId))
time.Sleep(300 * time.Millisecond)
}
}()
time.Sleep(time.Second)
}
Output of the above will be (try it on the Go Playground):
Client 2 got message: msg#0
Client 0 got message: msg#0
Client 1 got message: msg#0
Client 2 got message: msg#1
Client 0 got message: msg#1
Client 1 got message: msg#1
Client 1 got message: msg#2
Client 2 got message: msg#2
Client 0 got message: msg#2
Client 2 got message: msg#3
Client 0 got message: msg#3
Client 1 got message: msg#3
Improvements
You may consider the following improvements. These may or may not be useful depending on how / to what you use the broker.
Broker.Unsubscribe() may close the message channel, signalling that no more messages will be sent on it:
func (b *Broker[T]) Unsubscribe(msgCh chan T) {
b.unsubCh <- msgCh
close(msgCh)
}
This would allow clients to range over the message channel, like this:
msgCh := b.Subscribe()
for msg := range msgCh {
fmt.Printf("Client %d got message: %v\n", id, msg)
}
Then if someone unsubscribes this msgCh like this:
b.Unsubscribe(msgCh)
The above range loop will terminate after processing all messages that were sent before the call to Unsubscribe().
If you want your clients to rely on the message channel being closed, and the broker's lifetime is narrower than your app's lifetime, then you could also close all subscribed clients when the broker is stopped, in the Start() method like this:
case <-b.stopCh:
for msgCh := range subs {
close(msgCh)
}
return
Broadcast to a slice of channel and use sync.Mutex to manage channel add and remove may be the easiest way in your case.
Here is what you can do to broadcast in golang:
You can broadcast a share status change with sync.Cond. This way do not have any alloc once setup, but you can not add timeout functional or work with another channel.
You can broadcast a share status change with a close old channel and create new channel and sync.Mutex. This way have one alloc per status change, but you can add timeout functional and work with another channel.
You can broadcast to a slice of function callback and use sync.Mutex to manage them. The caller can do channel stuff. This way have more than one alloc per caller, and work with another channel.
You can broadcast to a slice of channel and use sync.Mutex to manage them. This way have more than one alloc per caller, and work with another channel.
You can broadcast to a slice of sync.WaitGroup and use sync.Mutex to manage them.
This is a late answer but I think it may appease some curious readers.
Go channels are widely welcomed to be used when it comes to concurrency.
Go community is rigid to follow this saying:
Do not communicate by sharing memory; instead, share memory by communicating.
I am completely neutral toward this and I think other options rather than well-defined channels should be considered when it comes to broadcasting.
Here is my take: Cond from sync packages are widely overlooked. Implementing braodcaster as suggested by Bronze man in very same context worths noting.
I was delighted witch icza suggestion to use channels and broadcast messages over them. I follow the same methods and use sync's conditional variable:
// Broadcaster is the struct which encompasses broadcasting
type Broadcaster struct {
cond *sync.Cond
subscribers map[interface{}]func(interface{})
message interface{}
running bool
}
this is the main struct that our whole broadcasting concept relies on.
Below, I define some behaviours for this struct. In a nutshell, subscribers should be able to be added, removed and whole the process should be revokable.
// SetupBroadcaster gives the broadcaster object to be used further in messaging
func SetupBroadcaster() *Broadcaster {
return &Broadcaster{
cond: sync.NewCond(&sync.RWMutex{}),
subscribers: map[interface{}]func(interface{}){},
}
}
// Subscribe let others enroll in broadcast event!
func (b *Broadcaster) Subscribe(id interface{}, f func(input interface{})) {
b.subscribers[id] = f
}
// Unsubscribe stop receiving broadcasting
func (b *Broadcaster) Unsubscribe(id interface{}) {
b.cond.L.Lock()
delete(b.subscribers, id)
b.cond.L.Unlock()
}
// Publish publishes the message
func (b *Broadcaster) Publish(message interface{}) {
go func() {
b.cond.L.Lock()
b.message = message
b.cond.Broadcast()
b.cond.L.Unlock()
}()
}
// Start the main broadcasting event
func (b *Broadcaster) Start() {
b.running = true
for b.running {
b.cond.L.Lock()
b.cond.Wait()
go func() {
for _, f := range b.subscribers {
f(b.message) // publishes the message
}
}()
b.cond.L.Unlock()
}
}
// Stop broadcasting event
func (b *Broadcaster) Stop() {
b.running = false
}
Next, I can use it quite easily:
messageToaster := func(message interface{}) {
fmt.Printf("[New Message]: %v\n", message)
}
unwillingReceiver := func(message interface{}) {
fmt.Println("Do not disturb!")
}
broadcaster := SetupBroadcaster()
broadcaster.Subscribe(1, messageToaster)
broadcaster.Subscribe(2, messageToaster)
broadcaster.Subscribe(3, unwillingReceiver)
go broadcaster.Start()
broadcaster.Publish("Hello!")
time.Sleep(time.Second)
broadcaster.Unsubscribe(3)
broadcaster.Publish("Goodbye!")
It should print something like this in any order:
[New Message]: Hello!
Do not disturb!
[New Message]: Hello!
[New Message]: Goodbye!
[New Message]: Goodbye!
See this on go playground
another one simple example:
https://play.golang.org
type Broadcaster struct {
mu sync.Mutex
clients map[int64]chan struct{}
}
func NewBroadcaster() *Broadcaster {
return &Broadcaster{
clients: make(map[int64]chan struct{}),
}
}
func (b *Broadcaster) Subscribe(id int64) (<-chan struct{}, error) {
defer b.mu.Unlock()
b.mu.Lock()
s := make(chan struct{}, 1)
if _, ok := b.clients[id]; ok {
return nil, fmt.Errorf("signal %d already exist", id)
}
b.clients[id] = s
return b.clients[id], nil
}
func (b *Broadcaster) Unsubscribe(id int64) {
defer b.mu.Unlock()
b.mu.Lock()
if _, ok := b.clients[id]; ok {
close(b.clients[id])
}
delete(b.clients, id)
}
func (b *Broadcaster) broadcast() {
defer b.mu.Unlock()
b.mu.Lock()
for k := range b.clients {
if len(b.clients[k]) == 0 {
b.clients[k] <- struct{}{}
}
}
}
type testClient struct {
name string
signal <-chan struct{}
signalID int64
brd *Broadcaster
}
func (c *testClient) doWork() {
i := 0
for range c.signal {
fmt.Println(c.name, "do work", i)
if i > 2 {
c.brd.Unsubscribe(c.signalID)
fmt.Println(c.name, "unsubscribed")
}
i++
}
fmt.Println(c.name, "done")
}
func main() {
var err error
brd := NewBroadcaster()
clients := make([]*testClient, 0)
for i := 0; i < 3; i++ {
c := &testClient{
name: fmt.Sprint("client:", i),
signalID: time.Now().UnixNano()+int64(i), // +int64(i) for play.golang.org
brd: brd,
}
c.signal, err = brd.Subscribe(c.signalID)
if err != nil {
log.Fatal(err)
}
clients = append(clients, c)
}
for i := 0; i < len(clients); i++ {
go clients[i].doWork()
}
for i := 0; i < 6; i++ {
brd.broadcast()
time.Sleep(time.Second)
}
}
output:
client:0 do work 0
client:2 do work 0
client:1 do work 0
client:2 do work 1
client:0 do work 1
client:1 do work 1
client:2 do work 2
client:0 do work 2
client:1 do work 2
client:2 do work 3
client:2 unsubscribed
client:2 done
client:0 do work 3
client:0 unsubscribed
client:0 done
client:1 do work 3
client:1 unsubscribed
client:1 done
Because Go channels follow the Communicating Sequential Processes (CSP) pattern, channels are a point-to-point communication entity. There is always one writer and one reader involved in each exchange.
However, each channel end can be shared amongst multiple goroutines. This is safe to do - there is no dangerous race condition.
So there can be multiple writers sharing the writing end. And/or there can be multiple readers sharing the reading end. I wrote more on this in a different answer, which includes examples.
If you really need a broadcast, you cannot do this directly, but it is not hard to implement an intermediate goroutine that copies a value out to each of a group of output channels.
The canonical (and idiomatic go) way to do this is via a slice of channels, as recommended above by Nevets and icza.
You should specifically not use a slice of callbacks. In some languages, you do typically register observers by passing a callback, but in those cases, you have to wrap their invocation in a fair amount of defensive code to protect the sender, and ideally you should have the generator of the message (the "Subject" in classic Observer pattern discussion) segregated from the observers by an intermediate message transport layer. This is where you typically use a pub-sub mesh (JMS brokers, gnats, MQ, whatever) when you're crossing process boundaries, but you should adhere to the same pattern if both subject and observers are internal to the same process (and most languages have available implementations of such mechanisms, so you shouldn't need to roll your own).
The reasons not to use callbacks include:
Unless you build in your own message transport layer, your subject is no longer both naive (it doesn't know the nature or cardinality of the observers) and disinterested (it doesn't care what they do with the message, only that it is made available to any interested parties);
If you want true broadcasting, then you need to act as if the order of receipt does not matter - ideally, everyone can see the message at the same time, even though in practice sending is iterative, even when using channels. But sending to recipient n+1 should absolutely not depend on confirmation of receipt by recipient n. That isn't broadcasting, it's serialized assignment. I say assignment because, if you are asking for a callback, then in executing the callback, you are enforcing (even if only minimally) some behavior to be taken by the recipient. You've basically turned your sender into an orchestrator, which is a very different sort of pattern with a different set of use cases.
Absent a defensive boundary (wrapping each callback invocation in a separate goroutine with a timeout context, e.g.), you are vulnerable to being blocked by a recipient - this is antithetical to broadcasting. Receipt (and optionally, taking any action at all based on) a broadcast message must be entirely asynchronous with respect to the original sending.
Is it doable to provide pseudo-broadcasting by using callbacks in go? Sure, but you have to invest in so much additional complexity to keep things clean - and why would you do that when go provides an easy and rather robust way to do it? The examples of channel-driven broadcasting above are good ones and how you should do it pretty much every time.
The specific exception when you absolutely should use callbacks is when you are not disinterested - you really do care that, on the basis of the sent message, the recipients take some action (and usually something specified by contract). For example, "I am about to unmount this filesystem, so flush and close your filehandles, let me know once you're done." (I know that's a pretty old-fashioned example, but it's the first one that comes to mind.)

Do go channels preserve order when blocked?

I have a slice of channels that all receive the same message:
func broadcast(c <-chan string, chans []chan<- string) {
for msg := range c {
for _, ch := range chans {
ch <- msg
}
}
}
However, since each of the channels in chans are potentially being read at a different rate, I don't want to block the other channels when I get a slow consumer. I've solved this with goroutines:
func broadcast(c <-chan string, chans []chan<- string) {
for msg := range c {
for _, ch := range chans {
go func() { ch <- msg }()
}
}
}
However, the order of the messages that get passed to each channel is important. I looked to the spec to see if channels preserve order when blocked, and all I found was this:
If the capacity is greater than zero, the channel is asynchronous: communication operations succeed without blocking if the buffer is not full (sends) or not empty (receives), and elements are received in the order they are sent.
To me, if a write is blocked, then it is not "sent", but waiting to be sent. With that assumption, the above says nothing about order of sending when multiple goroutines are blocked on writing.
Are there any guarantees about the order of sends after a channel becomes unblocked?
No, there are no guarantees.
Even when the channel is not full, if two goroutines are started at about the same time to send to it, I don't think there is any guarantee that the goroutine that was started first would actually execute first. So you can't count on the messages arriving in order.
You can drop the message if the channel is full (and then set a flag to pause the client and send them a message that they're dropping messages or whatever).
Something along the lines of (untested):
type Client struct {
Name string
ch chan<-string
}
func broadcast(c <-chan string, chans []*Client) {
for msg := range c {
for _, ch := range chans {
select {
case ch.ch <- msg:
// all okay
default:
log.Printf("Channel was full sending '%s' to client %s", msg, ch.Name)
}
}
}
}
In this code, no guarantees.
The main problem with the given sample code lies not in the channel behavior, but rather in the numerous created goroutines. All the goroutines are "fired" inside the same imbricated loop without further synchronization, so even before they start to send messages, we simply don't know which ones will execute first.
However this rises a legitimate question in general : if we somehow garantee the order of several blocking send instructions, are we guaranteed to receive them in the same order?
The "happens-before" property of the sendings is difficult to create. I fear it is impossible because :
Anything can happen before the sending instruction : for example, other goroutines performing their own sendings or not
A goroutine being blocked in a sending cannot simultaneously manage other sorts of synchronization
For example, if I have 10 goroutines numbered 1 to 10, I have no way of letting them send their own number to the channel, concurrently, in the right order. All I can do is use various kinds of sequential tricks like doing the sorting in 1 single goroutine.
This is an addition to the already posted answers.
As practically everyone stated, that the problem is the order of execution of the goroutines,
you can easily coordinate goroutine execution using channels by passing around the number of the
goroutine you want to run:
func coordinated(coord chan int, num, max int, work func()) {
for {
n := <-coord
if n == num {
work()
coord <- (n+1) % max
} else {
coord <- n
}
}
}
coord := make(chan int)
go coordinated(coord, 0, 3, func() { println("0"); time.Sleep(1 * time.Second) })
go coordinated(coord, 1, 3, func() { println("1"); time.Sleep(1 * time.Second) })
go coordinated(coord, 2, 3, func() { println("2"); time.Sleep(1 * time.Second) })
coord <- 0
or by using a central goroutine which executes the workers in a ordered manner:
func executor(funs chan func()) {
for {
worker := <-funs
worker()
funs <- worker
}
}
funs := make(chan func(), 3)
funs <- func() { println("0"); time.Sleep(1 * time.Second) }
funs <- func() { println("1"); time.Sleep(1 * time.Second) }
funs <- func() { println("2"); time.Sleep(1 * time.Second) }
go executor(funs)
These methods will, of course, remove all parallelism due to synchronization. However,
the concurrent aspect of your program remains.

Resources