slice, channel of empty interface as function argument - go

I would like to implement a parallelization function, which in the absence of generics, looks like so:
func Parallelize(s []interface{}, c chan interface{}, f func(interface{}, chan interface{})) {
var wg sync.WaitGroup
defer wg.Done()
for _, si := range s {
wg.Add(1)
go f(si, c)
}
wg.Wait()
close(c)
}
I'd like to enable passing objects of any type, but to ensure that the first argument is a slice of objects, second is a channel and third is a function that accepts an object and channel.
Apparently, the go compiler doesn't like the arguments. It is not allowing me to call this function like so:
a := make([]*A)
c := make(chan *A)
f := func(_a *A, _c chan A) {
...
}
Parallelize(a, c, f)
What is the right way of going about this?

There are several ways you can do this, though the best in my opinion is not to do it. This is one of those common patterns that is best left explicit because it is much easier to read and maintain. However, if you insist:
One way to do it is realizing that you don't really need to pass the slice elements:
func Parallelize(n int, c chan interface{}, f func(int, chan interface{})) {
var wg sync.WaitGroup
defer wg.Done()
for i:=0;i<n;i++ {
wg.Add(1)
go f(i, c)
}
wg.Wait()
close(c)
}
And call it using:
Parallelize(len(slice), ch, func(i int,ch chan interface{}) {
// use slice[i]
})
You also don't need to pass the channel:
func Parallelize(n int, f func(int)) {
var wg sync.WaitGroup
defer wg.Done()
for i:=0;i<n;i++ {
wg.Add(1)
go f(i)
}
wg.Wait()
}
And call it:
Parallelize(len(slice), func(i int) {
// use slice[i] and chan ch
})`
close(ch)
Another way of doing this is using reflection. It'll be uglier, and you'll have to deal with runtime errors instead of compile time errors.

Related

Difference between passing a channel to a closure as a formal argument vs using the channel defined in the parent scope?

Take these two snippets for example
Using the out chan from the parent scope
func Worker() {
out := make(chan int)
func() {
// write something to the channel
}()
return out
}
Passing the out chan as a formal argument to the closure
func Worker() {
out := make(chan int)
func(out chan int) {
// write something to the channel
}(out)
return out
}
I know that passing arguments to a closure creates a copy of that and using something from the parent scope uses the reference, so I want to know how does this work internally in case of pass by copy. are there two channels one in parent scope and the other copy passed to closure and when the copy in the closure is written to a copy of that value is also made in the channel in the parent scope? because we are returning the out chan in the parent scope to the caller and the values will be consumed from that channel only.
chan is a reference type just like a slice or a map. Everything in go is passed by value. When you pass chan as an argument it creates a copy of the reference referencing the same value. The channel will be consumable from the parent scope in both cases. But there are couple of differences. Consider the following code:
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(1)
go func() {
ch <- 1
ch = nil
wg.Done()
}()
<-ch // we can read from the channel
wg.Wait()
// ch is nil here because we override the reference with a null pointer
vs
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(1)
go func(ch chan int) {
ch <- 1
ch = nil
wg.Done()
}(ch)
<-ch // we still can read from the channel
wg.Wait()
// ch is not nil here because we override the copied reference not the original one
// the original reference remained the same

Channels terminate prematurely

I am prototyping a series of go routines for a pipeline that each perform a transformation. The routines are terminating before all the data has passed through.
I have checked Donavan and Kernighan book and Googled for solutions.
Here is my code:
package main
import (
"fmt"
"sync"
)
func main() {
a1 := []string{"apple", "apricot"}
chan1 := make(chan string)
chan2 := make(chan string)
chan3 := make(chan string)
var wg sync.WaitGroup
go Pipe1(chan2, chan1, &wg)
go Pipe2(chan3, chan2, &wg)
go Pipe3(chan3, &wg)
func (data []string) {
defer wg.Done()
for _, s := range data {
wg.Add(1)
chan1 <- s
}
go func() {
wg.Wait()
close(chan1)
}()
}(a1)
}
func Pipe1(out chan<- string, in <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for s := range in {
wg.Add(1)
out <- s + "s are"
}
}
func Pipe2(out chan<- string, in <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for s := range in {
wg.Add(1)
out <- s + " good for you"
}
}
func Pipe3(in <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for s := range in {
wg.Add(1)
fmt.Println(s)
}
}
My expected output is:
apples are good for you
apricots are good for you
The results of running main are inconsistent. Sometimes I get both lines. Sometimes I just get the apples. Sometimes nothing is output.
As Adrian already pointed out, your WaitGroup.Add and WaitGroup.Done calls are mismatched. However, in cases like this the "I am done" signal is typically given by closing the output channel. WaitGroups are only necessary if work is shared between several goroutines (i.e. several goroutines consume the same channel), which isn't the case here.
package main
import (
"fmt"
)
func main() {
a1 := []string{"apple", "apricot"}
chan1 := make(chan string)
chan2 := make(chan string)
chan3 := make(chan string)
go func() {
for _, s := range a1 {
chan1 <- s
}
close(chan1)
}()
go Pipe1(chan2, chan1)
go Pipe2(chan3, chan2)
// This range loop terminates when chan3 is closed, which Pipe2 does after
// chan2 is closed, which Pipe1 does after chan1 is closed, which the
// anonymous goroutine above does after it sent all values.
for s := range chan3 {
fmt.Println(s)
}
}
func Pipe1(out chan<- string, in <-chan string) {
for s := range in {
out <- s + "s are"
}
close(out) // let caller know that we're done
}
func Pipe2(out chan<- string, in <-chan string) {
for s := range in {
out <- s + " good for you"
}
close(out) // let caller know that we're done
}
Try it on the playground: https://play.golang.org/p/d2J4APjs_lL
You're calling wg.Wait in a goroutine, so main is allowed to return (and therefore your program exits) before the other routines have finished. This would cause the behavior your see, but taking out of a goroutine alone isn't enough.
You're also misusing the WaitGroup in general; your Add and Done calls don't relate to one another, and you don't have as many Dones as you have Adds, so the WaitGroup will never finish. If you're calling Add in a loop, then every loop iteration must also result in a Done call; as you have it now, you defer wg.Done() before each of your loops, then call Add inside the loop, resulting in one Done and many Adds. This code would need to be significantly revised to work as intended.

Quit all recursively spawned goroutines at once

I have a function that recursively spawns goroutines to walk a DOM tree, putting the nodes they find into a channel shared between all of them.
import (
"golang.org/x/net/html"
"sync"
)
func walk(doc *html.Node, ch chan *html.Node) {
var wg sync.WaitGroup
defer close(ch)
var f func(*html.Node)
f = func(n *html.Node) {
defer wg.Done()
ch <- n
for c := n.FirstChild; c != nil; c = c.NextSibling {
wg.Add(1)
go f(c)
}
}
wg.Add(1)
go f(doc)
wg.Wait()
}
Which I'd call like
// get the webpage using http
// parse the html into doc
ch := make(chan *html.Node)
go walk(doc, ch)
for c := range ch {
if someCondition(c) {
// do something with c
// quit all goroutines spawned by walk
}
}
I am wondering how I could quit all of these goroutines--i.e. close ch--once I have found a node of a certain type or some other condition has been fulfilled. I have tried using a quit channel that'd be polled before spawning the new goroutines and close ch if a value was received but that lead to race conditions where some goroutines tried sending on the channel that had just been closed by another one. I was pondering using a mutex but it seems inelegant and against the spirit of go to protect a channel with a mutex. Is there an idiomatic way to do this using channels? If not, is there any way at all? Any input appreciated!
The context package provides similar functionality. Using context.Context with a few Go-esque patterns, you can achieve what you need.
To start you can check this article to get a better feel of cancellation with context: https://www.sohamkamani.com/blog/golang/2018-06-17-golang-using-context-cancellation/
Also make sure to check the official GoDoc: https://golang.org/pkg/context/
So to achieve this functionality your function should look more like:
func walk(ctx context.Context, doc *html.Node, ch chan *html.Node) {
var wg sync.WaitGroup
defer close(ch)
var f func(*html.Node)
f = func(n *html.Node) {
defer wg.Done()
ch <- n
for c := n.FirstChild; c != nil; c = c.NextSibling {
select {
case <-ctx.Done():
return // quit the function as it is cancelled
default:
wg.Add(1)
go f(c)
}
}
}
select {
case <-ctx.Done():
return // perhaps it was cancelled so quickly
default:
wg.Add(1)
go f(doc)
wg.Wait()
}
}
And when calling the function, you will have something like:
// ...
ctx, cancelFunc := context.WithCancel(context.Background())
walk(ctx, doc, ch)
for value := range ch {
// ...
if someCondition {
cancelFunc()
// the for loop will automatically exit as the channel is being closed for the inside
}
}

What's the idiomatic solution to embarassingly parallel tasks in Go?

I'm currently staring at a beefed up version of the following code:
func embarrassing(data []string) []string {
resultChan := make(chan string)
var waitGroup sync.WaitGroup
for _, item := range data {
waitGroup.Add(1)
go func(item string) {
defer waitGroup.Done()
resultChan <- doWork(item)
}(item)
}
go func() {
waitGroup.Wait()
close(resultChan)
}()
var results []string
for result := range resultChan {
results = append(results, result)
}
return results
}
This is just blowing my mind. All this is doing can be expressed in other languages as
results = parallelMap(data, doWork)
Even if it can't be done quite this easily in Go, isn't there still a better way than the above?
If you need all the results, you don't need the channel (and the extra goroutine to close it) to communicate the results, you can write directly into the results slice:
func cleaner(data []string) []string {
results := make([]string, len(data))
wg := &sync.WaitGroup{}
wg.Add(len(data))
for i, item := range data {
go func(i int, item string) {
defer wg.Done()
results[i] = doWork(item)
}(i, item)
}
wg.Wait()
return results
}
This is possible because slice elements act as distinct variables, and thus can be written individually without synchronization. For details, see Can I concurrently write different slice elements. You also get the results in the same order as your input for free.
Anoter variation: if doWork() would not return the result but get the address where the result should be "placed", and additionally the sync.WaitGroup to signal completion, that doWork() function could be executed "directly" as a new goroutine.
We can create a reusable wrapper for doWork():
func doWork2(item string, result *string, wg *sync.WaitGroup) {
defer wg.Done()
*result = doWork(item)
}
If you have the processing logic in such format, this is how it can be executed concurrently:
func cleanest(data []string) []string {
results := make([]string, len(data))
wg := &sync.WaitGroup{}
wg.Add(len(data))
for i, item := range data {
go doWork2(item, &results[i], wg)
}
wg.Wait()
return results
}
Yet another variation could be to pass a channel to doWork() on which it is supposed to deliver the result. This solution doesn't even require a sync.Waitgroup, as we know how many elements we want to receive from the channel:
func cleanest2(data []string) []string {
ch := make(chan string)
for _, item := range data {
go doWork3(item, ch)
}
results := make([]string, len(data))
for i := range results {
results[i] = <-ch
}
return results
}
func doWork3(item string, res chan<- string) {
res <- "done:" + item
}
"Weakness" of this last solution is that it may collect the result "out-of-order" (which may or may not be a problem). This approach can be improved to retain order by letting doWork() receive and return the index of the item. For details and examples, see How to collect values from N goroutines executed in a specific order?
You can also use reflection to achieve something similar.
In this example it distribute the handler function over 4 goroutines and returns the results in a new instance of the given source slice type.
package main
import (
"fmt"
"reflect"
"strings"
"sync"
)
func parralelMap(some interface{}, handle interface{}) interface{} {
rSlice := reflect.ValueOf(some)
rFn := reflect.ValueOf(handle)
dChan := make(chan reflect.Value, 4)
rChan := make(chan []reflect.Value, 4)
var waitGroup sync.WaitGroup
for i := 0; i < 4; i++ {
waitGroup.Add(1)
go func() {
defer waitGroup.Done()
for v := range dChan {
rChan <- rFn.Call([]reflect.Value{v})
}
}()
}
nSlice := reflect.MakeSlice(rSlice.Type(), rSlice.Len(), rSlice.Cap())
for i := 0; i < rSlice.Len(); i++ {
dChan <- rSlice.Index(i)
}
close(dChan)
go func() {
waitGroup.Wait()
close(rChan)
}()
i := 0
for v := range rChan {
nSlice.Index(i).Set(v[0])
i++
}
return nSlice.Interface()
}
func main() {
fmt.Println(
parralelMap([]string{"what", "ever"}, strings.ToUpper),
)
}
Test here https://play.golang.org/p/iUPHqswx8iS

calling each function by iteration function slice

I am trying to loop a slice of function and then invoke every function in it. However I am getting strange results. Here is my code:
package main
import (
"fmt"
"sync"
)
func A() {
fmt.Println("A")
}
func B() {
fmt.Println("B")
}
func C() {
fmt.Println("C")
}
func main() {
type fs func()
var wg sync.WaitGroup
f := []fs{A, B, C}
for a, _ := range f {
wg.Add(1)
go func() {
defer wg.Done()
f[a]()
}()
}
wg.Wait()
}
I was thinking that it will invoke function A,B and then C but my output gets only Cs.
C
C
C
Please suggest whats wrong and the logic behind it. Also how can I get desired behavior.
Go Playground
Classic go gotcha :)
Official Go FAQ
for a, _ := range f {
wg.Add(1)
a:=a // this will make it work
go func() {
defer wg.Done()
f[a]()
}()
}
Your func() {}() is a closure that closes over a. And a is a shared across all the go func go routines because for loop reuses the same var (meaning same address in memory, hence same value), so naturally they all see last value of a.
Solution is either re-declare a:=a before closure (like above). This will create new var (new address in memory) which is then new for each invocation of go func.
Or pass it in as parameter to the go function, in which case you pass a copy of value of a like so:
go func(i int) {
defer wg.Done()
f[i]()
}(a)
You don't even need to have go routines this https://play.golang.org/p/nkP9YfeOWF for example demonstrates the same gotcha. The key here is 'closure'.
The problem seems to be that you are not passing the desired value to the goroutine and the variable value is being taken from the outer scope. That being said, the range iteration finishes even before the first goroutine is executed and that is why you are always getting index a == 2, which is function C.
You can test this if you simply put time.Sleep(100) inside your range, just to allow the goroutine to catch up with the main thread before continuing to the next iteration --> GO playground
for a, _ := range f {
wg.Add(1)
go func() {
defer wg.Done()
f[a]()
}()
time.Sleep(100)
}
Output
A
B
C
Although what you want to do is just simply pass a parameter to the goroutine which will make a copy for the function.
func main() {
type fs func()
var wg sync.WaitGroup
f := []fs{A, B, C}
for _, v := range f {
wg.Add(1)
go func(f fs) {
defer wg.Done()
f()
}(v)
}
wg.Wait()
}
GO Playground

Resources