What's the idiomatic solution to embarassingly parallel tasks in Go? - go

I'm currently staring at a beefed up version of the following code:
func embarrassing(data []string) []string {
resultChan := make(chan string)
var waitGroup sync.WaitGroup
for _, item := range data {
waitGroup.Add(1)
go func(item string) {
defer waitGroup.Done()
resultChan <- doWork(item)
}(item)
}
go func() {
waitGroup.Wait()
close(resultChan)
}()
var results []string
for result := range resultChan {
results = append(results, result)
}
return results
}
This is just blowing my mind. All this is doing can be expressed in other languages as
results = parallelMap(data, doWork)
Even if it can't be done quite this easily in Go, isn't there still a better way than the above?

If you need all the results, you don't need the channel (and the extra goroutine to close it) to communicate the results, you can write directly into the results slice:
func cleaner(data []string) []string {
results := make([]string, len(data))
wg := &sync.WaitGroup{}
wg.Add(len(data))
for i, item := range data {
go func(i int, item string) {
defer wg.Done()
results[i] = doWork(item)
}(i, item)
}
wg.Wait()
return results
}
This is possible because slice elements act as distinct variables, and thus can be written individually without synchronization. For details, see Can I concurrently write different slice elements. You also get the results in the same order as your input for free.
Anoter variation: if doWork() would not return the result but get the address where the result should be "placed", and additionally the sync.WaitGroup to signal completion, that doWork() function could be executed "directly" as a new goroutine.
We can create a reusable wrapper for doWork():
func doWork2(item string, result *string, wg *sync.WaitGroup) {
defer wg.Done()
*result = doWork(item)
}
If you have the processing logic in such format, this is how it can be executed concurrently:
func cleanest(data []string) []string {
results := make([]string, len(data))
wg := &sync.WaitGroup{}
wg.Add(len(data))
for i, item := range data {
go doWork2(item, &results[i], wg)
}
wg.Wait()
return results
}
Yet another variation could be to pass a channel to doWork() on which it is supposed to deliver the result. This solution doesn't even require a sync.Waitgroup, as we know how many elements we want to receive from the channel:
func cleanest2(data []string) []string {
ch := make(chan string)
for _, item := range data {
go doWork3(item, ch)
}
results := make([]string, len(data))
for i := range results {
results[i] = <-ch
}
return results
}
func doWork3(item string, res chan<- string) {
res <- "done:" + item
}
"Weakness" of this last solution is that it may collect the result "out-of-order" (which may or may not be a problem). This approach can be improved to retain order by letting doWork() receive and return the index of the item. For details and examples, see How to collect values from N goroutines executed in a specific order?

You can also use reflection to achieve something similar.
In this example it distribute the handler function over 4 goroutines and returns the results in a new instance of the given source slice type.
package main
import (
"fmt"
"reflect"
"strings"
"sync"
)
func parralelMap(some interface{}, handle interface{}) interface{} {
rSlice := reflect.ValueOf(some)
rFn := reflect.ValueOf(handle)
dChan := make(chan reflect.Value, 4)
rChan := make(chan []reflect.Value, 4)
var waitGroup sync.WaitGroup
for i := 0; i < 4; i++ {
waitGroup.Add(1)
go func() {
defer waitGroup.Done()
for v := range dChan {
rChan <- rFn.Call([]reflect.Value{v})
}
}()
}
nSlice := reflect.MakeSlice(rSlice.Type(), rSlice.Len(), rSlice.Cap())
for i := 0; i < rSlice.Len(); i++ {
dChan <- rSlice.Index(i)
}
close(dChan)
go func() {
waitGroup.Wait()
close(rChan)
}()
i := 0
for v := range rChan {
nSlice.Index(i).Set(v[0])
i++
}
return nSlice.Interface()
}
func main() {
fmt.Println(
parralelMap([]string{"what", "ever"}, strings.ToUpper),
)
}
Test here https://play.golang.org/p/iUPHqswx8iS

Related

All Goroutines Are Asleep (The Go Programming Language)

I'm working through The Go Programming Language and learning about goroutines, and came across the following issue. In this example, the following function is meant to take a channel of files and process each of them:
func makeThumbnails5(filenames <-chan string) int64 {
sizes := make(chan int64)
var wg sync.WaitGroup
for f := range filenames {
wg.Add(1)
// worker
go func(f string) {
defer wg.Done()
thumb, err := thumbnail.ImageFile(f)
if err != nil {
log.Println(err)
return
}
info, _ := os.Stat(thumb)
sizes <- info.Size()
}(f)
}
// closer
go func() {
wg.Wait()
close(sizes)
}()
var total int64
for size := range sizes {
total += size
}
wg.Wait()
return total
}
I've tried to use this function the following way:
func main() {
thumbnails := os.Args[1:] /* Get a list of all the images from the CLI */
ch := make(chan string, len(thumbnails))
for _, val := range thumbnails {
ch <- val
}
makeThumbnails5(ch)
}
However, when I run this program, I get the following error:
fatal error: all goroutines are asleep - deadlock!
It doesn't appear that the closer goroutine is running. Could someone help me understand what is going wrong here, and what I can do to run this function correctly?
As I commented it deadlocks because the filenames chan is never closed and thus the for f := range filenames loop never completes. However, just closing the input chan means that all goroutines launched in the loop would get stuck at the line sizes <- info.Size() until the loop ends. Not a problem in this case but if the input can be huge it could be (then you'd probably want to limit the number of concurrent workers too). So it makes sense to have the main loop in a goroutine too so that the for size := range sizes loop can start consuming. Following should work:
func makeThumbnails5(filenames <-chan string) int64 {
sizes := make(chan int64)
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
for f := range filenames {
wg.Add(1)
// worker
go func(f string) {
defer wg.Done()
thumb, err := thumbnail.ImageFile(f)
if err != nil {
log.Println(err)
return
}
info, _ := os.Stat(thumb)
sizes <- info.Size()
}(f)
}
}()
// closer
go func() {
wg.Wait()
close(sizes)
}()
var total int64
for size := range sizes {
total += size
}
return total
}
The implementation of the main has a similar problem that if the input is huge you're essentially load it all into memory (buffered chan) before passing it on to be processed. Perhaps something like following is better
func main() {
ch := make(chan string)
go func(thumbnails []string) {
defer close(ch)
for _, val := range thumbnails {
ch <- val
}
}(os.Args[1:])
makeThumbnails5(ch)
}

Are creating go routines asynchrnous?

I'm trying to fetch the content of an API with numerous goroutines.
I'm using a for loop to iterate over different character, but it seems like the forloop reaches its final value, before the requests are sent off.
package main
import (
"encoding/json"
"fmt"
"net/http"
"sync"
)
type people struct {
Name string `json:"name"`
}
func main(){
names := make(chan string, 25)
var wg sync.WaitGroup
for i := 0; i < 25; i++ {
wg.Add(1)
go func() {
defer wg.Done()
var p people
url := fmt.Sprintf("https://swapi.dev/api/people/%d", i)
getJSON(url, &p)
names <- p.Name
}()
}
name := <-names
fmt.Println(name)
wg.Wait()
}
func getJSON(url string, target interface{}) error {
r, err := http.Get(url)
if err != nil {
return err
}
defer r.Body.Close()
json.NewDecoder(r.Body).Decode(target)
return nil
}
Also, if somebody could improve my code quality, I'd be very grateful, I'm very new to Golang and don't have anybody to learn from!
You go routines are all using the same variable i. So on the first loop, you launch a goroutine that makes a url from i, and on the next loop i is incremented before that routine has a chance to run.
It's a common mistake in GoLang. The solution is to make a variable for each loop, and pass that one forward. You can either do it with a closure like this (playground).
for i := 0; i < 25; i++ {
wg.Add(1)
localI := i
go func() {
defer wg.Done()
var p people
// Use LocalI here
url := fmt.Sprintf("https://swapi.dev/api/people/%d", localI)
getJSON(url, &p)
names <- p.Name
}()
}
Or as an argument to the function (playground)
for i := 0; i < 25; i++ {
wg.Add(1)
localI := i
go func(localI int) {
defer wg.Done()
var p people
// Use LocalI here
url := fmt.Sprintf("https://swapi.dev/api/people/%d", localI)
getJSON(url, &p)
names <- p.Name
// Pass i here. Since I is a primitive, it is passed by value, not reference.
// Meaning a copy is made.
}(i)
}
Here is a good writeup on the mistake you made:
https://github.com/golang/go/wiki/CommonMistakes#using-goroutines-on-loop-iterator-variables
And the one above it is good to read too!

Channels terminate prematurely

I am prototyping a series of go routines for a pipeline that each perform a transformation. The routines are terminating before all the data has passed through.
I have checked Donavan and Kernighan book and Googled for solutions.
Here is my code:
package main
import (
"fmt"
"sync"
)
func main() {
a1 := []string{"apple", "apricot"}
chan1 := make(chan string)
chan2 := make(chan string)
chan3 := make(chan string)
var wg sync.WaitGroup
go Pipe1(chan2, chan1, &wg)
go Pipe2(chan3, chan2, &wg)
go Pipe3(chan3, &wg)
func (data []string) {
defer wg.Done()
for _, s := range data {
wg.Add(1)
chan1 <- s
}
go func() {
wg.Wait()
close(chan1)
}()
}(a1)
}
func Pipe1(out chan<- string, in <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for s := range in {
wg.Add(1)
out <- s + "s are"
}
}
func Pipe2(out chan<- string, in <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for s := range in {
wg.Add(1)
out <- s + " good for you"
}
}
func Pipe3(in <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for s := range in {
wg.Add(1)
fmt.Println(s)
}
}
My expected output is:
apples are good for you
apricots are good for you
The results of running main are inconsistent. Sometimes I get both lines. Sometimes I just get the apples. Sometimes nothing is output.
As Adrian already pointed out, your WaitGroup.Add and WaitGroup.Done calls are mismatched. However, in cases like this the "I am done" signal is typically given by closing the output channel. WaitGroups are only necessary if work is shared between several goroutines (i.e. several goroutines consume the same channel), which isn't the case here.
package main
import (
"fmt"
)
func main() {
a1 := []string{"apple", "apricot"}
chan1 := make(chan string)
chan2 := make(chan string)
chan3 := make(chan string)
go func() {
for _, s := range a1 {
chan1 <- s
}
close(chan1)
}()
go Pipe1(chan2, chan1)
go Pipe2(chan3, chan2)
// This range loop terminates when chan3 is closed, which Pipe2 does after
// chan2 is closed, which Pipe1 does after chan1 is closed, which the
// anonymous goroutine above does after it sent all values.
for s := range chan3 {
fmt.Println(s)
}
}
func Pipe1(out chan<- string, in <-chan string) {
for s := range in {
out <- s + "s are"
}
close(out) // let caller know that we're done
}
func Pipe2(out chan<- string, in <-chan string) {
for s := range in {
out <- s + " good for you"
}
close(out) // let caller know that we're done
}
Try it on the playground: https://play.golang.org/p/d2J4APjs_lL
You're calling wg.Wait in a goroutine, so main is allowed to return (and therefore your program exits) before the other routines have finished. This would cause the behavior your see, but taking out of a goroutine alone isn't enough.
You're also misusing the WaitGroup in general; your Add and Done calls don't relate to one another, and you don't have as many Dones as you have Adds, so the WaitGroup will never finish. If you're calling Add in a loop, then every loop iteration must also result in a Done call; as you have it now, you defer wg.Done() before each of your loops, then call Add inside the loop, resulting in one Done and many Adds. This code would need to be significantly revised to work as intended.

How to assign the values to struct while go routines are running?

I'm using goroutines in my project and I want to to assign the values to the struct fields but I don't know that how I will assign the values get by using mongodb quires to the struct fields I'm showing my struct and the query too.
type AppLoadNew struct{
StripeTestKey string `json:"stripe_test_key" bson:"stripe_test_key,omitempty"`
Locations []Locations `json:"location" bson:"location,omitempty"`
}
type Locations struct{
Id int `json:"_id" bson:"_id"`
Location string `json:"location" bson:"location"`
}
func GoRoutine(){
values := AppLoadNew{}
go func() {
data, err := GetStripeTestKey(bson.M{"is_default": true})
if err == nil {
values.StripeTestKey := data.TestStripePublishKey
}
}()
go func() {
location, err := GetFormLocation(bson.M{"is_default": true})
if err == nil {
values.Locations := location
}
}()
fmt.Println(values) // Here it will nothing
// empty
}
Can you please help me that I will assign all the values to the AppLoadNew struct.
In Go no value is safe for concurrent read and write (from multiple goroutines). You must synchronize access.
Reading and writing variables from multiple goroutines can be protected using sync.Mutex or sync.RWMutex, but in your case there is something else involved: you should wait for the 2 launched goroutines to complete. For that, the go-to solution is sync.WaitGroup.
And since the 2 goroutines write 2 different fields of a struct (which act as 2 distinct variables), they don't have to be synchronized to each other (see more on this here: Can I concurrently write different slice elements). Which means using a sync.WaitGroup is sufficient.
This is how you can make it safe and correct:
func GoRoutine() {
values := AppLoadNew{}
wg := &sync.WaitGroup{}
wg.Add(1)
go func() {
defer wg.Done()
data, err := GetStripeTestKey(bson.M{"is_default": true})
if err == nil {
values.StripeTestKey = data.StripeTestKey
}
}()
wg.Add(1)
go func() {
defer wg.Done()
location, err := GetFormLocation(bson.M{"is_default": true})
if err == nil {
values.Locations = location
}
}()
wg.Wait()
fmt.Println(values)
}
See a (slightly modified) working example on the Go Playground.
See related / similar questions:
Reading values from a different thread
golang struct concurrent read and write without Lock is also running ok?
How to make a variable thread-safe
You can use sync package with WaitGroup, here is an example:
package main
import (
"fmt"
"sync"
"time"
)
type Foo struct {
One string
Two string
}
func main() {
f := Foo{}
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
// Perform long calculations
<-time.After(time.Second * 1)
f.One = "foo"
}()
wg.Add(1)
go func() {
defer wg.Done()
// Perform long calculations
<-time.After(time.Second * 2)
f.Two = "bar"
}()
fmt.Printf("Before %+v\n", f)
wg.Wait()
fmt.Printf("After %+v\n", f)
}
The output:
Before {One: Two:}
After {One:foo Two:bar}

Golang: Processing 5 huge files concurrently

I have 5 huge (4 million rows each) logfiles that I process in Perl currently and I thought I may try to implement the same in Go and its concurrent features. So, being very inexperienced in Go, I was thinking of doing as below. Any comments on the approach will be greatly appreciated.
Some rough pseudocode:
var wg1 sync.WaitGroup
var wg2 sync.WaitGroup
func processRow (r Row) {
wg2.Add(1)
defer wg2.Done()
res = <process r>
return res
}
func processFile(f File) {
wg1.Add(1)
open(newfile File)
defer wg1.Done()
line = <row from f>
result = go processRow(line)
newFile.Println(result) // Write new processed line to newFile
wg2.Wait()
newFile.Close()
}
func main() {
for each f logfile {
go processFile(f)
}
wg1.Wait()
}
So, idea is that I process these 5 files concurrently and then all rows of each file will in turn also be processed concurrently.
Will that work?
You should definitely use channels to manage your processed rows. Alternatively you could also write another goroutine to handle your output.
var numGoWriters = 10
func processRow(r Row, ch chan<- string) {
res := process(r)
ch <- res
}
func writeRow(f File, ch <-chan string) {
w := bufio.NewWriter(f)
for s := range ch {
_, err := w.WriteString(s + "\n")
}
func processFile(f File) {
outFile, err := os.Create("/path/to/file.out")
if err != nil {
// handle it
}
defer outFile.Close()
var wg sync.WaitGroup
ch := make(chan string, 10) // play with this number for performance
defer close(ch) // once we're done processing rows, we close the channel
// so our worker threads exit
fScanner := bufio.NewScanner(f)
for fScanner.Scan() {
wg.Add(1)
go func() {
processRow(fScanner.Text(), ch)
wg.Done()
}()
}
for i := 0; i < numGoWriters; i++ {
go writeRow(outFile, ch)
}
wg.Wait()
}
Here we have processRow doing all the processing (I assumed to string), writeRow doing all the out I/O, and processFile tying each file together. Then all main has to do is hand off the files, spawn the goroutines, et voila.
func main() {
var wg sync.WaitGroup
filenames := [...]string{"here", "are", "some", "log", "paths"}
for fname := range filenames {
inFile, err := os.Open(fname)
if err != nil {
// handle it
}
defer inFile.Close()
wg.Add(1)
go processFile(inFile)
}
wg.Wait()

Resources