I have a slice that contains work to be done, and a slice that will contain the results when everything is done. The following is a sketch of my general process:
var results = make([]Result, len(jobs))
wg := sync.WaitGroup{}
for i, job := range jobs {
wg.Add(1)
go func(i int, j job) {
defer wg.Done()
var r Result = doWork(j)
results[i] = r
}(i, job)
}
wg.Wait()
// Use results
It seems to work, but I have not tested it thoroughly and am not sure if it is safe to do. Generally I would not feel good letting multiple goroutines write to anything, but in this case, each goroutine is limited to its own index in the slice, which is pre-allocated.
I suppose the alternative is collecting results via a channel, but since order of results matters, this seemed rather simple. Is it safe to write into slice elements this way?
The rule is simple: if multiple goroutines access a variable concurrently, and at least one of the accesses is a write, then synchronization is required.
Your example does not violate this rule. You don't write the slice value (the slice header), you only read it (implicitly, when you index it).
You don't read the slice elements, you only modify the slice elements. And each goroutine only modifies a single, different, designated slice element. And since each slice element has its own address (own memory space), they are like distinct variables. This is covered in Spec: Variables:
Structured variables of array, slice, and struct types have elements and fields that may be addressed individually. Each such element acts like a variable.
What must be kept in mind is that you can't read the results from the results slice without synchronization. And the waitgroup you used in your example is a sufficient synchronization. You are allowed to read the slice once wg.Wait() returns, because that can only happen after all worker goroutines called wg.Done(), and none of the worker goroutines modify the elements after they called wg.Done().
For example, this is a valid (safe) way to check / process the results:
wg.Wait()
// Safe to read results after the above synchronization point:
fmt.Println(results)
But if you would try to access the elements of results before wg.Wait(), that's a data race:
// This is data race! Goroutines might still run and modify elements of results!
fmt.Println(results)
wg.Wait()
Yes, it's perfectly legal: a slice has an array as its underlying data storage, and, being a compound type, an array is a sequence of "elements" which behave as individual variables with distinct memory locations; modifying them concurrently is fine.
Just be sure to synchronize the shutdown of your worker goroutines with
the main one before it reads the updated contents of the slice.
Using sync.WaitGroup for this—as you do—is perfectly fine.
Also, as #icza said, you must not modify the slice value itself (which is a struct containing a pointer to the backing storage array, the capacity and the length).
YES, YOU CAN.
tldr
In golang.org/x/sync/errgroup example, it has the same example code in Example (Parallel)
Google := func(ctx context.Context, query string) ([]Result, error) {
g, ctx := errgroup.WithContext(ctx)
searches := []Search{Web, Image, Video}
results := make([]Result, len(searches))
for i, search := range searches {
i, search := i, search
g.Go(func() error {
result, err := search(ctx, query)
if err == nil {
results[i] = result
}
return err
})
}
if err := g.Wait(); err != nil {
return nil, err
}
return results, nil
}
// ...
Related
I was exploring the possibility of concurrently accessing a map with fixed keys without a lock for performance improvement.
I've explored the similar with slice before and seems it works:
func TestConcurrentSlice(t *testing.T) {
fixed := []int{1, 2, 3}
wg := &sync.WaitGroup{}
for i := 0; i < len(fixed); i++ {
idx := i
wg.Add(1)
go func() {
defer wg.Done()
fixed[idx]++
}()
}
wg.Wait()
fmt.Printf("%v\n", fixed)
}
The above code will pass the -race test.
That gave me the confidence of achieving the same thing with map with fixed size (fixed number of keys) because I assume if the number of keys doesn't change, so the underline array (in map) does not need to expand, so it will be safe for us to access different key (different memory location) in different go-routine. So I wrote this test:
type simpleStruct struct {
val int
}
func TestConcurrentAccessMap(t *testing.T) {
fixed := map[string]*simpleStruct{
"a": {0},
"b": {0},
}
wg := &sync.WaitGroup{}
// here I use array instead of iterating the map to avoid read access
keys := []string{"a", "b"}
for _, k := range keys {
kcopy := k
wg.Add(1)
go func() {
defer wg.Done()
// this failed the race test
fixed[kcopy] = &simpleStruct{}
// this actually can pass the race test!
//fixed[kcopy].val++
}()
}
wg.Wait()
}
however, the test failed the race test with error message concurrent write by runtime.mapassign_faststr() function.
And one more interesting I found is the code I've commented out "fixed[kcopy].val++" actually passed the race test (I assume it's because of the writings are at different memory location). But I'm wondering since the go-routines are accessing different keys of the map, why it will fail the race test?
Accessing different slice elements without synchronization from multiple goroutines is OK, because each slice element acts as an individual variable. For details, see Can I concurrently write different slice elements.
However, this is not the case with maps. A value for a specific key does not act as a variable, and it is not addressable (because the actual memory space the value is stored at may be internally changed–at the sole discretion of the implementation).
So with maps, the general rule applies: if the map is accessed from multiple goroutines where at least one of them is a write (assign a value to a key), explicit synchronization is needed.
I need someome to help or at least any tip. I'm trying to read from large files (100mb - 11gb) line by line and then store some data into Map.
var m map[string]string
// expansive func
func stress(s string, mutex sync.Mutex) {
// some very cost operation .... that's why I want to use goroutines
mutex.Lock()
m[s] = s // store result
mutex.Unlock()
}
func main() {
file, err := os.Open("somefile.txt")
if err != nil {
fmt.Println(err)
return
}
defer func() {
if err = file.Close(); err != nil {
fmt.Println(err)
return
}
}()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
go stress(scanner.Text(), mutex)
}
}
Without gouroutines it works fine but slow. As you can see, file is large so within loop there will be a lot of gouroutines. And that fact provides two problems:
Sometimes mutex doesn't work properly. And programm crashes. (How many goroutines mutex suppose?)
Everytime some data just lost (But programm doesn't crash)
I suppose I should use WaitGroup, but I cannot understand how it should be. Also I guess there should be some limit for goroutines, maybe some counter. It would be great to run it in 5-20 goroutines.
UPD. Yes, As #user229044 mentioned, I have to pass mutex by pointer. But the problem with limiting goroutines within loop still active.
UPD2. This is how I workaround this problem. I don't exactly understand which way program handle these goroutines and how memory and process time go. Also almost all commentors point on Map structure, but the main problem was to handle runtime of goroutines. How many goroutines spawn if it would be 10billions iterations of Scan() loop, and how goroutines store in RAM?
func stress(s string, mutex *sync.Mutex) {
// a lot of coslty ops
// ...
// ...
mutex.Lock()
m[where] = result // store result
mutex.Unlock()
wg.Done()
}
// main
for scanner.Scan() {
wg.Add(1)
go func(data string) {
stress(data, &mutex)
}(scanner.Text())
}
wg.Wait()
Your specific problem is that you're copying the mutex by value. You should be passing a pointer to the mutex, so that a single instance of your mutex is shared by all function invocations. You're also spawning an unbounded number of go routines, which will eventually exhaust your system's memory.
However, you can spawn as many Go routines as you want and you're only wasting resources for no gain, and juggling all of those useless Go routines will probably cause a net loss of performance. Increased parallelism can't help you when every parallel process has to wait for serialized access to a data structure, as is the case with your map. sync.WaitGroup and mutexes are the wrong approach here.
Instead, to add and control concurrency, you want a buffered channel and single Go routine responsible for map inserts. This way you have one process reading from the file, and one process inserting into the map, decoupling the disk IO from the map insertion.
Something like this:
scanner := bufio.NewScanner(file)
ch := make(chan string, 10)
go func() {
for s := range ch {
m[s] = s
}
}()
for scanner.Scan() {
ch <- scanner.Text()
}
close(ch)
Say I have a large list of strings and I want to sort them, beyond the usual sort.Sort and sort.Slice etc I wanted to use more than one core to speed things up. So while reading the large list I add the strings to 2 different slices, strings that start with a-m and n-z (for arguments sake).
Meanwhile I've fired up multiple go routines to read a channel of string slices which will then sort their own sublists. So far, so good, "potentially" parallel processing of the lists so my sort time if effectively halved. Great. Now my question is how do I get the results back to the main goroutine?
Originally each goroutine had 2 channels, one for incoming unsorted list and the other for sorted list. Yes it works... but uses SOOO much memory (hey give the volume of data I'm tinkering with for this test, that's probably not unreasonable). But then it dawned on me that by passing a slice on a channel is really just passing a reference, so I don't actually NEED to pass anything back. Not having to put the resulting sorted lists in a channel for the return journey is obviously far less taxing memory wise, but it (to me) smells.
This means I could have one of the goroutines sorting away meanwhile the main goroutine (in theory) could be manipulating the same list. As long as discipline is used this wouldn't be an issue, but is still obviously a concern. Is there a generally accepted best practice within Go to say that references shouldn't be passed as input from one goroutine to another.... but IS acceptable that a goroutine generating reference data can be returned via a channel (since the goroutine would then stop using the reference).
Before anyone says it, yes I know I don't have to pass these in via channels etc but this just the case I was tinkering with and got me thinking.
Long and hand wavy I know. Here's a minimal subset of code showing the above.
package main
import (
"bufio"
"fmt"
"os"
"sort"
"strings"
"sync"
"time"
)
var wg sync.WaitGroup
func sortWordsList(id int, ch chan []string ) {
l := <- ch
sort.Strings(l)
wg.Done()
}
func main() {
file, err := os.Open("big.txt")
defer file.Close()
if err != nil {
fmt.Printf("BOOM %s\n", err.Error())
panic(err)
}
// Start reading from the file with a reader.
reader := bufio.NewReader(file)
inCh1 := make(chan []string, 1000)
inCh2 := make(chan []string, 1000)
go sortWordsList(1, inCh1)
go sortWordsList(2, inCh2)
wg.Add(2)
words1 := []string{}
words2 := []string{}
for {
line, err := reader.ReadString('\n')
if err != nil {
break
}
sp := strings.Split(line, " ")
for _,w := range sp {
word := strings.ToLower(w)
word = strings.TrimSuffix(word, "\n")
if len(word) > 0 {
// figure out where to go.
// arbitrary split.
if word[0] < 'm' {
words1 = append(words1, word)
} else {
words2 = append(words2, word)
}
}
}
}
inCh1 <- words1
inCh2 <- words2
close(inCh1)
close(inCh2)
wg.Wait()
// now have sorted words1 and words2 slices.
}
There is nothing wrong with passing pointers, slices, or maps. As long as you synchronize the access to the shared variable, you can pass a pointer and keep on using it in the sending goroutine. For large objects like arrays or large structs, passing a pointer is usually the logical thing to do to avoid expensive copies. Also, avoiding passing pointer means avoiding passing slices and maps, or anything that contains slices, maps, or pointers to other structs.
As you already know, you don't really need channels here, simply start your goroutines after you constructed your slices, and pass the slices directly.
go sortWordsList(words1)
go sortWordsList(words2)
or:
go sort.Strings(words1)
go sort.Strings(words2)
I saw the example of errgroup in godoc, and it makes me confused that it simply assigns the result to global results instead of using channels in each search routines. Heres the code:
Google := func(ctx context.Context, query string) ([]Result, error) {
g, ctx := errgroup.WithContext(ctx)
searches := []Search{Web, Image, Video}
results := make([]Result, len(searches))
for i, search := range searches {
i, search := i, search // https://golang.org/doc/faq#closures_and_goroutines
g.Go(func() error {
result, err := search(ctx, query)
if err == nil {
results[i] = result
}
return err
})
}
if err := g.Wait(); err != nil {
return nil, err
}
return results, nil
}
I'm not sure is there any reason or implied rules guarantees it is correct? THX
The intent here is to make searches and results congruent. The result for the Web search is always at results[0], the result for the Image search always at results[1], etc. It also makes for a simpler example, because there is no need for an additional goroutine that consumes a channel.
If the goroutines would send their results into a channel, the result order would be unpredictable. If predictable result order is not a property you care about feel free to use a channel.
There is secret sauce in this code that creates siloing:
results := make([]Result, len(searches))
^^^^ ^^^^^^^^^^^^^
for i, search := ... {
i, search := i, search
^^^^^^^^^^
g.Go {
results[i] = result
^^^^^^^^^^
}
We know how big the result set is going to be, so we pre-allocate all the slots before starting any goroutines. This eliminates any contention over the slice object itself
make(.., len(searches))
^^^^ ^^^^^^^^^^^^^
We then promote the index number and search property to a closure for each iteration, so there is no contention over the variables being used by the loop/goroutines
i, search := i, search
And finally, each worker operates on a singular slot in the pre-sized slice:
results[i] = result
The workers are guaranteed to only perform read operations on the "results" slice to find out where their element is (results[i]).
This particular pattern is limiting, you can't use the results until all the workers are completed. So ask yourself what you're going to do next when deciding whether to use this or a channels-based pipeline workflow.
results := getSearchResults(searches)
statistics := analyzeResults(results)
for stats := range statistics {
our.Write("{%s}\n", stats.String())
}
If the analysis of a given result is independent of any other, this is a good candidate for a channel-based workflow.
But if the analysis depends on order, or has different results depending on each other then you may not have any choice but to serialize the flow.
I have been using Go for a little and still getting better everyday, but not an expert per se. Currently I am tackling concurrency and goroutines as I think that is the final unknown in my Go toolbelt. I think I am getting the hang of it as such, but still definitely a beginner.
The task I am having an issue with seems pretty basic to me, but nothing I have tried works. I would like to figure out a way to calculate the length of a channel.
From what I have gathered, len() only works on buffered channels so that won't help me in this case. What I am doing is reading values from the DB in batches. I have a generator func that goes like
func gen() chan Result {
out := make(chan Result)
go func() {
... query db
for rows.Next() {
out <- row
}
close(out)
}()
return out
}
then I am using it as such
c := gen()
...
// do other stuff
I would either like to return the count with the out channel, or wrap all of it in a struct type and just return that.
like so:
c, len := gen()
or:
a := gen()
fmt.Println(a.c)
fmt.Println(a.len)
I believe I have tried all but using atomic, which I think would actually work but I read around and it apparently isn't the right thing to use atomic for. What other options do I have that either don't leave me with a 0 or blocks infinitely
Thanks!
The len built-in will return the "length" of a channel:
func len(v Type) int
The len built-in function returns the length of v, according to its type:
Array: the number of elements in v.
Pointer to array: the number of elements in *v (even if v is nil).
Slice, or map: the number of elements in v; if v is nil, len(v) is zero.
String: the number of bytes in v.
Channel: the number of elements queued (unread) in the channel buffer;
if v is nil, len(v) is zero.
But I don't think that will help you.
What you really need is a new approach to your problem: counting the items in queue in a channel is not an appropriate way to handle "batches" of tasks.
What do you need this length for?
You are using a not buffered channels. Thank you for that👍👏👌🙌
Unbuffered channel uses no memory. thus never contains nothing !
The only purpose of unbuffered channels are for achieving synchronicity between goroutine by passing an element from one to another. That's it !
go func(){
c:=make(chan struct{})
c<-struct{}{} // Definitely locked
}()
another deadlock
go func(){
c:=make(chan struct{})
<-c // Definitely locked
c<-struct{}{} // Never get there
}()
use another goroutine to read the channel
go func(){
c:=make(chan struct{})
go func(){<-c}()
c<-struct{}{}
}()
In your case you have a generator, which means you have to read the channel until the producer goroutine will close it. It is a good design that ensures that your goroutine are not dangling.
// Read the channel until the injecter goroutine finishes and closes it.
for r := range gen() {
...
}
// goroutine inner of gen() as finished
I am assuming, from your follow on answers, you actually want to know "good" values for the workers pool and the buffer on the channel to keep everything working "optimally".
This is extremely hard, and depends on what the workers are doing, but at a first guess I'd look at a minimal value of buffered channel and a pool of workers at runtime.GOMAXPROCS(0). If you have a lot of resources then you could go as far as "infinite" workers.