Scanf in multiple goroutines giving unexpected results - go

I was simply experimenting in golang. I came across an interesting result. This is my code.
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
var str1, str2 string
wg.Add(2)
go func() {
fmt.Scanf("%s", &str1)
wg.Done()
}()
go func() {
fmt.Scanf("%s", &str2)
wg.Done()
}()
wg.Wait()
fmt.Printf("%s %s\n", str1, str2)
}
I gave the following input.
beat
it
I was expecting the result to be either
it beat
or
beat it
But I got.
eat bit
Can any one please help me figure out why it is so?

fmt.Scanf isn't an atomic operation. Here's the implementation : http://golang.org/src/pkg/fmt/scan.go#L1115
There's no semaphor, nothing preventing two parallel executions. So what happens is simply that the executions are really parallel, and as there's no buffering, any byte reading is an IO operation and thus a perfect time for the go scheduler to change goroutine.

The problem is that you are sharing a single resource (the stdin byte stream) across multiple goroutines.
Each goroutine could be spawn at different non-deterministic times. i.e:
first goroutine 1 read all stdin, then start goroutine 2
first goroutine 2 read all stdin, then start goroutine 1
first goroutine 1 block on read, then start goroutine 2 read one char and then restart goroutine 1
... and so on and on ...
In most cases is enough to use only one goroutine to access a linear resource as a byte stream and attach a channel to it and then spawn multiple consumers that listen to that channel.
For example:
package main
import (
"fmt"
"io"
"sync"
)
func main() {
var wg sync.WaitGroup
words := make(chan string, 10)
wg.Add(1)
go func() {
for {
var buff string
_, err := fmt.Scanf("%s", &buff)
if err != nil {
if err != io.EOF {
fmt.Println("Error: ", err)
}
break
}
words <- buff
}
close(words)
wg.Done()
}()
// Multiple consumers
for i := 0; i < 5; i += 1 {
go func() {
for word := range words {
fmt.Printf("%s\n", word)
}
}()
}
wg.Wait()
}

Related

How to return multiple values from goroutine?

I found some example with "Catch values from Goroutines"-> link
There is show how to fetch value but if I want to return value from several goroutines, it wont work.So, does anybody know, how to do it?
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"sync"
)
// WaitGroup is used to wait for the program to finish goroutines.
var wg sync.WaitGroup
func responseSize(url string, nums chan int) {
// Schedule the call to WaitGroup's Done to tell goroutine is completed.
defer wg.Done()
response, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
defer response.Body.Close()
body, err := ioutil.ReadAll(response.Body)
if err != nil {
log.Fatal(err)
}
// Send value to the unbuffered channel
nums <- len(body)
}
func main() {
nums := make(chan int) // Declare a unbuffered channel
wg.Add(1)
go responseSize("https://www.golangprograms.com", nums)
go responseSize("https://gobyexample.com/worker-pools", nums)
go responseSize("https://stackoverflow.com/questions/ask", nums)
fmt.Println(<-nums) // Read the value from unbuffered channel
wg.Wait()
close(nums) // Closes the channel
// >> loading forever
Also, this example, worker pools
Is it possible to get value from result: fmt.Println(<-results) <- will be error.
Yes, just read from the channel multiple times:
answerOne := <-nums
answerTwo := <-nums
answerThree := <-nums
Channels function like thread-safe queues, allowing you to queue up values and read them out one by one
P.S. You should either add 3 to the wait group or not have one at all. The <-nums will block until a value is available on nums so it is not necessary

Why is this golang script giving me a deadlock ? + a few questions

I got this code from someone on github and I am trying to play around with it to understand concurrency.
package main
import (
"bufio"
"fmt"
"os"
"sync"
"time"
)
var wg sync.WaitGroup
func sad(url string) string {
fmt.Printf("gonna sleep a bit\n")
time.Sleep(2 * time.Second)
return url + " added stuff"
}
func main() {
sc := bufio.NewScanner(os.Stdin)
urls := make(chan string)
results := make(chan string)
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for url := range urls {
n := sad(url)
results <- n
}
}()
}
for sc.Scan() {
url := sc.Text()
urls <- url
}
for result := range results {
fmt.Printf("%s arrived\n", result)
}
wg.Wait()
close(urls)
close(results)
}
I have a few questions:
Why does this code give me a deadlock?
How does that for loop exist before the operation of taking in input from user does the go routines wait until anything is passes in the urls channel then start doing work? I don't get this because it's not sequential, like why is taking in input from user then putting every input in the urls channel then running the go routines is considered wrong?
Inside the for loop I have another loop which is iterating over the urls channel, does each go routine deal with exactly one line of input? or does one go routine handle multiple lines at once? how does any of this work?
Am i gathering the output correctly here?
Mostly you're doing things correctly, but have things a little out of order. The for sc.Scan() loop will continue until Scanner is done, and the for result := range results loop will never run, thus no go routine ('main' in this case) will be able to receive from results. When running your example, I started the for result := range results loop before for sc.Scan() and also in its own go routine--otherwise for sc.Scan() will never be reached.
go func() {
for result := range results {
fmt.Printf("%s arrived\n", result)
}
}()
for sc.Scan() {
url := sc.Text()
urls <- url
}
Also, because you run wg.Wait() before close(urls), the main goroutine is left blocked waiting for the 20 sad() go routines to finish. But they can't finish until close(urls) is called. So just close that channel before waiting for the waitgroup.
close(urls)
wg.Wait()
close(results)
The for-loop creates 20 goroutines, all waiting input from the urls channel. When someone writes into this channel, one of the goroutines will pick it up and work on in. This is a typical worker-pool implementation.
Then, then scanner reads input line by line, and sends it to the urls channel, where one of the goroutines will pick it up and write the response to the results channel. At this point, there are no other goroutines reading from the results channel, so this will block.
As the scanner reads URLs, all other goroutines will pick them up and block. So if the scanner reads more than 20 URLs, it will deadlock because all goroutines will be waiting for a reader.
If there are fewer than 20 URLs, the scanner for-loop will end, and the results will be read. However that will eventually deadlock as well, because the for-loop will terminate when the channel is closed, and there is no one there to close the channel.
To fix this, first, close the urls channel right after you finish reading. That will release all the for-loops in the goroutines. Then you should put the for-loop reading from the results channel into a goroutine, so you can call wg.Wait while results are being processed. After wg.Wait, you can close the results channel.
This does not guarantee that all items in the results channel will be read. The program may terminate before all messages are processed, so use a third channel which you close at the end of the goroutine that reads from the results channel. That is:
done:=make(chan struct{})
go func() {
defer close(done)
for result := range results {
fmt.Printf("%s arrived\n", result)
}
}()
wg.Wait()
close(results)
<-done
I am not super happy with previous answers, so here is a solution based on the documented behavior in the go tour, the go doc, the specifications.
package main
import (
"bufio"
"fmt"
"strings"
"sync"
"time"
)
var wg sync.WaitGroup
func sad(url string) string {
fmt.Printf("gonna sleep a bit\n")
time.Sleep(2 * time.Millisecond)
return url + " added stuff"
}
func main() {
// sc := bufio.NewScanner(os.Stdin)
sc := bufio.NewScanner(strings.NewReader(strings.Repeat("blah blah\n", 15)))
urls := make(chan string)
results := make(chan string)
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for url := range urls {
n := sad(url)
results <- n
}
}()
}
// results is consumed by so many goroutines
// we must wait for them to finish before closing results
// but we dont want to block here, so put that into a routine.
go func() {
wg.Wait()
close(results)
}()
go func() {
for sc.Scan() {
url := sc.Text()
urls <- url
}
close(urls) // done consuming a channel, close it, right away.
}()
for result := range results {
fmt.Printf("%s arrived\n", result)
} // the program will finish when it gets out of this loop.
// It will get out of this loop because you have made sure the results channel is closed.
}

Understanding golang channels: deadlock

The following code:
package main
import (
"fmt"
"strings"
)
var data = []string{
"The yellow fish swims slowly in the water",
"The brown dog barks loudly after a drink ...",
"The dark bird bird of prey lands on a small ...",
}
func main() {
histogram := make(map[string]int)
words := make(chan string)
for _, line := range data {
go func(l string) {
for _, w := range strings.Split(line, " ") {
words <- w
}
}(line)
}
defer close(words)
for w := range words {
histogram[w]++
}
fmt.Println(histogram)
}
ends with deadlock:
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
/tmp/sandbox780076580/main.go:28 +0x1e0
My understanding is that channel words will block writers and readers to achieve some synchronization. I'm trying to use a single channel for all goroutines (writers) and a single reader in main (using "range" command).
I have tried also with buffered channels - similar failures.
I have problems to understand why this is not working. Any tips towards understanding?
Thank you.
As stated in the comments to the question, the defer is not executed until main returns. As a result, the range over words blocks forever.
To fix the issue, the application must close words when all of the goroutines are done sending. One way to do this is to use a wait group. The wait group is incremented for each goroutine, decremented when the goroutines exit. Yet another goroutine waits on the group and closes the channel.
func main() {
histogram := make(map[string]int)
words := make(chan string)
var wg sync.WaitGroup
for _, line := range data {
wg.Add(1)
go func(l string) {
for _, w := range strings.Split(l, " ") {
words <- w
}
wg.Done()
}(line)
}
go func() {
wg.Wait()
close(words)
}()
for w := range words {
histogram[w]++
}
fmt.Println(histogram)
}
Bonus fix: The goroutine in the question referred to the loop variable iine instead of the argument l. The FAQ explains why this is a problem.

Selecting between time interval and length of channel

I'm here to find out the most idiomatic way to do the follow task.
Task:
Write data from a channel to a file.
Problem:
I have a channel ch := make(chan int, 100)
I need to read from the channel and write the values I read from the channel to a file. My question is basically how do I do so given that
If channel ch is full, write the values immediately
If channel ch is not full, write every 5s.
So essentially, data needs to be written to the file at least every 5s (assuming that data will be filled into the channel at least every 5s)
Whats the best way to use select, for and range to do my above task?
Thanks!
There is no such "event" as "buffer of channel is full", so you can't detect that [*]. This means you can't idiomatically solve your problem with language primitives using only 1 channel.
[*] Not entirely true: you could detect if the buffer of a channel is full by using select with default case when sending on the channel, but that requires logic from the senders, and repetitive attempts to send.
I would use another channel from which I would receive as values are sent on it, and "redirect", store the values in another channel which has a buffer of 100 as you mentioned. At each redirection you may check if the internal channel's buffer is full, and if so, do an immediate write. If not, continue to monitor the "incoming" channel and a timer channel with a select statement, and if the timer fires, do a "regular" write.
You may use len(chInternal) to check how many elements are in the chInternal channel, and cap(chInternal) to check its capacity. Note that this is "safe" as we are the only goroutine handling the chInternal channel. If there would be multiple goroutines, value returned by len(chInternal) could be outdated by the time we use it to something (e.g. comparing it).
In this solution chInternal (as its name says) is for internal use only. Others should only send values on ch. Note that ch may or may not be a buffered channel, solution works in both cases. However, you may improve efficiency if you also give some buffer to ch (so chances that senders get blocked will be lower).
var (
chInternal = make(chan int, 100)
ch = make(chan int) // You may (should) make this a buffered channel too
)
func main() {
delay := time.Second * 5
timer := time.NewTimer(delay)
for {
select {
case v := <-ch:
chInternal <- v
if len(chInternal) == cap(chInternal) {
doWrite() // Buffer is full, we need to write immediately
timer.Reset(delay)
}
case <-timer.C:
doWrite() // "Regular" write: 5 seconds have passed since last write
timer.Reset(delay)
}
}
}
If an immediate write happens (due to a "buffer full" situation), this solution will time the next "regular" write 5 seconds after this. If you don't want this and you want the 5-second regular writes be independent from the immediate writes, simply do not reset the timer following the immediate write.
An implementation of doWrite() may be as follows:
var f *os.File // Make sure to open file for writing
func doWrite() {
for {
select {
case v := <-chInternal:
fmt.Fprintf(f, "%d ", v) // Write v to the file
default: // Stop when no more values in chInternal
return
}
}
}
We can't use for ... range as that only returns when the channel is closed, but our chInternal channel is not closed. So we use a select with a default case so when no more values are in the buffer of chInternal, we return.
Improvements
Using a slice instead of 2nd channel
Since the chInternal channel is only used by us, and only on a single goroutine, we may also choose to use a single []int slice instead of a channel (reading/writing a slice is much faster than a channel).
Showing only the different / changed parts, it could look something like this:
var (
buf = make([]int, 0, 100)
)
func main() {
// ...
for {
select {
case v := <-ch:
buf = append(buf, v)
if len(buf) == cap(buf) {
// ...
}
}
func doWrite() {
for _, v := range buf {
fmt.Fprintf(f, "%d ", v) // Write v to the file
}
buf = buf[:0] // "Clear" the buffer
}
With multiple goroutines
If we stick to leave chInternal a channel, the doWrite() function may be called on another goroutine to not block the other one, e.g. go doWrite(). Since data to write is read from a channel (chInternal), this requires no further synchronization.
if you just use 5 seconds write, to increase the file write performance,
you may fill the channel any time you need,
then writer goroutine writes that data to the buffered file,
see this very simple and idiomatic sample without using timer
with just using for...range:
package main
import (
"bufio"
"fmt"
"os"
"sync"
)
var wg sync.WaitGroup
func WriteToFile(filename string, ch chan int) {
f, e := os.Create(filename)
if e != nil {
panic(e)
}
w := bufio.NewWriterSize(f, 4*1024*1024)
defer wg.Done()
defer f.Close()
defer w.Flush()
for v := range ch {
fmt.Fprintf(w, "%d ", v)
}
}
func main() {
ch := make(chan int, 100)
wg.Add(1)
go WriteToFile("file.txt", ch)
for i := 0; i < 500000; i++ {
ch <- i // do the job
}
close(ch) // Finish the job and close output file
wg.Wait()
}
and notice the defers order.
and in case of 5 seconds write, you may add one interval timer just to flush the buffer of this file to the disk, like this:
package main
import (
"bufio"
"fmt"
"os"
"sync"
"time"
)
var wg sync.WaitGroup
func WriteToFile(filename string, ch chan int) {
f, e := os.Create(filename)
if e != nil {
panic(e)
}
w := bufio.NewWriterSize(f, 4*1024*1024)
ticker := time.NewTicker(5 * time.Second)
quit := make(chan struct{})
go func() {
for {
select {
case <-ticker.C:
if w.Buffered() > 0 {
fmt.Println(w.Buffered())
w.Flush()
}
case <-quit:
ticker.Stop()
return
}
}
}()
defer wg.Done()
defer f.Close()
defer w.Flush()
defer close(quit)
for v := range ch {
fmt.Fprintf(w, "%d ", v)
}
}
func main() {
ch := make(chan int, 100)
wg.Add(1)
go WriteToFile("file.txt", ch)
for i := 0; i < 25; i++ {
ch <- i // do the job
time.Sleep(500 * time.Millisecond)
}
close(ch) // Finish the job and close output file
wg.Wait()
}
here I used time.NewTicker(5 * time.Second) for interval timer with quit channel, you may use time.AfterFunc() or time.Tick() or time.Sleep().
with some optimizations ( removing quit channel):
package main
import (
"bufio"
"fmt"
"os"
"sync"
"time"
)
var wg sync.WaitGroup
func WriteToFile(filename string, ch chan int) {
f, e := os.Create(filename)
if e != nil {
panic(e)
}
w := bufio.NewWriterSize(f, 4*1024*1024)
ticker := time.NewTicker(5 * time.Second)
defer wg.Done()
defer f.Close()
defer w.Flush()
for {
select {
case v, ok := <-ch:
if ok {
fmt.Fprintf(w, "%d ", v)
} else {
fmt.Println("done.")
ticker.Stop()
return
}
case <-ticker.C:
if w.Buffered() > 0 {
fmt.Println(w.Buffered())
w.Flush()
}
}
}
}
func main() {
ch := make(chan int, 100)
wg.Add(1)
go WriteToFile("file.txt", ch)
for i := 0; i < 25; i++ {
ch <- i // do the job
time.Sleep(500 * time.Millisecond)
}
close(ch) // Finish the job and close output file
wg.Wait()
}
I hope this helps.

Writing concurrently with channels

I wrote a short script to write a file concurrently.
One goroutine is supposed to write strings to a file while the others are supposed to send the messages through a channel to it.
However, for some really strange reason the file is created but no message is added to it through the channel.
package main
import (
"fmt"
"os"
"sync"
)
var wg sync.WaitGroup
var output = make(chan string)
func concurrent(n uint64) {
output <- fmt.Sprint(n)
defer wg.Done()
}
func printOutput() {
f, err := os.OpenFile("output.txt", os.O_CREATE|os.O_RDWR|os.O_APPEND, 0666);
if err != nil {
panic(err)
}
defer f.Close()
for msg := range output {
f.WriteString(msg+"\n")
}
}
func main() {
wg.Add(2)
go concurrent(1)
go concurrent(2)
wg.Wait()
close(output)
printOutput()
}
The printOutput() goroutine is executed completely, if I tried to write something after the for loop it would actually get into the file. So this leads me to think that range output might be null
You need to have something taking from the output channel as it is blocking until something removes what you put on it.
Not the only/best way to do it but: I moved printOutput() to above the other funcs and run it as a go routine and it prevents the deadlock.
package main
import (
"fmt"
"os"
"sync"
)
var wg sync.WaitGroup
var output = make(chan string)
func concurrent(n uint64) {
output <- fmt.Sprint(n)
defer wg.Done()
}
func printOutput() {
f, err := os.OpenFile("output.txt", os.O_CREATE|os.O_RDWR|os.O_APPEND, 0666)
if err != nil {
panic(err)
}
defer f.Close()
for msg := range output {
f.WriteString(msg + "\n")
}
}
func main() {
go printOutput()
wg.Add(2)
go concurrent(1)
go concurrent(2)
wg.Wait()
close(output)
}
One of the reason why you get a null output is because channels are blocking for both send/receive.
According to your flow, the code snippet below will never reach wg.Done(), as sending channel is expecting a receiving end to pull the data out. This is a typical deadlock example.
func concurrent(n uint64) {
output <- fmt.Sprint(n) // go routine is blocked until data in channel is fetched.
defer wg.Done()
}
Let's examine the main func:
func main() {
wg.Add(2)
go concurrent(1)
go concurrent(2)
wg.Wait() // the main thread will be waiting indefinitely here.
close(output)
printOutput()
}
My take on the problem:
package main
import (
"fmt"
"os"
"sync"
)
var wg sync.WaitGroup
var output = make(chan string)
var donePrinting = make(chan struct{})
func concurrent(n uint) {
defer wg.Done() // It only makes sense to defer
// wg.Done() before you do something.
// (like sending a string to the output channel)
output <- fmt.Sprint(n)
}
func printOutput() {
f, err := os.OpenFile("output.txt", os.O_CREATE|os.O_RDWR|os.O_APPEND, 0666)
if err != nil {
panic(err)
}
defer f.Close()
for msg := range output {
f.WriteString(msg + "\n")
}
donePrinting <- struct{}{}
}
func main() {
wg.Add(2)
go printOutput()
go concurrent(1)
go concurrent(2)
wg.Wait()
close(output)
<-donePrinting
}
Each concurrent function will deduct from the wait-group.
After the two concurrent goroutines finish, the wg.Wait() will unblock, and the next instruction (close(output)) will be executed. You have to wait for the two goroutines to finish before closing the channel. If, instead, you try the following:
go printOutput()
go concurrent(1)
go concurrent(2)
close(output)
wg.Wait()
you could end up with the close(output) instruction executing before any one of the concurrent goroutines concludes. If the channel closes before the concurrent goroutines run, they will crash at runtime, (while trying to write to a closed channel).
If, then, you don’t wait up for the printOutput() goroutine to finish, you could actually quit main() before printOutput() has gotten the chance to finish writing to its file.
Because I want to wait for the printOutput() goroutine to finish before I quit the program, I also created a separate channel just to signal that printOutput() is done.
The <-donePrinting instruction blocks until main receives something over the donePrinting channel.
Once main receives anything (even the empty structure that printOutput() sends), it will unblock and run to conclusion.
https://play.golang.org/p/nXJoYLI758m

Resources