In an effort to learn golang, I was looking through the go source for reverseproxy:
https://golang.org/src/net/http/httputil/reverseproxy.go
I found this block of code (truncated):
...
errc := make(chan error, 1)
spc := switchProtocolCopier{user: conn, backend: backConn}
go spc.copyToBackend(errc)
go spc.copyFromBackend(errc)
<-errc
return
}
// switchProtocolCopier exists so goroutines proxying data back and
// forth have nice names in stacks.
type switchProtocolCopier struct {
user, backend io.ReadWriter
}
func (c switchProtocolCopier) copyFromBackend(errc chan<- error) {
_, err := io.Copy(c.user, c.backend)
errc <- err
}
func (c switchProtocolCopier) copyToBackend(errc chan<- error) {
_, err := io.Copy(c.backend, c.user)
errc <- err
}
The portion that caught my attention was the creation of the errc buffered channel. I thought (probably naively) that we would use an unbuffered channel and the later receive from errc would need to run twice, like this:
<-errc
<-errc
As written, I understand that reading from the channel will ensure at least one of the copy methods has run. I also understand that the first send to the channel will not block, while the second will block only if the first one has not yet been received.
What I don't understand, is why it is written like this. Is it to ensure that only one of the methods completes? If that is the case, couldn't they technically both run?
Thanks!
The channel of size one helps realize a binary semaphore.
Since at most one value is consumed from the channel (on line 549), changing the size of the channel to be greater than one will not affect the currently exhibited behavior, which is wait until at least one of the two go routines complete executing the Copy operation.
Related
I was trying to understand the following piece of code that reads from a channel of channels. I am having some difficulties wrapping my head around the idea.
bridge := func(done <-chan interface{}, chanStream <-chan <-chan interface{}) <-chan interface{} {
outStream := make(chan interface{})
go func() {
defer close(outStream)
for {
var stream <-chan interface{}
select {
case <-done:
return
case maybeSteram, ok := <-chanStream:
if ok == false {
return
}
stream = maybeSteram
}
for c := range orDone(done, stream) {
select {
case outStream <- c:
case <-done: // Why we are selection from the done channel here?
}
}
}
}()
return outStream
}
The orDone function:
orDone := func(done <-chan interface{}, inStream <-chan interface{}) <-chan interface{} {
outStream := make(chan interface{})
go func() {
defer close(outStream)
for {
select {
case <-done:
return
case v, ok := <-inStream:
if ok == false {
return
}
select {
case outStream <- v:
case <-done: // Again why we are only reading from this channel? Shouldn't we return from here?
// Why we are not retuening from here?
}
}
}
}()
return outStream
}
As mentioned in the comment, I need some help to understand why we are selecting in the for c := range orDone(donem, stream). Can anyone explain what is going on here?
Thanks in advance.
Edit
I took the code from the book concurrency in go. The full code can be found here: https://github.com/kat-co/concurrency-in-go-src/blob/master/concurrency-patterns-in-go/the-bridge-channel/fig-bridge-channel.go
In both cases, the select is done to avoid blocking — if the reader isn't reading from our output channel, the write might block (maybe even forever), but we want the goroutine to terminate when the done channel is closed, without waiting for anything else. By using the select, it will wait until either thing happens, and then continue, instead of waiting indefinitely for the write to complete before checking done.
As for the other question, "why are we not returning here?": well, we could. But we don't have to, because a closed channel remains readable forever (producing an unlimited number of zero values) once it's been closed. So it's okay to do nothing in those "bottom" selects; if done was in fact closed we will go back up to the top of the loop and hit the case <-done: return there. I suppose it's a matter of style. I probably would have written the return myself, but the author of this sample may have wanted to avoid handling the same condition in two places. As long as it's just return it doesn't really matter, but if you wanted to do some additional action on done, that behavior would have to be updated in two places if the bottom select returns, but only in one place if it doesn't.
Fist I will explain the usage of done channel. It's a common pattern followed in many concurrency related package implementation in Go. Done channels purpose is to denote the end of computation or more like a stop signal. Usually done channel will be listened by many go-routines or multiple places in code flow. One such example is Done channel in Go's builtin "context" package. Since Go doesn't have anything like broadcast ie., to signal all listeners of a channel (expecting this feature is not a good idea too), people just close the channel and all listeners will receive nil value. In your case, since the second select statement is in the end of for block, the code owner might have decided to just continue the loop so that on next iteration, the first select statement which listens on done channel will return from the function.
I know there is a function called SetReadDeadline that can set a timeout in socket(conn.net) reading, while io.Read not. There is a way that starts another routine as a timer to solve this problem, but it brings another problem that the reader routine(io.Read) still block:
func (self *TimeoutReader) Read(buf []byte) (n int, err error) {
ch := make(chan bool)
n = 0
err = nil
go func() { // this goroutime still exist even when timeout
n, err = self.reader.Read(buf)
ch <- true
}()
select {
case <-ch:
return
case <-time.After(self.timeout):
return 0, errors.New("Timeout")
}
return
}
This question is similar in this post, but the answer is unclear.
Do you guys have any good idea to solve this problem?
Instead of setting a timeout directly on the read, you can close the os.File after a timeout. As written in https://golang.org/pkg/os/#File.Close
Close closes the File, rendering it unusable for I/O. On files that support SetDeadline, any pending I/O operations will be canceled and return immediately with an error.
This should cause your read to fail immediately.
Your mistake here is something different:
When you read from the reader you just read one time and that is wrong:
go func() {
n, err = self.reader.Read(buf) // this Read needs to be in a loop
ch <- true
}()
Here is a simple example (https://play.golang.org/p/2AnhrbrhLrv)
buf := bytes.NewBufferString("0123456789")
r := make([]byte, 3)
n, err := buf.Read(r)
fmt.Println(string(r), n, err)
// Output: 012 3 <nil>
The size of the given slice is used when using the io.Reader. If you would log the n variable in your code you would see, that not the whole file is read. The select statement outside of your goroutine is at the wrong place.
go func() {
a := make([]byte, 1024)
for {
select {
case <-quit:
result <- []byte{}
return
default:
_, err = self.reader.Read(buf)
if err == io.EOF {
result <- a
return
}
}
}
}()
But there is something more! You want to implement the io.Reader interface. After the Read() method is called until the file ends you should not start a goroutine in here, because you just read chunks of the file.
Also the timeout inside the Read() method doesn't help, because that timeout works for each call and not for the whole file.
In addition to #apxp's point about looping over Read, you could use a buffer size of 1 byte so that you never block as long is there is data to read.
When interacting with external resources anything can happen. It is possible for any given io.Reader implementation to simply block forever. Here, I'll write one for you...
type BlockingReader struct{}
func (BlockingReader) Read(b []byte) (int, error) {
<-make(chan struct{})
return 0, nil
}
Remember anyone can implement an interface, so you can't make any assumptions that it will behave like *os.File or any other standard library io.Reader. In addition to asinine coding like mine above, an io.Reader could legitimately connect to a resources that can block forever.
You cannot kill gorountines, so if an io.Reader truly blocks forever the blocked goroutine will continue to consume resources until your application terminates. However, this shouldn't be a problem, a blocked goroutine does not consume much in the way of resources, and should be fine as long as you don't blindly retry blocked Reads by spawning more gorountines.
For some reason, once I started adding strings through a channel in my goroutine, the code stalls when I run it. I thought that it was a scope/closure issue so I moved all code directly into the function to no avail. I have looked through Golang's documentation and all examples look similar to mine so I am kind of clueless as to what is going wrong.
func getPage(url string, c chan<- string, swg sizedwaitgroup.SizedWaitGroup) {
defer swg.Done()
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err)
}
nodes := doc.Find(".v-card .info")
for i := range nodes.Nodes {
el := nodes.Eq(i)
var name string
if el.Find("h3.n span").Size() != 0{
name = el.Find("h3.n span").Text()
}else if el.Find("h3.n").Size() != 0{
name = el.Find("h3.n").Text()
}
address := el.Find(".adr").Text()
phoneNumber := el.Find(".phone.primary").Text()
website, _ := el.Find(".track-visit-website").Attr("href")
//c <- map[string] string{"name":name,"address":address,"Phone Number": phoneNumber,"website": website,};
c <- fmt.Sprint("%s%s%s%s",name,address,phoneNumber,website)
fmt.Println([]string{name,address,phoneNumber,website,})
}
}
func getNumPages(url string) int{
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err);
}
pagination := strings.Split(doc.Find(".pagination p").Contents().Eq(1).Text()," ")
numItems, _ := strconv.Atoi(pagination[len(pagination)-1])
return int(math.Ceil(float64(numItems)/30))
}
func main() {
arrChan := make(chan string)
swg := sizedwaitgroup.New(8)
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add()
go getPage(fmt.Sprintf(base_url,item,1),arrChan,swg)
}
swg.Wait()
}
Edit:
so I fixed it by passing sizedwaitgroup as a reference but when I remove the buffer it doesn't work does that mean that I need to know how many elements will be sent to the channel in advance?
Issue
Building off of Colin Stewart's answer, from the code you have posted, as far as I can tell, your issue is actually with reading your arrChan. You write into it, but there's no place where you read from it in your code.
From the documentation :
If the channel is unbuffered, the sender blocks until the receiver has received the value. If the channel has a buffer, the sender blocks only until the value
has been copied to the buffer; if the buffer is full, this means
waiting until some receiver has retrieved a value.
By making the channel buffered, what's happening is your code is no longer blocking on the channel write operations, the line that looks like:
c <- fmt.Sprint("%s%s%s%s",name,address,phoneNumber,website)
My guess is that if you're still hanging at when the channel has a size of 5000, it's because you have more than 5000 values returned across all of your loops over node.Nodes. Once your buffered channel is full, the operations block until the channel has space, just like if you were writing to an unbuffered channel.
Fix
Here's a minimal example showing you how you would fix something like this (basically just add a reader)
package main
import "sync"
func getPage(item string, c chan<- string) {
c <- item
}
func readChannel(c <-chan string) {
for {
<-c
}
}
func main() {
arrChan := make(chan string)
wg := sync.WaitGroup{}
zips := []string{"78705", "78710", "78715"}
for _, item := range zips {
wg.Add(1)
go func() {
defer wg.Done()
getPage(item, arrChan)
}()
}
go readChannel(arrChan) // comment this out and you'll deadlock
wg.Wait()
}
Your channel has no buffer, so writes will block until the value can be read, and at least in the code you have posted, there are no readers.
You don't need to know size to make it work. But you might in order to exit cleanly. Which can be a bit tricky to observe at time because your program will exit once your main function exits and all goroutines still running are killed immediately finished or not.
As a warm up example, change readChannel in photoionized's response to this:
func readChannel(c <-chan string) {
for {
url := <-c
fmt.Println (url)
}
}
It only adds printing to the original code. But now you'll see better what is actually happening. Notice how it usually only prints two strings when code actually writes 3. This is because code exits once all writing goroutines finish, but reading goroutine is aborted mid way as result. You can "fix" it by removing "go" before readChannel (which would be same as reading the channel in main function). And then you'll see 3 strings printed, but program crashes with a dead lock as readChannel is still reading from the channel, but nobody writes into it anymore. You can fix that too by reading exactly 3 strings in readChannel(), but that requires knowing how many strings you expect to receive.
Here is my minimal working example (I'll use it to illustrate the rest):
package main
import (
"fmt"
"sync"
)
func getPage(url string, c chan<- string, wg *sync.WaitGroup) {
defer wg.Done()
c <- fmt.Sprintf("Got page for %s\n",url)
}
func readChannel(c chan string, wg *sync.WaitGroup) {
defer wg.Done()
var url string
ok := true
for ok {
url, ok = <- c
if ok {
fmt.Printf("Received: %s\n", url)
} else {
fmt.Println("Exiting readChannel")
}
}
}
func main() {
arrChan := make(chan string)
var swg sync.WaitGroup
base_url := "http://test/%s/%d"
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add(1)
go getPage(fmt.Sprintf(base_url,item,1),arrChan,&swg)
}
var wg2 sync.WaitGroup
wg2.Add(1)
go readChannel(arrChan, &wg2)
swg.Wait()
// All written, signal end to readChannel by closing the channel
close(arrChan)
wg2.Wait()
}
Here I close the channel to signal to readChannel that there is nothing left to read, so it can exit cleanly at proper time. But sometimes you might want instead to tell readChannel to read exactly 3 strings and finish. Or may be you would want to start one reader for each writer and each reader will read exactly one string... Well, there are many ways to skin a cat and choice is all yours.
Note, if you remove wg2.Wait() line your code becomes equivalent to photoionized's response and will only print two strings whilst writing 3. This is because code exits once all writers finish (ensured by swg.Wait()), but it does not wait for readChannel to finish.
If you remove close(arrChan) line instead, your code will crash with a deadlock after printing 3 lines as code waits for readChannel to finish, but readChannel waits to read from a channel which nobody is writing to anymore.
If you just remove "go" before the readChannel call, it becomes equivalent of reading from channel inside main function. It will again crash with a dead lock after printing 3 strings because readChannel is still reading when all writers have already finished (and readChannel has already read all they written). A tricky point here is that swg.Wait() line will never be reached by this code as readChannel never exits.
If you move readChannel call after the swg.Wait() then code will crash before even printing a single string. But this is a different dead lock. This time code reaches swg.Wait() and stops there waiting for writers. First writer succeeds, but channel is not buffered, so next writer blocks until someone reads from the channel the data already written. Trouble is - nobody reads from the channel yet as readChannel has not been called yet. So, it stalls and crashes with a dead lock. This particular issue can be "fixed", but making channel buffered as in make(chan string, 3) as that will allow writers to keep writing even though nobody is reading from that channel yet. And sometimes this is what you want. But here again you have to know the maximum of messages to ever be in the channel buffer. And most of the time it's only deferring a problem - just add one more writer and you are where you started - code stalls and crashes as channel buffer is full and that one extra writer is waiting for someone to read from the buffer.
Well, this should covers all bases. So, check your code and see which case is yours.
It doesn't seem possible to have two way communication via channels with a goroutine which is performing file operations, unless you block the channel communication on the file operations. How can I work around the limits this imposes?
Another way to phrase this question...
If I have a loop similar to the following running in a goroutine, how can I tell it to close the connection and exit without blocking on the next Read?
func readLines(response *http.Response, outgoing chan string) error {
defer response.Body.Close()
reader := bufio.NewReader(response.Body)
for {
line, err := reader.ReadString('\n')
if err != nil {
return err
}
outgoing <- line
}
}
It's not possible for it to read from a channel that tells it when to close down because it's blocking on the network reads (in my case, that can take hours).
It doesn't appear to be safe to simply call Close() from outside the goroutine, since the Read/Close methods don't appear to be fully thread safe.
I could simply put a lock around references to response.Body that used inside/outside the routine, but would cause the external code to block until a pending read completes, and I specifically want to be able to interrupt an in-progress read.
To address this scenario, several io.ReadCloser implementations in the standard library support concurrent calls to Read and Close where Close interrupts an active Read.
The response body reader created by net/http Transport is one of those implementations. It is safe to concurrently call Read and Close on the response body.
You can also interrupt an active Read on the response body by calling the Transport CancelRequest method.
Here's how implement cancel using close on the body:
func readLines(response *http.Response, outgoing chan string, done chan struct{}) error {
cancel := make(chan struct{})
go func() {
select {
case <-done:
response.Body.Close()
case <-cancel:
return
}()
defer response.Body.Close()
defer close(cancel) // ensure that goroutine exits
reader := bufio.NewReader(response.Body)
for {
line, err := reader.ReadString('\n')
if err != nil {
return err
}
outgoing <- line
}
}
Calling close(done) from another goroutine will cancel reads on the body.
I'm trying to implement an Observer Pattern suggested here; Observer pattern in Go language
(the code listed above doesn't compile and is incomplete). Here, is a complete code that compiles but I get deadlock error.
package main
import (
"fmt"
)
type Publisher struct{
listeners []chan int
}
type Subscriber struct{
Channel chan int
Name string
}
func (p *Publisher) Sub(c chan int){
p.listeners = append(p.listeners, c)
}
func (p *Publisher) Pub(m int, quit chan int){
for _, c := range p.listeners{
c <- m
}
quit <- 0
}
func (s *Subscriber) ListenOnChannel(){
data := <-s.Channel
fmt.Printf("Name: %v; Data: %v\n", s.Name, data)
}
func main() {
quit := make(chan int)
p := &Publisher{}
subscribers := []*Subscriber{&Subscriber{Channel: make(chan int), Name: "1"}, &Subscriber{Channel: make(chan int), Name: "2"}, &Subscriber{Channel: make(chan int), Name: "3"}}
for _, v := range subscribers{
p.Sub(v.Channel)
go v.ListenOnChannel()
}
p.Pub(2, quit)
<-quit
}
Also, if I get rid of 'quit' completely, I get no error but it only prints first record.
The problem is that you're sending to quit on the same goroutine that's receiving from quit.
quit has a buffer size of 0, which means that in order to proceed there has to be a sender on one side and a receiver on the other at the same time. You're sending, but no one's on the other end, so you wait forever. In this particular case the Go runtime is able to detect the problem and panic.
The reason only the first value is printed when you remove quit is that your main goroutine is exiting before your remaining two are able to print.
Do not just increase channel buffer sizes to get rid of problems like this. It can help (although in this case it doesn't), but it only covers up the problem and doesn't truly fix the underlying cause. Increasing a channel's buffer size is strictly an optimization. In fact, it's usually better to develop with no buffer because it makes concurrency problems more obvious.
There are two ways to fix the problem:
Keep quit, but send 0 on it in each goroutine inside ListenOnChannel. In main, make sure you receive a value from each goroutine before moving on. (In this case, you'll wait for three values.)
Use a WaitGroup. There's a good example of how it works in the documentation.
In general this looks good, but there is one problem. Remember that channels are either buffered or unbuffered (synchronous or asynchronous). When you send to an unbuffered channel or to a channel with a full buffer the sender will block until the data has been removed from the channel by a receiver.
So with that, I'll ask a question or two of my own:
Is the quit channel synchronous or asynchronous?
What happens in Pub when execution hits quit<-0?
One solution that fixes your problem and allows the code to run is to change the second-to-last code line to be go p.Pub(2, quit). But there is another solution. Can you see what it is?
I don't actually get the same behavior you do if I remove <-quit from the original code. And this should not affect the output because as it is written that line is never executed.