Why am I getting an EOF out of my io.PipeReader? - go

I’m using something similar in a project and I'm a bit perplexed: why isn't anything being printed?
package main
import (
"fmt"
"encoding/json"
"io"
)
func main() {
m := make(map[string]string)
m["foo"] = "bar"
pr, pw := io.Pipe()
go func() { pw.CloseWithError(json.NewEncoder(pw).Encode(&m)) }()
fmt.Fscan(pr)
}
https://play.golang.org/p/OJT1ZRAnut
Is this a race condition of some sort? I tried removing pw.CloseWithError but it changes nothing.

fmt.Fscan takes two arguments. A reader to read from, and one or more pointers to objects to populate. Its result is (n int, err error), where n is the number of items read, and err is the reason why n is less than the (variadic...) slice of data objects you fed into its second argument.
In this case, the slice of data objects is length zero, so Fscan fills zero objects and reads no data. It dutifully reports that it scanned 0 objects, and since that number is not less than the number of objects you passed into it, it reports nil error.
Try the following:
func main() {
m := make(map[string]string)
m["foo"] = "bar"
pr, pw := io.Pipe()
go func() { pw.CloseWithError(json.NewEncoder(pw).Encode(&m)) }()
var s string
n, err := fmt.Fscan(pr, &s)
fmt.Println(n, err) // should be 1 <nil>
fmt.Println(s) // should be {"foo":"bar"}
}

Related

How can I get a goroutine's runtime ID?

How can I get a goroutine's runtime ID?
I'm getting interleaved logs from an imported package - one approach would be to add a unique identifier to the logs of each goroutine.
I've found some references to runtime.GoID:
func worker() {
id := runtime.GoID()
log.Println("Goroutine ID:", id)
}
But it looks like this is now outdated/has been removed - https://pkg.go.dev/runtime?
Go deliberately chooses not to provide an ID since it would encourage worse software and hurt the overall ecosystem: https://go.dev/doc/faq#no_goroutine_id
Generally, the desire to de-anonymize goroutines is a design flaw and is strongly not recommended. There is almost always going to be a much better way to solve the issue at hand. Eg, if you need a unique identifier, it should be passed into the function or potentially via context.Context.
However, internally the runtime needs IDs for the implementation. For educational purposes you can find them with something like:
package main
import (
"bytes"
"errors"
"fmt"
"runtime"
"strconv"
)
func main() {
fmt.Println(goid())
done := make(chan struct{})
go func() {
fmt.Println(goid())
done <- struct{}{}
}()
go func() {
fmt.Println(goid())
done <- struct{}{}
}()
<-done
<-done
}
var (
goroutinePrefix = []byte("goroutine ")
errBadStack = errors.New("invalid runtime.Stack output")
)
// This is terrible, slow, and should never be used.
func goid() (int, error) {
buf := make([]byte, 32)
n := runtime.Stack(buf, false)
buf = buf[:n]
// goroutine 1 [running]: ...
buf, ok := bytes.CutPrefix(buf, goroutinePrefix)
if !ok {
return 0, errBadStack
}
i := bytes.IndexByte(buf, ' ')
if i < 0 {
return 0, errBadStack
}
return strconv.Atoi(string(buf[:i]))
}
Example output:
1 <nil>
19 <nil>
18 <nil>
They can also be found (less portably) via assembly by accessing the goid field in the g struct. This is how packages like github.com/petermattis/goid typically do it.

In Go, even though []byte is passed to a method by value, the original value is still modified?

Normally, the original value will change only if it's passed as a pointer to a method.
But I see this scenario that the original value is changed when it's passed as a value to a method. This happens a lot in implementations of Reader interface.
For example:
package main
import (
"fmt"
"io"
"strings"
)
func main() {
r := strings.NewReader("Hello, Reader!")
b := make([]byte, 8)
for {
n, err := r.Read(b)
fmt.Printf("n = %v err = %v b = %v\n", n, err, b)
fmt.Printf("b[:n] = %q\n", b[:n])
if err == io.EOF {
break
}
}
}
the variable b is passed to the Read method using value
n, err := r.Read(b)
but somehow the original value got changed and populated with data to print with.
If I look into the implementation of Read method
func (r *Reader) Read(b []byte) (n int, err error) {
if r.i >= int64(len(r.s)) {
return 0, io.EOF
}
r.prevRune = -1
n = copy(b, r.s[r.i:])
r.i += int64(n)
return
}
We can clearly see it's a value parameter?
This obeys how in go everything is passed and copied by value.
From my understand, b should at least be passed as a pointer like
func (r *Reader) Read(b *[]byte) (n int, err error) {
Why is this? Please help thanks.
A slice does not directly hold its contents. Instead a slice holds a pointer to its underlying array which holds the contents of the slice.
So, essentially yes, you are passing by value, but the value is a pointer the "backing array".
See: https://dave.cheney.net/2018/07/12/slices-from-the-ground-up

I want to split a file into equally sized "chunks", or slices and use goroutines to process them simultaneously

Using Go, I have large log files. Currently I open them, create a new scanner bufio.NewScanner, and then for scanner.Scan() to loop through the lines. Each line is sent through a processing function, which matches it to regular expressions and extracts data. I would like to process this file in chunks simultaneously using goroutines. I believe this may be quicker than looping through the whole file sequentially.
It can take a few seconds per file, and I'm wondering if I can process a single file in, say, 10 pieces at a time. I believe I can sacrifice the memory if needed. I have ~3gb, and the biggest log file is maybe 75mb.
I see that a scanner has a .Split() method, where you can provide a custom split function, but I wasn't able to find a good solution using this method.
I've also tried creating a slice of slices, looping through the scanner with scanner.Scan() and appending scanner.Text() to each slice.
eg:
// pseudocode because I couldn't get this to work either
scanner := bufio.NewScanner(logInfo)
threads := [[], [], [], [], []]
i := 0
for scanner.Scan() {
i = i + 1
if i > 5 {
i = 0
}
threads[i] = append(threads[i], scanner.Text())
}
fmt.Println(threads)
I'm new to Go and concerned about efficiency and performance. I want to learn how to write good Go code! Any help or advice is really appreciated.
Peter gives a good starting point, if you wanted to do something like a fan-out, fan-in pattern you could do something like:
package main
import (
"bufio"
"fmt"
"log"
"os"
"sync"
)
func main() {
file, err := os.Open("/path/to/file.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
lines := make(chan string)
// start four workers to do the heavy lifting
wc1 := startWorker(lines)
wc2 := startWorker(lines)
wc3 := startWorker(lines)
wc4 := startWorker(lines)
scanner := bufio.NewScanner(file)
go func() {
defer close(lines)
for scanner.Scan() {
lines <- scanner.Text()
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}()
merged := merge(wc1, wc2, wc3, wc4)
for line := range merged {
fmt.Println(line)
}
}
func startWorker(lines <-chan string) <-chan string {
finished := make(chan string)
go func() {
defer close(finished)
for line := range lines {
// Do your heavy work here
finished <- line
}
}()
return finished
}
func merge(cs ...<-chan string) <-chan string {
var wg sync.WaitGroup
out := make(chan string)
// Start an output goroutine for each input channel in cs. output
// copies values from c to out until c is closed, then calls wg.Done.
output := func(c <-chan string) {
for n := range c {
out <- n
}
wg.Done()
}
wg.Add(len(cs))
for _, c := range cs {
go output(c)
}
// Start a goroutine to close out once all the output goroutines are
// done. This must start after the wg.Add call.
go func() {
wg.Wait()
close(out)
}()
return out
}
If it is acceptable that line N+1 is processed before line N, you can use a simple fan-out pattern to get started. The Go blog explains this and more advanced patterns, such as cancelation and fan-in.
Note that this is just a starting point to keep it simple and on point. You would almost certainly want to wait for the process functions to return before exiting, for instance. This is explained in the mentioned blog post.
package main
import "bufio"
func main() {
var sc *bufio.Scanner
lines := make(chan string)
go process(lines)
go process(lines)
go process(lines)
go process(lines)
for sc.Scan() {
lines <- sc.Text()
}
close(lines)
}
func process(lines <-chan string) {
for line := range lines {
// implement processing here
}
}

Go: zlib uncompressing a slice of bytes

I am trying to parse a file that annoying consists of many separately zipped segments. I have parsed these segments one at a time into a slice of bytes and I want to uncompress them as I go.
Here is my current code that does the decompressing, which doesn't work. from and to are just set at the top as an example, in reality they are set by the code. data is the byte array containing the entire file. I don't want to seek it while it's on disk because its location on another server, so it's only realistic for me to load the entire file to []byte first and then parse it.
from, to := 0, 1000;
b := bytes.NewReader(data[from:from+to])
z, err := zlib.NewReader(b)
CheckErr(err)
defer z.Close()
p := make([]byte,0,1024)
z.Read(p)
fmt.Println(string(p))
So how is it so massively difficult just to unzip a slice of bytes? Anyway...
The problem appears to with how I am reading it out. Where it says z.Read, that doesn't seem to do anything.
How can I read the entire thing in one go into a slice of bytes?
Here's an outline for you. Note: In Go, CHECK FOR ERRORS!
package main
import (
"bytes"
"compress/zlib"
"fmt"
"io/ioutil"
)
func readSegment(data []byte, from, to int) ([]byte, error) {
b := bytes.NewReader(data[from : from+to])
z, err := zlib.NewReader(b)
if err != nil {
return nil, err
}
defer z.Close()
p, err := ioutil.ReadAll(z)
if err != nil {
return nil, err
}
return p, nil
}
func main() {
from, to := 0, 1000
data := make([]byte, from+to)
// ** parse input segments into data **
p, err := readSegment(data, from, to)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(p))
}
Use ReadAll(r io.Reader) ([]byte, error) from the io/ioutil package.
p, err := ioutil.ReadAll(b)
fmt.Println(string(p))
Read only reads up to the length of the given slice (1024 bytes in your case).
To read in chunks of 1024 bytes:
p := make([]byte,1024)
for {
numBytes, err := l.Read(p)
if err == io.EOF {
// you are done, numBytes might be less than len(p)
break
}
// do what you want with p
}
If you are getting the data from a webserver, you might even do
import (
"net/http"
"io/ioutil"
)
...
resp, errGet := http.Get("http://example.com/somefile")
// do error handling
z, errZ := zlib.NewReader(resp.Body)
// do error handling
resp.Body.Close()
p, err := ioutil.ReadAll(b)
// do error handling
since resp.Body happens to be an io.Reader as most io related types.

Consuming all elements of a channel into a slice

How can I construct a slice out of all of elements consumed from a channel (like Python's list does)? I can use this helper function:
func ToSlice(c chan int) []int {
s := make([]int, 0)
for i := range c {
s = append(s, i)
}
return s
}
but due to the lack of generics, I'll have to write that for every type, won't I? Is there a builtin function that implements this? If not, how can I avoid copying and pasting the above code for every single type I'm using?
If there's only a few instances in your code where the conversion is needed, then there's absolutely nothing wrong with copying the 7 lines of code a few times (or even inlining it where it's used, which reduces it to 4 lines of code and is probably the most readable solution).
If you've really got conversions between lots and lots of types of channels and slices and want something generic, then you can do this with reflection at the cost of ugliness and lack of static typing at the callsite of ChanToSlice.
Here's complete example code for how you can use reflect to solve this problem with a demonstration of it working for an int channel.
package main
import (
"fmt"
"reflect"
)
// ChanToSlice reads all data from ch (which must be a chan), returning a
// slice of the data. If ch is a 'T chan' then the return value is of type
// []T inside the returned interface.
// A typical call would be sl := ChanToSlice(ch).([]int)
func ChanToSlice(ch interface{}) interface{} {
chv := reflect.ValueOf(ch)
slv := reflect.MakeSlice(reflect.SliceOf(reflect.TypeOf(ch).Elem()), 0, 0)
for {
v, ok := chv.Recv()
if !ok {
return slv.Interface()
}
slv = reflect.Append(slv, v)
}
}
func main() {
ch := make(chan int)
go func() {
for i := 0; i < 10; i++ {
ch <- i
}
close(ch)
}()
sl := ChanToSlice(ch).([]int)
fmt.Println(sl)
}
You could make ToSlice() just work on interface{}'s, but the amount of code you save here will likely cost you in complexity elsewhere.
func ToSlice(c chan interface{}) []interface{} {
s := make([]interface{}, 0)
for i := range c {
s = append(s, i)
}
return s
}
Full example at http://play.golang.org/p/wxx-Yf5ESN
That being said: As #Volker said in the comments from the slice (haha) of code you showed it seems like it'd be saner to either process the results in a streaming fashion or "buffer them up" at the generator and just send the slice down the channel.

Resources