Golang read from pipe reads tons of data - go

I'm trying to read an archive that's being tarred, streaming, to stdin, but I'm somehow reading far more data in the pipe than tar is sending.
I run my command like this:
tar -cf - somefolder | ./my-go-binary
The source code is like this:
package main
import (
"bufio"
"io"
"log"
"os"
)
// Read from standard input
func main() {
reader := bufio.NewReader(os.Stdin)
// Read all data from stdin, processing subsequent reads as chunks.
parts := 0
for {
parts++
data := make([]byte, 4<<20) // Read 4MB at a time
_, err := reader.Read(data)
if err == io.EOF {
break
} else if err != nil {
log.Fatalf("Problems reading from input: %s", err)
}
}
log.Printf("Total parts processed: %d\n", parts)
}
For a 100MB tarred folder, I'm getting 1468 chunks of 4MB (that's 6.15GB)! Further, it doesn't seem to matter how large the data []byte array is: if I set the chunk size to 40MB, I still get ~1400 chunks of 40MB data, which makes no sense at all.
Is there something I need to do to read data from os.Stdin properly with Go?

Your code is inefficient. It's allocating and initializing data each time through the loop.
for {
data := make([]byte, 4<<20) // Read 4MB at a time
}
The code for your reader as an io.Reader is wrong. For example, you ignore the number of bytes read by _, err := reader.Read(data) and you don't handle err errors properly.
Package io
import "io"
type Reader
type Reader interface {
Read(p []byte) (n int, err error)
}
Reader is the interface that wraps the basic Read method.
Read reads up to len(p) bytes into p. It returns the number of bytes
read (0 <= n <= len(p)) and any error encountered. Even if Read
returns n < len(p), it may use all of p as scratch space during the
call. If some data is available but not len(p) bytes, Read
conventionally returns what is available instead of waiting for more.
When Read encounters an error or end-of-file condition after
successfully reading n > 0 bytes, it returns the number of bytes read.
It may return the (non-nil) error from the same call or return the
error (and n == 0) from a subsequent call. An instance of this general
case is that a Reader returning a non-zero number of bytes at the end
of the input stream may return either err == EOF or err == nil. The
next Read should return 0, EOF regardless.
Callers should always process the n > 0 bytes returned before
considering the error err. Doing so correctly handles I/O errors that
happen after reading some bytes and also both of the allowed EOF
behaviors.
Implementations of Read are discouraged from returning a zero byte
count with a nil error, except when len(p) == 0. Callers should treat
a return of 0 and nil as indicating that nothing happened; in
particular it does not indicate EOF.
Implementations must not retain p.
Here's a model file read program that conforms to the io.Reader interface:
package main
import (
"bufio"
"io"
"log"
"os"
)
func main() {
nBytes, nChunks := int64(0), int64(0)
r := bufio.NewReader(os.Stdin)
buf := make([]byte, 0, 4*1024)
for {
n, err := r.Read(buf[:cap(buf)])
buf = buf[:n]
if n == 0 {
if err == nil {
continue
}
if err == io.EOF {
break
}
log.Fatal(err)
}
nChunks++
nBytes += int64(len(buf))
// process buf
if err != nil && err != io.EOF {
log.Fatal(err)
}
}
log.Println("Bytes:", nBytes, "Chunks:", nChunks)
}
Output:
2014/11/29 10:00:05 Bytes: 5589891 Chunks: 1365

Read the documentation for Read:
Read reads data into p. It returns the number of bytes read into p. It
calls Read at most once on the underlying Reader, hence n may be less
than len(p). At EOF, the count will be zero and err will be io.EOF.
You are not reading 4MB at a time. You are providing buffer space and discarding the integer that would have told you how much the Read actually read. The buffer space is the maximum, but most usually 128k seems to get read per call, at least on my system. Try it out yourself:
// Read from standard input
func main() {
reader := bufio.NewReader(os.Stdin)
// Read all data from stdin, passing the data as parts into the channel
// for processing.
parts := 0
for {
parts++
data := make([]byte, 4<<20) // Read 4MB at a time
amount , err := reader.Read(data)
// WILL NOT BE 4MB!
log.Printf("Read: %v\n", amount)
if err == io.EOF {
break
} else if err != nil {
log.Fatalf("Problems reading from input: %s", err)
}
}
log.Printf("Total parts processed: %d\n", parts)
}
You have to implement the logic for handling the varying read amounts.

Related

How to read arbitrary amounts of data directly from a file in Go?

Without reading the contents of a file into memory, how can I read "x" bytes from the file so that I can specify what x is for every separate read operation?
I see that the Read method of various Readers takes a byte slice of a certain length and I can read from a file into that slice. But in that case the size of the slice is fixed, whereas what I would like to do, ideally, is something like:
func main() {
f, err := os.Open("./file.txt")
if err != nil {
panic(err)
}
someBytes := f.Read(2)
someMoreBytes := f.Read(4)
}
bytes.Buffer has a Next method which behaves very closely to what I would want, but it requires an existing buffer to work, whereas I'm hoping to read an arbitrary amount of bytes from a file without needing to read the whole thing into memory.
What is the best way to accomplish this?
Thank you for your time.
Use this function:
// readN reads and returns n bytes from the reader.
// On error, readN returns the partial bytes read and
// a non-nil error.
func readN(r io.Reader, n int) ([]byte, error) {
// Allocate buffer for result
b := make([]byte, n)
// ReadFull ensures buffer is filled or error is returned.
n, err := io.ReadFull(r, b)
return b[:n], err
}
Call like this:
someBytes, err := readN(f, 2)
if err != nil { /* handle error here */
someMoreBytes := readN(f, 4)
if err != nil { /* handle error here */
you can do something like this:
f, err := os.Open("/tmp/dat")
check(err)
b1 := make([]byte, 5)
n1, err := f.Read(b1)
check(err)
fmt.Printf("%d bytes: %s\n", n1, string(b1[:n1]))
for more reading please check site.

Multiple serial requests result in empty buffer

The first TCP connection running on localhost on osx always parses the binary sent to it correctly. Subsequent requests lose the binary data, only seeing the first byte [8]. How have I failed to set up my Reader?
package main
import (
"fmt"
"log"
"net"
"os"
"app/src/internal/handler"
"github.com/golang-collections/collections/stack"
)
func main() {
port := os.Getenv("SERVER_PORT")
s := stack.New()
ln, err := net.Listen("tcp", ":8080")
if err != nil {
log.Fatalf("net.Listen: %v", err)
}
fmt.Println("Serving on " + port)
for {
conn, err := ln.Accept()
// defer conn.Close()
if err != nil {
log.Fatal("ln.Accept")
}
go handler.Handle(conn, s)
}
}
package handler
import (
"fmt"
"io"
"log"
"net"
"github.com/golang-collections/collections/stack"
)
func Handle(c net.Conn, s *stack.Stack) {
fmt.Printf("Serving %s\n", c.RemoteAddr().String())
buf := make([]byte, 0, 256)
tmp := make([]byte, 128)
n, err := c.Read(tmp)
if err != nil {
if err != io.EOF {
log.Fatalf("connection Read() %v", err)
}
return
}
buf = append(buf, tmp[:n]...)
}
log:
Serving [::1]:51699
------------- value ---------------:QCXhoy5t
Buffer Length: 9. First Value: 8
Serving [::1]:51700
------------- value ---------------:
Buffer Length: 1. First Value: 8
Serving [::1]:51701
test sent over:
push random string:
QCXhoy5t
push random string:
GPh0EnbS
push random string:
4kJ0wN0R
The docs for Reader say:
Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n
<= len(p)) and any error encountered. Even if Read returns n < len(p), it may use
all of p as scratch space during the call. If some data is available but not
len(p) bytes, Read conventionally returns what is available instead of waiting
for more.
So the most likely cause of your issue is that Read is returning the data available (in this case a single character). You can fix this by using ioutil.ReadAll or performing the read in a loop (the fact the data is being added to a buffer makes it look like that was the original intention) with something like:
for {
n, err := c.Read(tmp)
if err != nil {
if err != io.EOF {
// Note that data might have also been received - you should process that
// if appropriate.
log.Fatalf("connection Read() %v", err)
return
}
break // All data received so process it
}
buf = append(buf, tmp[:n]...)
}
Note: There is no guarantee that any data is received; you should check the length before trying to access it (i.e. buf[0] may panic)

Dynamic FlatBuffers delimiter

I'm using flatbuffer to send binary data over unix socket. The flatbuffer that I send is of dynamic length. The problem I'm facing is, how to know how many bytes I have to read for one table.
Is there something like a delimiter that can be appended while sending, which I can use to determine the end of the flatbuffer.
When I tried with a smaller size
buf := make([]byte, 512)
nr, err := c.Read(buf)
if err != nil {
fmt.Println("exit echo")
return
}
And if the flatbuffer that is bigger than 512 bytes is read, then this results in failure.
When I read by growing my buffer, then I'm not able to find the end of the read
var n, nr int
var err error
buf := make([]byte, 0, 4096) // big buffer
tmp := make([]byte, 512)
for {
n, err = c.Read(tmp)
if err != nil {
break
}
nr += n
if nr >= 4096 {
err = errOverrun
break
}
buf = append(buf, tmp[:n]...)
}
if err != nil {
fmt.Println("read error:", err)
break
}
FlatBuffers does not include a length field by design, since in most context the length is an implicit part of the storage or transfer of a buffer.
If you have no way to know the size of a buffer, or you are streaming buffers, the best is to simply pre-fix any buffer with a 32bit length field, so you can use that to read the rest of the data.
In the C++ API this is even built-in (see SizePrefixed functions), but this hasn't been ported to Go yet, so you'd have to do it manually.

Why does conn.Read() write nothing into a []byte, but bufio.Reader.ReadString() works?

I have a connection, created like this:
conn, err = net.Dial("tcp", "127.0.0.1:20000")
I have tried reading from this connection in two ways. I think they both must work, but the first option doesn't.
Here is the first way of doing it:
var bytes []byte
for i := 0; i < 4; i++ {
conn.Read(bytes)
}
fmt.Printf("%v", bytes)
The output of this method is:
[]
And here is the same thing, done with bufio.Reader:
func readResponse(conn net.Conn) (response string, err error) {
reader := bufio.NewReader(conn)
_, err = reader.Discard(8)
if err != nil {
return
}
response, err = reader.ReadString('\n')
return
}
This function returns the response given by the server on the other end of the TCP connection.
Why does bufio.Reader.Read() work, but net.Conn.Read() doesn't?
The Conn.Read() method is to implement io.Reader, the general interface to read data from any source of bytes into a []byte. Quoting from the doc of Reader.Read():
Read reads up to len(p) bytes into p.
So Read() reads up to len(p) bytes but since you pass a nil slice, it won't read anything (length of a nil slice is 0). Please read the linked doc to know how Reader.Read() works.
Reader.Read() does not allocate a buffer ([]byte) where the read data will be stored, you have to create one and pass it, e.g.:
var buf = make([]byte, 100)
n, err := conn.Read(buf)
// n is the number of read bytes; don't forget to check err!
Don't forget to always check the returned error which may be io.EOF if end of data is reached. The general contract of io.Reader.Read() also allows returning some non-nil error (including io.EOF) and some read data (n > 0) at the same time. The number of read bytes will be in n, which means only the first n bytes of the buf is useful (in other words: buf[:n]).
Your other example using bufio.Reader works because you called Reader.ReadString() which doesn't require a []byte argument. If you would've used the bufio.Reader.Read() method, you would also had to pass a non-nil slice in order to actually get some data.

Reading specific number of bytes from a buffered reader in golang

I am aware of the specific function in golang from the bufio package.
func (b *Reader) Peek(n int) ([]byte, error)
Peek returns the next n bytes without advancing the reader. The bytes
stop being valid at the next read call. If Peek returns fewer than n
bytes, it also returns an error explaining why the read is short. The
error is ErrBufferFull if n is larger than b's buffer size.
I need to be able to read a specific number of bytes from a Reader that will advance the reader. Basically, identical to the function above, but it advances the reader. Does anybody know how to accomplish this?
Note that the bufio.Read method calls the underlying io.Read at most once, meaning that it can return n < len(p), without reaching EOF. If you want to read exactly len(p) bytes or fail with an error, you can use io.ReadFull like this:
n, err := io.ReadFull(reader, p)
This works even if the reader is buffered.
func (b *Reader) Read(p []byte) (n int, err error)
http://golang.org/pkg/bufio/#Reader.Read
The number of bytes read will be limited to len(p)
TLDR:
my42bytes, err := ioutil.ReadAll(io.LimitReader(myReader, 42))
Full answer:
#monicuta mentioned io.ReadFull which works great. Here I provide another method. It works by chaining ioutil.ReadAll and io.LimitReader together. Let's read the doc first:
$ go doc ioutil.ReadAll
func ReadAll(r io.Reader) ([]byte, error)
ReadAll reads from r until an error or EOF and returns the data it read. A
successful call returns err == nil, not err == EOF. Because ReadAll is
defined to read from src until EOF, it does not treat an EOF from Read as an
error to be reported.
$ go doc io.LimitReader
func LimitReader(r Reader, n int64) Reader
LimitReader returns a Reader that reads from r but stops with EOF after n
bytes. The underlying implementation is a *LimitedReader.
So if you want to get 42 bytes from myReader, you do this
import (
"io"
"io/ioutil"
)
func main() {
// myReader := ...
my42bytes, err := ioutil.ReadAll(io.LimitReader(myReader, 42))
if err != nil {
panic(err)
}
//...
}
Here is the equivalent code with io.ReadFull
$ go doc io.ReadFull
func ReadFull(r Reader, buf []byte) (n int, err error)
ReadFull reads exactly len(buf) bytes from r into buf. It returns the number
of bytes copied and an error if fewer bytes were read. The error is EOF only
if no bytes were read. If an EOF happens after reading some but not all the
bytes, ReadFull returns ErrUnexpectedEOF. On return, n == len(buf) if and
only if err == nil. If r returns an error having read at least len(buf)
bytes, the error is dropped.
import (
"io"
)
func main() {
// myReader := ...
buf := make([]byte, 42)
_, err := io.ReadFull(myReader, buf)
if err != nil {
panic(err)
}
//...
}
Compared to io.ReadFull, an advantage is that you don't need to manually make a buf, where len(buf) is the number of bytes you want to read, then pass buf as an argument when you Read
Instead you simply tell io.LimitReader you want at most 42 bytes from myReader, and call ioutil.ReadAll to read them all, returning the result as a slice of bytes. If successful, the returned slice is guaranteed to be of length 42.
I am prefering Read() especially if you are going to read any type of files and it could be also useful in sending data in chunks, below is an example to show how it is used
fs, err := os.Open("fileName");
if err != nil{
fmt.Println("error reading file")
return
}
defer fs.Close()
reader := bufio.NewReader(fs)
buf := make([]byte, 1024)
for{
v, _ := reader.Read(buf) //ReadString and ReadLine() also applicable or alternative
if v == 0{
return
}
//in case it is a string file, you could check its content here...
fmt.Print(string(buf))
}
Pass a n-bytes sized buffer to the reader.
If you want to read the bytes from an io.Reader and into an io.Writer, then you can use io.CopyN
CopyN copies n bytes (or until an error) from src to dst. It returns the number of bytes copied and the earliest error encountered while copying.
On return, written == n if and only if err == nil.
written, err := io.CopyN(dst, src, n)
if err != nil {
// We didn't read the desired number of bytes
} else {
// We can proceed successfully
}
To do this you just need to create a byte slice and read the data into this slice with
n := 512
buff := make([]byte, n)
fs.Read(buff) // fs is your reader. Can be like this fs, _ := os.Open('file')
func (b *Reader) Read(p []byte) (n int, err error)
Read reads data into p. It returns the number of bytes read into p.
The bytes are taken from at most one Read on the underlying Reader,
hence n may be less than len(p)

Resources