Dynamic FlatBuffers delimiter - go

I'm using flatbuffer to send binary data over unix socket. The flatbuffer that I send is of dynamic length. The problem I'm facing is, how to know how many bytes I have to read for one table.
Is there something like a delimiter that can be appended while sending, which I can use to determine the end of the flatbuffer.
When I tried with a smaller size
buf := make([]byte, 512)
nr, err := c.Read(buf)
if err != nil {
fmt.Println("exit echo")
return
}
And if the flatbuffer that is bigger than 512 bytes is read, then this results in failure.
When I read by growing my buffer, then I'm not able to find the end of the read
var n, nr int
var err error
buf := make([]byte, 0, 4096) // big buffer
tmp := make([]byte, 512)
for {
n, err = c.Read(tmp)
if err != nil {
break
}
nr += n
if nr >= 4096 {
err = errOverrun
break
}
buf = append(buf, tmp[:n]...)
}
if err != nil {
fmt.Println("read error:", err)
break
}

FlatBuffers does not include a length field by design, since in most context the length is an implicit part of the storage or transfer of a buffer.
If you have no way to know the size of a buffer, or you are streaming buffers, the best is to simply pre-fix any buffer with a 32bit length field, so you can use that to read the rest of the data.
In the C++ API this is even built-in (see SizePrefixed functions), but this hasn't been ported to Go yet, so you'd have to do it manually.

Related

How to read arbitrary amounts of data directly from a file in Go?

Without reading the contents of a file into memory, how can I read "x" bytes from the file so that I can specify what x is for every separate read operation?
I see that the Read method of various Readers takes a byte slice of a certain length and I can read from a file into that slice. But in that case the size of the slice is fixed, whereas what I would like to do, ideally, is something like:
func main() {
f, err := os.Open("./file.txt")
if err != nil {
panic(err)
}
someBytes := f.Read(2)
someMoreBytes := f.Read(4)
}
bytes.Buffer has a Next method which behaves very closely to what I would want, but it requires an existing buffer to work, whereas I'm hoping to read an arbitrary amount of bytes from a file without needing to read the whole thing into memory.
What is the best way to accomplish this?
Thank you for your time.
Use this function:
// readN reads and returns n bytes from the reader.
// On error, readN returns the partial bytes read and
// a non-nil error.
func readN(r io.Reader, n int) ([]byte, error) {
// Allocate buffer for result
b := make([]byte, n)
// ReadFull ensures buffer is filled or error is returned.
n, err := io.ReadFull(r, b)
return b[:n], err
}
Call like this:
someBytes, err := readN(f, 2)
if err != nil { /* handle error here */
someMoreBytes := readN(f, 4)
if err != nil { /* handle error here */
you can do something like this:
f, err := os.Open("/tmp/dat")
check(err)
b1 := make([]byte, 5)
n1, err := f.Read(b1)
check(err)
fmt.Printf("%d bytes: %s\n", n1, string(b1[:n1]))
for more reading please check site.

Hash large file using little memory

I need to hash very large files (>10TB files). So I decided to hash 128KB per MB.
My idea is to divide the file into 1MB blocks and hash only the first 128KB of each block.
The following code works, but it uses insane amounts of memory and I can't tell why...
func partialMD5Hash(filePath string) string {
var blockSize int64 = 1024 * 1024
var sampleSize int64 = 1024 * 128
file, err := os.Open(filePath)
if err != nil {
return "ERROR"
}
defer file.Close()
fileInfo, _ := file.Stat()
fileSize := fileInfo.Size()
hash := md5.New()
var i int64
for i = 0; i < fileSize / blockSize; i++ {
sample := make([]byte, sampleSize)
_, err = file.Read(sample)
if err != nil {
return "ERROR"
}
hash.Write(sample)
_, err := file.Seek(blockSize-sampleSize, 1)
if err != nil {
return "ERROR"
}
}
return hex.EncodeToString(hash.Sum(nil))
}
Any help will be appreciated!
There are several problems with the approach, and with the program.
If you want to hash a large file, you have to hash all of it. Sampling parts of the file will not detect modifications to the parts you didn't sample.
You are allocating a new buffer for every iteration. Instead, allocate one buffer outside the for-loop, and reuse it.
Also, you seem to be ignoring how many bytes actually read. So:
block := make([]byte, blockSize)
for {
n, err = file.Read(block)
if n>0 {
hash.Write(sample[:n])
}
if err==io.EOF {
break
}
if err != nil {
return "ERROR"
}
}
However, the following would be much more concise:
io.Copy(hash,file)

Multi line buffered read in go

I am trying to read file in buffered manner because I have very large files. I want to apply some text replacement on a file. Suppose for each read I search for a word 'foo' and replace it with some other word 'bar'. If I read using buffer of some size 5MB then it may be the case foo will split into two reads may be one read 'fo' and another read 'o' then I will not be able to find that word. Is there a way so that I can use buffered read upto last newline or may be read multiple line in buffer
I did below. But It will not read upto next line or previous line
file, err := os.Open(filename)
if err != nil {
panic(err)
}
defer file.Close()
byteSlice := make([]byte, 5*1024*1024) // read 5 MB
bufioreader := bufio.NewReaderSize(file, bufferSize)
for {
n, err := bufioreader.Read(byteSlice)
if n > 0 {
fmt.Println(byteSlice[:n])
} else if err == io.EOF {
break
} else {
panic(err)
}
}
Since you're using the bufio reader, you shouldn't really work on aligning the input with buffer boundaries yourself. Use one of the high level read functions, such as `bufioreader.ReadString('\n'), which will read a line using the underlying buffer, and you won't have to deal with line delimiters yourself.
You don’t need bufio reader if you have your own buffer. With your code you have a useless copy of data from the buffer in bufio to the byteslice.
Regarding the split "foo" problem, the solution is to move the last 2 characters from the buffer to the front before the next read.
More precisely, if the word to replace has length m, the copy the m-1 last letters of the buffer to the front of the buffer, fill the remain of the buffer and search for the word to replace in the buffer.
// assume we want to find word
file, err := os.Open(filename)
if err != nil {
panic(err)
}
defer file.Close()
trailingLen := len(word)-1
dataLen := 5*1024*1024 + trailingLen
data := make([]byte, dataLen) // read 5 MB
for {
n, err := file.Read(data[trailingLen:])
if err != nil {
if err == io.EOF {
break
}
panic(err)
}
// search and replace word in data[:n]
if n == dataLen {
copy(data, data[dataLen-trailingLen:])
}
}

Why does conn.Read() write nothing into a []byte, but bufio.Reader.ReadString() works?

I have a connection, created like this:
conn, err = net.Dial("tcp", "127.0.0.1:20000")
I have tried reading from this connection in two ways. I think they both must work, but the first option doesn't.
Here is the first way of doing it:
var bytes []byte
for i := 0; i < 4; i++ {
conn.Read(bytes)
}
fmt.Printf("%v", bytes)
The output of this method is:
[]
And here is the same thing, done with bufio.Reader:
func readResponse(conn net.Conn) (response string, err error) {
reader := bufio.NewReader(conn)
_, err = reader.Discard(8)
if err != nil {
return
}
response, err = reader.ReadString('\n')
return
}
This function returns the response given by the server on the other end of the TCP connection.
Why does bufio.Reader.Read() work, but net.Conn.Read() doesn't?
The Conn.Read() method is to implement io.Reader, the general interface to read data from any source of bytes into a []byte. Quoting from the doc of Reader.Read():
Read reads up to len(p) bytes into p.
So Read() reads up to len(p) bytes but since you pass a nil slice, it won't read anything (length of a nil slice is 0). Please read the linked doc to know how Reader.Read() works.
Reader.Read() does not allocate a buffer ([]byte) where the read data will be stored, you have to create one and pass it, e.g.:
var buf = make([]byte, 100)
n, err := conn.Read(buf)
// n is the number of read bytes; don't forget to check err!
Don't forget to always check the returned error which may be io.EOF if end of data is reached. The general contract of io.Reader.Read() also allows returning some non-nil error (including io.EOF) and some read data (n > 0) at the same time. The number of read bytes will be in n, which means only the first n bytes of the buf is useful (in other words: buf[:n]).
Your other example using bufio.Reader works because you called Reader.ReadString() which doesn't require a []byte argument. If you would've used the bufio.Reader.Read() method, you would also had to pass a non-nil slice in order to actually get some data.

Golang read from pipe reads tons of data

I'm trying to read an archive that's being tarred, streaming, to stdin, but I'm somehow reading far more data in the pipe than tar is sending.
I run my command like this:
tar -cf - somefolder | ./my-go-binary
The source code is like this:
package main
import (
"bufio"
"io"
"log"
"os"
)
// Read from standard input
func main() {
reader := bufio.NewReader(os.Stdin)
// Read all data from stdin, processing subsequent reads as chunks.
parts := 0
for {
parts++
data := make([]byte, 4<<20) // Read 4MB at a time
_, err := reader.Read(data)
if err == io.EOF {
break
} else if err != nil {
log.Fatalf("Problems reading from input: %s", err)
}
}
log.Printf("Total parts processed: %d\n", parts)
}
For a 100MB tarred folder, I'm getting 1468 chunks of 4MB (that's 6.15GB)! Further, it doesn't seem to matter how large the data []byte array is: if I set the chunk size to 40MB, I still get ~1400 chunks of 40MB data, which makes no sense at all.
Is there something I need to do to read data from os.Stdin properly with Go?
Your code is inefficient. It's allocating and initializing data each time through the loop.
for {
data := make([]byte, 4<<20) // Read 4MB at a time
}
The code for your reader as an io.Reader is wrong. For example, you ignore the number of bytes read by _, err := reader.Read(data) and you don't handle err errors properly.
Package io
import "io"
type Reader
type Reader interface {
Read(p []byte) (n int, err error)
}
Reader is the interface that wraps the basic Read method.
Read reads up to len(p) bytes into p. It returns the number of bytes
read (0 <= n <= len(p)) and any error encountered. Even if Read
returns n < len(p), it may use all of p as scratch space during the
call. If some data is available but not len(p) bytes, Read
conventionally returns what is available instead of waiting for more.
When Read encounters an error or end-of-file condition after
successfully reading n > 0 bytes, it returns the number of bytes read.
It may return the (non-nil) error from the same call or return the
error (and n == 0) from a subsequent call. An instance of this general
case is that a Reader returning a non-zero number of bytes at the end
of the input stream may return either err == EOF or err == nil. The
next Read should return 0, EOF regardless.
Callers should always process the n > 0 bytes returned before
considering the error err. Doing so correctly handles I/O errors that
happen after reading some bytes and also both of the allowed EOF
behaviors.
Implementations of Read are discouraged from returning a zero byte
count with a nil error, except when len(p) == 0. Callers should treat
a return of 0 and nil as indicating that nothing happened; in
particular it does not indicate EOF.
Implementations must not retain p.
Here's a model file read program that conforms to the io.Reader interface:
package main
import (
"bufio"
"io"
"log"
"os"
)
func main() {
nBytes, nChunks := int64(0), int64(0)
r := bufio.NewReader(os.Stdin)
buf := make([]byte, 0, 4*1024)
for {
n, err := r.Read(buf[:cap(buf)])
buf = buf[:n]
if n == 0 {
if err == nil {
continue
}
if err == io.EOF {
break
}
log.Fatal(err)
}
nChunks++
nBytes += int64(len(buf))
// process buf
if err != nil && err != io.EOF {
log.Fatal(err)
}
}
log.Println("Bytes:", nBytes, "Chunks:", nChunks)
}
Output:
2014/11/29 10:00:05 Bytes: 5589891 Chunks: 1365
Read the documentation for Read:
Read reads data into p. It returns the number of bytes read into p. It
calls Read at most once on the underlying Reader, hence n may be less
than len(p). At EOF, the count will be zero and err will be io.EOF.
You are not reading 4MB at a time. You are providing buffer space and discarding the integer that would have told you how much the Read actually read. The buffer space is the maximum, but most usually 128k seems to get read per call, at least on my system. Try it out yourself:
// Read from standard input
func main() {
reader := bufio.NewReader(os.Stdin)
// Read all data from stdin, passing the data as parts into the channel
// for processing.
parts := 0
for {
parts++
data := make([]byte, 4<<20) // Read 4MB at a time
amount , err := reader.Read(data)
// WILL NOT BE 4MB!
log.Printf("Read: %v\n", amount)
if err == io.EOF {
break
} else if err != nil {
log.Fatalf("Problems reading from input: %s", err)
}
}
log.Printf("Total parts processed: %d\n", parts)
}
You have to implement the logic for handling the varying read amounts.

Resources