Read exactly n bytes unless EOF? - go

I'm using a function that returns an io.Reader to download a file from the Internet.
I want to process the file in exactly 2048 chunks until it's no longer possible because of EOF.
The io.ReadFull function is almost what I want:
buf := make([]byte, 2048)
for {
if _, err := io.ReadFull(reader, buf); err == io.EOF {
return io.ErrUnexpectedEOF
} else if err != nil {
return err
}
// Do processing on buf
}
The problem with this is that not all files are a multiple of 2048 bytes, so the last chunk may only be e.g. 500 bytes, io.ReadFull will therefore return ErrUnexpectedEOF and the last chunk is discarded.
A function name to summarize what I want could be io.ReadFullUnlessLastChunk, so ErrUnexpectedEOF is not returned if the reason that buf cannot be filled with 2048 bytes, is that the file is EOF after e.g. 500 bytes. However, in any other case ErrUnexpectedEOF should be returned as a problem has occured.
What could I do to accomplish this?
Another problem is that reading only 2048 bytes at the time directly from the network seems to have much overhead, if I could get 256 KB from network into a buffer, and then take the 2048 bytes I need from that buffer instead, that would be better.

For example,
package main
import (
"bufio"
"fmt"
"io"
"os"
)
func readChunks(r io.Reader) error {
if _, ok := r.(*bufio.Reader); !ok {
r = bufio.NewReader(r)
}
buf := make([]byte, 0, 2048)
for {
n, err := io.ReadFull(r, buf[:cap(buf)])
buf = buf[:n]
if err != nil {
if err == io.EOF {
break
}
if err != io.ErrUnexpectedEOF {
return err
}
}
// Process buf
fmt.Println(len(buf))
}
return nil
}
func main() {
fName := `test.file`
f, err := os.Open(fName)
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
err = readChunks(f)
if err != nil {
fmt.Println(err)
return
}
}

Related

reading golang websocket returns random bytes

My program:
package main
import (
"fmt"
"io"
"log"
"net"
"github.com/gobwas/ws"
)
func HandleConn(conn net.Conn) {
for {
header, err := ws.ReadHeader(conn)
if err != nil {
log.Fatal(err)
}
buf := make([]byte, header.Length)
_, err = io.ReadFull(conn, buf)
if err != nil {
log.Fatal(err)
}
fmt.Println(buf)
fmt.Println(string(buf))
}
}
func main() {
ln, err := net.Listen("tcp", "localhost:8080")
if err != nil {
log.Fatal(err)
}
for {
conn, err := ln.Accept()
if err != nil {
log.Fatal(err)
}
_, err = ws.Upgrade(conn)
if err != nil {
log.Fatal(err)
}
go HandleConn(conn)
}
}
I do in browser console:
let socket = new WebSocket("ws://127.0.0.1:8080")
socket.send("Hello world")
I see random bytes in the my terminal. Each call to socket.send("Hello world") return different bytes. But the length of the byte array is always equal to the length of the string. Where does golang get these random bytes? How can I fix this? My program is an example from the docs.
If you are going to not use the wsutil you need to unmask the payload:
buff := make([]byte, header.Length)
_, err = io.ReadFull(conn, buff)
if err != nil {
// handle error
}
if header.Masked {
ws.Cipher(buff, header.Mask, 0)
}
fmt.Println(string(buff))

After called Peek method, the origin data has changed

package main
import (
"bufio"
"io"
"golang.org/x/net/html/charset"
"golang.org/x/text/encoding"
"net/http"
"fmt"
"golang.org/x/text/transform"
"io/ioutil"
)
// main
func main() {
resp, err := http.Get("http://www.baidu.com")
if err != nil {
panic(err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
fmt.Println("Error: status code", resp.StatusCode)
return
}
e := determineEncoding(resp.Body)
utf8Reader := transform.NewReader(resp.Body, e.NewDecoder())
all, err := ioutil.ReadAll(utf8Reader)
if err != nil {
panic(err)
}
fmt.Printf("%s\n", all)
}
// determine
func determineEncoding(r io.Reader) encoding.Encoding {
reader := bufio.NewReader(r)
// The start position was not correct
bytes, err := reader.Peek(1024)
if err != nil {
panic(err)
}
e, _, _ := charset.DetermineEncoding(bytes, "")
return e
}
The result is not correct data. The start position is not zero.
As document describe 'Peek returns the next n bytes without advancing the reader. The bytes stop being valid at the next read call. If Peek returns fewer than n bytes, it also returns an error explaining why the read is short. The error is ErrBufferFull if n is larger than b's buffer size.'
Peek returns the next n bytes without advancing the reader.
This refers to the *bufio.Reader, not the underlying reader. The buffered reader will read from the underlying reader if necessary. How else would it return the bytes?
In your case, you have to stop using the response body directly after calling determineEncoding and use the *bufio.Reader instead.
For instance:
func determineEncoding(r *bufio.Reader) encoding.Encoding {
bytes, err := r.Peek(1024)
// as before
}
func main() {
// as before
defer resp.Body.Close()
r := bufio.NewReader(resp.Body)
e := determineEncoding(r)
utf8Reader := transform.NewReader(r, e.NewDecoder())
// as before
}

Reading more than 4096 bytes per chunk with part.Read

I'm trying to process a multipart file upload in small chunks to avoid storing the entire file in memory. The following function seems to solve this, however when passing a []byte as the destination for the part.Read() method, it reads the part in chunks of 4096 bytes instead of in chunks of the destination size (len([]byte)).
When opening a local file and Read()'ing it into a []byte of the same size, it uses the entire space available as expected. Thus I think it's something specific to the part.Reader(). However, I'm unable to find anything about a default or max size for that function.
For reference, the function is as follows:
func ReceiveFile(w http.ResponseWriter, r *http.Request) {
reader, err := r.MultipartReader()
if err != nil {
panic(err)
}
if reader == nil {
panic("Wrong media type")
}
buf := make([]byte, 16384)
fmt.Println(len(buf))
for {
part, err := reader.NextPart()
if err == io.EOF {
break
}
if err != nil {
panic(err)
}
var n int
for {
n, err = part.Read(buf)
if err == io.EOF {
break
}
if err != nil {
panic(err)
}
fmt.Printf("Read %d bytes into buf\n", n)
fmt.Println(len(buf))
}
n, err = part.Read(buf)
fmt.Printf("Finally read %d bytes into buf\n", n)
fmt.Println(len(buf))
}
The part reader does not attempt to fill the caller's buffer as allowed by the io.Reader contract.
The best way to handle this depends on the requirements of the application.
If you want to slurp the part into memory, then use ioutil.ReadAll:
for {
part, err := reader.NextPart()
if err == io.EOF {
break
}
if err != nil {
// handle error
}
p, err := ioutil.ReadAll(part)
if err != nil {
// handle error
}
// p is []byte with the contents of the part
}
If you want to copy the part to the io.Writer w, then use io.Copy:
for {
part, err := reader.NextPart()
if err == io.EOF {
break
}
if err != nil {
// handle error
}
w := // open a writer
_, err := io.Copy(w, part)
if err != nil {
// handle error
}
}
If you want to process fixed size chunks, then use io.ReadFull:
buf := make([]byte, chunkSize)
for {
part, err := reader.NextPart()
if err == io.EOF {
break
}
if err != nil {
// handle error
}
_, err := io.ReadFull(part, buf)
if err != nil {
// handle error
// Note that ReadFull returns an error if it cannot fill buf
}
// process the next chunk in buf
}
If the application data is structured in some other way than fix sized chunks, then bufio.Scanner might be of help.
Instead change the chunk size, why not use io.ReadFull ?
https://golang.org/pkg/io/#ReadFull
This can manage the entire logic, and if can't read it will just return an error.

Getting EOF from server as client in Go

I have some a Go client for a custom protocol. The protocol is lz4-compressed JSON-RPC with a four byte header giving the length of the compressed JSON.
func ReceiveMessage(conn net.Conn) ([]byte, error) {
start := time.Now()
bodyLen := 0
body := make([]byte, 0, 4096)
buf := make([]byte, 0, 256)
for bodyLen == 0 || len(body) < bodyLen {
if len(body) > 4 {
header := body[:4]
body = body[:4]
bodyLen = int(unpack(header))
}
n, err := conn.Read(buf[:])
if err != nil {
if err != io.EOF {
return body, err
}
}
body = append(body, buf[0:n]...)
now := time.Now()
if now.Sub(start) > time.Duration(readTimeout) * time.Millisecond {
return body, fmt.Errorf("Timed-out while reading from socket.")
}
time.Sleep(time.Duration(1) * time.Millisecond)
}
return lz4.Decode(nil, body)
}
The client:
func main() {
address := os.Args[1]
msg := []byte(os.Args[2])
fmt.Printf("Sending %s to %s\n", msg, address)
conn, err := net.Dial(address)
if err != nil {
fmt.Printf("%v\n", err)
return
}
// Another library call
_, err = SendMessage(conn, []byte(msg))
if err != nil {
fmt.Printf("%v\n", err)
return
}
response, err := ReceiveMessage(conn)
conn.Close()
if err != nil {
fmt.Printf("%v\n", err)
return
}
fmt.Printf("Response: %s\n", response)
}
When I call it, I get no response and it just times out. (If I do not explicitly ignore the EOF, it returns there with io.EOF error.) I have another library for this written in Python that also works against the same endpoint with the same payload. Do you see anything immediately?
[JimB just beat me to an answer but here goes anyway.]
The root issue is that you did body = body[:4]
when you wanted body = body[4:].
The former keeps only the first four header bytes
while the latter tosses
the four header bytes just decoded.
Here is a self contained version with some debug logs
that works.
It has some of the other changes I mentioned.
(I guessed at various things that you didn't include, like the lz4 package used, the timeout, unpack, etc.)
package main
import (
"encoding/binary"
"errors"
"fmt"
"io"
"log"
"net"
"time"
"github.com/bkaradzic/go-lz4"
)
const readTimeout = 30 * time.Second // XXX guess
func ReceiveMessage(conn net.Conn) ([]byte, error) {
bodyLen := 0
body := make([]byte, 0, 4096)
var buf [256]byte
conn.SetDeadline(time.Now().Add(readTimeout))
defer conn.SetDeadline(time.Time{}) // disable deadline
for bodyLen == 0 || len(body) < bodyLen {
if bodyLen == 0 && len(body) >= 4 {
bodyLen = int(unpack(body[:4]))
body = body[4:]
if bodyLen <= 0 {
return nil, errors.New("invalid body length")
}
log.Println("read bodyLen:", bodyLen)
continue
}
n, err := conn.Read(buf[:])
body = append(body, buf[:n]...)
log.Printf("appended %d bytes, len(body) now %d", n, len(body))
// Note, this is checked *after* handing any n bytes.
// An io.Reader is allowed to return data with an error.
if err != nil {
if err != io.EOF {
return nil, err
}
break
}
}
if len(body) != bodyLen {
return nil, fmt.Errorf("got %d bytes, expected %d",
len(body), bodyLen)
}
return lz4.Decode(nil, body)
}
const address = ":5678"
var msg = []byte(`{"foo":"bar"}`)
func main() {
//address := os.Args[1]
//msg := []byte(os.Args[2])
fmt.Printf("Sending %s to %s\n", msg, address)
conn, err := net.Dial("tcp", address)
if err != nil {
fmt.Printf("%v\n", err)
return
}
// Another library call
_, err = SendMessage(conn, msg)
if err != nil {
fmt.Printf("%v\n", err)
return
}
response, err := ReceiveMessage(conn)
conn.Close()
if err != nil {
fmt.Printf("%v\n", err)
return
}
fmt.Printf("Response: %s\n", response)
}
// a guess at what your `unpack` does
func unpack(b []byte) uint32 {
return binary.LittleEndian.Uint32(b)
}
func SendMessage(net.Conn, []byte) (int, error) {
// stub
return 0, nil
}
func init() {
// start a simple test server in the same process as a go-routine.
ln, err := net.Listen("tcp", address)
if err != nil {
log.Fatal(err)
}
go func() {
defer ln.Close()
for {
conn, err := ln.Accept()
if err != nil {
log.Fatalln("accept:", err)
}
go Serve(conn)
}
}()
}
func Serve(c net.Conn) {
defer c.Close()
// skip readding the initial request/message and just respond
const response = `{"somefield": "someval"}`
// normally (de)compression in Go is done streaming via
// an io.Reader or io.Writer but we need the final length.
data, err := lz4.Encode(nil, []byte(response))
if err != nil {
log.Println("lz4 encode:", err)
return
}
log.Println("sending len:", len(data))
if err = binary.Write(c, binary.LittleEndian, uint32(len(data))); err != nil {
log.Println("writing len:", err)
return
}
log.Println("sending data")
if _, err = c.Write(data); err != nil {
log.Println("writing compressed response:", err)
return
}
log.Println("Serve done, closing connection")
}
Playground (but not runnable there).
You have a number of issues with the server code. Without a full reproducing case, it's hard to tell if these will fix everything.
for bodyLen == 0 || len(body) < bodyLen {
if len(body) > 4 {
header := body[:4]
body = body[:4]
bodyLen = int(unpack(header))
}
every iteration, if len(body) > 4, you slice body back to the first 4 bytes. Body might never get to be >= bodyLen.
n, err := conn.Read(buf[:])
You don't need to re-slice buf here, use conn.Read(buf)
if err != nil {
if err != io.EOF {
return body, err
}
}
io.EOF is the end of the stream, and you need to handle it. Note that n might still be > 0 when you get an EOF. Check after processing the body for io.EOF or you could loop indefinitely.
body = append(body, buf[0:n]...)
now := time.Now()
if now.Sub(start) > time.Duration(readTimeout) * time.Millisecond {
return body, fmt.Errorf("Timed-out while reading from socket.")
you would be better off using conn.SetReadDeadline before each read, so a stalled Read could be interrupted.

Read whole data with Golang net.Conn.Read

So I'm building a network app in Go and I've seen that Conn.Read reads into a limited byte array, which I had created with make([]byte, 2048) and now the problem is that I don't know the exact length of the content, so it could be too much or not enough.
My question is how can I just read the exact amount of data. I think I have to use bufio, but I'm not sure.
It highly depends on what you're trying to do, and what kind of data you're expecting, for example if you just want to read until the EOF you could use something like this:
func main() {
conn, err := net.Dial("tcp", "google.com:80")
if err != nil {
fmt.Println("dial error:", err)
return
}
defer conn.Close()
fmt.Fprintf(conn, "GET / HTTP/1.0\r\n\r\n")
buf := make([]byte, 0, 4096) // big buffer
tmp := make([]byte, 256) // using small tmo buffer for demonstrating
for {
n, err := conn.Read(tmp)
if err != nil {
if err != io.EOF {
fmt.Println("read error:", err)
}
break
}
//fmt.Println("got", n, "bytes.")
buf = append(buf, tmp[:n]...)
}
fmt.Println("total size:", len(buf))
//fmt.Println(string(buf))
}
//edit: for completeness sake and #fabrizioM's great suggestion, which completely skipped my mind:
func main() {
conn, err := net.Dial("tcp", "google.com:80")
if err != nil {
fmt.Println("dial error:", err)
return
}
defer conn.Close()
fmt.Fprintf(conn, "GET / HTTP/1.0\r\n\r\n")
var buf bytes.Buffer
io.Copy(&buf, conn)
fmt.Println("total size:", buf.Len())
}
You can use the ioutil.ReadAll function:
import (
"fmt"
"io/ioutil"
"net"
)
func whois(domain, server string) ([]byte, error) {
conn, err := net.Dial("tcp", server+":43")
if err != nil {
return nil, err
}
defer conn.Close()
fmt.Fprintf(conn, "%s\r\n", domain)
return ioutil.ReadAll(conn)
}
You can read data something like this:
// import net/textproto
import ("net/textproto", ...)
....
reader := bufio.NewReader(Conn)
tp := textproto.NewReader(reader)
defer Conn.Close()
for {
// read one line (ended with \n or \r\n)
line, _ := tp.ReadLine()
// do something with data here, concat, handle and etc...
}
....

Resources