What should io.Write returns? - go

I was learning go and doing some tests with the language when I found this weird behavior in my code. I create two types to demonstrate my findings and implemented the interface io.Write on both. This little program downloads the content of a web page and prints it to the console.
The BrokenConsoleWriter keeps track of the bytes written by fmt.Println and any error it may "throw" and returns it. On the other hand, the ConsoleWriter simply ignore the return of fmt.Println and returns the total length of the slice and nil for the error.
When I run the program, the BrokenConsoleWriter doesn't print the entire html content while that ConsoleWriter does. Why is this happening?
package main
import (
"fmt"
"io"
"net/http"
"os"
)
type ConsoleWriter struct{}
type BrokenConsoleWriter struct{}
func (b BrokenConsoleWriter) Write(p []byte) (n int, err error) {
bytesWritten, error := fmt.Println(string(p))
return bytesWritten, error
}
func (c ConsoleWriter) Write(p []byte) (n int, err error) {
fmt.Println(string(p))
return len(p), nil
}
func main() {
const url = "https://www.nytimes.com/"
resp, err := http.Get(url)
if err != nil {
fmt.Println("Some error occurred:", err)
os.Exit(1)
}
//io.Copy(ConsoleWriter{}, resp.Body)
io.Copy(BrokenConsoleWriter{}, resp.Body)
}

The Write() method is to implement io.Write() which documents that:
Write writes len(p) bytes from p to the underlying data stream. It returns the number of bytes written from p (0 <= n <= len(p)) and any error encountered that caused the write to stop early. Write must return a non-nil error if it returns n < len(p). Write must not modify the slice data, even temporarily.
Implementations must not retain p.
So your Write() method must report how many bytes you processed from the p slice passed to you. Not how many bytes you generate to some other source.
And this is the error with your BrokenConsoleWriter.Write() implementation: you don't report how many bytes you process from p, you report how many bytes fmt.Prinln() actually writes. And since fmt.Prinln() also prints a newline after printing its arguments, the value it returns will surely be not valid for BrokenConsoleWriter.Write().
Note that fmt.Prinln() with a single string argument will write out that string and append a newline, which on unix systems is a single character \n, and \r\n on Windows. So on unix systems you also get a correct behavior if you subtract 1 from its return value:
func (b BrokenConsoleWriter) Write(p []byte) (n int, err error) {
bytesWritten, error := fmt.Println(string(p))
return bytesWritten - 1, error
}
Also note that the input is already formatted into lines, so you inserting newlines "randomly" by using fmt.Prinln() may even result in an invalid document. Do use fmt.Print() instead of fmt.Println():
func (b BrokenConsoleWriter) Write(p []byte) (n int, err error) {
bytesWritten, error := fmt.Print(string(p))
return bytesWritten, error
}
But the correctness of this solution still depends on the implementation of fmt.Print(). The correct solution is to report len(p) because that's what happened: you processed len(p) bytes of the input slice (all of it).

Related

Go: Returning always nil error to implement interface

I have an interface:
type encoder interface {
encode() ([]byte, error)
}
Some implementations of encoder return an error:
type fooEncoder string
func (e fooEncoder) encode() ([]byte, error) {
if isSomeValidityCheck(e) {
return []byte(e), nil
}
return nil, fmt.Errorf("Invalid type!")
}
But for others, there will never be an error:
type boolEncoder bool
func (e boolEncoder) encode() ([]byte, error) {
if e {
return []byte{0xff}, nil
}
return []byte{0x00}, nil
}
Is it idiomatic/correct to say a method will return an error, even if it will always be nil, so that it conforms to an interface? I have boolEncoder.encode returning an error only so that it conforms to encoder and can be used as such.
This is completely OK / normal. Often it's more important to implement an interface than to reduce the code (of the method).
There are numerous examples in the standard lib too.
For example bytes/Buffer.Write() implements io.Writer with
func (b *Buffer) Write(p []byte) (n int, err error)
But writing to an in-memory buffer cannot fail, it documents that it never will return a non-nil error:
Write appends the contents of p to the buffer, growing the buffer as needed. The return value n is the length of p; err is always nil. If the buffer becomes too large, Write will panic with ErrTooLarge.
Buffer.Write() could have a signature that doesn't return anything because its return values carry no information (n is always len(p) and err is always nil), but then you couldn't use bytes.Buffer as an io.Writer, which is way more important.
See related: Is unnamed arguments a thing in Go? and Why does Go allow compilation of unused function parameters?

Using an io.WriteSeeker without a File in Go

I am using a third party library to generate PDFs. In order to write the PDF at the end (after all of content has been added using the lib's API), the pdfWriter type has a Write function that expects an io.WriteSeeker.
This is OK if I want to work with files, but I need to work in-memory. Trouble is, I can't find any way to do this - the only native type I found that implements io.WriteSeeker is File.
This is the part that works by using File for the io.Writer in the Write function of the pdfWriter:
fWrite, err := os.Create(outputPath)
if err != nil {
return err
}
defer fWrite.Close()
err = pdfWriter.Write(fWrite)
Is there way to do this without an actual File? Like getting a []byte or something?
Unfortunately there is no ready solution for an in-memory io.WriteSeeker implementation in the standard lib.
But as always, you can always implement your own. It's not that hard.
An io.WriteSeeker is an io.Writer and an io.Seeker, so basically you only need to implement 2 methods:
Write(p []byte) (n int, err error)
Seek(offset int64, whence int) (int64, error)
Read the general contract of these methods in their documentation how they should behave.
Here's a simple implementation which uses an in-memory byte slice ([]byte). It's not optimized for speed, this is just a "demo" implementation.
type mywriter struct {
buf []byte
pos int
}
func (m *mywriter) Write(p []byte) (n int, err error) {
minCap := m.pos + len(p)
if minCap > cap(m.buf) { // Make sure buf has enough capacity:
buf2 := make([]byte, len(m.buf), minCap+len(p)) // add some extra
copy(buf2, m.buf)
m.buf = buf2
}
if minCap > len(m.buf) {
m.buf = m.buf[:minCap]
}
copy(m.buf[m.pos:], p)
m.pos += len(p)
return len(p), nil
}
func (m *mywriter) Seek(offset int64, whence int) (int64, error) {
newPos, offs := 0, int(offset)
switch whence {
case io.SeekStart:
newPos = offs
case io.SeekCurrent:
newPos = m.pos + offs
case io.SeekEnd:
newPos = len(m.buf) + offs
}
if newPos < 0 {
return 0, errors.New("negative result pos")
}
m.pos = newPos
return int64(newPos), nil
}
Yes, and that's it.
Testing it:
my := &mywriter{}
var ws io.WriteSeeker = my
ws.Write([]byte("hello"))
fmt.Println(string(my.buf))
ws.Write([]byte(" world"))
fmt.Println(string(my.buf))
ws.Seek(-2, io.SeekEnd)
ws.Write([]byte("k!"))
fmt.Println(string(my.buf))
ws.Seek(6, io.SeekStart)
ws.Write([]byte("gopher"))
fmt.Println(string(my.buf))
Output (try it on the Go Playground):
hello
hello world
hello work!
hello gopher
Things that can be improved:
Create a mywriter value with an initial empty buf slice, but with a capacity that will most likely cover the size of the result PDF document. E.g. if you estimate the result PDFs are around 1 MB, create a buffer with capacity for 2 MB like this:
my := &mywriter{buf: make([]byte, 0, 2<<20)}
Inside mywriter.Write() when capacity needs to be increased (and existing content copied over), it may be profitable to use bigger increment, e.g. double the current capacity to a certain extent, which reserves space for future appends and minimizes the reallocations.

Tour of Go exercise #22: Reader, what does the question mean?

Exercise: Readers
Implement a Reader type that emits an infinite stream of the ASCII character 'A'.
I don't understand the question, how to emit character 'A'? into which variable should I set that character?
Here's what I tried:
package main
import "golang.org/x/tour/reader"
type MyReader struct{}
// TODO: Add a Read([]byte) (int, error) method to MyReader.
func main() {
reader.Validate(MyReader{}) // what did this function expect?
}
func (m MyReader) Read(b []byte) (i int, e error) {
b = append(b, 'A') // this is wrong..
return 1, nil // this is also wrong..
}
Ah I understand XD
I think it would be better to say: "rewrite all values in []byte into 'A's"
package main
import "golang.org/x/tour/reader"
type MyReader struct{}
// TODO: Add a Read([]byte) (int, error) method to MyReader.
func (m MyReader) Read(b []byte) (i int, e error) {
for x := range b {
b[x] = 'A'
}
return len(b), nil
}
func main() {
reader.Validate(MyReader{})
}
An io.Reader.Read role is to write a given memory location with data read from its source.
To implement a stream of 'A', the function must write given memory location with 'A' values.
It is not required to fill in the entire slice provided in input, it can decide how many bytes of the input slice is written (Read reads up to len(p) bytes into p), it must return that number to indicate to the consumer the length of data to process.
By convention an io.Reader indicates its end by returning an io.EOF error. If the reader does not return an error, it behaves as an infinite source of data to its consumer which can never detect an exit condition.
Note that a call to Read that returns 0 bytes read can happen and does not indicate anything particular, Callers should treat a return of 0 and nil as indicating that nothing happened; Which makes this non-solution https://play.golang.org/p/aiUyc4UDYi2 fails with a timeout.
In regard to that, the solution provided here https://stackoverflow.com/a/68077578/4466350 return copy(b, "A"), nil is really just right. It writes the minimum required, with an elegant use of built-ins and syntax facilities, and it never returns an error.
The alleged answer is didn't work for me, even without the typos.
Try as I did, that string would not go into b.
func (r MyReader) Read(b []byte) (int, error) {
return copy(b, "A"), nil
}
My solution: just add one byte at a time, store the index i using closure.
package main
import (
"golang.org/x/tour/reader"
)
type MyReader struct{}
func (mr MyReader) Read(b []byte) (int, error) {
i := 0
p := func () int {
b[i] = 'A'
i += 1
return i
}
return p(), nil
}
func main() {
reader.Validate(MyReader{})
}
Simplest one:
func (s MyReader) Read(b []byte) (int, error) {
b[0] = byte('A')
return 1, nil
}
You can generalize the idea to create an eternal reader, alwaysReader, from which you always read the same byte value over and over (it never results in EOF):
package readers
type alwaysReader struct {
value byte
}
func (r alwaysReader) Read(p []byte) (n int, err error) {
for i := range p {
p[i] = r.value
}
return len(p), nil
}
func NewAlwaysReader(value byte) alwaysReader {
return alwaysReader { value }
}
NewAlwaysReader() is the constructor for alwaysReader (which isn't exported). The result of NewAlwaysReader('A') is a reader from whom you will always read 'A'.
A clarifying unit test for alwaysReader:
package readers_test
import (
"bytes"
"io"
"readers"
"testing"
)
func TestEmptyReader(t *testing.T) {
const numBytes = 128
const value = 'A'
buf := bytes.NewBuffer(make([]byte, 0, numBytes))
reader := io.LimitReader(readers.NewAlwaysReader(value), numBytes)
n, err := io.Copy(buf, reader)
if err != nil {
t.Fatal("copy failed: %w")
}
if n != numBytes {
t.Errorf("%d bytes read but %d expected", n, numBytes)
}
for i, elem := range buf.Bytes() {
if elem != value {
t.Errorf("byte at position %d has not the value %v but %v", i, value, elem)
}
}
}
Since we can read from the alwaysReader forever, we need to decorate it with a io.LimitReader so that we end up reading at most numBytes from it. Otherwise, the bytes.Buffer will eventually run out of memory for reallocating its internal buffer because of io.Copy().
Note that the following implementation of Read() for alwaysReader is also valid:
func (r alwaysReader) Read(p []byte) (n int, err error) {
if len(p) > 0 {
p[0] = r.value
return 1, nil
}
return 0, nil
}
The former Read() implementation fills the whole byte slice with the byte value, whereas the latter writes a single byte.

Golang read from pipe reads tons of data

I'm trying to read an archive that's being tarred, streaming, to stdin, but I'm somehow reading far more data in the pipe than tar is sending.
I run my command like this:
tar -cf - somefolder | ./my-go-binary
The source code is like this:
package main
import (
"bufio"
"io"
"log"
"os"
)
// Read from standard input
func main() {
reader := bufio.NewReader(os.Stdin)
// Read all data from stdin, processing subsequent reads as chunks.
parts := 0
for {
parts++
data := make([]byte, 4<<20) // Read 4MB at a time
_, err := reader.Read(data)
if err == io.EOF {
break
} else if err != nil {
log.Fatalf("Problems reading from input: %s", err)
}
}
log.Printf("Total parts processed: %d\n", parts)
}
For a 100MB tarred folder, I'm getting 1468 chunks of 4MB (that's 6.15GB)! Further, it doesn't seem to matter how large the data []byte array is: if I set the chunk size to 40MB, I still get ~1400 chunks of 40MB data, which makes no sense at all.
Is there something I need to do to read data from os.Stdin properly with Go?
Your code is inefficient. It's allocating and initializing data each time through the loop.
for {
data := make([]byte, 4<<20) // Read 4MB at a time
}
The code for your reader as an io.Reader is wrong. For example, you ignore the number of bytes read by _, err := reader.Read(data) and you don't handle err errors properly.
Package io
import "io"
type Reader
type Reader interface {
Read(p []byte) (n int, err error)
}
Reader is the interface that wraps the basic Read method.
Read reads up to len(p) bytes into p. It returns the number of bytes
read (0 <= n <= len(p)) and any error encountered. Even if Read
returns n < len(p), it may use all of p as scratch space during the
call. If some data is available but not len(p) bytes, Read
conventionally returns what is available instead of waiting for more.
When Read encounters an error or end-of-file condition after
successfully reading n > 0 bytes, it returns the number of bytes read.
It may return the (non-nil) error from the same call or return the
error (and n == 0) from a subsequent call. An instance of this general
case is that a Reader returning a non-zero number of bytes at the end
of the input stream may return either err == EOF or err == nil. The
next Read should return 0, EOF regardless.
Callers should always process the n > 0 bytes returned before
considering the error err. Doing so correctly handles I/O errors that
happen after reading some bytes and also both of the allowed EOF
behaviors.
Implementations of Read are discouraged from returning a zero byte
count with a nil error, except when len(p) == 0. Callers should treat
a return of 0 and nil as indicating that nothing happened; in
particular it does not indicate EOF.
Implementations must not retain p.
Here's a model file read program that conforms to the io.Reader interface:
package main
import (
"bufio"
"io"
"log"
"os"
)
func main() {
nBytes, nChunks := int64(0), int64(0)
r := bufio.NewReader(os.Stdin)
buf := make([]byte, 0, 4*1024)
for {
n, err := r.Read(buf[:cap(buf)])
buf = buf[:n]
if n == 0 {
if err == nil {
continue
}
if err == io.EOF {
break
}
log.Fatal(err)
}
nChunks++
nBytes += int64(len(buf))
// process buf
if err != nil && err != io.EOF {
log.Fatal(err)
}
}
log.Println("Bytes:", nBytes, "Chunks:", nChunks)
}
Output:
2014/11/29 10:00:05 Bytes: 5589891 Chunks: 1365
Read the documentation for Read:
Read reads data into p. It returns the number of bytes read into p. It
calls Read at most once on the underlying Reader, hence n may be less
than len(p). At EOF, the count will be zero and err will be io.EOF.
You are not reading 4MB at a time. You are providing buffer space and discarding the integer that would have told you how much the Read actually read. The buffer space is the maximum, but most usually 128k seems to get read per call, at least on my system. Try it out yourself:
// Read from standard input
func main() {
reader := bufio.NewReader(os.Stdin)
// Read all data from stdin, passing the data as parts into the channel
// for processing.
parts := 0
for {
parts++
data := make([]byte, 4<<20) // Read 4MB at a time
amount , err := reader.Read(data)
// WILL NOT BE 4MB!
log.Printf("Read: %v\n", amount)
if err == io.EOF {
break
} else if err != nil {
log.Fatalf("Problems reading from input: %s", err)
}
}
log.Printf("Total parts processed: %d\n", parts)
}
You have to implement the logic for handling the varying read amounts.

Reading specific number of bytes from a buffered reader in golang

I am aware of the specific function in golang from the bufio package.
func (b *Reader) Peek(n int) ([]byte, error)
Peek returns the next n bytes without advancing the reader. The bytes
stop being valid at the next read call. If Peek returns fewer than n
bytes, it also returns an error explaining why the read is short. The
error is ErrBufferFull if n is larger than b's buffer size.
I need to be able to read a specific number of bytes from a Reader that will advance the reader. Basically, identical to the function above, but it advances the reader. Does anybody know how to accomplish this?
Note that the bufio.Read method calls the underlying io.Read at most once, meaning that it can return n < len(p), without reaching EOF. If you want to read exactly len(p) bytes or fail with an error, you can use io.ReadFull like this:
n, err := io.ReadFull(reader, p)
This works even if the reader is buffered.
func (b *Reader) Read(p []byte) (n int, err error)
http://golang.org/pkg/bufio/#Reader.Read
The number of bytes read will be limited to len(p)
TLDR:
my42bytes, err := ioutil.ReadAll(io.LimitReader(myReader, 42))
Full answer:
#monicuta mentioned io.ReadFull which works great. Here I provide another method. It works by chaining ioutil.ReadAll and io.LimitReader together. Let's read the doc first:
$ go doc ioutil.ReadAll
func ReadAll(r io.Reader) ([]byte, error)
ReadAll reads from r until an error or EOF and returns the data it read. A
successful call returns err == nil, not err == EOF. Because ReadAll is
defined to read from src until EOF, it does not treat an EOF from Read as an
error to be reported.
$ go doc io.LimitReader
func LimitReader(r Reader, n int64) Reader
LimitReader returns a Reader that reads from r but stops with EOF after n
bytes. The underlying implementation is a *LimitedReader.
So if you want to get 42 bytes from myReader, you do this
import (
"io"
"io/ioutil"
)
func main() {
// myReader := ...
my42bytes, err := ioutil.ReadAll(io.LimitReader(myReader, 42))
if err != nil {
panic(err)
}
//...
}
Here is the equivalent code with io.ReadFull
$ go doc io.ReadFull
func ReadFull(r Reader, buf []byte) (n int, err error)
ReadFull reads exactly len(buf) bytes from r into buf. It returns the number
of bytes copied and an error if fewer bytes were read. The error is EOF only
if no bytes were read. If an EOF happens after reading some but not all the
bytes, ReadFull returns ErrUnexpectedEOF. On return, n == len(buf) if and
only if err == nil. If r returns an error having read at least len(buf)
bytes, the error is dropped.
import (
"io"
)
func main() {
// myReader := ...
buf := make([]byte, 42)
_, err := io.ReadFull(myReader, buf)
if err != nil {
panic(err)
}
//...
}
Compared to io.ReadFull, an advantage is that you don't need to manually make a buf, where len(buf) is the number of bytes you want to read, then pass buf as an argument when you Read
Instead you simply tell io.LimitReader you want at most 42 bytes from myReader, and call ioutil.ReadAll to read them all, returning the result as a slice of bytes. If successful, the returned slice is guaranteed to be of length 42.
I am prefering Read() especially if you are going to read any type of files and it could be also useful in sending data in chunks, below is an example to show how it is used
fs, err := os.Open("fileName");
if err != nil{
fmt.Println("error reading file")
return
}
defer fs.Close()
reader := bufio.NewReader(fs)
buf := make([]byte, 1024)
for{
v, _ := reader.Read(buf) //ReadString and ReadLine() also applicable or alternative
if v == 0{
return
}
//in case it is a string file, you could check its content here...
fmt.Print(string(buf))
}
Pass a n-bytes sized buffer to the reader.
If you want to read the bytes from an io.Reader and into an io.Writer, then you can use io.CopyN
CopyN copies n bytes (or until an error) from src to dst. It returns the number of bytes copied and the earliest error encountered while copying.
On return, written == n if and only if err == nil.
written, err := io.CopyN(dst, src, n)
if err != nil {
// We didn't read the desired number of bytes
} else {
// We can proceed successfully
}
To do this you just need to create a byte slice and read the data into this slice with
n := 512
buff := make([]byte, n)
fs.Read(buff) // fs is your reader. Can be like this fs, _ := os.Open('file')
func (b *Reader) Read(p []byte) (n int, err error)
Read reads data into p. It returns the number of bytes read into p.
The bytes are taken from at most one Read on the underlying Reader,
hence n may be less than len(p)

Resources