Do I understand correctly that crypto/rand.Reader can return Read error only on platforms not listed below, i.e. when it is not actually implemented?
// Reader is a global, shared instance of a cryptographically
// strong pseudo-random generator.
//
// On Linux, Reader uses getrandom(2) if available, /dev/urandom otherwise.
// On OpenBSD, Reader uses getentropy(2).
// On other Unix-like systems, Reader reads from /dev/urandom.
// On Windows systems, Reader uses the CryptGenRandom API.
var Reader io.Reader
TL;DR; crypto/rand's Read() (and Reader.Read()) methods may fail due to a variety of reasons, even on the platforms listed as supported. Do not assume that calls to this functions will always succeed. Always check the error return value.
Do I understand correctly that crypto/rand.Reader can return Read error only on platforms not listed below, i.e. when it is not actually implemented?
No. For example, have a look at the Linux implementation of rand.Reader. If available, this implementation will use the getrandom Linux system call, which may fail with a number of errors (most importantly, EAGAIN):
EAGAIN - The requested entropy was not available, and getrandom() would
have blocked if the GRND_NONBLOCK flag was not set.
The EAGAIN error quite literally tells you to "try again later"; the official meaning according to man 3 errno is "Resource temporarily unavailable". So when receiving an EAGAIN error you could simply keep trying for a certain time.
If getrandom is not available, the crypto/rand module will try to open and read from /dev/urandom (see source code), which might also fail for any number of reasons. These errors might not necessarily be of temporary nature (for example, issues with file system permissions); if your application depends on the availability of random data, you should treat an error like any other kind of non-recoverable error in your application.
For these reasons, you should not assume that rand.Read() will always succeed on Linux/UNIX and always check rand.Read()'s error return value.
type io.Reader
Reader is the interface that wraps the basic Read method.
Read reads up to len(p) bytes into p. It returns the number of bytes
read (0 <= n <= len(p)) and any error encountered. Even if Read
returns n < len(p), it may use all of p as scratch space during the
call. If some data is available but not len(p) bytes, Read
conventionally returns what is available instead of waiting for more.
When Read encounters an error or end-of-file condition after
successfully reading n > 0 bytes, it returns the number of bytes read.
It may return the (non-nil) error from the same call or return the
error (and n == 0) from a subsequent call. An instance of this general
case is that a Reader returning a non-zero number of bytes at the end
of the input stream may return either err == EOF or err == nil. The
next Read should return 0, EOF.
Callers should always process the n > 0 bytes returned before
considering the error err. Doing so correctly handles I/O errors that
happen after reading some bytes and also both of the allowed EOF
behaviors.
Implementations of Read are discouraged from returning a zero byte
count with a nil error, except when len(p) == 0. Callers should treat
a return of 0 and nil as indicating that nothing happened; in
particular it does not indicate EOF.
Implementations must not retain p.
type Reader interface {
Read(p []byte) (n int, err error)
}
No. io.Readers return errors.
Related
I would like to parse several times with gocal data I retrieve through a HTTP call. Since I would like to avoid making the call for each of the parsing, I would like to save this data and reuse it.
The Body I get from http.Get is of type io.ReadCloser. The gocal parser requires io.Reader so it works.
Since I can retrieve Body only once, I can save it with body, _ := io.ReadAll(get.Body) but then I do not know how to serve []byte as io.Reader back (to the gocal parser, several times to account for different parsing conditions)
As you have figured, the http.Response.Body is exposed as an io.Reader, this reader is not re usable because it is connected straight to the underlying connection* (might be tcp/utp/or any other stream like reader under the net package).
Once you read the bytes out of the connection, new bytes are sitting their waiting for another read.
In order to save the response, indeed, you need to drain it first, and save that result within a variable.
body, _ := io.ReadAll(get.Body)
To re use that slice of bytes many time using the Go programming language, the standard API provides a buffered reader bytes.NewReader.
This buffer adequately offers the Reset([]byte) method to reset the state of the buffer.
The bytes.Reader.Reset is very useful to read multiple times the same bytes buffer with no allocations. In comparison, bytes.NewReader allocates every time it is called.
Finally, between two consecutive calls to c.Parser, you should reset the buffer with bytes buffer you have collected previously.
such as :
buf := bytes.NewReader(body)
// initialize the parser
c.Parse()
// process the result
// reset the buf, parse again
buf.Reset(body)
c.Parse()
You can try this version https://play.golang.org/p/YaVtCTZHZEP It uses the strings.NewReader buffer, but the interface and behavior are similar.
not super obvious, that is the general principle, the transport reads the headers, and leave the body untouched unless you consume it. see also that.
Another question How to read/write from/to file using Go? got into safe closing of file descriptors in a comment.
Note that these examples aren't checking the error return from
fo.Close(). From the Linux man pages close(2): Not checking the return
value of close() is a common but nevertheless serious programming
error. It is quite possible that errors on a previous write(2)
operation are first reported at the final close(). Not checking the
return value when closing the file may lead to silent loss of data.
This can especially be observed with NFS and with disk quota. – Nick
Craig-Wood Jan 25 '13 at 7:12
The solution that updated the post used a panic:
// close fo on exit and check for its returned error
defer func() {
if err := fo.Close(); err != nil {
panic(err)
}
}()
I want to hand this error as a value instead of panicking.
If we are afraid of writes not being completed close isn't enough, so updating the error is still not correct.
The correct solution if you want to not hit this is to fsync the file(s):
defer(fd.Close())
// Do stuff
return fd.Sync()
It's easier to read then returning a non-nil modified error either through defer or maintaining throughout the function.
This will be a performance hit, but will catch both close errors for writing to buffers and the physical write to disk.
I wish to confirm there are no more bytes to be read from a buffered reader (neither from the internal buffer, nor from the underlying file object) by trying to read one more byte (and catching EOF).
Is using bufio.Read or bufio.ReadByte suitable for this purpose?
It's not clear from the bufio.Read documentation whether or not the integer returned can be zero, in non-EOF cases. Namely, is 0, nil a valid return value if len(p) > 0?
func (b *Reader) Read(p []byte) (n int, err error)
Read reads data into p. It returns the number of bytes read into p. The bytes are taken from at most one Read on the underlying Reader, hence n may be less than len(p). To read exactly len(p) bytes, use io.ReadFull(b, p). At EOF, the count will be zero and err will be io.EOF.
Similarly, the bufio.ReadByte documentation doesn't separate error cases from EOF cases very well, and it doesn't exactly define what it means by "available" (i.e. available in the internal buffer, or available in the underlying file)?
func (b *Reader) ReadByte() (byte, error)
ReadByte reads and returns a single byte. If no byte is available, returns an error.
Passing a buffer of length 1 to bufio.Read, when the reader is backed with an underlying os.File, will indeed return n==0, io.EOF if the file is at EOF.
The documentation is being a bit imprecise because some of the behavior depends on the underlying reader you pass to the bufio reader. The code for bufio.Read() draws a more accurate picture. I'll outline the logic.
bufio.Read: Only issues a Read to the underlying reader if all bytes in the internal buffer have been exhausted. So, presumably, if you've already read as many bytes from the buffered reader as the number of bytes in the underlying file, that internal buffer should be exhausted when you make the last call bufio.Read(buf[0:1]) to check for EOF.
When the internal buffer is exhausted, and you ask the bufio reader for more, bufio.Read will do at most one call to the underlying reader. The type of error you get then will depend on your underlying reader.
Asking to read for n > 0 bytes from an os.File when the read pointer is already at EOF should return 0, io.EOF (according to the doc on os.File File.Read). But if your underlying reader was something else, perhaps a custom type specific to your application designed to return 0, nil at EOF, then bufio.Read would echo that back instead.
bufio.ReadByte: The logic behind bufio.ReadByte is slightly different but the outcome should be the same as bufio.Read in cases where the underlying reader is an os.File. The main difference with bufio.Read is that bufio.ReadByte can make several attempts to refill the internal buffer. If an error is encountered during refilling (which will be the case for a os.File reader at EOF), it is returned after the first erroneous read attempt. So, if your underlying Reader is an os.File reader, then you'll get 0, io.EOF if and only if your underlying file is at EOF. If your underlying reader was a custom reader type that only returned 0, nil at EOF, then bufio.ReadByte would eventually emit a "NoProgress" error. I'm not sure why the retry logic is only in bufio.ReadByte, but the good news is that either option can be used if your underlying file behaves like an os.File.
Other info:
This is not directly applicable to golang, but you may find the following thread interesting: Can read(2) return zero bytes when not at EOF. Its topic is the read() system call's semantics (POSIX). Reads on non-blocking sockets/files, even when no data is ready, should return -1, not 0, and set errno EAGAIN (or EINTR when interrupted). Non-blocking sockets/files are not really a concept native to go (as far as i know), and the bufio module in particular will panic() whenever/if the underlying reader returns negative numbers, so you don't have to worry about it.
Is it a good idea to generate a secure random hex string until the process succeeds?
All examples I've come across show that if rand.Read returns error, we should panic, os.Exit(1) or return empty string and the error.
I need my program to continue to function in case of such errors and wait until a random string is generated. Is it a good idea to loop until the string is generated, any pitfalls with that?
import "crypto/rand"
func RandomHex() string {
var buf [16]byte
for {
_, err := rand.Read(buf[:])
if err == nil {
break
}
}
return hex.EncodeToString(buf[:])
}
No. It may always return an error in certain contexts.
Example: playground: don't use /dev/urandom in crypto/rand
Imagine that a machine does not have the source that crypto/rand gets data from or the program runs in a context that doesn't have access to that source. In that case you might consider having the program return that error in a meaningful way rather than spin.
More explicitly, if you are serious in your use of crypto/rand then consider writing RandomHex such that it is exceptionally clear to the caller that it is meant for security contexts (possibly rename it) and return the error from RandomHex. The calling function needs to handle that error and let the user know that something is very wrong. For example in a rest api, I'd expect that error to surface to the request handler, fail & return a 500 at that point, and log a high severity error.
Is it a good idea to loop until the string is generated,
That depends. Probably yes.
any pitfalls with that?
You discard the random bytes read on error. And this in a tight loop.
This may drain you entropy source (depending on the OS) faster than
it can be filled.
Instead of an unbound infinite loop: Break after n rounds and give up.
Graceful degradation or stopping is best: If your program is stuck in
an endless loop it is also not "continue"ing.
Go 1.7 beta 1 was released this morning, here is the release notes draft of Go 1.7. A new function KeepAlive was added to the package runtime. The doc of runtime.KeepAlive has given an example:
type File struct { d int }
d, err := syscall.Open("/file/path", syscall.O_RDONLY, 0)
// ... do something if err != nil ...
p := &FILE{d}
runtime.SetFinalizer(p, func(p *File) { syscall.Close(p.d) })
var buf [10]byte
n, err := syscall.Read(p.d, buf[:])
// Ensure p is not finalized until Read returns.
runtime.KeepAlive(p)
// No more uses of p after this point.
The doc of runtime.SetFinalizer has also given an explanation about runtime.KeepAlive:
For example, if p points to a struct that contains a file descriptor
d, and p has a finalizer that closes that file descriptor, and if the
last use of p in a function is a call to syscall.Write(p.d, buf,
size), then p may be unreachable as soon as the program enters
syscall.Write. The finalizer may run at that moment, closing p.d,
causing syscall.Write to fail because it is writing to a closed file
descriptor (or, worse, to an entirely different file descriptor opened
by a different goroutine). To avoid this problem, call
runtime.KeepAlive(p) after the call to syscall.Write.
What confused me is that the variable p has not left its life scope yet, why will it be unreachable? Does that mean that a variable will be unreachable if only there is no use of it in the following code, no matter whether it is in its life scope?
A variable becomes unreachable when the runtime detects that the Go code cannot reach a point where that variable is referenced again.
In the example you posted, a syscall.Open() is used to open a file. The returned file descriptor (which is just an int value) is "wrapped" in a struct. Then a finalizer is attached to this struct value that closes the file descriptor. Now when this struct value becomes unreachable, its finalizer may be run at any moment, and the closing / invalidation / re-using of the file descriptor could cause unexpected behavior or errors in the execution of the Read() syscall.
The last use of this struct value p in Go code is when syscall.Read() is invoked (and the file descriptor p.d is passed to it). The implementation of the syscall will use that file descriptor after the initiation of syscall.Read(), it may do so up until syscall.Read() returns. But this use of the file descriptor is "independent" of the Go code.
So the struct value p is not used during the execution of the syscall, and the syscall blocks the Go code until it returns. Which means the Go runtime is allowed to mark p as unreachable during the execution of Read() (before Read() returns), or even before its actual execution begins (because p is only used to provide the arguments to call Read().
Hence the call to runtime.KeepAlive(): since this call is after the syscall.Read() and it references the variable p, the Go runtime is not allowed to mark p unreachable before Read() returns, because this is after the Read() call.
Note that you could use other constructs to "keep p alive", e.g. _ = p or returning it. runtime.KeepAlive() does nothing magical in the background, its implementation is:
func KeepAlive(interface{}) {}
runtime.KeepAlive() does provide a much better alternative because:
It clearly documents we want to keep p alive (to prevent runs of Finalizers).
Using other constructs such as _ = p might get "optimized" out by future compilers, but not runtime.KeepAlive() calls.