var r bufio.Reader
How do I check if r has no more data (is empty, is depleted)?
I understand that this may need to block until that information is known.
Can't find anything searching Google. I thought the Peek function would be useful to see if there is more data, but this seems to only peek an underlying buffer if exists. I could also try to Read one byte and subsequently call UnreadByte but that's extremely messy and unclear, are there any better options?
If r.Peek(1) returns data, then the next call to Read will return data.
If there's no data in the buffer, then Peek calls to the underlying reader and will block until data is available or an error.
If I understand your question correctly, would this work?
// func (*Reader) Size() int
// Size returns the size of the underlying buffer in bytes.
size := r.Size()
// func (*Reader) Buffered() int
// Buffered returns the number of bytes that can be read from the current buffer
buffered := r.Buffered()
Related
I am using mmap on Go, after mmap a file, this pointer will be used across all goroutines.
Then i want to update this file data (with new size + data layout) if i munmap it, it will cause segfault error if any other goroutine access to the freedmemory region.
Then i don't use munmap, i create a new file with updated data then i mmap this file on the old pointer, will it work or cause any memory leak?
// mmap a file
b, err := syscall.Mmap(fdOldFile, 0, int(dataSize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
// mmap new file with new size
nb, e := syscall.Mmap(fdNewFile, 0, int(newSize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
// pooring data to new file with new data layout
// ...
// munmap b will cause segfault if b is beging used in another goroutine
// syscall.Munmap(b)
os.Remove(oldFile)
os.Rename(newFile, oldFile)
syscall.Munmap(nb)
// set b = new b instead
b = syscall.Mmap(fdNewFile, 0, int(newSize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
The code in your example will keep the old file memory mapped, this is because the kernel will keep it mapped until you unmap it or the process exits. Because of this the syscall/sys library always keeps a reference to the memory mapped address to prevent it from being garbage collected, even if you lose the reference.
The proper way to replace the file behind the same address is to use the mmap syscall with the same address. However, the syscall.Mmap wrapper will not let you specify the address param, it is always 0(which means that the kernel will pick a address not currently in use).
You can also grow or shrink the existing region with the mremap syscall, but no wrapper exists for this syscall in the stdlib. The most likey reason for these limitations it that when you change an existing mapping, the length may change. Go will return a []byte, which internally has a cap and len value. So if the size of the underlaying array changes but the len will not you can get segfaults. And since the len and cap are passed by value, the stdlib can't change these slices when changing the underlying memory.
So, in order to do this, assuming you still want to you have to:
Expose the internal syscall.mmap function which does allow you to specify address
import _ "unsafe"
//go:linkname mmap syscall.mmap
func mmap(addr uintptr, length uintptr, prot int, flags int, fd int, offset int64) (xaddr uintptr, err error)
You should still use syscall.Mmap for the initial allocation of the address because there are a few requirements and it is better to let the kernel pick a good addresss, but now you can change it. You will need to use reflection and unsafe pointer casting to get the address form the []byte you got back from syscall.Mmap.
If you are going to pass a different length you must also the change
the len of all copies of the []byte including subslices to avoid segfaults. If you use the exact same length every time this should not be an issue.
So TLDR: You need to be very sure what you are doing to not make any mistakes or you will some nasty bugs, but it can be done.
I would like to parse several times with gocal data I retrieve through a HTTP call. Since I would like to avoid making the call for each of the parsing, I would like to save this data and reuse it.
The Body I get from http.Get is of type io.ReadCloser. The gocal parser requires io.Reader so it works.
Since I can retrieve Body only once, I can save it with body, _ := io.ReadAll(get.Body) but then I do not know how to serve []byte as io.Reader back (to the gocal parser, several times to account for different parsing conditions)
As you have figured, the http.Response.Body is exposed as an io.Reader, this reader is not re usable because it is connected straight to the underlying connection* (might be tcp/utp/or any other stream like reader under the net package).
Once you read the bytes out of the connection, new bytes are sitting their waiting for another read.
In order to save the response, indeed, you need to drain it first, and save that result within a variable.
body, _ := io.ReadAll(get.Body)
To re use that slice of bytes many time using the Go programming language, the standard API provides a buffered reader bytes.NewReader.
This buffer adequately offers the Reset([]byte) method to reset the state of the buffer.
The bytes.Reader.Reset is very useful to read multiple times the same bytes buffer with no allocations. In comparison, bytes.NewReader allocates every time it is called.
Finally, between two consecutive calls to c.Parser, you should reset the buffer with bytes buffer you have collected previously.
such as :
buf := bytes.NewReader(body)
// initialize the parser
c.Parse()
// process the result
// reset the buf, parse again
buf.Reset(body)
c.Parse()
You can try this version https://play.golang.org/p/YaVtCTZHZEP It uses the strings.NewReader buffer, but the interface and behavior are similar.
not super obvious, that is the general principle, the transport reads the headers, and leave the body untouched unless you consume it. see also that.
I wish to confirm there are no more bytes to be read from a buffered reader (neither from the internal buffer, nor from the underlying file object) by trying to read one more byte (and catching EOF).
Is using bufio.Read or bufio.ReadByte suitable for this purpose?
It's not clear from the bufio.Read documentation whether or not the integer returned can be zero, in non-EOF cases. Namely, is 0, nil a valid return value if len(p) > 0?
func (b *Reader) Read(p []byte) (n int, err error)
Read reads data into p. It returns the number of bytes read into p. The bytes are taken from at most one Read on the underlying Reader, hence n may be less than len(p). To read exactly len(p) bytes, use io.ReadFull(b, p). At EOF, the count will be zero and err will be io.EOF.
Similarly, the bufio.ReadByte documentation doesn't separate error cases from EOF cases very well, and it doesn't exactly define what it means by "available" (i.e. available in the internal buffer, or available in the underlying file)?
func (b *Reader) ReadByte() (byte, error)
ReadByte reads and returns a single byte. If no byte is available, returns an error.
Passing a buffer of length 1 to bufio.Read, when the reader is backed with an underlying os.File, will indeed return n==0, io.EOF if the file is at EOF.
The documentation is being a bit imprecise because some of the behavior depends on the underlying reader you pass to the bufio reader. The code for bufio.Read() draws a more accurate picture. I'll outline the logic.
bufio.Read: Only issues a Read to the underlying reader if all bytes in the internal buffer have been exhausted. So, presumably, if you've already read as many bytes from the buffered reader as the number of bytes in the underlying file, that internal buffer should be exhausted when you make the last call bufio.Read(buf[0:1]) to check for EOF.
When the internal buffer is exhausted, and you ask the bufio reader for more, bufio.Read will do at most one call to the underlying reader. The type of error you get then will depend on your underlying reader.
Asking to read for n > 0 bytes from an os.File when the read pointer is already at EOF should return 0, io.EOF (according to the doc on os.File File.Read). But if your underlying reader was something else, perhaps a custom type specific to your application designed to return 0, nil at EOF, then bufio.Read would echo that back instead.
bufio.ReadByte: The logic behind bufio.ReadByte is slightly different but the outcome should be the same as bufio.Read in cases where the underlying reader is an os.File. The main difference with bufio.Read is that bufio.ReadByte can make several attempts to refill the internal buffer. If an error is encountered during refilling (which will be the case for a os.File reader at EOF), it is returned after the first erroneous read attempt. So, if your underlying Reader is an os.File reader, then you'll get 0, io.EOF if and only if your underlying file is at EOF. If your underlying reader was a custom reader type that only returned 0, nil at EOF, then bufio.ReadByte would eventually emit a "NoProgress" error. I'm not sure why the retry logic is only in bufio.ReadByte, but the good news is that either option can be used if your underlying file behaves like an os.File.
Other info:
This is not directly applicable to golang, but you may find the following thread interesting: Can read(2) return zero bytes when not at EOF. Its topic is the read() system call's semantics (POSIX). Reads on non-blocking sockets/files, even when no data is ready, should return -1, not 0, and set errno EAGAIN (or EINTR when interrupted). Non-blocking sockets/files are not really a concept native to go (as far as i know), and the bufio module in particular will panic() whenever/if the underlying reader returns negative numbers, so you don't have to worry about it.
I know that golang's bytes.Buffer is not thread-safe but if I have one writer (in a goroutine) and one reader (in another goroutine). Is it safe?
If not, then why is it not? Write appends to the buffer while reads read from the start so I don't see a scenario where they will be accessing the same memory location.
No, it's not safe.
bytes.Buffer is a struct, and both the Buffer.Read() and Buffer.Write() methods read / modify the same fields of the same struct value (they have pointer receivers). This alone is enough to be unsafe for concurrent use. For more details, see Is it safe to read a function pointer concurrently without a lock?
Also think about that a bytes.Buffer stores bytes in a byte slice, which is a field of the struct. When writing, it might be necessary to sometimes allocate a bigger buffer (if slice capacity is not enough), and so the slice header (the slice struct field) must be changed (in Write()). Without synchronization there's no guarantee that a concurrent Read() will see this.
And... even if no reallocation is needed (because the underlying byte slice has enough capacity to accommodate the data passed to Write()), storing the data in the byte slice requires to reslice it, so the slice header changes even if no reallocation is needed (the length of the slice is also part of the slice header). To see what's in the slice header, check out the reflect.SliceHeader type.
buff := bytes.NewBuffer(somebytes)
How to write on top of buff? Currently I'm creating a new buffer. Is this the right way?
newBuff := bytes.NewBuffer(otherbytes)
newBuff.ReadFrom(buff)
bytes.NewBuffer() returns a *Buffer. *Buffer implements io.Writer (and io.Reader) so you can simply write to it by calling its Write() or WriteString() methods.
Example:
somebytes := []byte("abc")
buff := bytes.NewBuffer(somebytes)
buff.Write([]byte("def"))
fmt.Println(buff)
Output as expected (try it on the Go Playground):
abcdef
If you want to start with an empty buffer, you can simply create an empty Buffer struct (and take its address):
buff := &bytes.Buffer{}
If you want to "overwrite" the current content of the buffer, you can use the Buffer.Reset() method or the equivalent Buffer.Truncate(0) call.
Note that resetting or truncating the buffer will throw away the content (or only a part of it in case of Truncate(). But the allocated buffer (byte slice) in the background is kept and reused.
Note:
What you really want to do is not possible directly: just imagine if you want to insert some data in front of an existing content, the existing content would have to be shifted every time you write / insert something in front of it. This is not really efficient.
Instead create your body in a Buffer. Once it's done, you will know what your header will be. Create the header in another Buffer, and when it's done, copy (write) the body (from the first Buffer) into the second already containing the header.
Or if you don't need to store the whole data, you don't need to create a 2nd Buffer for the header. Once the body is ready, write the header to your output, and then write the body from the Buffer.