Golang's bytes.Buffer thread safety for one writer/one reader - go

I know that golang's bytes.Buffer is not thread-safe but if I have one writer (in a goroutine) and one reader (in another goroutine). Is it safe?
If not, then why is it not? Write appends to the buffer while reads read from the start so I don't see a scenario where they will be accessing the same memory location.

No, it's not safe.
bytes.Buffer is a struct, and both the Buffer.Read() and Buffer.Write() methods read / modify the same fields of the same struct value (they have pointer receivers). This alone is enough to be unsafe for concurrent use. For more details, see Is it safe to read a function pointer concurrently without a lock?
Also think about that a bytes.Buffer stores bytes in a byte slice, which is a field of the struct. When writing, it might be necessary to sometimes allocate a bigger buffer (if slice capacity is not enough), and so the slice header (the slice struct field) must be changed (in Write()). Without synchronization there's no guarantee that a concurrent Read() will see this.
And... even if no reallocation is needed (because the underlying byte slice has enough capacity to accommodate the data passed to Write()), storing the data in the byte slice requires to reslice it, so the slice header changes even if no reallocation is needed (the length of the slice is also part of the slice header). To see what's in the slice header, check out the reflect.SliceHeader type.

Related

Mmap new file to existing pointer instead of munmap

I am using mmap on Go, after mmap a file, this pointer will be used across all goroutines.
Then i want to update this file data (with new size + data layout) if i munmap it, it will cause segfault error if any other goroutine access to the freedmemory region.
Then i don't use munmap, i create a new file with updated data then i mmap this file on the old pointer, will it work or cause any memory leak?
// mmap a file
b, err := syscall.Mmap(fdOldFile, 0, int(dataSize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
// mmap new file with new size
nb, e := syscall.Mmap(fdNewFile, 0, int(newSize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
// pooring data to new file with new data layout
// ...
// munmap b will cause segfault if b is beging used in another goroutine
// syscall.Munmap(b)
os.Remove(oldFile)
os.Rename(newFile, oldFile)
syscall.Munmap(nb)
// set b = new b instead
b = syscall.Mmap(fdNewFile, 0, int(newSize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
The code in your example will keep the old file memory mapped, this is because the kernel will keep it mapped until you unmap it or the process exits. Because of this the syscall/sys library always keeps a reference to the memory mapped address to prevent it from being garbage collected, even if you lose the reference.
The proper way to replace the file behind the same address is to use the mmap syscall with the same address. However, the syscall.Mmap wrapper will not let you specify the address param, it is always 0(which means that the kernel will pick a address not currently in use).
You can also grow or shrink the existing region with the mremap syscall, but no wrapper exists for this syscall in the stdlib. The most likey reason for these limitations it that when you change an existing mapping, the length may change. Go will return a []byte, which internally has a cap and len value. So if the size of the underlaying array changes but the len will not you can get segfaults. And since the len and cap are passed by value, the stdlib can't change these slices when changing the underlying memory.
So, in order to do this, assuming you still want to you have to:
Expose the internal syscall.mmap function which does allow you to specify address
import _ "unsafe"
//go:linkname mmap syscall.mmap
func mmap(addr uintptr, length uintptr, prot int, flags int, fd int, offset int64) (xaddr uintptr, err error)
You should still use syscall.Mmap for the initial allocation of the address because there are a few requirements and it is better to let the kernel pick a good addresss, but now you can change it. You will need to use reflection and unsafe pointer casting to get the address form the []byte you got back from syscall.Mmap.
If you are going to pass a different length you must also the change
the len of all copies of the []byte including subslices to avoid segfaults. If you use the exact same length every time this should not be an issue.
So TLDR: You need to be very sure what you are doing to not make any mistakes or you will some nasty bugs, but it can be done.

How to save, and then serve again data of type io.Reader?

I would like to parse several times with gocal data I retrieve through a HTTP call. Since I would like to avoid making the call for each of the parsing, I would like to save this data and reuse it.
The Body I get from http.Get is of type io.ReadCloser. The gocal parser requires io.Reader so it works.
Since I can retrieve Body only once, I can save it with body, _ := io.ReadAll(get.Body) but then I do not know how to serve []byte as io.Reader back (to the gocal parser, several times to account for different parsing conditions)
As you have figured, the http.Response.Body is exposed as an io.Reader, this reader is not re usable because it is connected straight to the underlying connection* (might be tcp/utp/or any other stream like reader under the net package).
Once you read the bytes out of the connection, new bytes are sitting their waiting for another read.
In order to save the response, indeed, you need to drain it first, and save that result within a variable.
body, _ := io.ReadAll(get.Body)
To re use that slice of bytes many time using the Go programming language, the standard API provides a buffered reader bytes.NewReader.
This buffer adequately offers the Reset([]byte) method to reset the state of the buffer.
The bytes.Reader.Reset is very useful to read multiple times the same bytes buffer with no allocations. In comparison, bytes.NewReader allocates every time it is called.
Finally, between two consecutive calls to c.Parser, you should reset the buffer with bytes buffer you have collected previously.
such as :
buf := bytes.NewReader(body)
// initialize the parser
c.Parse()
// process the result
// reset the buf, parse again
buf.Reset(body)
c.Parse()
You can try this version https://play.golang.org/p/YaVtCTZHZEP It uses the strings.NewReader buffer, but the interface and behavior are similar.
not super obvious, that is the general principle, the transport reads the headers, and leave the body untouched unless you consume it. see also that.

Go check if bufio reader is empty

var r bufio.Reader
How do I check if r has no more data (is empty, is depleted)?
I understand that this may need to block until that information is known.
Can't find anything searching Google. I thought the Peek function would be useful to see if there is more data, but this seems to only peek an underlying buffer if exists. I could also try to Read one byte and subsequently call UnreadByte but that's extremely messy and unclear, are there any better options?
If r.Peek(1) returns data, then the next call to Read will return data.
If there's no data in the buffer, then Peek calls to the underlying reader and will block until data is available or an error.
If I understand your question correctly, would this work?
// func (*Reader) Size() int
// Size returns the size of the underlying buffer in bytes.
size := r.Size()
// func (*Reader) Buffered() int
// Buffered returns the number of bytes that can be read from the current buffer
buffered := r.Buffered()

Program changes memory values when using calloc() vs make() for slices

I am trying to build a slice of pointers manually and with C.calloc() for allocating the array portion of the slice. I am able to do this successfully though when I try and add pointers that I allocate with make() some of the values (of what the pointers point to) get changed seemingly randomly. On the other hand if I C.calloc() space for the pointers I will be adding, the value are not changed. Or, if I allocate the slices with make() and the pointers I add are allocated with make() the values are not changed.
I do notice that the memory locations of the pointers when using C.calloc() vs make() are very different but I don't see why this should cause the memory to be changed randomly. I am new to Go so please forgive me if I am overlooking some very simple.
Here is the code I use for allocating my slices manually:
type caster struct {
ptr *byte;
len int64;
cap int64;
}
var temp caster;
temp.ptr=(*byte)(C.calloc(C.ulong(size),8));
temp.len=int64(size);
temp.cap=int64(size);
newTable.table=*(*[]*entry)(unsafe.Pointer(&temp));
This works if the entries I add are allocated as follows:
var temp caster;
var e []entry;
temp.ptr=(*byte)(C.calloc(C.ulong(ninserts),8));
temp.len=int64(ninserts);
temp.cap=int64(ninserts);
e=*(*[]entry)(unsafe.Pointer(&temp));
for i:=0;i<ninserts;i++ {
e[i].val=hint64(rand.Int63());
}
for i:=0;i<ninserts;i++ {
ht.insert(&e[i]);
}
though the memory of of the entries gets randomly changed if they are allocated as follows:
var e []entry = make([]entry, ninserts);
for i:=0;i<ninserts;i++ {
e[i].val=hint64(rand.Int63());
}
for i:=0;i<ninserts;i++ {
ht.insert(&e[i]);
}
Unless I build my slices normally as follows:
newTable.table = make([]*entry, size);
I am trying to build a slice of pointers manually and with C.calloc() for allocating the array portion of the slice.
This is explicitly forbidden.
To quote from the official cgo documentation:
Go is a garbage collected language, and the garbage collector needs to know the location of every pointer to Go memory. Because of this, there are restrictions on passing pointers between Go and C.
In this section the term Go pointer means a pointer to memory allocated by Go (such as by using the & operator or calling the predefined new function) and the term C pointer means a pointer to memory allocated by C (such as by a call to C.malloc). Whether a pointer is a Go pointer or a C pointer is a dynamic property determined by how the memory was allocated; it has nothing to do with the type of the pointer.
Note that values of some Go types, other than the type's zero value, always include Go pointers. This is true of string, slice, interface, channel, map, and function types. A pointer type may hold a Go pointer or a C pointer. Array and struct types may or may not include Go pointers, depending on the element types. All the discussion below about Go pointers applies not just to pointer types, but also to other types that include Go pointers.
The boldface above is mine. It means that you must not allocate any of these types via C's allocators.
It is possible to defeat this enforcement by using the unsafe package, and of course there is nothing stopping the C code from doing anything it likes. However, programs that break these rules are likely to fail in unexpected and unpredictable ways.
This bit of your own code:
newTable.table=*(*[]*entry)(unsafe.Pointer(&temp));
violates the rules, but defeats their enforcement. You have allocated C memory, and are now trying to use it as if it were Go memory, in the form of a slice.

Is it necessary to return pointer type in sync.Pool New function?

I saw the issue on Github which says sync.Pool should be used only with pointer types, for example:
var TPool = sync.Pool{
New: func() interface{} {
return new(T)
},
}
Does it make sense? What about return T{} and which is the better choice, why?
The whole point of sync.Pool is to avoid (expensive) allocations. Large-ish buffers, etc. You allocate a few buffers and they stay in memory, available for reuse. Hence the use of pointers.
But here you'll be copying the values on every step, defeating the purpose. (Assuming your T is a "normal" struct and not something like SliceHeader)
It is not necessary. In most cases it should be a pointer as you want to share an object, not to make copies.
In some use cases this can be a non pointer type, like an id of some external resource. I can imagine a pool of paths (mounted disk drives) represented with strings where some large file operations are being conducted.

Resources