Mmap new file to existing pointer instead of munmap - go

I am using mmap on Go, after mmap a file, this pointer will be used across all goroutines.
Then i want to update this file data (with new size + data layout) if i munmap it, it will cause segfault error if any other goroutine access to the freedmemory region.
Then i don't use munmap, i create a new file with updated data then i mmap this file on the old pointer, will it work or cause any memory leak?
// mmap a file
b, err := syscall.Mmap(fdOldFile, 0, int(dataSize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
// mmap new file with new size
nb, e := syscall.Mmap(fdNewFile, 0, int(newSize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
// pooring data to new file with new data layout
// ...
// munmap b will cause segfault if b is beging used in another goroutine
// syscall.Munmap(b)
os.Remove(oldFile)
os.Rename(newFile, oldFile)
syscall.Munmap(nb)
// set b = new b instead
b = syscall.Mmap(fdNewFile, 0, int(newSize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)

The code in your example will keep the old file memory mapped, this is because the kernel will keep it mapped until you unmap it or the process exits. Because of this the syscall/sys library always keeps a reference to the memory mapped address to prevent it from being garbage collected, even if you lose the reference.
The proper way to replace the file behind the same address is to use the mmap syscall with the same address. However, the syscall.Mmap wrapper will not let you specify the address param, it is always 0(which means that the kernel will pick a address not currently in use).
You can also grow or shrink the existing region with the mremap syscall, but no wrapper exists for this syscall in the stdlib. The most likey reason for these limitations it that when you change an existing mapping, the length may change. Go will return a []byte, which internally has a cap and len value. So if the size of the underlaying array changes but the len will not you can get segfaults. And since the len and cap are passed by value, the stdlib can't change these slices when changing the underlying memory.
So, in order to do this, assuming you still want to you have to:
Expose the internal syscall.mmap function which does allow you to specify address
import _ "unsafe"
//go:linkname mmap syscall.mmap
func mmap(addr uintptr, length uintptr, prot int, flags int, fd int, offset int64) (xaddr uintptr, err error)
You should still use syscall.Mmap for the initial allocation of the address because there are a few requirements and it is better to let the kernel pick a good addresss, but now you can change it. You will need to use reflection and unsafe pointer casting to get the address form the []byte you got back from syscall.Mmap.
If you are going to pass a different length you must also the change
the len of all copies of the []byte including subslices to avoid segfaults. If you use the exact same length every time this should not be an issue.
So TLDR: You need to be very sure what you are doing to not make any mistakes or you will some nasty bugs, but it can be done.

Related

Go check if bufio reader is empty

var r bufio.Reader
How do I check if r has no more data (is empty, is depleted)?
I understand that this may need to block until that information is known.
Can't find anything searching Google. I thought the Peek function would be useful to see if there is more data, but this seems to only peek an underlying buffer if exists. I could also try to Read one byte and subsequently call UnreadByte but that's extremely messy and unclear, are there any better options?
If r.Peek(1) returns data, then the next call to Read will return data.
If there's no data in the buffer, then Peek calls to the underlying reader and will block until data is available or an error.
If I understand your question correctly, would this work?
// func (*Reader) Size() int
// Size returns the size of the underlying buffer in bytes.
size := r.Size()
// func (*Reader) Buffered() int
// Buffered returns the number of bytes that can be read from the current buffer
buffered := r.Buffered()

Program changes memory values when using calloc() vs make() for slices

I am trying to build a slice of pointers manually and with C.calloc() for allocating the array portion of the slice. I am able to do this successfully though when I try and add pointers that I allocate with make() some of the values (of what the pointers point to) get changed seemingly randomly. On the other hand if I C.calloc() space for the pointers I will be adding, the value are not changed. Or, if I allocate the slices with make() and the pointers I add are allocated with make() the values are not changed.
I do notice that the memory locations of the pointers when using C.calloc() vs make() are very different but I don't see why this should cause the memory to be changed randomly. I am new to Go so please forgive me if I am overlooking some very simple.
Here is the code I use for allocating my slices manually:
type caster struct {
ptr *byte;
len int64;
cap int64;
}
var temp caster;
temp.ptr=(*byte)(C.calloc(C.ulong(size),8));
temp.len=int64(size);
temp.cap=int64(size);
newTable.table=*(*[]*entry)(unsafe.Pointer(&temp));
This works if the entries I add are allocated as follows:
var temp caster;
var e []entry;
temp.ptr=(*byte)(C.calloc(C.ulong(ninserts),8));
temp.len=int64(ninserts);
temp.cap=int64(ninserts);
e=*(*[]entry)(unsafe.Pointer(&temp));
for i:=0;i<ninserts;i++ {
e[i].val=hint64(rand.Int63());
}
for i:=0;i<ninserts;i++ {
ht.insert(&e[i]);
}
though the memory of of the entries gets randomly changed if they are allocated as follows:
var e []entry = make([]entry, ninserts);
for i:=0;i<ninserts;i++ {
e[i].val=hint64(rand.Int63());
}
for i:=0;i<ninserts;i++ {
ht.insert(&e[i]);
}
Unless I build my slices normally as follows:
newTable.table = make([]*entry, size);
I am trying to build a slice of pointers manually and with C.calloc() for allocating the array portion of the slice.
This is explicitly forbidden.
To quote from the official cgo documentation:
Go is a garbage collected language, and the garbage collector needs to know the location of every pointer to Go memory. Because of this, there are restrictions on passing pointers between Go and C.
In this section the term Go pointer means a pointer to memory allocated by Go (such as by using the & operator or calling the predefined new function) and the term C pointer means a pointer to memory allocated by C (such as by a call to C.malloc). Whether a pointer is a Go pointer or a C pointer is a dynamic property determined by how the memory was allocated; it has nothing to do with the type of the pointer.
Note that values of some Go types, other than the type's zero value, always include Go pointers. This is true of string, slice, interface, channel, map, and function types. A pointer type may hold a Go pointer or a C pointer. Array and struct types may or may not include Go pointers, depending on the element types. All the discussion below about Go pointers applies not just to pointer types, but also to other types that include Go pointers.
The boldface above is mine. It means that you must not allocate any of these types via C's allocators.
It is possible to defeat this enforcement by using the unsafe package, and of course there is nothing stopping the C code from doing anything it likes. However, programs that break these rules are likely to fail in unexpected and unpredictable ways.
This bit of your own code:
newTable.table=*(*[]*entry)(unsafe.Pointer(&temp));
violates the rules, but defeats their enforcement. You have allocated C memory, and are now trying to use it as if it were Go memory, in the form of a slice.

Is it necessary to return pointer type in sync.Pool New function?

I saw the issue on Github which says sync.Pool should be used only with pointer types, for example:
var TPool = sync.Pool{
New: func() interface{} {
return new(T)
},
}
Does it make sense? What about return T{} and which is the better choice, why?
The whole point of sync.Pool is to avoid (expensive) allocations. Large-ish buffers, etc. You allocate a few buffers and they stay in memory, available for reuse. Hence the use of pointers.
But here you'll be copying the values on every step, defeating the purpose. (Assuming your T is a "normal" struct and not something like SliceHeader)
It is not necessary. In most cases it should be a pointer as you want to share an object, not to make copies.
In some use cases this can be a non pointer type, like an id of some external resource. I can imagine a pool of paths (mounted disk drives) represented with strings where some large file operations are being conducted.

Golang's bytes.Buffer thread safety for one writer/one reader

I know that golang's bytes.Buffer is not thread-safe but if I have one writer (in a goroutine) and one reader (in another goroutine). Is it safe?
If not, then why is it not? Write appends to the buffer while reads read from the start so I don't see a scenario where they will be accessing the same memory location.
No, it's not safe.
bytes.Buffer is a struct, and both the Buffer.Read() and Buffer.Write() methods read / modify the same fields of the same struct value (they have pointer receivers). This alone is enough to be unsafe for concurrent use. For more details, see Is it safe to read a function pointer concurrently without a lock?
Also think about that a bytes.Buffer stores bytes in a byte slice, which is a field of the struct. When writing, it might be necessary to sometimes allocate a bigger buffer (if slice capacity is not enough), and so the slice header (the slice struct field) must be changed (in Write()). Without synchronization there's no guarantee that a concurrent Read() will see this.
And... even if no reallocation is needed (because the underlying byte slice has enough capacity to accommodate the data passed to Write()), storing the data in the byte slice requires to reslice it, so the slice header changes even if no reallocation is needed (the length of the slice is also part of the slice header). To see what's in the slice header, check out the reflect.SliceHeader type.

Transfer a pointer through boost::interprocess::message_queue

What I am trying to do is have application A send application B a pointer to an object which A has allocated on shared memory ( using boost::interprocess ). For that pointer transfer I intend to use boost::interprocess::message_queue. Obviously a direct raw pointer from A is not valid in B so I try to transfer an offset_ptr allocated on the shared memory. However that also does not seem to work.
Process A does this:
typedef offset_ptr<MyVector> MyVectorPtr;
MyVectorPtr * myvector;
myvector = segment->construct<MyVectorPtr>( boost::interprocess::anonymous_instance )();
*myvector = segment->construct<MyVector>( boost::interprocess::anonymous_instance )
(*alloc_inst_vec); ;
// myvector gets filled with data here
//Send on the message queue
mq->send(myvector, sizeof(MyVectorPtr), 0);
Process B does this:
// Create a "buffer" on this side of the queue
MyVectorPtr * myvector;
myvector = segment->construct<MyVectorPtr>( boost::interprocess::anonymous_instance )();
mq->receive( myvector, sizeof(MyVectorPtr), recvd_size, priority);
As I see it, in this way a do a bit copy of the offset pointer which invalidates him in process B. How do I do this right?
It seems you can address it as described in this post on the boost mailing list.
I agree there is some awkwardness here and offset_ptr doesn't really work for what you are trying to do. offset_ptr is useful if the pointer itself is stored inside of another class/struct which also is allocated in your shared memory segment, but generally you have some top-level item which is not a member of some object allocated in shared memory.
You'll notice the offset_ptr example kindof glosses over this - it just has a comment "Communicate list to other processes" with no details. In some cases you may have a single named top-level object and that name can be how you communicate it, but if you have an arbitrary number of top-level objects to communicate, it seems like just sending the offset from the shared memory's base address is the best you can do.
You calculate the offset on the sending in, send it, and then add to the base adddress on the receiving end. If you want to be able to send nullptr as well, you could do like offset_ptr does and agree that 1 is an offset that is sufficiently unlikely to be used, or pick another unlikely sentinel value.

Resources