CGO known implementation bug in pointer checks

CGO known implementation bug in pointer checks - go

According to the documentation of CGO (https://pkg.go.dev/cmd/cgo), there is a known bug in the implementation:
Note: the current implementation has a bug. While Go code is permitted to write nil or a C pointer (but not a Go pointer) to C memory, the current implementation may sometimes cause a runtime error if the contents of the C memory appear to be a Go pointer. Therefore, avoid passing uninitialized C memory to Go code if the Go code is going to store pointer values in it. Zero out the memory in C before passing it to Go.
I looked for this in the issue tracker at GitHub but can't find it there. Could someone please elaborate on why this might happen? How does the runtime find Go pointers in uninitialized C memory?
Eg. let's say I am passing an uninitialized char array to a Go function from C, how can the runtime interpret a Go pointer in this memory?
Also, the "if the Go code is going to store pointer values in it" part confuses me. Why does later use of this memory matter?

I looked for this in the issue tracker at GitHub but can't find it there.
The bug to which this comment refers is https://golang.org/issue/19928, which is admittedly not easy to find. 😅
Could someone please elaborate on why this might happen? How does the runtime find Go pointers in uninitialized C memory?
During certain parts of a garbage collection cycle, the collector turns on a “write barrier” for writes to pointers in the Go heap, recording the previously-stored pointer value to ensure that it is not missed during the GC scan.
The bug here is that the write barrier sometimes also records the previously-stored pointer value for pointers outside the Go heap. If that value looks like a Go pointer, the garbage collector may try to scan it recursively, and could crash if it isn't actually a valid pointer.
Eg. let's say I am passing an uninitialized char array to a Go function from C, how can the runtime interpret a Go pointer in this memory?
This bug should not occur if the uninitialized data passed to Go is of a type that does not contain any pointers. So for a char array in particular you should be fine either way.
Also, the "if the Go code is going to store pointer values in it" part confuses me. Why does later use of this memory matter?
The compiler inserts write barriers at store instructions for pointer types. If the Go program does not store pointers, then the compiler will not emit any write barriers, and the bug in the write barrier will not be triggered.

Related

Is allocation C memory to hold a Go struct a supported use case for cgo?

I've been exploring strategies around not passing nested go pointers around into C. Here's an example of how I'm try out allocating a block of C memory with the intent of holding a Go struct:
(*MyGoStruvt)(C.calloc(1, unsafe.Sizeof(MyGoStruvt{})))
Does anyone know if this is a supported use case? And if not, could someone explain how wrong this approach is?

C++ primer 5 edition: Container of shared_ptr

Again reading C++ Primer 5 Edition. I am on chapter 12 Dynamic memory. Everything is OK. Until this point in the book:
"Because memory is not freed until the last shared_ptr goes away, it can be important to be sure that shared_ptrs don’t stay around after they are no longer needed.The program will execute correctly but may waste memory if you neglect to destroy shared_ptrs that the program does not need.One way that shared_ptrs might stay around after you need them is if you put shared_ptrs in a container and subsequently reorder the container so that you don’t need all the elements.You should be sure to erase shared_ptr elements once you no longer need those elements.
Note
If you put shared_ptrs in a container, and you subsequently need to use some, but not all, of the elements, remember to erase the elements you no longer need."
I don't understand this paragraph can someone explain to me how may shared_ptrs leaks? and an example of the "container" of shared_ptr that can cause leak. Thank you.

It essentially means that as long as you have a std::shared_ptr object in your container the object it points to will not be deleted.
So once you have no more use of that object, you should remove the corresponding std::shared_ptr from your container so the storage can be freed.
If you were to keep adding elements to your container and never removing any, you would essentially leak memory (ofc it will be cleaned up when the reference count hits 0, but until then it's reserved for no reason).
Side note, make sure you always think about when you are using std::shared_ptr. Often a std::unique_ptr is enough and should the need arise to make it shared it's easy to do that.
See Does C++11 unique_ptr and shared_ptr able to convert to each other's type?
also
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rr-unique

(Idiomatic?) Difference between new(T) and &T{...}?

I started kidding around with Go and am a little irritated by the new function. It seems to be quite limited, especially when considering structures with anonymous fields or inline initialisations. So I read through the spec and stumbled over the following paragraph:
Calling the built-in function new or taking the address of a composite literal allocates storage for a variable at run time.
So I have the suspicion that new(T) and &T{} will behave in the exact same way, is that correct? And if that is correct, in what situation should new be used?

Yes, you are correct. new is not that useful with structs. But it is with other basic types. new(int) will get you a pointer to a zero-valued int, and you can't do &int{} or similar.
In any case, in my experience, you rarely want that, so new is rarely used. You can just declare a plain int and pass around a pointer to it. In fact, doing this is probably better because liberates you from thinking about allocating in the stack vs. in the heap, as the compiler will decide for you.

Is there valid "use cases" for Undefined Behaviour?

I have found a piece of code which has UB, and was told to leave it in the code, with a comment that states it is UB. Using MSVC2012 only.
The code itself has a raw array of Foo objects, then casts that array to char* with reinterpret_cast<char*> and then calls delete casted_array (like this, not delete[]) on it.
Like this:
Foo* foos = new Foo[500];
char* CastedFoos = reinterpret_cast<char*>(foos);
delete CastedFoos;
Per the Standard 5.3.5/3 this is clearly Undefined Behavior.
Apparently this code does what it does to avoid having to call destructors as an optimisation.
I wondered, is there actually places where leaving UB in the code could be considered valid?
Also, as far as I'm concerned, leaving the above in code is not smart, am I right?

It depends entirely on your perspective.
Take an extreme example: in C++03, threads were undefined behavior. As soon as you had more than one thread, your program's behavior was no longer defined by the C++ standard.
And yet, most people would say threads are useful.
Of course, multithreading may have been UB according to the C++ standard, but individual compilers didn't treat it as undefined. They provided an additional guarantee that multithreading is going to work as you'd expect.
When talking about C++ in the abstract, UB has no uses whatsoever. How could it? You don't know what could or would happen.
But in specific applications, specific code compiled by specific compilers to run on specific operating systems, you may sometimes know that a piece of UB is (1) safe, and (2) ends up having some kind of beneficial effect.

The C++ standard defines "undefined behaviour" as follows:
behavior for which this standard imposes no requirements
So if you want your code to be portable to different compilers and platforms, then your code should not depend on undefined behavior, because what the programs (that are produced by different compilers compiling your code) do in these cases may vary.
If you don't care about portability, then you should check if your compiler documents how it behaves under the circumstances of interest. If it doesn't document what it does (and it doesn't have to), beware that the compiler could change what it does without warning between different versions. Also note that its behaviour may be non-deterministic. So for example it could crash 1% of the time, which you may not notice in ad-hoc testing, but will come back and bite you later when it goes into production. So even if you are using one compiler, it may still be a bad idea to depend on undefined behavior.
With regard to your specific example, you can rewrite it to achieve the same effect (not calling destructor, but reclaiming memory) in a way that does not result in undefined behaviour. Allocate a std::aligned_storage to hold the Foo array, call placement new to construct the Foo array on the aligned_storage, then when you want to deallocate the array, deallocate the aligned_storage without calling placement delete.
Of course this is still a terrible design, may cause memory leaks or other problems depending on what Foo::~Foo() was supposed to do, but at least it isn't UB.

What is the Go language garbage collection approach compared to others?

I do not know much about the Go programming language, but I have seen several claims that said Go has latency-free garbage collection, and it is much better than other garbage collectors (like JVM garbage collector). I have developed application for JVM and i know that JVM garbage collector is not latency-free (specially in large memory usage).
I was wondering, what is difference between the garbage collection approach in Go and and the others which make it latency-free?
Thanks in advance.
Edit:
#All I edited this question entirely, please vote to reopen this question if you find it constructive.

Go does not have latency-free garbage collection. If you can point out where those claims are, I'd like to try to correct them.
One advantage that we believe Go has over Java is that it gives you more control over memory layout. For example, a simple 2D graphics package might define:
type Rect struct {
Min Point
Max Point
}
type Point struct {
X int
Y int
}
In Go, a Rect is just four integers contiguous in memory. You can still pass &r.Max to function expecting a *Point, that's just a pointer into the middle of the Rect variable r.
In Java, the equivalent expression would be to make Rect and Point classes, in which case the Min and Max fields in Rect would be pointers to separately allocated objects. This requires more allocated objects, taking up more memory, and giving the garbage collector more to track and more to do. On the other hand, it does avoid ever needing to create a pointer to the middle of an object.
Compared to Java, then, Go gives you the programmer more control over memory layout, and you can use that control to reduce the load on the garbage collector. That can be very important in programs with large amounts of data. Control over memory layout may also be important for extracting performance from the hardware due to cache effects and such, but that's tangential to the original question.
The collector in the current Go distributions is reasonable but by no means state of the art. We have plans to spend more effort improving it over the next year or two. To be clear,
Go's garbage collector is certainly not as good as modern Java garbage collectors, but we believe it is easier in Go to write programs that don't need as much garbage collection to begin with, so the net effect can still be that garbage collection is less of an issue in a Go program than in an equivalent Java program.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio