c++11: what is its gc interface, and how to implement? - c++11

I was watching Bjarne Stroustrup's talk "The Essence of C++".
In 44:26 he mentioned "C++11 specifies a GC Interface".
May I ask what is the interface, and how to implement it?
Any more detailed good introduction online, or some sample codes to demonstrate it pls?

Stroustrup extends this discussion in his C++ FAQ, the thing is that GC usage is optional, library vendors are free to implement one or not :
Garbage collection (automatic recycling of unreferenced regions of
memory) is optional in C++; that is, a garbage collector is not a
compulsory part of an implementation. However, C++11 provides a
definition of what a GC can do if one is used and an ABI (Application
Binary Interface) to help control its actions.
The rules for pointers and lifetimes are expressed in terms of "safely
derived pointer" (3.7.4.3); roughly: "pointer to something allocated
by new or to a sub-object thereof."
to ordinary mortals: [...]
The functions in the C++ standard supporting this (the "interface" to which Stroustrup is referring to) are :
std::declare_reachable
std::undeclare_reachable
std::declare_no_pointers
std::undeclare_no_pointers
These functions are presented in the N2670 proposal :
Its purpose is to support both garbage collected implementations and
reachability-based leak detectors. This is done by giving undefined
behavior to programs that "hide a pointer" by, for example, xor-ing it
with another value, and then later turn it back into an ordinary
pointer and dereference it. Such programs may currently produce
incorrect results with conservative garbage collectors, since an
object referenced only by such a "hidden pointer" may be prematurely
collected. For the same reason, reachability-based leak detectors may
erroneously report that such programs leak memory.
Either your implementation supports "strict pointer safety" in which case implementing a GC is possible, or it has a "relaxed pointer safety" (by default), in which case it is not. You can determine that by looking at the result of std::get_pointer_safety(), if available.
I don't know of any actual standard C++ GC implementation, but at least the standard is preparing the ground for it to happen.

In addition to the good answer by quantdev, which I've upvoted, I wanted to provide a little more information here (which would not fit in a comment).
Here is a C++11 conforming program which demonstrates whether or not an implementation supports the GC interface:
#include <iostream>
#include <memory>
int
main()
{
#ifdef __STDCPP_STRICT_POINTER_SAFETY__
std::cout << __STDCPP_STRICT_POINTER_SAFETY__ << '\n';
#endif
switch (std::get_pointer_safety())
{
case std::pointer_safety::relaxed:
std::cout << "relaxed\n";
break;
case std::pointer_safety::preferred:
std::cout << "preferred\n";
break;
case std::pointer_safety::strict:
std::cout << "strict\n";
break;
}
}
An output of:
relaxed
means that the implementation has a trivial implementation which does nothing at all.
libc++ outputs:
relaxed
VS-2015 outputs:
relaxed
gcc 5.0 outputs:
prog.cc: In function 'int main()':
prog.cc:10:13: error: 'get_pointer_safety' is not a member of 'std'
switch (std::get_pointer_safety())
^

Related

Sharing memory with the kernel and compiler optimizations

a frame is shared with a kernel.
User-space code:
read frame // read frame content
_mm_mfence // prevent before "releasing" a frame before we read everything.
frame.status = 0 // "release" a frame
Kernel code:
poll for frame.status // reads a frame's status
_mm_lfence
Kernel can poll it asynchronically, in another thread. So, there is no syscall between userspace code and kernelspace.
Is it correctly synchronized?
I doubt because of the following situation:
A compiler has a weak memory model and we have to assume that it can do wild changes as you can imagine if optimizied/changed program is consistent within one-thread.
So, on my eye we need a second barrier because it is possible that a compiler optimize out store frame.status, 0.
Yes, it will be a very wild optimization but if a compiler would be able to prove that noone in the context (within thread) reads that field it can optimize out it.
I believe that it is theoretically possibe, isn't it?
So, to prevent that we can put the second barrier:
User-space code:
read frame // read frame content
_mm_mfence // prevent before "releasing" a frame before we read everything.
frame.status = 0 // "release" a frame
_mm_fence
Ok, now compiler restrain itself before optimization.
What do you think?
EDIT
[The question is raised by the issue that __mm_fence does not prevent before optimizations-out.
#PeterCordes, to make sure myself: __mm_fence does not prevent before optimizations out (it is just x86 memory barrier, not compiler). However, atomic_thread_fence(any_order) prevents before reorderings (it depends on any_order, obviously) but it also prevents before optimizations out?
For example:
// x is an int pointer
*x = 5
*(x+4) = 6
std::atomic_thread_barrier(memory_order_release)
prevents before optimizations out of stores to x? It seems that it must- otherwise every store to x should be volatile.
However, I saw a lot of lock-free code and there is no making fields as volatile.
_mm_mfence is also a compiler barrier. (See When should I use _mm_sfence _mm_lfence and _mm_mfence, and also BeeOnRope's answer there).
atomic_thread_fence with release, rel_acq, or seq_cst stops earlier stores from merging with later stores. But mo_acquire doesn't have to.
Writes to non-atomic globals variables can only be optimized out by merging with other writes to the same non-atomic variables, not by optimizing them away entirely. So the real question is what reorderings can happen that can let two non-atomic assignments come together.
There has to be an assignment to an atomic variable in there somewhere for there to be anything that another thread could synchronize with. Some compilers might give atomic_thread_fence stronger behaviour wrt. non-atomic variables, but in C++11 there's no way for another thread to legally observer anything about the ordering of *x and x[4] in
#include <atomic>
std::atomic<int> shared_flag {0};
int x[8];
void writer() {
*x = 0;
x[4] = 0;
atomic_thread_fence(mo_release);
x[4] = 1;
atomic_thread_fence(mo_release);
shared_flag.store(1, mo_relaxed);
}
The store to shared_flag has to appear after the stores to x[0] and x[4], but it's only an implementation detail what order the stores to x[0] and x[4] happen in, and whether there are 2 stores to x[4].
For example, on the Godbolt compiler explorer gcc7 and earlier merge the stores to x[4], but gcc8 doesn't, and neither do clang or ICC. The old gcc behaviour does not violate the ISO C++ standard, but I think they strengthened gcc's thread_fence because it wasn't strong enough to prevent bugs in other cases.
For example,
void writer_gcc_bug() {
*x = 0;
std::atomic_thread_fence(std::memory_order_release);
shared_flag.store(1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
*x = 2; // gcc7 and earlier merge this, which arguably a bug
}
gcc only does shared_flag = 1; *x = 2; in that order. You could argue that there's no way for another thread to safely observe *x after seeing shared_flag == 1, because this thread writes it again right away with no synchronization. (i.e. data race UB in any potential observer makes this reordering arguably legal).
But gcc developers don't think that's enough reason, (it may be violating the guarantees of the builtin __atomic functions that the <atomic> header uses to implement the API). And there may be other cases where there is a real bug that even a standards-conforming program could observe the aggressive reordering that violated the standard.
Apparently this changed on 2017-09 with the fix for gcc bug 80640.
Alexander Monakov wrote:
I think the bug is that on x86 __atomic_thread_fence(x) is expanded into nothing for x!=__ATOMIC_SEQ_CST, it should place a compiler barrier similar to expansion of __atomic_signal_fence.
(__atomic_signal_fence includes something as strong as asm("" ::: "memory" ).)
Yup that would definitely be a bug. So it's not that gcc was being really clever and doing allowed reorderings, it was just mostly failing at thread_fence, and any correctness that did happen was due to other factors, like non-inline function boundaries! (And that it doesn't optimize atomics, only non-atomics.)

Does ios_base::sync_with_stdio(false) affect <fstream>?

It is well known that ios_base::sync_with_stdio(false) will help the performance of cin and cout in <iostream> by preventing sync b/w C and C++ I/O. However, I am curious as to whether it makes any difference at all in <fstream>.
I ran some tests with GNU C++11 and the following code (with and without the ios_base::sync_with_stdio(false) snippet):
#include <fstream>
#include <iostream>
#include <chrono>
using namespace std;
ofstream out("out.txt");
int main() {
auto start = chrono::high_resolution_clock::now();
long long val = 2;
long long x=1<<22;
ios_base::sync_with_stdio(false);
while (x--) {
val += x%666;
out << val << "\n";
}
auto end = chrono::high_resolution_clock::now();
chrono::duration<double> diff = end-start;
cout<<diff.count()<<" seconds\n";
return 0;
}
The results are as follows:
With sync_with_stdio(false): 0.677863 seconds (average 3 trials)
Without sync_with_stdio(false): 0.653789 seconds (average 3 trials)
Is this to be expected? Is there a reason for a nearly identical, if not slower speed, with sync_with_stdio(false)?
Thank you for your help.
The idea of sync_with_stdio() is to allow mixing input and output to standard stream objects (stdin, stdout, and stderr in C and std::cin, std::cout, std::cerr, and std::clog as well as their wide character stream counterparts in C++) without any need to worry about characters being buffered in any of the buffers of the involved objects. Effectively, with std::ios_base::sync_with_stdio(true) the C++ IOStreams can't use their own buffers. In practice that normally means that buffering on std::streambuf level is entirely disabled. Without a buffer IOStreams are rather expensive, though, as they process individual character involving potentially multiple virtual function calls. Essentially, the speed-up you get from std::ios_base::sync_with_stdio(false) is allowing both the C and C++ library to user their own buffers.
An alternative approach could be to share the buffer between the C and C++ library facilities, e.g., by building the C library facilities on top of the more powerful C++ library facilities (before people complain that this would be a terrible idea, making C I/O slower: that is actually not true at all with a proper implementation of the standard C++ library IOStreams). I'm not aware of any non-experimental implementation which does use that. With this setup std::ios_base::sync_with_stdio(value) wouldn't have any effect at all.
Typical implementations of IOStreams use different stream buffers for the standard stream objects from those used for file streams. Part of the reason is probably that the standard stream objects are normally not opened using a name but some other entity identifying them, e.g., a file descriptor on UNIX systems and it would require a "back door" interface to allow using a std::filebuf for the standard stream objects. However, at least early implementations of Dinkumware's standard C++ library which shipped (ships?), e.g., with MSVC++, used std::filebuf for the standard stream objects. This std::filebuf implementation was just a wrapper around FILE*, i.e., literally implementing what the C++ standard says rather than semantically implementing it. That was already a terrible idea to start with but it was made worse by inhibiting std::streambuf level buffering for all file streams with std::ios_base::sync_with_stdio(true) as that setting also affected file streams. I do not know whether this [performance] problem was fixed since. Old issue in the C/C++ User Journal and/or P.J.Plauger's "The [draft] Standard C++ Library" should show a discussion of this implementation.
tl;dr: According to the standard std::ios_base::sync_with_stdio(false) only changes the constraints for the standard stream objects to make their use faster. Whether it has other effects depends on the IOStream implementation and there was at least one (Dinkumware) where it made a difference.

C++11 guarantee no concurrent calls

Are std::mutex and std::unique_ptr sufficient to guarantee that there will be no concurrent calls to an object? In the following code snippet will Object not have any concurrent calls?
class Example {
public:
std::mutex Mutex;
Example(){...
};
//
private:
static std::unique_ptr<Object> Mutex;
};
No, you would have to lock and unlock the mutex, when you need it. Just the existance of a mutex in no guarantee. Also a unique_ptr cannot change this!
mutex example is in the reference: http://en.cppreference.com/w/cpp/thread/mutex
Const however does now guarantee that nothing can change your object at the same time than you are using the const.
This obviously suppose you use a well coded object (like STL containers) and that no one tried to work around the compiler checks.
See : http://herbsutter.com/2013/01/01/video-you-dont-know-const-and-mutable/
unique_ptr is mainly related with resources management and RAII idiom. You don't need to use unique_ptr to achieve thread safety even in some scenarios it may prove useful. However, the example that you provided is a bit unusual, you used the identifier Mutex twice to indicate a std::mutex and a std::unique_ptr.
I suggest to get some good books on C++ before to dwelve in more complex topics like concurrency and resources management.
This nice article of Diego Dagum explains what is new regarding concurency in C++11
http://msdn.microsoft.com/en-us/magazine/hh852594.aspx
And an article about unique_ptr:
http://www.drdobbs.com/cpp/c11-uniqueptr/240002708

When using CoTaskMemAlloc, should I always call CoTaskMemFree?

I'm writing some COM and ATL code, and for some reason all the code uses CoTaskMemAlloc to allocate memory instead of new or malloc. So I followed along this coding style and I also use CoTaskMemAlloc.
My teachers taught me to always delete or free when allocating memory. However I'm not sure if I should always be calling CoTaskMemFree if I use CoTaskMemAlloc?
Using the CRT's provided new/malloc and delete/free is a problem in COM interop. To make them work, it is very important that the same copy of the CRT both allocates and releases the memory. That's impossible to enforce in a COM interop scenario, your COM server and the client are practically guaranteed to use different versions of the CRT. Each using their own heap to allocate from. This causes undiagnosable memory leaks on Windows XP, a hard exception on Vista and up.
Which is why the COM heap exists, a single predefined heap in a process that's used both by the server and the client. IMalloc is the generic interface to access that shared heap, CoTaskMemAlloc() and CoTaskMemFree() are the system provided helper functions to use that interface.
That said, this is only necessary in a case where the server allocates memory and the client has to release it. Or the other way around. Which should always be rare in an interop scenario, the odds for accidents are just too large. In COM Automation there are just two such cases, a BSTR and a SAFEARRAY, types that are already wrapped. You avoid it in other cases by having the method caller provide the memory and the callee fill it in. Which also allows a strong optimization, the memory could come from the caller's stack.
Review the code and check who allocates the memory and who needs to release it. If both exist in the same module then using new/malloc is fine because there's now a hard guarantee that the same CRT instance takes care of it. If that's not the case then consider fixing it so the caller provides the memory and releases it.
The allocation and freeing of memory must always come from the same source. If you use CoTaskMemAlloc then you must use CoTaskMemFree to free the memory.
Note in C++ though the act of managing memory and object construction / destruction (new / delete) are independent actions. It's possible to customize specific objects to use a different memory allocator and still allow for the standard new / delete syntax which is preferred. For example
class MyClass {
public:
void* operator new(size_t size) {
return ::CoTaskMemAlloc(size);
}
void* operator new[](size_t size) {
return ::CoTaskMemAlloc(size);
}
void operator delete(void* pMemory) {
::CoTaskMemFree(pMemory);
}
void operator delete[](void* pMemory) {
::CoTaskMemFree(pMemory);
}
};
Now I can use this type just like any other C++ type and yet the memory will come from the COM heap
// Normal object construction but memory comes from CoTaskMemAlloc
MyClass *pClass = new MyClass();
...
// Normal object destruction and memory freed from CoTaskMemFree
delete pClass;
The answer to the question is: Yes, you should use CoTaskMemFree to free memory allocated with CoTaskMemAlloc.
The other answers do a good job explaining why CoTaskMemAlloc and CoTaskMemFree are necessary for memory passed between COM servers and COM clients, but they didn't directly answer your question.
Your teacher was right: You should always use the corresponding release function for any resource. If you use new, use delete. If you use malloc, use free. If you use CreateFile, use CloseHandle. Etc.
Better yet, in C++, use RAII objects that allocate the resource in the constructor and release the resource in the destructor, and then use those RAII wrappers instead of the bare functions. This makes it easier and cleaner to write code that doesn't leak, even if you get something like an exception.
The standard template library provides containers that implement RAII, which is why you should learn to use a std::vector or std::string rather than allocating bare memory and trying to manage it yourself. There are also smart pointers like std::shared_ptr and std::unique_ptr that can be used to make sure the right release call is always made at the right time.
ATL provides some classes like ATL::CComPtr which are wrapper objects that handle the reference counting of COM objects for you. They are not foolproof to use correctly, and, in fact, have a few more gotchas than most of the modern STL classes, so read the documentation carefully. When used correctly, it's relatively easy to make sure the AddRef and Release calls all match up.

Protecting memory from changing

Is there a way to protect an area of the memory?
I have this struct:
#define BUFFER 4
struct
{
char s[BUFFER-1];
const char zc;
} str = {'\0'};
printf("'%s', zc=%d\n", str.s, str.zc);
It is supposed to operate strings of lenght BUFFER-1, and garantee that it ends in '\0'.
But compiler gives error only for:
str.zc='e'; /*error */
Not if:
str.s[3]='e'; /*no error */
If compiling with gcc and some flag might do, that is good as well.
Thanks,
Beco
To detect errors at runtime take a look at the -fstack-protector-all option in gcc. It may be of limited use when attempting to detect very small overflows like the one your described.
Unfortunately you aren't going to find a lot of info on detecting buffer overflow scenarios like the one you described at compile-time. From a C language perspective the syntax is totally correct, and the language gives you just enough rope to hang yourself with. If you really want to protect your buffers from yourself you can write a front-end to array accesses that validates the index before it allows access to the memory you want.

Resources