D Dynamic Arrays - RAII - memory-management

D Dynamic Arrays - RAII - memory-management

I admit I have no deep understanding of D at this point, my knowledge relies purely on what documentation I have read and the few examples I have tried.
In C++ you could rely on the RAII idiom to call the destructor of objects on exiting their local scope.
Can you in D?
I understand D is a garbage collected language, and that it also supports RAII.
Why does the following code not cleanup the memory as it leaves a scope then?
import std.stdio;
void main() {
{
const int len = 1000 * 1000 * 256; // ~1GiB
int[] arr;
arr.length = len;
arr[] = 99;
}
while (true) {}
}
The infinite loop is there so as to keep the program open to make residual memory allocations easy visible.
A comparison of a equivalent same program in C++ is shown below.
It can be seen that C++ immediately cleaned up the memory after allocation (the refresh rate makes it appear as if less memory was allocated), whereas D kept it even though it had left scope.
Therefore, when does the GC cleanup?

scope declarations are going in D2, so I'm not terribly certain on the semantics, but what I'd imagine is happening is that scope T[] a; only allocates the array struct on the stack (which needless to say, already happens, regardless of scope). As they are going, don't use scope (using scope(exit) and friends is different -- keep using them).
Dynamic arrays always use the GC to allocate their memory -- there's no getting around that. If you want something more deterministic, using std.container.Array would be the simplest manner, as I think you could pretty much drop it in where your scope vector3b array is:
Array!vector3b array
Just don't bother setting the length to zero -- the memory will be free'd once it goes out of scope (Array uses malloc/free from libc under the hood).

No, you cannot assume that the garbage collector will collect your object at any point in time.
There is, however, a delete keyword (as well as a scope keyword) that can delete an object deterministically.
scope is used like:
{
scope auto obj = new int[5];
//....
} //obj cleaned up here
and delete is used like in C++ (there's no [] notation for delete).
There are some gotcha's, though:
It doesn't always work properly (I hear it doesn't work well with arrays)
The developers of D (e.g. Andrei) are intending to remove them in later versions, because it can obviously mess up things if used incorrectly. (I personally hate this, given that it's so easy to screw things up anyway, but they're sticking with removing it, and I don't think people can convince them otherwise although I'd love it if that was the case.)
In its place, there is already a clear method that you can use, like arr.clear(); however, I'm not quite sure what it exactly does yet myself, but you could look at the source code in object.d in the D runtime if you're interested.
As to your amazement: I'm glad you're amazed, but it shouldn't be really surprising considering that they're both native code. :-)

Related

Do I change the value of a defglobal in a correct way?

I'm writing an app which should at some point get the value of a defglobal variable and change it. For this I do the following:
DATA_OBJECT cur_time_q;
if (!EnvGetDefglobalValue(CLIEnvironment, cur_timeq_kw, &cur_time_q)) return CUR_TIME_GLBVAR_MISSING;
uint64_t cur_time = t_left;
SetType(cur_time_q, INTEGER);
void* val = EnvAddLong(CLIEnvironment, cur_time);
SetValue(cur_time_q, val);
EnvSetDefglobalValue(CLIEnvironment, cur_timeq_kw, &cur_time_q);
I partly took this approach from "Advanced Programming Guide" and it works fine, but I have some questions:
Does EnvAddLong(...) add a value, which will retain in memory, until the environment is destroyed? May it consume memory and increase the execution time of other API-functions like EnvRun(...), if the function with this fragment of code is called for, say, several thousand iterations?
Isn't it overkill? Should I go for something like EnvEval("(bind ...)") instead?

There's information in the CLIPS Advanced Programming Guide on how CLIPS handles garbage collection. API calls like EnvAddLong which are used to create values to pass to other API functions don't trigger garbage collection. Generally, API calls which cause code to execute or deallocate data structures such as Run, Reset, Clear, and Eval, trigger garbage collection and will deallocate any transient data created by functions like EnvAddLong. So if your program design repeatedly assigns values to globals and then runs, any CLIPS data structures you allocate will eventually be freed once the data is confirmed to be garbage and is no longer referenced by any CLIPS data structures.
If you can easily construct a string to pass to the Eval function, it's often easier to do this rather than make multiple API calls to achieve the same result.
The API was overhauled in release 6.4, so many tasks such as assigning a value to a defglobal can be done with one step rather than several.
CLIPSValue rv;
Defglobal *global;
mainEnv = CreateEnvironment();
Build(mainEnv,"(defglobal ?*x* = 3.1)");
Eval(mainEnv,"?*x*",&rv);
printf("%lf\n",rv.floatValue->contents);
global = FindDefglobal(mainEnv,"x");
if (global != NULL)
{
DefglobalSetInteger(global,343433);
Eval(mainEnv,"(println ?*x*)",NULL);
DefglobalGetValue(global,&rv);
printf("%lf\n",rv.floatValue->contents);
}

What happens when the raw pointer from shared_ptr get() is deleted?

I wrote some code like this:
shared_ptr<int> r = make_shared<int>();
int *ar = r.get();
delete ar; // report double free or corruption
// still some code
When the code ran up to delete ar;, the program crashed, and reported "double free or corruption", I'm confused why double free? The "r" is still in the scope, and not popped-off from stack. Do the delete operator do something magic?? Does it know the raw pointer is handled by a smart pointer currently? and then counter in "r" be decremented to zero automatically?
I know the operations is not recommended， but I want to know why?

You are deleting a pointer that didn't come from new, so you have undefined behavior (anything can happen).
From cppreference on delete:
For the first (non-array) form, expression must be a pointer to an object type or a class type contextually implicitly convertible to such pointer, and its value must be either null or pointer to a non-array object created by a new-expression, or a pointer to a base subobject of a non-array object created by a new-expression. If expression is anything else, including if it is a pointer obtained by the array form of new-expression, the behavior is undefined.
If the allocation is done by new, we can be sure that the pointer we have is something we can use delete on. But in the case of shared_ptr.get(), we cannot be sure if we can use delete because it might not be the actual pointer returned by new.

shared_ptr<int> r = make_shared<int>();
There is no guarantee that this will call new int (which isn't strictly observable by the user anyway) or more generally new T (which is observable with a user defined, class specific operator new); in practice, it won't (there is no guarantee that it won't).
The discussion that follows isn't just about shared_ptr, but about "smart pointers" with ownership semantics. For any owning smart pointer smart_owning:
The primary motivation for make_owning instead of smart_owning<T>(new T) is to avoid having a memory allocation without owner at any time; that was essential in C++ when order of evaluation of expressions didn't provide the guarantee that evaluation of the sub-expressions in the argument list was immediately before call of that function; historically in C++:
f (smart_owning<T>(new T), smart_owning<U>(new U));
could be evaluated as:
T *temp1 = new T;
U *temp2 = new U;
auto &&temp3 = smart_owning<T>(temp1);
auto &&temp4 = smart_owning<U>(temp2);
This way temp1 and temp2 are not managed by any owning object for a non trivial time:
obviously new U can throw an exception
constructing an owning smart pointer usually requires the allocation of (small) ressources and can throw
So either temp1 or temp2 could be leaked (but not both) if an exception was thrown, which was the exact problem we were trying to avoid in the first place. This means composite expressions involving construction of owning smart pointers was a bad idea; this is fine:
auto &&temp_t = smart_owning<T>(new T);
auto &&temp_u = smart_owning<U>(new U);
f (temp_t, temp_u);
Usually expression involving as many sub-expression with function calls as f (smart_owning<T>(new T), smart_owning<U>(new U)) are considered reasonable (it's a pretty simple expression in term of number of sub-expressions). Disallowing such expressions is quite annoying and very difficult to justify.
[This is one reason, and in my opinion the most compelling reason, why the non determinism of the order of evaluation was removed by the C++ standardisation committee so that such code is not safe. (This was an issue not just for memory allocated, but for any managed allocation, like file descriptors, database handles...)]
Because code frequently needed to do things such as smart_owning<T>(allocate_T()) in sub-expressions, and because telling programmers to decompose moderately complex expressions involving allocation in many simple lines wasn't appealing (more lines of code doesn't mean easier to read), the library writers provided a simple fix: a function to do the creation of an object with dynamic lifetime and the creation of its owning object together. That solved the order of evaluation problem (but was complicated at first because it needed perfect forwarding of the arguments of the constructor).
Giving two tasks to a function (allocate an instance of T and a instance of smart_owning) gives the freedom to do an interesting optimization: you can avoid one dynamic allocation by putting both the managed object and its owner next to each others.
But once again, that was not the primary purpose of functions like make_shared.
Because exclusive ownership smart pointers by definition don't need to keep a reference count, and by definition don't need to share the data needed for the deleter either between instances, and so can keep that data in the "smart pointer"(*), no additional allocation is needed for the construction of unique_ptr; yet a make_unique function template was added, to avoid the dangling pointer issue, not to optimize a non-thing (an allocation that isn't done in the fist place).
(*) which BTW means unique owner "smart pointers" do not have pointer semantic, as pointer semantic implies that you can makes copies of the "pointer", and you can't have two copies of a unique owner pointing to the same instance; "smart pointers" were never pointers anyway, the term is misleading.
Summary:
make_shared<T> does an optional optimization where there is no separate dynamic memory allocation for T: there is no operator new(sizeof (T)). There is obviously still the creation of an instance with dynamic lifetime with another operator new: placement new.
If you replace the explicit memory deallocation with an explicit destruction and add a pause immediately after that point:
class C {
public:
~C();
};
shared_ptr<C> r = make_shared<C>();
C *ar = r.get();
ar->~C();
pause(); // stops the program forever
The program will probably run fine; it is still illogical, indefensible, incorrect to explicitly destroy an object managed by a smart pointer. It isn't "your" resource. If pause() could exit with an exception, the owning smart pointer would try to destroy the managed object which doesn't even exist anymore.

It of course depends on how library implements make_shared, however most probable implementation is that:
std::make_shared allocates one block for two things:
shared pointer control block
contained object
std::make_shared() will invoke memory allocator once and then it will call placement new twice to initialize (call constructors) of mentioned two things.
| block requested from allocator |
| shared_ptr control block | X object |
#1 #2 #3
That means that memory allocator has provided one big block, which address is #1.
Shared pointer then uses it for control block (#1) and actual contained object (#2).
When you invoke delete with actual object kept by shred_ptr ( .get() ) you call delete(#2).
Because #2 is not known by allocator you get an corruption error.

See here. I quot:
std::shared_ptr is a smart pointer that retains shared ownership of an object through a pointer. Several shared_ptr objects may own the same object. The object is destroyed and its memory deallocated when either of the following happens:
the last remaining shared_ptr owning the object is destroyed;
the last remaining shared_ptr owning the object is assigned another pointer via operator= or reset().
The object is destroyed using delete-expression or a custom deleter that is supplied to shared_ptr during construction.
So the pointer is deleted by shared_ptr. You're not suppose to delete the stored pointer yourself
UPDATE:
I didn't realize that there were more statements and the pointer was not out of scope, I'm sorry.
I was reading more and the standard doesn't say much about the behavior of get() but here is a note, I quote:
A shared_ptr may share ownership of an object while storing a pointer to another object. get() returns the stored pointer, not the managed pointer.
So it looks that it is allowed that the pointer returned by get() is not necessarily the same pointer allocated by the shared_ptr (presumably using new). So delete that pointer is undefined behavior. I will be looking a little more into the details.
UPDATE 2:
The standard says at § 20.7.2.2.6 (about make_shared):
6 Remarks: Implementations are encouraged, but not required, to perform no more than one memory allocation. [ Note: This provides efficiency equivalent to an intrusive smart pointer. — end note ]
7 [ Note: These functions will typically allocate more memory than sizeof(T) to allow for internal bookkeeping structures such as the reference counts. — end note ]
So an specific implementation of make_shared could allocate a single chunk of memory (or more) and use part of that memory to initialize the stored pointer (but maybe not all the memory allocated). get() must return a pointer to the stored object, but there is no requirement by the standard, as previously said, that the pointer returned by get() has to be the one allocated by new. So delete that pointer is undefined behavior, you got a signal raised but anything can happen.

Sharing memory with the kernel and compiler optimizations

a frame is shared with a kernel.
User-space code:
read frame // read frame content
_mm_mfence // prevent before "releasing" a frame before we read everything.
frame.status = 0 // "release" a frame
Kernel code:
poll for frame.status // reads a frame's status
_mm_lfence
Kernel can poll it asynchronically, in another thread. So, there is no syscall between userspace code and kernelspace.
Is it correctly synchronized?
I doubt because of the following situation:
A compiler has a weak memory model and we have to assume that it can do wild changes as you can imagine if optimizied/changed program is consistent within one-thread.
So, on my eye we need a second barrier because it is possible that a compiler optimize out store frame.status, 0.
Yes, it will be a very wild optimization but if a compiler would be able to prove that noone in the context (within thread) reads that field it can optimize out it.
I believe that it is theoretically possibe, isn't it?
So, to prevent that we can put the second barrier:
User-space code:
read frame // read frame content
_mm_mfence // prevent before "releasing" a frame before we read everything.
frame.status = 0 // "release" a frame
_mm_fence
Ok, now compiler restrain itself before optimization.
What do you think?
EDIT
[The question is raised by the issue that __mm_fence does not prevent before optimizations-out.
#PeterCordes, to make sure myself: __mm_fence does not prevent before optimizations out (it is just x86 memory barrier, not compiler). However, atomic_thread_fence(any_order) prevents before reorderings (it depends on any_order, obviously) but it also prevents before optimizations out?
For example:
// x is an int pointer
*x = 5
*(x+4) = 6
std::atomic_thread_barrier(memory_order_release)
prevents before optimizations out of stores to x? It seems that it must- otherwise every store to x should be volatile.
However, I saw a lot of lock-free code and there is no making fields as volatile.

_mm_mfence is also a compiler barrier. (See When should I use _mm_sfence _mm_lfence and _mm_mfence, and also BeeOnRope's answer there).
atomic_thread_fence with release, rel_acq, or seq_cst stops earlier stores from merging with later stores. But mo_acquire doesn't have to.
Writes to non-atomic globals variables can only be optimized out by merging with other writes to the same non-atomic variables, not by optimizing them away entirely. So the real question is what reorderings can happen that can let two non-atomic assignments come together.
There has to be an assignment to an atomic variable in there somewhere for there to be anything that another thread could synchronize with. Some compilers might give atomic_thread_fence stronger behaviour wrt. non-atomic variables, but in C++11 there's no way for another thread to legally observer anything about the ordering of *x and x[4] in
#include <atomic>
std::atomic<int> shared_flag {0};
int x[8];
void writer() {
*x = 0;
x[4] = 0;
atomic_thread_fence(mo_release);
x[4] = 1;
atomic_thread_fence(mo_release);
shared_flag.store(1, mo_relaxed);
}
The store to shared_flag has to appear after the stores to x[0] and x[4], but it's only an implementation detail what order the stores to x[0] and x[4] happen in, and whether there are 2 stores to x[4].
For example, on the Godbolt compiler explorer gcc7 and earlier merge the stores to x[4], but gcc8 doesn't, and neither do clang or ICC. The old gcc behaviour does not violate the ISO C++ standard, but I think they strengthened gcc's thread_fence because it wasn't strong enough to prevent bugs in other cases.
For example,
void writer_gcc_bug() {
*x = 0;
std::atomic_thread_fence(std::memory_order_release);
shared_flag.store(1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
*x = 2; // gcc7 and earlier merge this, which arguably a bug
}
gcc only does shared_flag = 1; *x = 2; in that order. You could argue that there's no way for another thread to safely observe *x after seeing shared_flag == 1, because this thread writes it again right away with no synchronization. (i.e. data race UB in any potential observer makes this reordering arguably legal).
But gcc developers don't think that's enough reason, (it may be violating the guarantees of the builtin __atomic functions that the <atomic> header uses to implement the API). And there may be other cases where there is a real bug that even a standards-conforming program could observe the aggressive reordering that violated the standard.
Apparently this changed on 2017-09 with the fix for gcc bug 80640.
Alexander Monakov wrote:
I think the bug is that on x86 __atomic_thread_fence(x) is expanded into nothing for x!=__ATOMIC_SEQ_CST, it should place a compiler barrier similar to expansion of __atomic_signal_fence.
(__atomic_signal_fence includes something as strong as asm("" ::: "memory" ).)
Yup that would definitely be a bug. So it's not that gcc was being really clever and doing allowed reorderings, it was just mostly failing at thread_fence, and any correctness that did happen was due to other factors, like non-inline function boundaries! (And that it doesn't optimize atomics, only non-atomics.)

Free C pointer when collected by GC

I have a package that interfaces with a C library. Now I need to store a pointer to a C struct in the Go struct
type A struct {
s *C.struct_b
}
Obviously this pointer needs to be freed before the struct is collected by the GC. How can I accomplish that?

The best thing to do is when possible copy the C struct into go controlled memory.
var ns C.struct_b
ns = *A.s
A.s = &ns
Obviously, that won't work in all cases. C.struct_b may be too complicated or shared with something still in C code. In this case, you need to create a .Free() or .Close() method (whichever makes more sense) and document that the user of your struct must call it. In Go, a Free method should always be safe to call. For example, after free is run, be sure to set A.s = nil so that if the user calls Free twice, the program does not crash.
There is also a way to create finalizers. See another answer I wrote here. However, they may not always run and if garbage is created fast enough, it is very possible that the creation of garbage will out pace collection. This should be considered as a supplement to having a Free/Close method and not a replacement.

Call to _freea really necessary?

I am developping on Windows with DevStudio, in C/C++ unmanaged.
I want to allocate some memory on the stack instead of the heap because I don't want to have to deal with releasing that memory manually (I know about smart pointers and all those things. I have a very specific case of memory allocation I need to deal with), similar to the use of A2W() and W2A() macros.
_alloca does that, but it is deprecated. It is suggested to use malloca instead. But _malloca documentation says that a call to ___freea is mandatory for each call to _malloca. It then defeats my purpose to use _malloca, I will use malloc or new instead.
Anybody knows if I can get away with not calling _freea without leaking and what the impacts are internally?
Otherwise, I will end-up just using deprecated _alloca function.

It is always important to call _freea after every call to _malloca.
_malloca is like _alloca, but adds some extra security checks and enhancements for your protection. As a result, it's possible for _malloca to allocate on the heap instead of the stack. If this happens, and you do not call _freea, you will get a memory leak.
In debug mode, _malloca ALWAYS allocates on the heap, so also should be freed.
Search for _ALLOCA_S_THRESHOLD for details on how the thresholds work, and why _malloca exists instead of _alloca, and it should make sense.
Edit:
There have been comments suggesting that the person just allocate on the heap, and use smart pointers, etc.
There are advantages to stack allocations, which _malloca will provide you, so there are reasons for wanting to do this. _alloca will work the same way, but is much more likely to cause a stack overflow or other problem, and unfortunately does not provide nice exceptions, but rather tends to just tear down your process. _malloca is much safer in this regard, and protects you, but the cost is that you still need to free your memory with _freea since it's possible (but unlikely in release mode) that _malloca will choose to allocate on the heap instead of the stack.
If your only goal is to avoid having to free memory, I would recommend using a smart pointer that will handle the freeing of memory for you as the member goes out of scope. This would assign memory on the heap, but be safe, and prevent you from having to free the memory. This will only work in C++, though - if you're using plain ol' C, this approach will not work.
If you are trying to allocate on the stack for other reasons (typically performance, since stack allocations are very, very fast), I would recommend using _malloca and living with the fact that you'll need to call _freea on your values.

Another thing to consider is using an RAII class to manage the allocation - of course that's only useful if your macro (or whatever) can be restricted to C++.
If you want to avoid hitting the heap for performance reasons, take a look at the techniques used by Matthew Wilson's auto_buffer<> template class (http://www.stlsoft.org/doc-1.9/classstlsoft_1_1auto__buffer.html). This will allocate on the stack unless your runtime size request exceeds a size specified at compiler time - so you get the speed of no heap allocation for the majority of allocations (if you size the template right), but everything still works correctly if your exceed that size.
Since STLsoft has a whole lot of cruft to deal with portability issues, you may want to look at a simpler version of auto_buffer<> which is described in Wilson's book, "Imperfect C++".
I found it quite handy in an embedded project.

To allocate memory on the stack, simply declare a variable of the appropriate type and size.

I answered this before, but I'd missed something fundamental that meant that it only worked in debug mode. I moved the call to _malloca into the constructor of a class that would auto-free.
In debug this is fine, as it always allocates on the heap. However, in release, it allocates on the stack, and upon returning from the constructor, the stack pointer has been reset, and the memory lost.
I went back and took a different approach, resulting in a combination of using a macro (eurgh) to allocate the memory and instantiate an object that will automatically call _freea on that memory. As it's a macro, it's allocated in the same stack frame, and so will actually work in release mode. It's just as convenient as my class, but slightly less nice to use.
I did the following:
class EXPORT_LIB_CLASS CAutoMallocAFree
{
public:
CAutoMallocAFree( void *pMem ) : m_pMem( pMem ) {}
~CAutoMallocAFree() { _freea( m_pMem ); }
private:
void *m_pMem;
CAutoMallocAFree();
CAutoMallocAFree( const CAutoMallocAFree &rhs );
CAutoMallocAFree &operator=( const CAutoMallocAFree &rhs );
};
#define AUTO_MALLOCA( Var, Type, Length ) \
Type* Var = (Type *)( _malloca( ( Length ) * sizeof ( Type ) ) ); \
CAutoMallocAFree __MALLOCA_##Var( (void *) Var );
This way I can allocate using the following macro call, and it's released when the instantiated class goes out of scope:
AUTO_MALLOCA( pBuffer, BYTE, Len );
Ar.LoadRaw( pBuffer, Len );
My apologies for posting something that was plainly wrong!

If you're using _malloca() then you must call _freea() to prevent memory leak because _malloca() can do the allocation either on stack or heap. It resorts to allocate on heap if the given size exceeds_ALLOCA_S_THRESHOLD value. Thus, it's safer to call _freea() which won't do anything if allocation happened on stack.
If you're using _alloca() which seems to be deprecated as of today; there is no need to call _freea() as the allocation happens on stack.

If your concern is having to free temp memory, and you know all about things like smart-pointers then why not use a similar pattern where memory is freed when it goes out of scope?
template <class T>
class TempMem
{
TempMem(size_t size)
{
mAddress = new T[size];
}
~TempMem
{
delete [] mAddress;
}
T* mAddress;
}
void foo( void )
{
TempMem<int> buffer(1024);
// alternatively you could override the T* operator..
some_memory_stuff(buffer.mAddress);
// temp-mem auto-freed
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio