C++ std::atomic and shared memory? - c++11

I have 4096 bytes of shared memory allocated. How to treat it as an array of std::atomic<uint64_t> objects?
The final goal is to place array of 64-bit variables to shared memory and perform __sync_fetch_and_add (GCC built-in) on these variables. But I would prefer using native C++11 code instead of using GCC built-ins. So how to use the allocated memory as an std::atomic objects? Should I invoke placement new on 512 counters? What If std::atomic's constructor require additional memory allocations in some imeplementations? Should I consider the aligning of std::atomic object in shared memory?

With C++20, you can use std::atomic_ref if you can make sure that your objects are suitably aligned. For C++17 and anything older, boost::atomic_ref might work as well, though I haven't tested it.
If you don't want to use Boost, then compiler builtins are the only solution left. In that case, you should prefer the __atomic builtins over the old __sync functions, as stated on the GCC documentation page for atomics.

Related

why does arm atomic_[read/write] operations implemented as volatile pointers?

He is example of atomic_read implementation:
#define atomic_read(v) (*(volatile int *)&(v)->counter)
Also, should we explicitly use memory barriers for atomic operations on arm?
He is example of atomic_read implementation:
A problematic one actually, which assumes that a cast is not a nop, which isn't guaranteed.
Also, should we explicitly use memory barriers for atomic operations
on arm?
Probably. It depends on what you are doing and what you are expecting.
Yes, the casting to volatile is to prevent the compiler from assuming the value of v cannot change. As for using memory barriers, the GCC builtins already allow you to specify the memory ordering you desire, no need to do it manually: https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/_005f_005fatomic-Builtins.html#g_t_005f_005fatomic-Builtins
The default behavior on GCC is to use __ATOMIC_SEQ_CST which will emit the barriers necessary on Arm to make sure your atomics execute in the order you place them in the code. To optimize performance on Arm, you will want to consider using weaker semantics to allow the compiler to elide barriers and let the hardware execute faster. For more information on the types of memory barriers the Arm architecture has, see https://developer.arm.com/docs/den0024/latest/memory-ordering/barriers.

Memory Management in Glib

How is storage/memory reclaimed in Glib? I've called g_object_unref() and the ref-counts are zero but I'm not sure any storage is ever reclaimed.
Do I need to call a routine? If so, which routine. If not, what?
Much of the memory allocation in GLib is done using the slice allocator, which has better performance when allocating lots of identical-sized blocks of memory, as happens a lot in GLib-using code.
You won't see memory usage jump up and down with the slice allocator in the same way that you would when using traditional malloc. The slice allocator often keeps memory in use for a while in order to reallocate it to other blocks.
If you want to force the slice allocator to behave like malloc, use the environment variable G_SLICE=always-malloc. That's not recommended for production, but it is the recommended way to use valgrind on GLib programs.

How to reserve a range of memory in data section (RAM) and the prevent heap/stack of same application using that memory?

I want to reserve/allocate a range of memory in RAM and the same application should not overwrite or use that range of memory for heap/stack storage. How to allocate a range of memory in ram protected from stack/heap overwrite?
I thought about adding(or allocating) an array to the application itself and reserve memory, But its optimized out by compiler as its not referenced anywhere in the application.
I am using ARM GNU toolchain for compiling.
There are several solutions to this problem. Listing in best to worse order,
Use the linker
Annotate the variable
Global scope
Volatile (maybe)
Linker script
You can obviously use a linker file to do this. It is the proper tool for the job. Pass the linker the --verbose parameter to see what the default script is. You may then modify it to precisely reserve the memory.
Variable Attributes
With more recent versions of gcc, the attribute used will also do what you want. Most modern gcc versions will support this. It is also significantly easier than the linker script; but only the linker script gives precise control over the position of the hole in a reliable manner.
Global scope
You may also give your array global scope and the compiler should not eliminate it. This may not be true if you use link time optimization.
Volatile
Theoretically, a compiler may eliminate a static volatile array. The volatile comes into play when you have code involving the array. It modifies the access behavior so the compiler will never caches access to that range. Dr. Dobbs on volatile At least the behavior is unclear to me and I would not recommend this method. It may work with some versions (and optimization levels) of the compiler and not others.
Limitations
Also, the linker option -gc-sections, can eliminate space reserved with either the global scope and the volatile methods as the symbol may not be annotated in any way in object formats; see the linker script (KEEP).
Only the Linker script can definitely restrict over-writes by the stack. You need to position the top of the stack before your reserved area. Typically, the heap grows up and the stack grows down. So these two collide with each other. This is particular to your environment/C library (for instance newlib is the typical ARM bare metal library). Looking at the linker file will give the best clue to this.
My guess is you want a fallow area to reserve for some sort of debugging information in the event of a system crash? A more explicit explaination of you problem would be helpful. You don't seem to be concerned with the position of the memory, so I guess this is not hardware related.

Structure memory alignment - Compile time vs Dynamically allocated memory

I was just going through glibc manual for description about posix_memalign function when I encountered this statement:
The address of a block returned by malloc or realloc in the GNU system is always a
multiple of eight (or sixteen on 64-bit systems). If you need a block whose address is a
multiple of a higher power of two than that, use memalign, posix_memalign, or valloc.
If I consider a simple structure containing just an int data member:
struct Mystruct
{
int member;
};
Then I can see that Mystruct should be 4-byte aligned. But according to libc manual on a 64-bit architecture, dynamically allocating memory for such structure would return memory allocated on an address with 16 byte alignment.
Correct me if I am wrong. To me it seems like compiler uses natural alignment of a structure only for global/static/automatic variables (data, bss, stack). But on the other hand, to allocate the same structure on heap memory, the malloc call uses a predefined alignment(8 on 32-bit architectures and 16 on 64 bit architectures) ?
One thing worth remembering is that malloc() is nothing magical -- it's just a function. Likewise, the heap is just a bunch of memory that the compiler promises not to touch for its stuff. Allocating things statically (on the stack or in data/bss) is something that the compiler does do, so it can align them with whatever alignment is specified. malloc() (and the associated functions) are functions that manage the heap, but they are called at runtime. The compiler doesn't know anything about malloc() except that it's a function, and malloc() doesn't know anything about the memory that you're asking it to allocate except how much you're asking for. Because 16-bit aligned is usually fine for most common uses, malloc() ensures this alignment to be safe.
PS: A fun (and eye-opening) exercise is to write your own malloc() implementation.

How to ensure that malloc and mmap cooperate, i.e., work on non-overlapping memory regions?

My main problem is that I need to enable multiple OS processes to communicate via a large shared memory heap that is mapped to identical address ranges in all processes. (To make sure that pointer values are actually meaningful.)
Now, I run into trouble that part of the program/library is using standard malloc/free and it seems to me that the underlying implementation does not respect mappings I create with mmap.
Or, another option is that I create mappings in regions that malloc already planned to use.
Unfortunately, I am not able to guarantee 100% identical malloc/free behavior in all processes before I establish the mmap-mappings.
This leads me to give the MAP_FIXED flag to mmap. The first process is using 0x0 as base address to ensure that the mapping range is at least somehow reasonable, but that does not seem to transfer to other processes. (The binary is also linked with -Wl,-no_pie.)
I tried to figure out whether I could query the system to know which pages it plans to use for malloc by reading up on malloc_default_zone, but that API does not seem to offer what I need.
Is there any way to ensure that malloc is not using particular memory pages/address ranges?
(It needs to work on OSX. Linux tips, which guide me in the right direction are appreciate, too.)
I notice this in the mmap documentation:
If MAP_FIXED is specified, a successful mmap deletes any previous mapping in the allocated address range
However, malloc won't use map fixed, so as long as you get in before malloc, you'd be okay: you could test whether a region is free by first trying to map it without MAP_FIXED, and if that succeeds at the same address (which it will do if the address is free) then you can remap with MAP_FIXED knowing that you're not choosing a section of address space that malloc had already grabbed
The only guaranteed way to guarantee that the same block of logical memory will be available in two processes is to have one fork from the other.
However, if you're compiling with 64-bit pointers, then you can just pick an (unusual) region of memory, and hope for the best, since the chance of collision is tiny.
See also this question about valid address spaces.
OpenBSD malloc() implementation uses mmap() for memory allocation. I suggest you to see how does it work then write your own custom implementation of malloc() and tell your program and the libraries used by it to use your own implementation of malloc().
Here is OpenBSD malloc():
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/malloc.c?rev=1.140
RBA

Resources