Structure memory alignment - Compile time vs Dynamically allocated memory

Structure memory alignment - Compile time vs Dynamically allocated memory - memory-management

I was just going through glibc manual for description about posix_memalign function when I encountered this statement:
The address of a block returned by malloc or realloc in the GNU system is always a
multiple of eight (or sixteen on 64-bit systems). If you need a block whose address is a
multiple of a higher power of two than that, use memalign, posix_memalign, or valloc.
If I consider a simple structure containing just an int data member:
struct Mystruct
{
int member;
};
Then I can see that Mystruct should be 4-byte aligned. But according to libc manual on a 64-bit architecture, dynamically allocating memory for such structure would return memory allocated on an address with 16 byte alignment.
Correct me if I am wrong. To me it seems like compiler uses natural alignment of a structure only for global/static/automatic variables (data, bss, stack). But on the other hand, to allocate the same structure on heap memory, the malloc call uses a predefined alignment(8 on 32-bit architectures and 16 on 64 bit architectures) ?

One thing worth remembering is that malloc() is nothing magical -- it's just a function. Likewise, the heap is just a bunch of memory that the compiler promises not to touch for its stuff. Allocating things statically (on the stack or in data/bss) is something that the compiler does do, so it can align them with whatever alignment is specified. malloc() (and the associated functions) are functions that manage the heap, but they are called at runtime. The compiler doesn't know anything about malloc() except that it's a function, and malloc() doesn't know anything about the memory that you're asking it to allocate except how much you're asking for. Because 16-bit aligned is usually fine for most common uses, malloc() ensures this alignment to be safe.
PS: A fun (and eye-opening) exercise is to write your own malloc() implementation.

Related

What is Layout?

I want to reimplement some stdlib smart pointers in rust (mainly Box) to learn it better, i come from a C background which has the simple malloc and free functions, but rust's alloc and dealloc need some Layout. What is it? It's not really explained in the docs.

As its documentation says, Layout describes a block of memory that is to be allocated or deallocated. In addition to memory size, it specifies alignment and optionally the trailing padding of the block.
C's malloc isn't told what alignment to use, so it has to assume worst-case alignment, possibly wasting memory. And even in C one sometimes needs memory of non-standard alignment, for which APIs outside of standard C had to be used until C11.
One final difference between Rust's and C's allocator interface is that Rust requires the layout to be passed to dealloc as well as to alloc - equivalent to C's free requiring the size that was passed to malloc. At first this sounds like a disadvantage because the user of the API must track the allocated size in order to be able to deallocate. But it turns out not to be an issue in practice because:
when allocating a single value such as Box<T> or array Box<[T; n]>, the size is determined at compile-time and thus known when Box::drop is invoked.
when allocating a dynamic array, such as Vec<T>, the capacity of the vector is tracked by the vector itself and thus also available in Vec::drop.
when boxing a slice, such as Box<[T]>, the slice cannot be resized and its capacity is equal to its length and again known to Box::drop.
So it turns out that it is the C-style allocation API that results in storing redundant size information. Passing the size to dealloc eliminates the redundancy, providing opportunity to save space, especially for small blocks where the size information is a non-negligible percentage of the memory used.
When comparing Rust's allocation interface to that of C, keep in mind that in Rust raw allocation is much more of a specialized tool, used only to implement safe abstractions such as Box, Rc or Vec. Unlike C, where a programmer is routinely expected to invoke malloc and free, most Rust programmers never need to invoke the global allocator directly.

windows memory management: Difference between VirtualAlloc() and heap functions [duplicate]

There are lots of method to allocate memory in Windows environment, such as VirtualAlloc, HeapAlloc, malloc, new.
Thus, what's the difference among them?

Each API is for different uses. Each one also requires that you use the correct deallocation/freeing function when you're done with the memory.
VirtualAlloc
A low-level, Windows API that provides lots of options, but is mainly useful for people in fairly specific situations. Can only allocate memory in (edit: not 4KB) larger chunks. There are situations where you need it, but you'll know when you're in one of these situations. One of the most common is if you have to share memory directly with another process. Don't use it for general-purpose memory allocation. Use VirtualFree to deallocate.
HeapAlloc
Allocates whatever size of memory you ask for, not in big chunks than VirtualAlloc. HeapAlloc knows when it needs to call VirtualAlloc and does so for you automatically. Like malloc, but is Windows-only, and provides a couple more options. Suitable for allocating general chunks of memory. Some Windows APIs may require that you use this to allocate memory that you pass to them, or use its companion HeapFree to free memory that they return to you.
malloc
The C way of allocating memory. Prefer this if you are writing in C rather than C++, and you want your code to work on e.g. Unix computers too, or someone specifically says that you need to use it. Doesn't initialise the memory. Suitable for allocating general chunks of memory, like HeapAlloc. A simple API. Use free to deallocate. Visual C++'s malloc calls HeapAlloc.
new
The C++ way of allocating memory. Prefer this if you are writing in C++. It puts an object or objects into the allocated memory, too. Use delete to deallocate (or delete[] for arrays). Visual studio's new calls HeapAlloc, and then maybe initialises the objects, depending on how you call it.
In recent C++ standards (C++11 and above), if you have to manually use delete, you're doing it wrong and should use a smart pointer like unique_ptr instead. From C++14 onwards, the same can be said of new (replaced with functions such as make_unique()).
There are also a couple of other similar functions like SysAllocString that you may be told you have to use in specific circumstances.

It is very important to understand the distinction between memory allocation APIs (in Windows) if you plan on using a language that requires memory management (like C or C++.) And the best way to illustrate it IMHO is with a diagram:
Note that this is a very simplified, Windows-specific view.
The way to understand this diagram is that the higher on the diagram a memory allocation method is, the higher level implementation it uses. But let's start from the bottom.
Kernel-Mode Memory Manager
It provides all memory reservations & allocations for the operating system, as well as support for memory-mapped files, shared memory, copy-on-write operations, etc. It's not directly accessible from the user-mode code, so I'll skip it here.
VirtualAlloc / VirtualFree
These are the lowest level APIs available from the user mode. The VirtualAlloc function basically invokes ZwAllocateVirtualMemory that in turn does a quick syscall to ring0 to relegate further processing to the kernel memory manager. It is also the fastest method to reserve/allocate block of new memory from all available in the user mode.
But it comes with two main conditions:
It only allocates memory blocks aligned on the system granularity boundary.
It only allocates memory blocks of the size that is the multiple of the system granularity.
So what is this system granularity? You can get it by calling GetSystemInfo. It is returned as the dwAllocationGranularity parameter. Its value is implementation (and possibly hardware) specific, but on many 64-bit Windows systems it is set at 0x10000 bytes, or 64K.
So what all this means, is that if you try to allocate, say just an 8 byte memory block with VirtualAlloc:
void* pAddress = VirtualAlloc(NULL, 8, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
If successful, pAddress will be aligned on the 0x10000 byte boundary. And even though you requested only 8 bytes, the actual memory block that you will get will be the entire page (or, something like 4K bytes. The exact page size is returned in the dwPageSize parameter.) But, on top of that, the entire memory block spanning 0x10000 bytes (or 64K in most cases) from pAddress will not be available for any further allocations. So in a sense, by allocating 8 bytes you could as well be asking for 65536.
So the moral of the story here is not to substitute VirtualAlloc for generic memory allocations in your application. It must be used for very specific cases, as is done with the heap below. (Usually for reserving/allocating large blocks of memory.)
Using VirtualAlloc incorrectly can lead to severe memory fragmentation.
HeapCreate / HeapAlloc / HeapFree / HeapDestroy
In a nutshell, the heap functions are basically a wrapper for VirtualAlloc function. Other answers here provide a pretty good concept of it. I'll add that, in a very simplistic view, the way heap works is this:
HeapCreate reserves a large block of virtual memory by calling VirtualAlloc internally (or ZwAllocateVirtualMemory to be specific). It also sets up an internal data structure that can track further smaller size allocations within the reserved block of virtual memory.
Any calls to HeapAlloc and HeapFree do not actually allocate/free any new memory (unless, of course the request exceeds what has been already reserved in HeapCreate) but instead they meter out (or commit) a previously reserved large chunk, by dissecting it into smaller memory blocks that a user requests.
HeapDestroy in turn calls VirtualFree that actually frees the virtual memory.
So all this makes heap functions perfect candidates for generic memory allocations in your application. It is great for arbitrary size memory allocations. But a small price to pay for the convenience of the heap functions is that they introduce a slight overhead over VirtualAlloc when reserving larger blocks of memory.
Another good thing about heap is that you don't really need to create one. It is generally created for you when your process starts. So one can access it by calling GetProcessHeap function.
malloc / free
Is a language-specific wrapper for the heap functions. Unlike HeapAlloc, HeapFree, etc. these functions will work not only if your code is compiled for Windows, but also for other operating systems (such as Linux, etc.)
This is a recommended way to allocate/free memory if you program in C. (Unless, you're coding a specific kernel mode device driver.)
new / delete
Come as a high level (well, for C++) memory management operators. They are specific for the C++ language, and like malloc for C, are also the wrappers for the heap functions. They also have a whole bunch of their own code that deals C++-specific initialization of constructors, deallocation in destructors, raising an exception, etc.
These functions are a recommended way to allocate/free memory and objects if you program in C++.
Lastly, one comment I want to make about what has been said in other responses about using VirtualAlloc to share memory between processes. VirtualAlloc by itself does not allow sharing of its reserved/allocated memory with other processes. For that one needs to use CreateFileMapping API that can create a named virtual memory block that can be shared with other processes. It can also map a file on disk into virtual memory for read/write access. But that is another topic.

VirtualAlloc is a specialized allocation of the OS virtual memory (VM) system. Allocations in the VM system must be made at an allocation granularity which (the allocation granularity) is architecture dependent. Allocation in the VM system is one of the most basic forms of memory allocation. VM allocations can take several forms, memory is not necessarily dedicated or physically backed in RAM (though it can be). VM allocation is typically a special purpose type of allocation, either because of the allocation has to
be very large,
needs to be shared,
must be aligned on a particular value (performance reasons) or
the caller need not use all of this memory at once...
etc...
HeapAlloc is essentially what malloc and new both eventually call. It is designed to be very fast and usable under many different types of scenarios of a general purpose allocation. It is the "Heap" in a classic sense. Heaps are actually setup by a VirtualAlloc, which is what is used to initially reserve allocation space from the OS. After the space is initialized by VirtualAlloc, various tables, lists and other data structures are configured to maintain and control the operation of the HEAP. Some of that operation is in the form of dynamically sizing (growing and shrinking) the heap, adapting the heap to particular usages (frequent allocations of some size), etc..
new and malloc are somewhat the same, malloc is essentially an exact call into HeapAlloc( heap-id-default ); new however, can [additionally] configure the allocated memory for C++ objects. For a given object, C++ will store vtables on the heap for each caller. These vtables are redirects for execution and form part of what gives C++ it's OO characteristics like inheritance, function overloading, etc...
Some other common allocation methods like _alloca() and _malloca() are stack based; FileMappings are really allocated with VirtualAlloc and set with particular bit flags which designate those mappings to be of type FILE.
Most of the time, you should allocate memory in a way which is consistent with the use of that memory ;). new in C++, malloc for C, VirtualAlloc for massive or IPC cases.
*** Note, large memory allocations done by HeapAlloc are actually shipped off to VirtualAlloc after some size (couple hundred k or 16 MB or something I forget, but fairly big :) ).
*** EDIT
I briefly remarked about IPC and VirtualAlloc, there is also something very neat about a related VirtualAlloc which none of the responders to this question have discussed.
VirtualAllocEx is what one process can use to allocate memory in an address space of a different process. Most typically, this is used in combination to get remote execution in the context of another process via CreateRemoteThread (similar to CreateThread, the thread is just run in the other process).

In outline:
VirtualAlloc, HeapAlloc etc. are Windows APIs that allocate memory of various types from the OS directly. VirtualAlloc manages pages in the Windows virtual memory system, while HeapAlloc allocates from a specific OS heap. Frankly, you are unlikely to ever need to use eiither of them.
malloc is a Standard C (and C++) library function that allocates memory to your process. Implementations of malloc will typically use one of the OS APIs to create a pool of memory when your app starts and then allocate from it as you make malloc requests
new is a Standard C++ operator which allocates memory and then calls constructors appropriately on that memory. It may be implemented in terms of malloc or in terms of the OS APIs, in which case it too will typically create a memory pool on application startup.

VirtualAlloc ===> sbrk() under UNIX
HeapAlloc ====> malloc() under UNIX

VirtualAlloc => Allocates straight into virtual memory, you reserve/commit in blocks. This is great for large allocations, for example large arrays.
HeapAlloc / new => allocates the memory on the default heap (or any other heap that you may create). This allocates per object and is great for smaller objects. The default heap is serializable therefore it has guarantee thread allocation (this can cause some issues on high performance scenarios and that's why you can create your own heaps).
malloc => uses the C runtime heap, similar to HeapAlloc but it is common for compatibility scenarios.
In a nutshell, the heap is just a chunk of virtual memory that is governed by a heap manager (rather than raw virtual memory)
The last model on the memory world is memory mapped files, this scenario is great for large chunk of data (like large files). This is used internally when you open an EXE (it does not load the EXE in memory, just creates a memory mapped file).

Linux page allocation

In linux if malloc can't allocate data chunk from preallocated page. it uses mmap() or brk() to assign new pages. I wanted to clarify a few things :
I don't understand the following statment , I thought that when I use mmap() or brk() the kernel maps me a whole page. but the allocator alocates me only what I asked for from that new page? In this case trying to access unallocated space (In the newly mapped page) will not cause a page fault? (I understand that its not recommended )
The libc allocator manages each page: slicing them into smaller blocks, assigning them to processes, freeing them, and so on. For example, if your program uses 4097 bytes total, you need to use two pages, even though in reality the allocator gives you somewhere between 4105 to 4109 bytes
How does the allocator knows the VMA bounders ?(I assume no system call used) because the VMA struct that hold that information can only get accessed from kernel mode?

The system-level memory allocation (via mmap and brk) is all page-aligned and page-sized. This means if you use malloc (or other libc API that allocates memory) some small amount of memory (say 10 bytes), you are guaranteed that all the other bytes on that page of memory are readable without triggering a page fault.
Malloc and family do their own bookkeeping within the pages returned from the OS, so the mmap'd pages used by libc also contain a bunch of malloc metadata, in addition to whatever space you've allocated.
The libc allocator knows where everything is because it invokes the brk() and mmap() calls. If it invokes mmap() it passes in a size, and the kernel returns a start address. Then the libc allocator just stores those values in its metadata.
Doug Lea's malloc implementation is a very, very well documented memory allocator implementation and its comments will shed a lot of light on how allocators work in general:
http://g.oswego.edu/dl/html/malloc.html

Executable memory within 32 bits displacement of code area

Writing a JIT compiler in C++ on 64-bit Windows, generated code will sometimes need to call run-time functions that are written in C++. At the moment I'm allocating memory in which to place the generated code with VirtualAlloc(0, bytes, MEM_COMMIT, PAGE_EXECUTE_READWRITE); the last flag is important because allocated memory is not otherwise executable.
VirtualAlloc could presumably return memory anywhere in the 64-bit address space, which is fine for data (of which in general more than 4 gigabytes will be needed, so it does need 64-bit addressing), but the most efficient form of the x64 call instruction wants a 32-bit IP-relative offset, and since the amount of generated code will be less than 4 gigabytes, it would be preferable to locate it within a 32-bit displacement of the code compiled from C++.
Is there a way to arrange this?

You can specify a virtual address near which you want the allocation to happen as the first argument. To increase the chance of getting the allocation within the boundaries you want to could reserve the virtual memory region first and then request for committed memory as and when needed from the reserved space.
Allocation by default happens bottom unless MEM_TOP_DOWN is specified or system is configured to perform memory layout top down to catch pointer truncation problems. Gist is that you can only increase the chance of having allocation within the boundary but should have code to handle when allocation is out of boundary.

mips memory management

how do you manually manage the heap in mips assembly, specifically the SPIM simulator?
the heap, i've found begins at 0x10040000 when using the sbrk syscall, e.g.
li $t0, 1
li $s0, 9
syscall
sw $t0, ($s0) # 1 located at 0x10040000
so, does a call to sbrk NOT guarantee that you will get back the next free memory slot? for instance, if i called sbrk for a single 4 byte space, SPIM might allocate addresses: 0x10040000-0x10040003. however, a second call for another 4 byte space might be unrelated to the prior 4 byte allocation? thus, a data structure is required to keep track of which memory slots have been allocated? finally, do memory managers try to reduce the number of calls to sbrk by determining free space that lies between the addresses that are tracked by the particular data structure?

On real systems, sbrk returns page-granularity allocations. I'm not sure if the SPIM simulator does (the meagre online docs imply it will return any byte-oriented granularity).
Generally, the sbrk system call just sets the "end-of-heap" pointer. All the underlying OS knows is the beginning of the heap (where sbrk started at the beginning of the program), and the current end-of-heap pointer. All the memory in that bound is considered (from the OS point of view) heap memory in use by the program.
(Note, in your case I believe the SPIM simulator takes an integer to bump the pointer by, so implicitly all the memory is contiguous and I think the break is always increasing?)
The "malloc" API that is generally built on sbrk provides more than a simple contiguous, growing memory region. A good malloc library generally lets you mark a region of memory as "free" (so it can be used to satisfy a subsequent malloc call). Note that OS generally doesn't know about this "free". Malloc keeps track of the memory that is free or not. In general malloc cannot return this arbitrary region of the heap to the OS, because the heap is a single contiguous region from the OS point of view.
Real malloc implementations have to deal with the mis-match between allocation requests and page-size (normal sbrk only returns multiples of page-aligned, page-sized allocations). Your simulator case doesn't have this problem because sbrk is fine-grained.
Note that tracking which memory is in use or not requires memory, so malloc has some overhead. Some implementations are designed to store most of this bookkeeping in the "free" memory (reducing the apparent cost to the client). There are lots of other strategies for matching malloc to sbrk.... (In your case mapping malloc to sbrk, and making free a no-op will 'work' as long as you don't allocate too much memory ...)
Here's an overview of how malloc and sbrk relate that draws some ASCII art I don't have the patience to transcribe: http://web.eecs.utk.edu/~huangj/cs360/360/notes/Malloc1/lecture.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio