Writing a JIT compiler in C++ on 64-bit Windows, generated code will sometimes need to call run-time functions that are written in C++. At the moment I'm allocating memory in which to place the generated code with VirtualAlloc(0, bytes, MEM_COMMIT, PAGE_EXECUTE_READWRITE); the last flag is important because allocated memory is not otherwise executable.
VirtualAlloc could presumably return memory anywhere in the 64-bit address space, which is fine for data (of which in general more than 4 gigabytes will be needed, so it does need 64-bit addressing), but the most efficient form of the x64 call instruction wants a 32-bit IP-relative offset, and since the amount of generated code will be less than 4 gigabytes, it would be preferable to locate it within a 32-bit displacement of the code compiled from C++.
Is there a way to arrange this?
You can specify a virtual address near which you want the allocation to happen as the first argument. To increase the chance of getting the allocation within the boundaries you want to could reserve the virtual memory region first and then request for committed memory as and when needed from the reserved space.
Allocation by default happens bottom unless MEM_TOP_DOWN is specified or system is configured to perform memory layout top down to catch pointer truncation problems. Gist is that you can only increase the chance of having allocation within the boundary but should have code to handle when allocation is out of boundary.
Related
I'm developing a C-library to be used under uCOS-III. The CPU is an ARM Cortex M4 SAM4C. Within the library I want to use a third party product X, whose particular name is not relevant here. The source code for X is completely available and compiles without problems.
Inside X a lot of memory allocations are executed, using calloc() and free().
The problem is, that plain usage of malloc is not advisable for embedded systems, because of memory fragmentation. The documentation for uCOS-III explicitly advises against using malloc - instead OSMemCreate/OSMemGet/OSMemPut shall be used to allocate and free chunks of memory out of a statically allocated memory block.
Question-1:
What is the general advice to get around the "standard implementation" of malloc? I would prefer a kind of malloc, where I have access to a fixed memory pool (e.g. dedicated for a special task)
Question-2:
How should OSMemCreate() be used correctly? I have first to initialize a memory partition with a certain block size. The amount of requested memory is between 4 Bytes and about 800 bytes. I can get blocks on request, but with fixed size. If block-size=4 I cannot allocate 16 Bytes, since blocks are not contiguous in memory. If block-size=800 and I need only 4 bytes, most of the block is left unused and I will very soon run out of blocks.
So I don't know, how to solve my original problem by use of OSMemCreate...
Can anybody give me an advice how I could proceed?
Many thanks,
Michael
1) Don't link with the standard library version of malloc/free. Instead create your own implementation of malloc/free that serves as a wrapper to OSMemGet/OSMemPut.
2) You can create more than one memory partition with OSMemCreate. Create small, medium, and large partitions that hold block sizes which are tuned for your application to reduce waste.
If you want malloc to get an appropriately sized block from your various memory partitions then you'll have to invent some magic so that free returns the block to the appropriate memory partition. (Perhaps malloc allocates an extra word, stores the pointer to the memory partition in the first word, and then returns the address after the word where the pointer is stored. Then free knows to get the memory partition pointer from the preceding word.)
An alternative to using malloc/free is to rewrite that code to use statically allocated variables or call OSMemGet/OSMemPut directly.
There are lots of method to allocate memory in Windows environment, such as VirtualAlloc, HeapAlloc, malloc, new.
Thus, what's the difference among them?
Each API is for different uses. Each one also requires that you use the correct deallocation/freeing function when you're done with the memory.
VirtualAlloc
A low-level, Windows API that provides lots of options, but is mainly useful for people in fairly specific situations. Can only allocate memory in (edit: not 4KB) larger chunks. There are situations where you need it, but you'll know when you're in one of these situations. One of the most common is if you have to share memory directly with another process. Don't use it for general-purpose memory allocation. Use VirtualFree to deallocate.
HeapAlloc
Allocates whatever size of memory you ask for, not in big chunks than VirtualAlloc. HeapAlloc knows when it needs to call VirtualAlloc and does so for you automatically. Like malloc, but is Windows-only, and provides a couple more options. Suitable for allocating general chunks of memory. Some Windows APIs may require that you use this to allocate memory that you pass to them, or use its companion HeapFree to free memory that they return to you.
malloc
The C way of allocating memory. Prefer this if you are writing in C rather than C++, and you want your code to work on e.g. Unix computers too, or someone specifically says that you need to use it. Doesn't initialise the memory. Suitable for allocating general chunks of memory, like HeapAlloc. A simple API. Use free to deallocate. Visual C++'s malloc calls HeapAlloc.
new
The C++ way of allocating memory. Prefer this if you are writing in C++. It puts an object or objects into the allocated memory, too. Use delete to deallocate (or delete[] for arrays). Visual studio's new calls HeapAlloc, and then maybe initialises the objects, depending on how you call it.
In recent C++ standards (C++11 and above), if you have to manually use delete, you're doing it wrong and should use a smart pointer like unique_ptr instead. From C++14 onwards, the same can be said of new (replaced with functions such as make_unique()).
There are also a couple of other similar functions like SysAllocString that you may be told you have to use in specific circumstances.
It is very important to understand the distinction between memory allocation APIs (in Windows) if you plan on using a language that requires memory management (like C or C++.) And the best way to illustrate it IMHO is with a diagram:
Note that this is a very simplified, Windows-specific view.
The way to understand this diagram is that the higher on the diagram a memory allocation method is, the higher level implementation it uses. But let's start from the bottom.
Kernel-Mode Memory Manager
It provides all memory reservations & allocations for the operating system, as well as support for memory-mapped files, shared memory, copy-on-write operations, etc. It's not directly accessible from the user-mode code, so I'll skip it here.
VirtualAlloc / VirtualFree
These are the lowest level APIs available from the user mode. The VirtualAlloc function basically invokes ZwAllocateVirtualMemory that in turn does a quick syscall to ring0 to relegate further processing to the kernel memory manager. It is also the fastest method to reserve/allocate block of new memory from all available in the user mode.
But it comes with two main conditions:
It only allocates memory blocks aligned on the system granularity boundary.
It only allocates memory blocks of the size that is the multiple of the system granularity.
So what is this system granularity? You can get it by calling GetSystemInfo. It is returned as the dwAllocationGranularity parameter. Its value is implementation (and possibly hardware) specific, but on many 64-bit Windows systems it is set at 0x10000 bytes, or 64K.
So what all this means, is that if you try to allocate, say just an 8 byte memory block with VirtualAlloc:
void* pAddress = VirtualAlloc(NULL, 8, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
If successful, pAddress will be aligned on the 0x10000 byte boundary. And even though you requested only 8 bytes, the actual memory block that you will get will be the entire page (or, something like 4K bytes. The exact page size is returned in the dwPageSize parameter.) But, on top of that, the entire memory block spanning 0x10000 bytes (or 64K in most cases) from pAddress will not be available for any further allocations. So in a sense, by allocating 8 bytes you could as well be asking for 65536.
So the moral of the story here is not to substitute VirtualAlloc for generic memory allocations in your application. It must be used for very specific cases, as is done with the heap below. (Usually for reserving/allocating large blocks of memory.)
Using VirtualAlloc incorrectly can lead to severe memory fragmentation.
HeapCreate / HeapAlloc / HeapFree / HeapDestroy
In a nutshell, the heap functions are basically a wrapper for VirtualAlloc function. Other answers here provide a pretty good concept of it. I'll add that, in a very simplistic view, the way heap works is this:
HeapCreate reserves a large block of virtual memory by calling VirtualAlloc internally (or ZwAllocateVirtualMemory to be specific). It also sets up an internal data structure that can track further smaller size allocations within the reserved block of virtual memory.
Any calls to HeapAlloc and HeapFree do not actually allocate/free any new memory (unless, of course the request exceeds what has been already reserved in HeapCreate) but instead they meter out (or commit) a previously reserved large chunk, by dissecting it into smaller memory blocks that a user requests.
HeapDestroy in turn calls VirtualFree that actually frees the virtual memory.
So all this makes heap functions perfect candidates for generic memory allocations in your application. It is great for arbitrary size memory allocations. But a small price to pay for the convenience of the heap functions is that they introduce a slight overhead over VirtualAlloc when reserving larger blocks of memory.
Another good thing about heap is that you don't really need to create one. It is generally created for you when your process starts. So one can access it by calling GetProcessHeap function.
malloc / free
Is a language-specific wrapper for the heap functions. Unlike HeapAlloc, HeapFree, etc. these functions will work not only if your code is compiled for Windows, but also for other operating systems (such as Linux, etc.)
This is a recommended way to allocate/free memory if you program in C. (Unless, you're coding a specific kernel mode device driver.)
new / delete
Come as a high level (well, for C++) memory management operators. They are specific for the C++ language, and like malloc for C, are also the wrappers for the heap functions. They also have a whole bunch of their own code that deals C++-specific initialization of constructors, deallocation in destructors, raising an exception, etc.
These functions are a recommended way to allocate/free memory and objects if you program in C++.
Lastly, one comment I want to make about what has been said in other responses about using VirtualAlloc to share memory between processes. VirtualAlloc by itself does not allow sharing of its reserved/allocated memory with other processes. For that one needs to use CreateFileMapping API that can create a named virtual memory block that can be shared with other processes. It can also map a file on disk into virtual memory for read/write access. But that is another topic.
VirtualAlloc is a specialized allocation of the OS virtual memory (VM) system. Allocations in the VM system must be made at an allocation granularity which (the allocation granularity) is architecture dependent. Allocation in the VM system is one of the most basic forms of memory allocation. VM allocations can take several forms, memory is not necessarily dedicated or physically backed in RAM (though it can be). VM allocation is typically a special purpose type of allocation, either because of the allocation has to
be very large,
needs to be shared,
must be aligned on a particular value (performance reasons) or
the caller need not use all of this memory at once...
etc...
HeapAlloc is essentially what malloc and new both eventually call. It is designed to be very fast and usable under many different types of scenarios of a general purpose allocation. It is the "Heap" in a classic sense. Heaps are actually setup by a VirtualAlloc, which is what is used to initially reserve allocation space from the OS. After the space is initialized by VirtualAlloc, various tables, lists and other data structures are configured to maintain and control the operation of the HEAP. Some of that operation is in the form of dynamically sizing (growing and shrinking) the heap, adapting the heap to particular usages (frequent allocations of some size), etc..
new and malloc are somewhat the same, malloc is essentially an exact call into HeapAlloc( heap-id-default ); new however, can [additionally] configure the allocated memory for C++ objects. For a given object, C++ will store vtables on the heap for each caller. These vtables are redirects for execution and form part of what gives C++ it's OO characteristics like inheritance, function overloading, etc...
Some other common allocation methods like _alloca() and _malloca() are stack based; FileMappings are really allocated with VirtualAlloc and set with particular bit flags which designate those mappings to be of type FILE.
Most of the time, you should allocate memory in a way which is consistent with the use of that memory ;). new in C++, malloc for C, VirtualAlloc for massive or IPC cases.
*** Note, large memory allocations done by HeapAlloc are actually shipped off to VirtualAlloc after some size (couple hundred k or 16 MB or something I forget, but fairly big :) ).
*** EDIT
I briefly remarked about IPC and VirtualAlloc, there is also something very neat about a related VirtualAlloc which none of the responders to this question have discussed.
VirtualAllocEx is what one process can use to allocate memory in an address space of a different process. Most typically, this is used in combination to get remote execution in the context of another process via CreateRemoteThread (similar to CreateThread, the thread is just run in the other process).
In outline:
VirtualAlloc, HeapAlloc etc. are Windows APIs that allocate memory of various types from the OS directly. VirtualAlloc manages pages in the Windows virtual memory system, while HeapAlloc allocates from a specific OS heap. Frankly, you are unlikely to ever need to use eiither of them.
malloc is a Standard C (and C++) library function that allocates memory to your process. Implementations of malloc will typically use one of the OS APIs to create a pool of memory when your app starts and then allocate from it as you make malloc requests
new is a Standard C++ operator which allocates memory and then calls constructors appropriately on that memory. It may be implemented in terms of malloc or in terms of the OS APIs, in which case it too will typically create a memory pool on application startup.
VirtualAlloc ===> sbrk() under UNIX
HeapAlloc ====> malloc() under UNIX
VirtualAlloc => Allocates straight into virtual memory, you reserve/commit in blocks. This is great for large allocations, for example large arrays.
HeapAlloc / new => allocates the memory on the default heap (or any other heap that you may create). This allocates per object and is great for smaller objects. The default heap is serializable therefore it has guarantee thread allocation (this can cause some issues on high performance scenarios and that's why you can create your own heaps).
malloc => uses the C runtime heap, similar to HeapAlloc but it is common for compatibility scenarios.
In a nutshell, the heap is just a chunk of virtual memory that is governed by a heap manager (rather than raw virtual memory)
The last model on the memory world is memory mapped files, this scenario is great for large chunk of data (like large files). This is used internally when you open an EXE (it does not load the EXE in memory, just creates a memory mapped file).
I was just going through glibc manual for description about posix_memalign function when I encountered this statement:
The address of a block returned by malloc or realloc in the GNU system is always a
multiple of eight (or sixteen on 64-bit systems). If you need a block whose address is a
multiple of a higher power of two than that, use memalign, posix_memalign, or valloc.
If I consider a simple structure containing just an int data member:
struct Mystruct
{
int member;
};
Then I can see that Mystruct should be 4-byte aligned. But according to libc manual on a 64-bit architecture, dynamically allocating memory for such structure would return memory allocated on an address with 16 byte alignment.
Correct me if I am wrong. To me it seems like compiler uses natural alignment of a structure only for global/static/automatic variables (data, bss, stack). But on the other hand, to allocate the same structure on heap memory, the malloc call uses a predefined alignment(8 on 32-bit architectures and 16 on 64 bit architectures) ?
One thing worth remembering is that malloc() is nothing magical -- it's just a function. Likewise, the heap is just a bunch of memory that the compiler promises not to touch for its stuff. Allocating things statically (on the stack or in data/bss) is something that the compiler does do, so it can align them with whatever alignment is specified. malloc() (and the associated functions) are functions that manage the heap, but they are called at runtime. The compiler doesn't know anything about malloc() except that it's a function, and malloc() doesn't know anything about the memory that you're asking it to allocate except how much you're asking for. Because 16-bit aligned is usually fine for most common uses, malloc() ensures this alignment to be safe.
PS: A fun (and eye-opening) exercise is to write your own malloc() implementation.
sorry for my rather general question, but I could not find a definite answer to it:
Given that I have free swap memory left and I allocate memory in reasonable chunks (~1MB) -> can memory allocation still fail for any reason?
The smartass answer would be "yes, memory allocation can fail for any reason". That may not be what you are looking for.
Generally, whether your system has free memory left is not related to whether allocations succeed. Rather, the question is whether your process address space has free virtual address space.
The allocator (malloc, operator new, ...) first looks if there is free address space in the current process that is already mapped, that is, the kernel is aware that the addresses should be usable. If there is, that address space is reserved in the allocator and returned.
Otherwise, the kernel is asked to map new address space to the process. This may fail, but generally doesn't, as mapping does not imply using physical memory yet -- it is just a promise that, should someone try to access this address, the kernel will try to find physical memory and set up the MMU tables so the virtual->physical translation finds it.
When the system is out of memory, there is no physical memory left, the process is suspended and the kernel attempts to free physical memory by moving other processes' memory to disk. The application does not notice this, except that executing a single assembler instruction apparently took a long time.
Memory allocations in the process fail if there is no mapped free region large enough and the kernel refuses to establish a mapping. For example, not all virtual addresses are useable, as most operating systems map the kernel at some address (typically, 0x80000000, 0xc0000000, 0xe0000000 or something such on 32 bit architectures), so there is a per-process limit that may be lower than the system limit (for example, a 32 bit process on Windows can only allocate 2 GB, even if the system is 64 bit). File mappings (such as the program itself and DLLs) further reduce the available space.
A very general and theoretical answer would be no, it can not. One of the reasons it could possibly under very peculiar circumstances fail is that there would be some weird fragmentation of your available / allocatable memory. I wonder whether you're trying get (probably very minor) performance boost (skipping if pointer == NULL - kind of thing) or you're just wondering and want to discuss it, in which case you should probably use chat.
Yes, memory allocation often fails when you run out of memory space in a 32-bit application (can be 2, 3 or 4 GB depending on OS version and settings). This would be due to a memory leak. It can also fail if your OS runs out of space in your swap file.
I have encountered a problem in a C program running on an AVR microcontroller (ATMega328P). I believe it is due to a stack/heap collision but I'd like to be able to confirm this.
Is there any way I can visualise SRAM usage by the stack and the heap?
Note: the program is compiled with avr-gcc and uses avr-libc.
Update: The actual problem I am having is that the malloc implementation is failing (returning NULL). All mallocing happens on startup and all freeing happens at the end of the application (which in practice is never since the main part of the application is in an infinite loop). So I'm sure fragmentation is not the issue.
You can check RAM static usage using avr-size utility, as decribed in
http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=62968,
http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=82536,
http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=95638,
and http://letsmakerobots.com/node/27115
avr-size -C -x Filename.elf
(avr-size documentation: http://ccrma.stanford.edu/planetccrma/man/man1/avr-size.1.html )
Follows an example of how to set this on an IDE:
On Code::Blocks, Project -> Build options -> Pre/post build steps -> Post-build steps, include:
avr-size -C $(TARGET_OUTPUT_FILE)
or
avr-size -C --mcu=atmega328p $(TARGET_OUTPUT_FILE)
Example output at the end of build:
AVR Memory Usage
----------------
Device: atmega16
Program: 7376 bytes (45.0% Full)
(.text + .data + .bootloader)
Data: 81 bytes (7.9% Full)
(.data + .bss + .noinit)
EEPROM: 63 bytes (12.3% Full)
(.eeprom)
Data is your SRAM usage, and it is only the amount that the compiler
knows at compile time. You also need room for things created at
runtime (particularly stack usage).
To check stack usage (dynamic RAM),
from http://jeelabs.org/2011/05/22/atmega-memory-use/
Here’s a small utility function which determines how much RAM is
currently unused:
int freeRam () {
extern int __heap_start, *__brkval;
int v;
return (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval);
}
And here’s a sketch using that code:
void setup () {
Serial.begin(57600);
Serial.println("\n[memCheck]");
Serial.println(freeRam());
}
The freeRam() function returns how many bytes exists between the end of the heap and the last allocated memory on the stack, so it is effectively how much the stack/heap can grow before they collide.
You could check the return of this function around code you suspect may be causing stack/heap collision.
You say malloc is failing and returning NULL:
The obvious cause which you should look at first is that your heap is "full" - i.e, the memory you've asked to malloc cannot be allocated, because it's not available.
There are two scenarios to bear in mind:
a: You have a 16 K heap, you've already malloced 10 K and you try and malloc a further 10K. Your heap is simply too small.
b: More commonly, you have a 16 k Heap, you've been doing a bunch of malloc/free/realloc calls and your heap is less than 50% 'full': You call malloc for 1K and it FAILS - what's up? Answer - the heap free space is fragmented - there isn't a contigous 1K of free memory that can be returned. C Heap managers can not compact the heap when this happens, so you're generally in a bad way. There are techniques to avoid fragmentation, but it's difficult to know if this is really the problem. You'd need to add logging shims to malloc and free so that you can get an idea of what dynamic memory operations are being performed.
EDIT:
You say all mallocs happen at startup, so fragmentation isn't the issue.
In which case, it should be easy to replace the dynamic allocation with static.
old code example:
char *buffer;
void init()
{
buffer = malloc(BUFFSIZE);
}
new code:
char buffer[BUFFSIZE];
Once you've done this everywhere, your LINKER should warn you if everything cannot fit into the memory available. Don't forget to reduce the heap size - but beware that some runtime io system functions may still use the heap, so you may not be able to remove it entirely.
Don't use the heap / dynamic allocation on smaller embedded targets. Especially with a processor with such limited resources. Rather redesign your application because the problem will reoccur as your program grows.
The usual approach would be to fill the memory with a known pattern and then to check which areas are overwritten.
If you're using both stack and heap, then it can be a little more tricky. I'll explain what I've done when no heap is used. As a general rule, all the companies I've worked for (in the domain of embedded C software) have avoided using heap for small embedded projects—to avoid the uncertainty of heap memory availability. We use statically declared variables instead.
One method is to fill most of the stack area with a known pattern (e.g. 0x55) at start-up. This is usually done by a small bit of code early in the software execution, either right at the start of main(), or perhaps even before main() begins, in the start-up code. Take care not to overwrite the small amount of stack in use at that point of course. Then, after running the software for a while, inspect the contents of stack space and see where the 0x55 is still intact. How you "inspect" depends on your target hardware. Assuming you have a debugger connected, then you can simply stop the micro running and read the memory.
If you have a debugger that can do a memory-access breakpoint (a bit more fancy than the usual execution breakpoint), then you can set a breakpoint in a particular stack location—such as the farthest limit of your stack space. That can be extremely useful, because it also shows you exactly what bit of code is running when it reaches that extent of stack usage. But it requires your debugger to support the memory-access breakpoint feature and it's often not found in the "low-end" debuggers.
If you're also using heap, then it can be a bit more complicated because it may be impossible to predict where stack and heap will collide.
Assuming you're using just one stack (so not an RTOS or anything) and that the stack is at the end of the memory, growing down, while the heap is starting after the BSS/DATA region, growing up. I've seen implementations of malloc that actually check the stackpointer and fail on a collision. You could try to do that.
If you're not able to adapt the malloc code, you could choose to put your stack at the start of the memory (using the linker file). In general it's always a good idea to know/define the maximum size of the stack. If you put it at the start, you'll get an error on reading beyond the beginning of the RAM. The Heap will be at the end and can probably not grow beyond the end if it's a decent implemantation (will return NULL instead). Good thing is you know have 2 separate error cases for 2 separate issues.
To find out the maximum stack size, you could fill your memory with a pattern, run the application and see how far it went, see also reply from Craig.
If you can edit the code for your heap, you could pad it with a couple of extra bytes (tricky on such low resources) on each block of memory. These bytes could contain a known pattern different from the stack. This might give you a clue if it collides with the stack by seeing it appear inside the stack or vice versa.
On Unix like operating systems a library function named sbrk() with a parameter of 0 allows you to access the topmost address of dynamically allocated heap memory. The return value is a void * pointer and could be compared with the address of an arbitrary stack allocated variable.
Using the result of this comparison should be used with care. Depending on the CPU and system architecture, the stack may be growing down from a arbitrary high address while the allocated heap will move up from low-bound memory.
Sometimes the operating system has other concepts for memory management (i.e. OS/9) which places heap and stack in different memory segments in free memory. On these operating systems - especially for embedded systems - you need to define the maximum memory requirements of your
applications in advance to enable the system to allocate memory segments of matching sizes.