how can i override malloc(), calloc(), free() etc under OS X? - xcode

Assuming the latest XCode and GCC, what is the proper way to override the memory allocation functions (I guess operator new/delete as well). The debugging memory allocators are too slow for a game, I just need some basic stats I can do myself with minimal impact.
I know its easy in Linux due to the hooks, and this was trivial under codewarrior ten years ago when I wrote HeapManager.
Sadly smartheap no longer has a mac version.

I would use library preloading for this task, because it does not require modification of the running program. If you're familiar with the usual Unix way to do this, it's almost a matter of replacing LD_PRELOAD with DYLD_INSERT_LIBRARIES.
First step is to create a library with code such as this, then build it using regular shared library linking options (gcc -dynamiclib):
void *malloc(size_t size)
{
void * (*real_malloc)(size_t);
real_malloc = dlsym(RTLD_NEXT, "malloc");
fprintf(stderr, "allocating %lu bytes\n", (unsigned long)size);
/* Do your stuff here */
return real_malloc(size);
}
Note that if you also divert calloc() and its implementation calls malloc(), you may need additional code to check how you're being called. C++ programs should be pretty safe because the new operator calls malloc() anyway, but be aware that no standard enforces that. I have never encountered an implementation that didn't use malloc(), though.
Finally, set up the running environment for your program and launch it (might require adjustments depending on how your shell handles environment variables):
export DYLD_INSERT_LIBRARIES=./yourlibrary.dylib
export DYLD_FORCE_FLAT_NAMESPACE=1
yourprogram --yourargs
See the dyld manual page for more information about the dynamic linker environment variables.
This method is pretty generic. There are limitations, however:
You won't be able to divert direct system calls
If the application itself tricks you by using dlsym() to load malloc's address, the call won't be diverted. Unless, however, you trick it back by also diverting dlsym!

The malloc_default_zone technique mentioned at http://lists.apple.com/archives/darwin-dev/2005/Apr/msg00050.html appears to still work, see e.g. http://code.google.com/p/fileview/source/browse/trunk/fileview/fv_zone.cpp?spec=svn354&r=354 for an example use that seems to be similar to what you intend.

After much searching (here included) and issues with 10.7 I decided to write a blog post about this topic: How to set malloc hooks in OSX Lion
You'll find a few good links at the end of the post with more information on this topic.
The basic solution:
malloc_zone_t *dz=malloc_default_zone();
if(dz->version>=8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ | VM_PROT_WRITE);//remove the write protection
}
original_free=dz->free;
dz->free=&my_free; //this line is throwing a bad ptr exception without calling vm_protect first
if(dz->version==8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ);//put the write protection back
}

This is an old question, but I came across it while trying to do this myself. I got curious about this topic for a personal project I was working on, mainly to make sure that what I thought was automatically deallocated was being properly deallocated. I ended up writing a C++ implementation to allow me to track the amount of allocated heap and report it out if I so chose.
https://gist.github.com/monitorjbl/3dc6d62cf5514892d5ab22a59ff34861
As the name notes, this is OSX-specific. However, I was able to do this on Linux environments using the malloc_usable_size
Example
#define MALLOC_DEBUG_OUTPUT
#include "malloc_override_osx.hpp"
int main(){
int* ip = (int*)malloc(sizeof(int));
double* dp = (double*)malloc(sizeof(double));
free(ip);
free(dp);
}
Building
$ clang++ -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk \
-pipe -stdlib=libc++ -std=gnu++11 -g -o test test.cpp
$ ./test
0x7fa28a403230 -> malloc(16) -> 16
0x7fa28a403240 -> malloc(16) -> 32
0x7fa28a403230 -> free(16) -> 16
0x7fa28a403240 -> free(16) -> 0
Hope this helps someone else out in the future!

If the basic stats you need can be collected in a simple wrapper, a quick (and kinda dirty) trick is just using some #define macro replacement.
void* _mymalloc(size_t size)
{
void* ptr = malloc(size);
/* do your stat work? */
return ptr;
}
and
#define malloc(sz_) _mymalloc(sz_)
Note: if the macro is defined before the _mymalloc definition it will end up replacing the malloc call inside that function leaving you with infinite recursion... so ensure this isn't the case. You might want to explicitly #undef it before that function definition and simply (re)define it afterward depending on where you end up including it to hopefully avoid this situation.

I think if you define a malloc() and free() in your own .c file included in the project the linker will resolve that version.
Now then, how do you intend to implement malloc?

Check out Emery Berger's -- the author of the Hoard memory allocator's -- approach for replacing the allocator on OSX at https://github.com/emeryberger/Heap-Layers/blob/master/wrappers/macwrapper.cpp (and a few other files you can trace yourself by following the includes).
This is complementary to Alex's answer, but I thought this example was more to-the-point of replacing the system provided allocator.

Related

Disable Lazy Evaluation in QEMU

Is it possible to disable lazy evaluation in QEMU (User-Mode)?
I found no flags for it when running qemu-i386.
Looking at the Code i found the function:
static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
in target/i386/translate.c which converts one instruction into a host instruction. Within this function, the function:
static void gen_compute_eflags(DisasContext *s)
is used to generate eflags for specific instructions (that will need them).
My first idea would be to add the gen_compute_eflags() to every instruction, i wonder if there is a more effective and less error-prone way to do this.
What are you actually trying to achieve? There is no easy way to somehow disable the optimisation away of the flag-register codegen when the flag value is never used before it is overwritten, but I don't know why you would want to do that. The optimisation makes execution faster and QEMU does it in only the places where it is safe (ie where the flag value we don't compute is never used by the guest and not visible to the user via the debug stub).

GCC: avoiding long time linking while using static arrays

My question is practically repeats this one, which asks why this issue occurs. I would like ot know if it is possible to avoid it.
The issue is: if I allocate a huge amount of memory statically:
unsigned char static_data[ 8 * BYTES_IN_GYGABYTE ];
then linker (ld) takes very long time to make an executable. There is a good explanation from #davidg about this behaviour in question I gave above:
This leaves us with the follow series of steps:
The assembler tells the linker that it needs to create a section of memory that is 1GB long.
The linker goes ahead and allocates this memory, in preparation for placing it in the final executable.
The linker realizes that this memory is in the .bss section and is marked NOBITS, meaning that the data is just 0, and doesn't need to be physically placed into the final executable. It avoids writing out the 1GB of data, instead just throwing the allocated memory away.
The linker writes out to the final ELF file just the compiled code, producing a small executable.
A smarter linker might be able to avoid steps 2 and 3 above, making your compile time much faster
Ok. #davidg had explained why does linker takes a lot of time, but I want to know how can I avoid it. Maybe GCC have some options, that will say to linker to be a little smarter and to avoid steps 2 and 3 above ?
Thank you.
P.S. I use GCC 4.5.2 at Ubuntu
You can allocate the static memory in the release version only:
#ifndef _DEBUG
unsigned char static_data[ 8 * BYTES_IN_GYGABYTE ];
#else
unsigned char *static_data;
#endif
I would have 2 ideas in mind that could help:
As already mentioned in some comment: place it in a separate compilation unit.That itself will not reduce linking time. But maybe together with incremental linking it helps (ld option -r).
Other is similar. Place it in a separate compilation unit, and generate a shared library from it. And just link later with the shared library.
Sadly I can not promise that one of it helps, as I have no way to test: my gcc(4.7.2) and bin tools dont show this time consuming behaviour, 8, 16 or 32 Gigabytes testprogram compile and link in under a second.

Protecting memory from changing

Is there a way to protect an area of the memory?
I have this struct:
#define BUFFER 4
struct
{
char s[BUFFER-1];
const char zc;
} str = {'\0'};
printf("'%s', zc=%d\n", str.s, str.zc);
It is supposed to operate strings of lenght BUFFER-1, and garantee that it ends in '\0'.
But compiler gives error only for:
str.zc='e'; /*error */
Not if:
str.s[3]='e'; /*no error */
If compiling with gcc and some flag might do, that is good as well.
Thanks,
Beco
To detect errors at runtime take a look at the -fstack-protector-all option in gcc. It may be of limited use when attempting to detect very small overflows like the one your described.
Unfortunately you aren't going to find a lot of info on detecting buffer overflow scenarios like the one you described at compile-time. From a C language perspective the syntax is totally correct, and the language gives you just enough rope to hang yourself with. If you really want to protect your buffers from yourself you can write a front-end to array accesses that validates the index before it allows access to the memory you want.

Can I assume sizeof(GUID)==16 at all times?

The definition of GUID in the windows header's is like this:
typedef struct _GUID {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
unsigned char Data4[ 8 ];
} GUID;
However, no packing is not defined. Since the alignment of structure members is dependent on the compiler implementation one could think this structure could be longer than 16 bytes in size.
If i can assume it is always 16 bytes - my code using GUIDs is more efficient and simple.
However, it would be completely unsafe - if a compiler adds some padding in between of the members for some reason.
My questions do potential reasons exist ? Or is the probability of the scenario that sizeof(GUID)!=16 actually really 0.
It's not official documentation, but perhaps this article can ease some of your fears. I think there was another one on a similar topic, but I cannot find it now.
What I want to say is that Windows structures do have a packing specifier, but it's a global setting which is somewhere inside the header files. It's a #pragma or something. And it is mandatory, because otherwise programs compiled by different compilers couldn't interact with each other - or even with Windows itself.
It's not zero, it depends on your system. If the alignment is word (4-bytes) based, you'll have padding between the shorts, and the size will be more than 16.
If you want to be sure that it's 16 - manually disable the padding, otherwise use sizeof, and don't assume the value.
If I feel I need to make an assumption like this, I'll put a 'compile time assertion' in the code. That way, the compiler will let me know if and when I'm wrong.
If you have or are willing to use Boost, there's a BOOST_STATIC_ASSERT macro that does this.
For my own purposes, I've cobbled together my own (that works in C or C++ with MSVC, GCC and an embedded compiler or two) that uses techniques similar to those described in this article:
http://www.pixelbeat.org/programming/gcc/static_assert.html
The real tricks to getting the compile time assertion to work cleanly is dealing with the fact that some compilers don't like declarations mixed with code (MSVC in C mode), and that the techniques often generate warnings that you'd rather not have clogging up an otherwise working build. Coming up with techniques that avoid the warnings is sometimes a challenge.
Yes, on any Windows compiler. Otherwise IsEqualGUID would not work: it compares only the first 16 bytes. Similarly, any other WinAPI function that takes a GUID* just checks the first 16 bytes.
Note that you must not assume generic C or C++ rules for windows.h. For instance, a byte is always 8 bits on Windows, even though ISO C allows 9 bits.
Anytime you write code dependent on the size of someone else's structure,
warning bells should go off.
Could you give an example of some of the simplified code you want to use?
Most people would just use sizeof(GUID) if the size of the structure was needed.
With that said -- I can't see the size of GUID ever changing.
#include <stdio.h>
#include <rpc.h>
int main () {
GUID myGUID;
printf("size of GUID is %d\n", sizeof(myGUID));
return 0;
}
Got 16. This is useful to know if you need to manually allocate on the heap.

How to reference segment beginning and size from C code

I am porting a program for an ARM chip from a IAR compiler to gcc.
In the original code, IAR specific operators such as __segment_begin and __segment_size are used to obtain the beginning and size respectively of certain memory segments.
Is there any way to do the same thing with GCC? I've searched the GCC manual but was unable to find anything relevant.
More details:
The memory segments in question have to be in fixed locations so that the program can interface correctly with certain peripherals on the chip. The original code uses the __segment_begin operator to get the address of this memory and the __segment_size to ensure that it doesn't overflow this memory.
I can achieve the same functionality by adding variables to indicate the start and end of these memory segments but if GCC had similar operators that would help minimise the amount of compiler dependent code I end up having to write and maintain.
What about the linker's flag --section-start? Which I read is supported here.
An example on how to use it can be found on the AVR Freaks Forum:
const char __attribute__((section (".honk"))) ProjString[16] = "MY PROJECT V1.1";
You will then have to add to the linker's options: -Wl,--section-start=.honk=address.
Modern versions of GCC will declare two variables for each segment, namely __start_MY_SEGMENT and __stop_MY_SEGMENT. To use these variables, you need to declare them as externs with the desired type. Following that, you and then use the '&' operator to get the address of the start and end of that segment.
extern uint8_t __start_MY_SEGMENT;
extern uint8_t __stop_MY_SEGMENT;
#define MY_SEGMENT_LEN (&__stop_MY_SEGMENT - &__start_MY_SEGMENT)

Resources