I need variant of snprint that has guarantee that it never calls malloc.
That's because this snprintf (let's call it safe_snprint()) is going to
be called from places where malloc will fail or deadlock.
What is closer to truth, 1 or 2 ?
On Windows, native snprintf might call malloc. Then
i need to pull opensource snprintf.c and call it safe_snprintf(). Or
On Windows native snprint is guaranteed to never call malloc.
I'd prefer (2) if it is documented somewhere. Thanks
The implementation of _snprintf() in the VC libs (MSVCRT) will only call malloc on floating point conversions if and only if (iff) the format precision exceeds 163 characters. This applies to %E %G %A %e %f %g and %a format specifiers.
This should apply to all releases of MSVCRT since at least version 6.
I cannot find documentation according to windows snprintf implementation but this link states that there could be reason to inner malloc call. Also I do not think so that any standard forbids developers to use it. So I recommend you to use the first approach. Here you could find a list of snprintf implementations.
There is no reason why snprintf() should ever call malloc(). Period.
But, however, you cannot be sure without viewing the source. And even then you cannot be sure that it will stay this way.
If you have to be absolutely sure, than you might want to implement your own snprintf().
Related
I need longjmp/setjmp in a .kext file for OS X. Unfortunately, I don't think there's any official support for these functions in XNU. Is there any fundamental reason why this cannot work or is it just not implemented right now?
Any ideas how I could get this to work?
If it helps, I want to try to get Lua to run in the OS X kernel but the runtime seems to depend on either longjmp/setjmp or C++ exceptions both of which are not available in XNU.
There's nothing about standard-compliant use of setjmp/longjmp which stops you from using it in a kernel context. The main thing to be careful about regarding the kernel execution context is that the current thread is usually identified via pointer arithmetic on the current stack pointer, so unlike in user space, you can't use green threads or otherwise mess with the rsp register (on x86-64). longjmp does set the stack pointer, but only to the value previously saved by setjmp, which will be in the same stack if you stick to standard use, so that's safe.
As far as I'm aware, compilers don't treat setjmp() calls specially, so you can implement your own version quite easily as a function in assembly language. Setjmp will need to save the return pointer, the stack pointer, and any callee-saved registers to the jmp_buf-typed array passed into the function; all of this is defined in the ABI for the platform in question (x86-64 sysv in the case of OS X). Then return 0 (set rax to 0 on x86-64). Your version of longjmp will simply need to restore the contents of this array and return to the saved location, with the passed-in value as the return value (copy the argument to rax on x86-64). To comply with the standard, you must return 1 if 0 is passed to longjmp.
In userspace, setjmp/longjmp typically also affect the signal mask, which doesn't apply in the kernel.
I am amazed that I can't find any document stating the difference between _int_malloc and malloc in the output of Valgrind's callgrind tool.
Could anybody explain what's their difference?
Furthermore, I actually write C++ code, so I am using exclusively new not malloc, but in the callgrind output only mallocs are showing up.
The malloc listed in the callgrind output will be the implementation of malloc provided by the glibc function __libc_malloc in the file glibc/malloc/malloc.c.
This function calls another function, intended for internal use only, named _int_malloc, which does most of the hard work.
As writing standard libraries is very difficult, the authors must be very good programmers and therefore very lazy. So, instead of writing memory allocation code twice, the new operator calls malloc in order to get the memory it requires.
When I compile 32-bit C code with GCC and the -fomit-frame-pointer option, the frame pointer (ebp) is not used unless my function calls Windows API functions with stdcall and atleast one parameter.
For example, if I only use GetCommandLine() from the Windows API, which has no parameters/arguments, GCC will omit the frame pointer and use ebp for other things, speeding up the code and not having that useless prologue.
But the moment I call a stdcall Win32 function that accepts at least one argument, GCC completely ignores the -fomit-frame-pointer and uses the frame pointer anyway, and the code is worse in inspection as it can't use ebp for general purpose things. Not to mention I find the frame pointer quite pointless. I mean, I want to compile for release and distribution, why should I care about debugging? (if I want to debug I'll just use a debug build instead after reproducing the bug)
My stack most certainly does NOT contain dynamic allocation like alloca. So, the stack has a defined structure yet GCC chooses the dumb method despite my options? Is there something I'm missing to force it to not use frame pointer?
My second grip I have with it is that it refuses to use "push" instructions for Win32 functions. Every other compiler I tried, they used push instructions to push on the stack, resulting in much better more compact code, not to mention it is the most natural way to push arguments for stdcall. Yet GCC stubbornly uses "mov" instructions to move in each spot, manually, at offsets relative to esp because it needs to keep the stack pointer completely static. stdcall is made to be easy on the caller, and yet GCC completely misses the point of stdcall since it generates this crappy code when interfacing with it. What's worse, since the stack pointer is static, it still uses a frame pointer? Just why?
I tried -mpush-args, it doesn't do anything.
I also noticed that if I make my stack big enough for it to exceed a page (4096 bytes), GCC will add a prologue with a function that does nothing but "bitwise or" the stack every 4096 bytes with zero (which does nothing). I assume it's for touching the stack and automatically commiting memory with page faults if the stack was reserved? Unfortunately, it does this even if I set the initial commit of the stack (not reserve) to high enough to hold my stack, not to mention this shouldn't even be needed in the first place. Redundant code at its best.
Are these bugs in GCC? Or something I'm missing in options? Should I use something else? Please tell me if I'm missing some options.
I seriously hope I won't have to make an inline asm macro just to call stdcall functions and use push instructions (and this will avoid frame pointer too I guess). That sounds really overkill for something so basic that should be in compilers of today. And yes I use GCC 4.8.1 so not an ancient version.
As extra question, is it possible to force GCC to not save registers on the stack at function prologue? I use my own direct entry point with -nostartfiles argument, because it is a pure Windows application and it works just fine without standard lib startup. If I use attribute((noreturn)), it will discard the epilogue restoring the registers but it will still push them on the stack at prologue, I don't know if there's a way to force it to not save registers for this entry point function. Either way not a big deal in the least, it would just feel more complete I guess. Thanks!
See the answer Force GCC to push arguments on the stack before calling function (using PUSH instruction)
I.e. try -mpush-args -mno-accumulate-outgoing-args. It may also require -mno-stack-arg-probe if gcc complains.
It looks like supplying the -mpush-args -mno-accumulate-outgoing-args -mno-stack-arg-probe works, specifically the last one. Now the code is cleaner and more normal like other compilers, and it uses PUSH for arguments, even makes it easier to track in OllyDbg this way.
Unfortunately, this FORCES the stupid frame pointer to be used, even in small functions that absolutely do not need it at all. Seriously is there a way to absolutely force GCC to disable the frame pointer?!
There is a version of C99/posix memcpy function in GCC: __builtin_memcpy.
Sometimes it can be replaced by GCC to inline version of memcpy and in other cases it is replaced by call to libc's memcpy. E.g. it was noted here:
Finally, on a compiler note, __builtin_memcpy can fall back to emitting a memcpy function call.
What is the logic in this selection? Is it logic the same in other gcc-compatible compilers, like clang/llvm, intel c++ compiler, PCC, suncc (oracle studio)?
When I should prefer of using __builtin_memcpy over plain memcpy?
I had been experimenting with the builtin replacement some time ago and I found out that the <string.h> functions are only replaced when the size of the source argument can be known at compile time. In which case the call to libc is replaced directly by unrolled code.
Unless you compile with -fno-builtin, -ansi, -std=c89 or something similar, it actually doesn't matter wether you use the __builtin_ prefix or not.
Although it's hard to follow, the code that deciedes whether to emit a library call or a chunk of code seems to be here.
What calls best emulate pread/pwrite in MSVC 10?
At the C runtime library level, look at fread, fwrite and fseek.
At the Win32 API level, have a look at ReadFile, WriteFile, and SetFilePointer. MSDN has extensive coverage of file I/O API's.
Note that both ReadFile and WriteFile take an OVERLAPPED struct argument, which lets you specify a file offset. The offset is respected for all files that support byte offsets, even when opened for synchronous (i.e. non 'overlapped') I/O.
Depending on the problem you are trying to solve, file mapping may be a better design choice.
It looks like you just use the lpOverlapped parameter to ReadFile/WriteFile to pass a pointer to an OVERLAPPED structure with the offset specified in Offset and OffsetHigh.
(Note: You don't actually get overlapping IO unless the handle was opened with FILE_FLAG_OVERLAPPED.)
The answer provided by Oren seems correct but doesn't seem to meet the needs. Actually, I too was here for searching the answer but couldn't find it. So, I will update a bit here.
As said,
At the C runtime library level, there are fread, fwrite and fseek.
At the Win32 API level, we can have two level of abstractions. One at the lower level which works with file descriptors and other at higher level which works with Windows' defined data structures such as File and Handle.
If you wish to work with Files and Handles, you have ReadFile, WriteFile, and SetFilePointer. But most the time, C++ developers prefer working with File Descriptors. For that, you have _read, _write and _lseek.