libstdc++ invalid free a std::string - c++11

gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) run my code will coredump. valgrind message is :
==14892== Invalid free() / delete / delete[] / realloc()
==14892== at 0x4C2B343: operator delete(void*) (vg_replace_malloc.c:502)
==14892== by 0x53404DE: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
==14892== by 0x5AE2258: __run_exit_handlers (exit.c:82)
==14892== by 0x5AE22A4: exit (exit.c:104)
==14892== by 0x5AC7ECB: (below main) (libc-start.c:321)
==14892== Address 0x55892e8 is in the BSS segment of /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19
when i use gcc version 4.7.2 20121015 (Red Hat 4.7.2-5) (GCC), it won't coredump.
EDIT:
yesterday, i found out the bug of my codes. such as:
static array[1024][4];
array[1023][4] = '\0'; // this destroy other object
after i changed this code, it runs well.

On the balance of probabilities, no, it's far more likely to be a bug in your own code. There are countless millions of developers "testing" gcc every day whereas you code has probably been tested by, well, just you :-)
The most likely explanation is that you're invoking undefined behaviour that just happens to "work" on one of the platforms, though that in no way makes it a good idea.
If you were to provide the code, we could be a lot more definitive. Without that, I'm afraid, generalities are the best we can do.
One thing you may want to look at, though it may not be fruitful. If that address on the last line of your valgrind output is the address being freed, it's a potential worry that it's in the BSS, which is generally used for statically defined variables, not the memory arenas used for dynamic allocation.
Still, that's very dependent on the implementation, hence why I'm including it as an aside.
The core answer remains the likelihood of a problem (of some description) in your own code.
And, based on your later comment that you have code like:
static char array[1024][4];
array[1023][4] = 0;
that is indeed a bug.
That definition gives you an array of elements that you can validly access as:
array[0..1023][0..3]
Using an array index outside of that range is considered undefined behaviour, which may well corrupt some other piece of information, such as a pointer to heap memory that you will later try to free.

Related

How can I enable UnAligned Access for ARM NEON in LLVM compiler?

What is the flag to enable unaligned memory access for ARM NEON in LLVM compiler.
I was testing my ARM NEON intrinsic program in Xcode. I am accessing data from unaligned memory:
char TempMemory[32] = {0};
char * pTempMem = TempMemory;
pTempMem += 7;
int32x2_t i32x2_value = vld1_lane_s32((int32_t const *) pTempMem, i32x2_offset, 0);
Equivalent assembly for the intrinsic should be VLD1.32 {d0[0]}, [pTempMem], but the compiler align it to next multiple of 32 and access data. Because of that, my program is not working fine.
So, How can I enable unaligned access in LLVM compiler?
This isn't actually a NEON problem, it's a C problem, and the issue is:
vld1_lane_s32((int32_t const *) pTempMem , i32x2_offset, 0);
Casting a pointer is a message to the compiler saying "hey, I know this looks bad, but trust me, I really know what I'm doing". Converting a pointer to type A to a pointer to type B, if the pointer does not have suitable alignment for type B, gives undefined behaviour. Therefore the compiler is free to assume that the argument to vld_1_lane_s32 is always 32-bit aligned because there's no valid way it couldn't be (and you've promised you know what you're doing), so it emits the instruction with the alignment hint.
Now, you could fiddle around with options in an attempt to get a different kind of undefined behaviour that matches what you want, but that's just bodging around the problem rather than fixing it. That the underlying NEON instruction set can support unaligned accesses doesn't affect the C language's definition of and restrictions around data alignment.
I'm not familiar with how clever LLVM is, so I'm not sure if simply omitting the pointer cast would work (technically, C permits converting char * to any other type of data pointer, so it should be able to sort out the alignment itself). Otherwise, the solution is to use an appropriate vld*_u8 operation to load the data into the vector via the correct type, then cast that with vreinterpret_s32_u8 once it's in the register.

Casting a pointer to int and a warning?

I got this warning when building beta release of SageMath 6.1 on OSX 10.9.1, with 64 bit processor:
extra.cc:940:30: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
Can you give an example of the command which can cause this kind of warning?
The post here says that there is no way to do it:
void *ptr = ...;
int x = (int)ptr;
...
ptr = (void *)x;
which is invalid according to the source.
Gcc version
$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix
The link is correct. The issue is that neither an int nor a pointer has a standard size, but on a 32-bit system they are most likely to both be 32-bits, whereas on a 64-bit system, a pointer is 64-bits wide while an int is still likely 32-bits.
Can you give an example of the command which can cause this kind of warning?
On a 64-bit system using glibc, take an int and cast its value to a pointer:
int x;
void *p = (void*)x;
Of course, if the issue is that pointers are twice the size of ints, this all by itself isn't a problem, it just throws a warning. The reason this kind of thing should throw a warning is that it could, in a "not all by itself" context, indicate a serious mistake.
So, it's not a good programming practice but it does not have to indicate a real error -- e.g., p += (void*)x will do the same thing, and this is more obviously likely innocuous, although the cast is unnecessary (and causes the warning).
I notice all kinds of (hopefully) innocent warnings like this when compiling even widely used, well aged software. You just have to trust that the people who've put it together are aware of this and aren't phased because they are sure there isn't a real issue, just a spurious warning. They could almost certainly still eliminate them, but GCC is notoriously picky (suggest YET MORE parentheses...), and not everyone cares, particularly on portable projects that may be more often built elsewhere.

GCC: avoiding long time linking while using static arrays

My question is practically repeats this one, which asks why this issue occurs. I would like ot know if it is possible to avoid it.
The issue is: if I allocate a huge amount of memory statically:
unsigned char static_data[ 8 * BYTES_IN_GYGABYTE ];
then linker (ld) takes very long time to make an executable. There is a good explanation from #davidg about this behaviour in question I gave above:
This leaves us with the follow series of steps:
The assembler tells the linker that it needs to create a section of memory that is 1GB long.
The linker goes ahead and allocates this memory, in preparation for placing it in the final executable.
The linker realizes that this memory is in the .bss section and is marked NOBITS, meaning that the data is just 0, and doesn't need to be physically placed into the final executable. It avoids writing out the 1GB of data, instead just throwing the allocated memory away.
The linker writes out to the final ELF file just the compiled code, producing a small executable.
A smarter linker might be able to avoid steps 2 and 3 above, making your compile time much faster
Ok. #davidg had explained why does linker takes a lot of time, but I want to know how can I avoid it. Maybe GCC have some options, that will say to linker to be a little smarter and to avoid steps 2 and 3 above ?
Thank you.
P.S. I use GCC 4.5.2 at Ubuntu
You can allocate the static memory in the release version only:
#ifndef _DEBUG
unsigned char static_data[ 8 * BYTES_IN_GYGABYTE ];
#else
unsigned char *static_data;
#endif
I would have 2 ideas in mind that could help:
As already mentioned in some comment: place it in a separate compilation unit.That itself will not reduce linking time. But maybe together with incremental linking it helps (ld option -r).
Other is similar. Place it in a separate compilation unit, and generate a shared library from it. And just link later with the shared library.
Sadly I can not promise that one of it helps, as I have no way to test: my gcc(4.7.2) and bin tools dont show this time consuming behaviour, 8, 16 or 32 Gigabytes testprogram compile and link in under a second.

How to align stack at 32 byte boundary in GCC?

I'm using MinGW64 build based on GCC 4.6.1 for Windows 64bit target. I'm playing around with the new Intel's AVX instructions. My command line arguments are -march=corei7-avx -mtune=corei7-avx -mavx.
But I started running into segmentation fault errors when allocating local variables on the stack. GCC uses the aligned moves VMOVAPS and VMOVAPD to move __m256 and __m256d around, and these instructions require 32-byte alignment. However, the stack for Windows 64bit has only 16 byte alignment.
How can I change the GCC's stack alignment to 32 bytes?
I have tried using -mstackrealign but to no avail, since that aligns only to 16 bytes. I couldn't make __attribute__((force_align_arg_pointer)) work either, it aligns to 16 bytes anyway. I haven't been able to find any other compiler options that would address this. Any help is greatly appreciated.
EDIT:
I tried using -mpreferred-stack-boundary=5, but GCC says that 5 is not supported for this target. I'm out of ideas.
I have been exploring the issue, filed a GCC bug report, and found out that this is a MinGW64 related problem. See GCC Bug#49001. Apparently, GCC doesn't support 32-byte stack alignment on Windows. This effectively prevents the use of 256-bit AVX instructions.
I investigated a couple ways how to deal with this issue. The simplest and bluntest solution is to replace of aligned memory accesses VMOVAPS/PD/DQA by unaligned alternatives VMOVUPS etc. So I learned Python last night (very nice tool, by the way) and pulled off the following script that does the job with an input assembler file produced by GCC:
import re
import fileinput
import sys
# fix aligned stack access
# replace aligned vmov* by unaligned vmov* with 32-byte aligned operands
# see Intel's AVX programming guide, page 39
vmova = re.compile(r"\s*?vmov(\w+).*?((\(%r.*?%ymm)|(%ymm.*?\(%r))")
aligndict = {"aps" : "ups", "apd" : "upd", "dqa" : "dqu"};
for line in fileinput.FileInput(sys.argv[1:],inplace=1):
m = vmova.match(line)
if m and m.group(1) in aligndict:
s = m.group(1)
print line.replace("vmov"+s, "vmov"+aligndict[s]),
else:
print line,
This approach is pretty safe and foolproof. Though I observed a performance penalty on rare occasions. When the stack is unaligned, the memory access crosses the cache line boundary. Fortunately, the code performs as fast as aligned accesses most of the time. My recommendation: inline functions in critical loops!
I also attempted to fix the stack allocation in every function prolog using another Python script, trying to align it always at the 32-byte boundary. This seems to work for some code, but not for other. I have to rely on the good will of GCC that it will allocate aligned local variables (with respect to the stack pointer), which it usually does. This is not always the case, especially when there is a serious register spilling due to the necessity to save all ymm register before a function call. (All ymm registers are callee-save). I can post the script if there's an interest.
The best solution would be to fix GCC MinGW64 build. Unfortunately, I have no knowledge of its internal workings, just started using it last week.
You can get the effect you want by
Declaring your variables not as variables, but as fields in a struct
Declaring an array that is larger than the structure by an appropriate amount of padding
Doing pointer/address arithmetic to find a 32 byte aligned address in side the array
Casting that address to a pointer to your struct
Finally using the data members of your struct
You can use the same technique when malloc() does not align stuff on the heap appropriately.
E.g.
void foo() {
struct I_wish_these_were_32B_aligned {
vec32B foo;
char bar[32];
}; // not - no variable definition, just the struct declaration.
unsigned char a[sizeof(I_wish_these_were_32B_aligned) + 32)];
unsigned char* a_aligned_to_32B = align_to_32B(a);
I_wish_these_were_32B_aligned* s = (I_wish_these_were_32B_aligned)a_aligned_to_32B;
s->foo = ...
}
where
unsigned char* align_to_32B(unsiged char* a) {
uint64_t u = (unit64_t)a;
mask_aligned32B = (1 << 5) - 1;
if (u & mask_aligned32B == 0) return (unsigned char*)u;
return (unsigned char*)((u|mask_aligned_32B) + 1);
}
I just ran in the same issue of having segmentation faults when using AVX inside my functions. And it was also due to the stack misalignment. Given the fact that this is a compiler issue (and the options that could help are not available in Windows), I worked around the stack usage by:
Using static variables (see this issue). Given the fact that they are not stored in the stack, you can force their alignment by using __attribute__((align(32))) in your declaration. For example: static __m256i r __attribute__((aligned(32))).
Inlining the functions/methods receiving/returning AVX data. You can force GCC to inline your function/method by adding inline and __attribute__((always_inline)) to your function prototype/declaration. Inlining your functions increase the size of your program, but they also prevent the function from using the stack (and hence, avoids the stack-alignment issue). Example: inline __m256i myAvxFunction(void) __attribute__((always_inline));.
Be aware that the usage of static variables is no thread-safe, as mentioned in the reference. If you are writing a multi-threaded application you may have to add some protection for your critical paths.

how can i override malloc(), calloc(), free() etc under OS X?

Assuming the latest XCode and GCC, what is the proper way to override the memory allocation functions (I guess operator new/delete as well). The debugging memory allocators are too slow for a game, I just need some basic stats I can do myself with minimal impact.
I know its easy in Linux due to the hooks, and this was trivial under codewarrior ten years ago when I wrote HeapManager.
Sadly smartheap no longer has a mac version.
I would use library preloading for this task, because it does not require modification of the running program. If you're familiar with the usual Unix way to do this, it's almost a matter of replacing LD_PRELOAD with DYLD_INSERT_LIBRARIES.
First step is to create a library with code such as this, then build it using regular shared library linking options (gcc -dynamiclib):
void *malloc(size_t size)
{
void * (*real_malloc)(size_t);
real_malloc = dlsym(RTLD_NEXT, "malloc");
fprintf(stderr, "allocating %lu bytes\n", (unsigned long)size);
/* Do your stuff here */
return real_malloc(size);
}
Note that if you also divert calloc() and its implementation calls malloc(), you may need additional code to check how you're being called. C++ programs should be pretty safe because the new operator calls malloc() anyway, but be aware that no standard enforces that. I have never encountered an implementation that didn't use malloc(), though.
Finally, set up the running environment for your program and launch it (might require adjustments depending on how your shell handles environment variables):
export DYLD_INSERT_LIBRARIES=./yourlibrary.dylib
export DYLD_FORCE_FLAT_NAMESPACE=1
yourprogram --yourargs
See the dyld manual page for more information about the dynamic linker environment variables.
This method is pretty generic. There are limitations, however:
You won't be able to divert direct system calls
If the application itself tricks you by using dlsym() to load malloc's address, the call won't be diverted. Unless, however, you trick it back by also diverting dlsym!
The malloc_default_zone technique mentioned at http://lists.apple.com/archives/darwin-dev/2005/Apr/msg00050.html appears to still work, see e.g. http://code.google.com/p/fileview/source/browse/trunk/fileview/fv_zone.cpp?spec=svn354&r=354 for an example use that seems to be similar to what you intend.
After much searching (here included) and issues with 10.7 I decided to write a blog post about this topic: How to set malloc hooks in OSX Lion
You'll find a few good links at the end of the post with more information on this topic.
The basic solution:
malloc_zone_t *dz=malloc_default_zone();
if(dz->version>=8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ | VM_PROT_WRITE);//remove the write protection
}
original_free=dz->free;
dz->free=&my_free; //this line is throwing a bad ptr exception without calling vm_protect first
if(dz->version==8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ);//put the write protection back
}
This is an old question, but I came across it while trying to do this myself. I got curious about this topic for a personal project I was working on, mainly to make sure that what I thought was automatically deallocated was being properly deallocated. I ended up writing a C++ implementation to allow me to track the amount of allocated heap and report it out if I so chose.
https://gist.github.com/monitorjbl/3dc6d62cf5514892d5ab22a59ff34861
As the name notes, this is OSX-specific. However, I was able to do this on Linux environments using the malloc_usable_size
Example
#define MALLOC_DEBUG_OUTPUT
#include "malloc_override_osx.hpp"
int main(){
int* ip = (int*)malloc(sizeof(int));
double* dp = (double*)malloc(sizeof(double));
free(ip);
free(dp);
}
Building
$ clang++ -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk \
-pipe -stdlib=libc++ -std=gnu++11 -g -o test test.cpp
$ ./test
0x7fa28a403230 -> malloc(16) -> 16
0x7fa28a403240 -> malloc(16) -> 32
0x7fa28a403230 -> free(16) -> 16
0x7fa28a403240 -> free(16) -> 0
Hope this helps someone else out in the future!
If the basic stats you need can be collected in a simple wrapper, a quick (and kinda dirty) trick is just using some #define macro replacement.
void* _mymalloc(size_t size)
{
void* ptr = malloc(size);
/* do your stat work? */
return ptr;
}
and
#define malloc(sz_) _mymalloc(sz_)
Note: if the macro is defined before the _mymalloc definition it will end up replacing the malloc call inside that function leaving you with infinite recursion... so ensure this isn't the case. You might want to explicitly #undef it before that function definition and simply (re)define it afterward depending on where you end up including it to hopefully avoid this situation.
I think if you define a malloc() and free() in your own .c file included in the project the linker will resolve that version.
Now then, how do you intend to implement malloc?
Check out Emery Berger's -- the author of the Hoard memory allocator's -- approach for replacing the allocator on OSX at https://github.com/emeryberger/Heap-Layers/blob/master/wrappers/macwrapper.cpp (and a few other files you can trace yourself by following the includes).
This is complementary to Alex's answer, but I thought this example was more to-the-point of replacing the system provided allocator.

Resources