How to reference segment beginning and size from C code - gcc

I am porting a program for an ARM chip from a IAR compiler to gcc.
In the original code, IAR specific operators such as __segment_begin and __segment_size are used to obtain the beginning and size respectively of certain memory segments.
Is there any way to do the same thing with GCC? I've searched the GCC manual but was unable to find anything relevant.
More details:
The memory segments in question have to be in fixed locations so that the program can interface correctly with certain peripherals on the chip. The original code uses the __segment_begin operator to get the address of this memory and the __segment_size to ensure that it doesn't overflow this memory.
I can achieve the same functionality by adding variables to indicate the start and end of these memory segments but if GCC had similar operators that would help minimise the amount of compiler dependent code I end up having to write and maintain.

What about the linker's flag --section-start? Which I read is supported here.
An example on how to use it can be found on the AVR Freaks Forum:
const char __attribute__((section (".honk"))) ProjString[16] = "MY PROJECT V1.1";
You will then have to add to the linker's options: -Wl,--section-start=.honk=address.

Modern versions of GCC will declare two variables for each segment, namely __start_MY_SEGMENT and __stop_MY_SEGMENT. To use these variables, you need to declare them as externs with the desired type. Following that, you and then use the '&' operator to get the address of the start and end of that segment.
extern uint8_t __start_MY_SEGMENT;
extern uint8_t __stop_MY_SEGMENT;
#define MY_SEGMENT_LEN (&__stop_MY_SEGMENT - &__start_MY_SEGMENT)

Related

How can I enable UnAligned Access for ARM NEON in LLVM compiler?

What is the flag to enable unaligned memory access for ARM NEON in LLVM compiler.
I was testing my ARM NEON intrinsic program in Xcode. I am accessing data from unaligned memory:
char TempMemory[32] = {0};
char * pTempMem = TempMemory;
pTempMem += 7;
int32x2_t i32x2_value = vld1_lane_s32((int32_t const *) pTempMem, i32x2_offset, 0);
Equivalent assembly for the intrinsic should be VLD1.32 {d0[0]}, [pTempMem], but the compiler align it to next multiple of 32 and access data. Because of that, my program is not working fine.
So, How can I enable unaligned access in LLVM compiler?
This isn't actually a NEON problem, it's a C problem, and the issue is:
vld1_lane_s32((int32_t const *) pTempMem , i32x2_offset, 0);
Casting a pointer is a message to the compiler saying "hey, I know this looks bad, but trust me, I really know what I'm doing". Converting a pointer to type A to a pointer to type B, if the pointer does not have suitable alignment for type B, gives undefined behaviour. Therefore the compiler is free to assume that the argument to vld_1_lane_s32 is always 32-bit aligned because there's no valid way it couldn't be (and you've promised you know what you're doing), so it emits the instruction with the alignment hint.
Now, you could fiddle around with options in an attempt to get a different kind of undefined behaviour that matches what you want, but that's just bodging around the problem rather than fixing it. That the underlying NEON instruction set can support unaligned accesses doesn't affect the C language's definition of and restrictions around data alignment.
I'm not familiar with how clever LLVM is, so I'm not sure if simply omitting the pointer cast would work (technically, C permits converting char * to any other type of data pointer, so it should be able to sort out the alignment itself). Otherwise, the solution is to use an appropriate vld*_u8 operation to load the data into the vector via the correct type, then cast that with vreinterpret_s32_u8 once it's in the register.

Optimizing used registers when using inline ARM assembly in GCC

I want to write some inline ARM assembly in my C code. For this code, I need to use a register or two more than just the ones declared as inputs and outputs to the function. I know how to use the clobber list to tell GCC that I will be using some extra registers to do my computation.
However, I am sure that GCC enjoys the freedom to shuffle around which registers are used for what when optimizing. That is, I get the feeling it is a bad idea to use a fixed register for my computations.
What is the best way to use some extra register that is neither input nor output of my inline assembly, without using a fixed register?
P.S. I was thinking that using a dummy output variable might do the trick, but I'm not sure what kind of weird other effects that will have...
Ok, I've found a source that backs up the idea of using dummy outputs instead of hard registers:
4.8 Temporary registers:
People also sometimes erroneously use clobbers for temporary registers. The right way is
to make up a dummy output, and use “=r” or “=&r” depending on the permitted overlap
with the inputs. GCC allocates a register for the dummy value. The difference is that
GCC can pick a convenient register, so it has more flexibility.
from page 20 of this pdf.
For anyone who is interested in more info on inline assembly with GCC this website turned out to be very instructive.

Protecting memory from changing

Is there a way to protect an area of the memory?
I have this struct:
#define BUFFER 4
struct
{
char s[BUFFER-1];
const char zc;
} str = {'\0'};
printf("'%s', zc=%d\n", str.s, str.zc);
It is supposed to operate strings of lenght BUFFER-1, and garantee that it ends in '\0'.
But compiler gives error only for:
str.zc='e'; /*error */
Not if:
str.s[3]='e'; /*no error */
If compiling with gcc and some flag might do, that is good as well.
Thanks,
Beco
To detect errors at runtime take a look at the -fstack-protector-all option in gcc. It may be of limited use when attempting to detect very small overflows like the one your described.
Unfortunately you aren't going to find a lot of info on detecting buffer overflow scenarios like the one you described at compile-time. From a C language perspective the syntax is totally correct, and the language gives you just enough rope to hang yourself with. If you really want to protect your buffers from yourself you can write a front-end to array accesses that validates the index before it allows access to the memory you want.

Can I assume sizeof(GUID)==16 at all times?

The definition of GUID in the windows header's is like this:
typedef struct _GUID {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
unsigned char Data4[ 8 ];
} GUID;
However, no packing is not defined. Since the alignment of structure members is dependent on the compiler implementation one could think this structure could be longer than 16 bytes in size.
If i can assume it is always 16 bytes - my code using GUIDs is more efficient and simple.
However, it would be completely unsafe - if a compiler adds some padding in between of the members for some reason.
My questions do potential reasons exist ? Or is the probability of the scenario that sizeof(GUID)!=16 actually really 0.
It's not official documentation, but perhaps this article can ease some of your fears. I think there was another one on a similar topic, but I cannot find it now.
What I want to say is that Windows structures do have a packing specifier, but it's a global setting which is somewhere inside the header files. It's a #pragma or something. And it is mandatory, because otherwise programs compiled by different compilers couldn't interact with each other - or even with Windows itself.
It's not zero, it depends on your system. If the alignment is word (4-bytes) based, you'll have padding between the shorts, and the size will be more than 16.
If you want to be sure that it's 16 - manually disable the padding, otherwise use sizeof, and don't assume the value.
If I feel I need to make an assumption like this, I'll put a 'compile time assertion' in the code. That way, the compiler will let me know if and when I'm wrong.
If you have or are willing to use Boost, there's a BOOST_STATIC_ASSERT macro that does this.
For my own purposes, I've cobbled together my own (that works in C or C++ with MSVC, GCC and an embedded compiler or two) that uses techniques similar to those described in this article:
http://www.pixelbeat.org/programming/gcc/static_assert.html
The real tricks to getting the compile time assertion to work cleanly is dealing with the fact that some compilers don't like declarations mixed with code (MSVC in C mode), and that the techniques often generate warnings that you'd rather not have clogging up an otherwise working build. Coming up with techniques that avoid the warnings is sometimes a challenge.
Yes, on any Windows compiler. Otherwise IsEqualGUID would not work: it compares only the first 16 bytes. Similarly, any other WinAPI function that takes a GUID* just checks the first 16 bytes.
Note that you must not assume generic C or C++ rules for windows.h. For instance, a byte is always 8 bits on Windows, even though ISO C allows 9 bits.
Anytime you write code dependent on the size of someone else's structure,
warning bells should go off.
Could you give an example of some of the simplified code you want to use?
Most people would just use sizeof(GUID) if the size of the structure was needed.
With that said -- I can't see the size of GUID ever changing.
#include <stdio.h>
#include <rpc.h>
int main () {
GUID myGUID;
printf("size of GUID is %d\n", sizeof(myGUID));
return 0;
}
Got 16. This is useful to know if you need to manually allocate on the heap.

How to align stack at 32 byte boundary in GCC?

I'm using MinGW64 build based on GCC 4.6.1 for Windows 64bit target. I'm playing around with the new Intel's AVX instructions. My command line arguments are -march=corei7-avx -mtune=corei7-avx -mavx.
But I started running into segmentation fault errors when allocating local variables on the stack. GCC uses the aligned moves VMOVAPS and VMOVAPD to move __m256 and __m256d around, and these instructions require 32-byte alignment. However, the stack for Windows 64bit has only 16 byte alignment.
How can I change the GCC's stack alignment to 32 bytes?
I have tried using -mstackrealign but to no avail, since that aligns only to 16 bytes. I couldn't make __attribute__((force_align_arg_pointer)) work either, it aligns to 16 bytes anyway. I haven't been able to find any other compiler options that would address this. Any help is greatly appreciated.
EDIT:
I tried using -mpreferred-stack-boundary=5, but GCC says that 5 is not supported for this target. I'm out of ideas.
I have been exploring the issue, filed a GCC bug report, and found out that this is a MinGW64 related problem. See GCC Bug#49001. Apparently, GCC doesn't support 32-byte stack alignment on Windows. This effectively prevents the use of 256-bit AVX instructions.
I investigated a couple ways how to deal with this issue. The simplest and bluntest solution is to replace of aligned memory accesses VMOVAPS/PD/DQA by unaligned alternatives VMOVUPS etc. So I learned Python last night (very nice tool, by the way) and pulled off the following script that does the job with an input assembler file produced by GCC:
import re
import fileinput
import sys
# fix aligned stack access
# replace aligned vmov* by unaligned vmov* with 32-byte aligned operands
# see Intel's AVX programming guide, page 39
vmova = re.compile(r"\s*?vmov(\w+).*?((\(%r.*?%ymm)|(%ymm.*?\(%r))")
aligndict = {"aps" : "ups", "apd" : "upd", "dqa" : "dqu"};
for line in fileinput.FileInput(sys.argv[1:],inplace=1):
m = vmova.match(line)
if m and m.group(1) in aligndict:
s = m.group(1)
print line.replace("vmov"+s, "vmov"+aligndict[s]),
else:
print line,
This approach is pretty safe and foolproof. Though I observed a performance penalty on rare occasions. When the stack is unaligned, the memory access crosses the cache line boundary. Fortunately, the code performs as fast as aligned accesses most of the time. My recommendation: inline functions in critical loops!
I also attempted to fix the stack allocation in every function prolog using another Python script, trying to align it always at the 32-byte boundary. This seems to work for some code, but not for other. I have to rely on the good will of GCC that it will allocate aligned local variables (with respect to the stack pointer), which it usually does. This is not always the case, especially when there is a serious register spilling due to the necessity to save all ymm register before a function call. (All ymm registers are callee-save). I can post the script if there's an interest.
The best solution would be to fix GCC MinGW64 build. Unfortunately, I have no knowledge of its internal workings, just started using it last week.
You can get the effect you want by
Declaring your variables not as variables, but as fields in a struct
Declaring an array that is larger than the structure by an appropriate amount of padding
Doing pointer/address arithmetic to find a 32 byte aligned address in side the array
Casting that address to a pointer to your struct
Finally using the data members of your struct
You can use the same technique when malloc() does not align stuff on the heap appropriately.
E.g.
void foo() {
struct I_wish_these_were_32B_aligned {
vec32B foo;
char bar[32];
}; // not - no variable definition, just the struct declaration.
unsigned char a[sizeof(I_wish_these_were_32B_aligned) + 32)];
unsigned char* a_aligned_to_32B = align_to_32B(a);
I_wish_these_were_32B_aligned* s = (I_wish_these_were_32B_aligned)a_aligned_to_32B;
s->foo = ...
}
where
unsigned char* align_to_32B(unsiged char* a) {
uint64_t u = (unit64_t)a;
mask_aligned32B = (1 << 5) - 1;
if (u & mask_aligned32B == 0) return (unsigned char*)u;
return (unsigned char*)((u|mask_aligned_32B) + 1);
}
I just ran in the same issue of having segmentation faults when using AVX inside my functions. And it was also due to the stack misalignment. Given the fact that this is a compiler issue (and the options that could help are not available in Windows), I worked around the stack usage by:
Using static variables (see this issue). Given the fact that they are not stored in the stack, you can force their alignment by using __attribute__((align(32))) in your declaration. For example: static __m256i r __attribute__((aligned(32))).
Inlining the functions/methods receiving/returning AVX data. You can force GCC to inline your function/method by adding inline and __attribute__((always_inline)) to your function prototype/declaration. Inlining your functions increase the size of your program, but they also prevent the function from using the stack (and hence, avoids the stack-alignment issue). Example: inline __m256i myAvxFunction(void) __attribute__((always_inline));.
Be aware that the usage of static variables is no thread-safe, as mentioned in the reference. If you are writing a multi-threaded application you may have to add some protection for your critical paths.

Resources