arm-gcc compiled code is bigger than on armclang - gcc

I've generated the same project on STM32CubeMx and added the same code
uint8_t* data = new uint8_t[16]
HAL_Delay and HAL_GPIO_TogglePin in infinite loop
for the Keil MDK project and as Makefile project. In both main.c is changed to main.cpp
The whole user code looks like:
uint8_t* data;
int main(void)
{
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_USART3_UART_Init();
MX_USB_OTG_FS_PCD_Init();
data = new uint8_t[16];
for(int i = 0; i < 16; i++)data[i] = i+1;
while (1)
{
HAL_GPIO_TogglePin(LD3_GPIO_Port, GPIO_PIN_14);
HAL_Delay(500);
}
}
In Keil I use armclang v6.19 and in Makefile project I use arm-none-eabi-gcc v12.2.1 20221205
I've checked compilation results with different optimization flags and here are results:
O0
Keil:
Program Size: Code=11950 RO-data=510 RW-data=12 ZI-data=3012
arm-gcc:
text data bss dec hex
17572 100 3268 20940 51cc
O3
Keil:
Program Size: Code=8238 RO-data=510 RW-data=12 ZI-data=3012
arm-gcc:
text data bss dec hex
12448 100 3268 15816 3dc8
Oz
Keil:
Program Size: Code=6822 RO-data=510 RW-data=12 ZI-data=3012
arm-gcc:
text data bss dec hex
11876 100 3268 15244 3b8c
What is the reason of such a difference? How can I fix that?
I guess, there are differences in -O* flag meanings in theese compilers and they use different optimizer options, but I am not sure

Related

Compiling a flat library?

I'd like to compile a set of functions into a flat library for lack of a better term. There are a bunch of functions like
// add.c
int add (int a, int b) {
return a + b;
}
// multiply.c
int multiply (int a, int b) {
int result = 0;
if (a >= 0)
for (; a > 0; --a) result = add(result, b);
else
for (; a < 0; ++a) result = add(result, b);
return result;
}
// double.c
int two = 2;
int double_ (int x) {
return multiply(x, two);
}
and the compiled binary shall have
no main or __start entry points (it's a library, not an executable),
only instructions and data, no headers,
position-independent code,
no external dependencies (I'm not using any external libraries, but GCC appears to always include standard library stuff, which I don't need), and
little to no padding (i.e. no excessive amounts of null bytes for page/sector alignment)
And to be able to call the functions from outside the binary I either need to know their offsets from the beginning of the binary, or have a jump table at the beginning of the binary.
Using GCC points 3 and 4 can probably be achieved with -fPIC and -nostdlib. And if the functions were independent of each other I could achieve 5. by simply compiling the files separately and concatenating them manually which would also give me the function offsets, but here the functions are not independent of each other, so I rely on GCC to stich together the functions with minimal padding. For point 2 there is probably some objcopy --oformat binary trick or something similar. But I have no clue how to get point 1 to work. So far every single guide I've found online is for compiling custom/"hello world" kernels all of which are executables and have entry points. And if I don't provide an entry point ld complains that the symbol __start cannot be found. Furthermore, I don't know how to get the function offsets of the compiled binary or how to tell GCC to include a jump table (whichever of the two is possible).
Any ideas on how to compile the example above so that the compiled binary satisfies points 1 through 5 and is callable from outside the binary (either by offsets or via a jump table at the beginning)?
After realizing that my requirements kinda look like compiling stuff for embedded devices with little storage, I looked up how firmware is compiled for embedded devices and found out that the linker can be finely tuned with linker scripts. I ended up writing my own linker script that looks like this:
// File: link.ld
OUTPUT(test.bin);
OUTPUT_FORMAT(binary);
SECTIONS {
.text 0 : {
add.o(.text);
multiply.o(.text);
double.o(.text);
}
/DISCARD/ : {
*(*)
}
}
Now, I can compile my source code with
gcc -c -fPIC -nostartfiles -nostdlib add.c multiply.c double.c
ld -M -T link.ld
where the first line compiles (-c) the source code into position-independent (-fPIC) object files (*.c -> *.o) without standard library (-nostartfiles -nostdlib), and the second line basically takes the .text sections of the object files and concatenates them, and prints out (-M) the section layout of the output file including the offsets of all symbols.

How to generate unaligned access condition and how to detect it using gcc?

I'm currently working on heap memory allocation schemes. I want to detect unaligned access's without actual hardware ( with using gcc ).
First, I use arm-none-eabi-gcc as a compiler and my working station is ubuntu 16.04. I tried to detect unaligned access with using this compiler.
I expect that following code is valid in terms of alignment.
int main ( int argc, char ** argv )
{
int32_t volatile * const ptr = ( int32_t volatile * ) 0x20000000;
*ptr = 3;
*(ptr + 1 ) = 4;
*(ptr + 2 ) = 5;
*(ptr + 3 ) = 6;
return 0;
}
But I expect that following code is not valid in terms of alignment.
int main ( int argc, char** argv )
{
int32_t volatile * const ptr = ( int32_t volatile * ) 0x20000003;
*ptr = 3;
*(ptr + 1 ) = 4;
*(ptr + 2 ) = 5;
*(ptr + 3 ) = 6;
return 0;
}
I convert these C codes to assembly using following lines respectively:
arm-none-eabi-gcc -O0 -o main_0.s -S ../main.c,
arm-none-eabi-gcc -O0 -o main_3.s -S ../main.c
When I compare the assembly codes, there is no difference between these assembly codes. Also GCC don't show any warning.
So, Why these assembly codes is same ? How can I generate unaligned access condition ? Is there any way to detect unaligned access condition with using GCC ?
Thanks
Since in ARM, the data address is always a register value (unless you're performing a PC relative load, and in the case of the PC the lower bits are special), there is nothing in the assembly to distinguish between one word load and another.
The 'problem' with unaligned accesses stems from the implementations of the architecture, and the interactions of the processor with it's memory interfaces. Even when an unaligned access is supported, it will be slower, so the processor can be configured to trap them at run time.
The code you provide will perform unaligned accesses, but this will run fine on many cores. This is assuming that the compiler isn't detecting undefined behaviour (which clearly it does not). I'm assuming that the pointer is actually initialised differently when you compile your example.
You can enable or disable the use of unaligned access features within libraries using the -munaligned-access switch see the Keil description here, but all that is doing is defining a macro, not enabling any run-time or compile time checks.
In general, alignment issues will show up at runtime just as much as at compile time - and in order to detect them at runtime you will need an instruction set simulator (with traps that can be enabled).

How do I ask the assembler to "give me a full size register"?

I'm trying to allow the assembler to give me a register it chooses, and then use that register with inline assembly. I'm working with the program below, and its seg faulting. The program was compiled with g++ -O1 -g2 -m64 wipe.cpp -o wipe.exe.
When I look at the crash under lldb, I believe I'm getting a 32-bit register rather than a 64-bit register. I'm trying to compute an address (base + offset) using lea, and store the result in a register the assembler chooses:
"lea (%0, %1), %2\n"
Above, I'm trying to say "use a register, and I'll refer to it as %2".
When I perform a disassembly, I see:
0x100000b29: leal (%rbx,%rsi), %edi
-> 0x100000b2c: movb $0x0, (%edi)
So it appears the code being generated calculates and address using 64-bit values (rbx and rsi), but saves it to a 32-bit register (edi) (that the assembler chose).
Here are the values at the time of the crash:
(lldb) type format add --format hex register
(lldb) p $edi
(unsigned int) $3 = 1063330
(lldb) p $rbx
(unsigned long) $4 = 4296030616
(lldb) p $rsi
(unsigned long) $5 = 10
A quick note on the Input Operands below. If I drop the "r" (2), then I get a compiler error when I refer to %2 in the call to lea: invalid operand number in inline asm string.
How do I tell the assembler to "give me a full size register" and then refer to it in my program?
int main(int argc, char* argv[])
{
string s("Hello world");
cout << s << endl;
char* ptr = &s[0];
size_t size = s.length();
if(ptr && size)
{
__asm__ __volatile__
(
"%=:\n" /* generate a unique label for TOP */
"subq $1, %1\n" /* 0-based index */
"lea (%0, %1), %2\n" /* calcualte ptr[idx] */
"movb $0, (%2)\n" /* 0 -> ptr[size - 1] .. ptr[0] */
"jnz %=b\n" /* Back to TOP if non-zero */
: /* no output */
: "r" (ptr), "r" (size), "r" (2)
: "0", "1", "2", "cc"
);
}
return 0;
}
Sorry about these inline assembly questions. I hope this is the last one. I'm not really thrilled with using inline assembly in GCC because of pain points like this (and my fading memory). But its the only legal way I know to do what I want to do given GCC's interpretation of the qualifier volatile in C.
If interested, GCC interprets C's volatile qualifier as hardware backed memory, and anything else is an abuse and it results in an illegal program. So the following is not legal for GCC:
volatile void* g_tame_the_optimizer = NULL;
...
unsigned char* ptr = ...
size_t size = ...;
for(size_t i = 0; i < size; i++)
ptr[i] = 0x00;
g_tame_the_optimizer = ptr;
Interestingly, Microsoft uses a more customary interpretation of volatile (what most programmers expect - namely, anything can change the memory, and not just memory mapped hardware), and the code above is acceptable.
gcc inline asm is a complicated beast. "r" (2) means allocate an int sized register and load it with the value 2. If you just need an arbitrary scratch register you can declare a 64 bit early-clobber dummy output, such as "=&r" (dummy) in the output section, with void *dummy declared earlier. You can consult the gcc manual for more details.
As to the final code snippet looks like you want a memory barrier, just as the linked email says. See the manual for example.

Determine program segments (HEADER, TEXT, CONST, etc...) at run time

So i realize I can open a binary up in IDA Pro and determine where the segments start/stop. Is it possible to determine this at run-time in Cocoa?
I'm assuming there are some c-level library functions that enable this, I poked around in the mach headers but couldn't find much :/
Thanks in advance!
Cocoa doesn’t include classes for handling Mach-O files. You need to use the Mach-O functions provided by the system. You were right in read the Mach-O headers.
I’ve coded a small program that accepts as input a Mach-O file name and dumps information about its segments. Note that this program deals with thin files (i.e., not fat/universal) for the x86_64 architecture only.
Note that I’m also not checking every operation and whether the file is a correctly formed Mach-O file. Doing the appropriate checks are left as an exercise to the reader.
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <mach-o/loader.h>
#include <sys/mman.h>
#include <sys/stat.h>
int main(int argc, char *argv[]) {
int fd;
struct stat stat_buf;
size_t size;
char *addr = NULL;
struct mach_header_64 *mh;
struct load_command *lc;
struct segment_command_64 *sc;
// Open the file and get its size
fd = open(argv[1], O_RDONLY);
fstat(fd, &stat_buf);
size = stat_buf.st_size;
// Map the file to memory
addr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_FILE | MAP_PRIVATE, fd, 0);
// The first bytes of a Mach-O file comprise its header
mh = (struct mach_header_64 *)addr;
// Load commands follow the header
addr += sizeof(struct mach_header_64);
printf("There are %d load commands\n", mh->ncmds);
for (int i = 0; i < mh->ncmds; i++) {
lc = (struct load_command *)addr;
if (lc->cmdsize == 0) continue;
// If the load command is a (64-bit) segment,
// print information about the segment
if (lc->cmd == LC_SEGMENT_64) {
sc = (struct segment_command_64 *)addr;
printf("Segment %s\n\t"
"vmaddr 0x%llx\n\t"
"vmsize 0x%llx\n\t"
"fileoff %llu\n\t"
"filesize %llu\n",
sc->segname,
sc->vmaddr,
sc->vmsize,
sc->fileoff,
sc->filesize);
}
// Advance to the next load command
addr += lc->cmdsize;
}
printf("\nDone.\n");
munmap(addr, size);
close(fd);
return 0;
}
You need to compile this program for x86_64 bit only and run it against a x86_64 Mach-O binary. For instance, assuming you’ve saved this program as test.c:
$ clang test.c -arch x86_64 -o test
$ ./test ./test
There are 11 load commands
Segment __PAGEZERO
vmaddr 0x0
vmsize 0x100000000
fileoff 0
filesize 0
Segment __TEXT
vmaddr 0x100000000
vmsize 0x1000
fileoff 0
filesize 4096
Segment __DATA
vmaddr 0x100001000
vmsize 0x1000
fileoff 4096
filesize 4096
Segment __LINKEDIT
vmaddr 0x100002000
vmsize 0x1000
fileoff 8192
filesize 624
Done.
If you want more examples on how to read Mach-O files, cctools on Apple’s Open Source Web site is probably your best bet. You’ll also want to read the Mac OS X ABI Mach-O File Format Reference as well.

How to find the address & length of a C++ function at runtime (MinGW)

As this is my first post to stackoverflow I want to thank you all for your valuable posts that helped me a lot in the past.
I use MinGW (gcc 4.4.0) on Windows-7(64) - more specifically I use Nokia Qt + MinGW but Qt is not involved in my Question.
I need to find the address and -more important- the length of specific functions of my application at runtime, in order to encode/decode these functions and implement a software protection system.
I already found a solution on how to compute the length of a function, by assuming that static functions placed one after each other in a source-file, it is logical to be also sequentially placed in the compiled object file and subsequently in memory.
Unfortunately this is true only if the whole CPP file is compiled with option: "g++ -O0" (optimization level = 0).
If I compile it with "g++ -O2" (which is the default for my project) the compiler seems to relocate some of the functions and as a result the computed function length seems to be both incorrect and negative(!).
This is happening even if I put a "#pragma GCC optimize 0" line in the source file,
which is supposed to be the equivalent of a "g++ -O0" command line option.
I suppose that "g++ -O2" instructs the compiler to perform some global file-level optimization (some function relocation?) which is not avoided by using the #pragma directive.
Do you have any idea how to prevent this, without having to compile the whole file with -O0 option?
OR: Do you know of any other method to find the length of a function at runtime?
I prepare a small example for you, and the results with different compilation options, to highlight the case.
The Source:
// ===================================================================
// test.cpp
//
// Intention: To find the addr and length of a function at runtime
// Problem: The application output is correct when compiled with: "g++ -O0"
// but it's erroneous when compiled with "g++ -O2"
// (although a directive "#pragma GCC optimize 0" is present)
// ===================================================================
#include <stdio.h>
#include <math.h>
#pragma GCC optimize 0
static int test_01(int p1)
{
putchar('a');
putchar('\n');
return 1;
}
static int test_02(int p1)
{
putchar('b');
putchar('b');
putchar('\n');
return 2;
}
static int test_03(int p1)
{
putchar('c');
putchar('\n');
return 3;
}
static int test_04(int p1)
{
putchar('d');
putchar('\n');
return 4;
}
// Print a HexDump of a specific address and length
void HexDump(void *startAddr, long len)
{
unsigned char *buf = (unsigned char *)startAddr;
printf("addr:%ld, len:%ld\n", (long )startAddr, len);
len = (long )fabs(len);
while (len)
{
printf("%02x.", *buf);
buf++;
len--;
}
printf("\n");
}
int main(int argc, char *argv[])
{
printf("======================\n");
long fun_len = (long )test_02 - (long )test_01;
HexDump((void *)test_01, fun_len);
printf("======================\n");
fun_len = (long )test_03 - (long )test_02;
HexDump((void *)test_02, fun_len);
printf("======================\n");
fun_len = (long )test_04 - (long )test_03;
HexDump((void *)test_03, fun_len);
printf("Test End\n");
getchar();
// Just a trick to block optimizer from eliminating test_xx() functions as unused
if (argc > 1)
{
test_01(1);
test_02(2);
test_03(3);
test_04(4);
}
}
The (correct) Output when compiled with "g++ -O0":
[note the 'c3' byte (= assembly 'ret') at the end of all functions]
======================
addr:4199344, len:37
55.89.e5.83.ec.18.c7.04.24.61.00.00.00.e8.4e.62.00.00.c7.04.24.0a.00.00.00.e8.42
.62.00.00.b8.01.00.00.00.c9.c3.
======================
addr:4199381, len:49
55.89.e5.83.ec.18.c7.04.24.62.00.00.00.e8.29.62.00.00.c7.04.24.62.00.00.00.e8.1d
.62.00.00.c7.04.24.0a.00.00.00.e8.11.62.00.00.b8.02.00.00.00.c9.c3.
======================
addr:4199430, len:37
55.89.e5.83.ec.18.c7.04.24.63.00.00.00.e8.f8.61.00.00.c7.04.24.0a.00.00.00.e8.ec
.61.00.00.b8.03.00.00.00.c9.c3.
Test End
The erroneous Output when compiled with "g++ -O2":
(a) function test_01 addr & len seem correct
(b) functions test_02, test_03 have negative lengths,
and fun. test_02 length is also incorrect.
======================
addr:4199416, len:36
83.ec.1c.c7.04.24.61.00.00.00.e8.c5.61.00.00.c7.04.24.0a.00.00.00.e8.b9.61.00.00
.b8.01.00.00.00.83.c4.1c.c3.
======================
addr:4199452, len:-72
83.ec.1c.c7.04.24.62.00.00.00.e8.a1.61.00.00.c7.04.24.62.00.00.00.e8.95.61.00.00
.c7.04.24.0a.00.00.00.e8.89.61.00.00.b8.02.00.00.00.83.c4.1c.c3.57.56.53.83.ec.2
0.8b.5c.24.34.8b.7c.24.30.89.5c.24.08.89.7c.24.04.c7.04.
======================
addr:4199380, len:-36
83.ec.1c.c7.04.24.63.00.00.00.e8.e9.61.00.00.c7.04.24.0a.00.00.00.e8.dd.61.00.00
.b8.03.00.00.00.83.c4.1c.c3.
Test End
This is happening even if I put a "#pragma GCC optimize 0" line in the source file, which is supposed to be the equivalent of a "g++ -O0" command line option.
I don't believe this is true: it is supposed to be the equivalent of attaching __attribute__((optimize(0))) to subsequently defined functions, which causes those functions to be compiled with a different optimisation level. But this does not affect what goes on at the top level, whereas the command line option does.
If you really must do horrible things that rely on top level ordering, try the -fno-toplevel-reorder option. And I suspect that it would be a good idea to add __attribute__((noinline)) to the functions in question as well.

Resources