Why adding delay when gcc -o2 optimization is used?

Why adding delay when gcc -o2 optimization is used? - gcc

I read an example code of STM32 with LCD and found below code, and its purpose is to write the LCD controller register index as output data of LCD controller.
void LCD_WR_REG(uint16_t regval)
{
regval = regval; // Necessary delay when using -o2 optimization
LCD->LCD_REG = regval;
}
I searched for a while for -o2, but didn't get much useful info about the what the comment here means, or why a self assignment is necessary here.

The comment is simply wrong. This operation will be optimized out. I believe that this comment was written where the original author of the code is struggling to make it work and something else was in this line.
LCD_WR_REG:
ldr r3, .L3
strh r0, [r3] # movhi
bx lr
.L3:
.word 1207993344
It could have some effect if regval was declared as volatile
void LCD_WR_REG1(volatile uint16_t regval)
{
regval = regval; // Necessary delay when using -o2 optimization
LCD->LCD_REG = regval;
}
LCD_WR_REG1:
sub sp, sp, #8
strh r0, [sp, #6] # movhi
ldrh r3, [sp, #6]
strh r3, [sp, #6] # movhi
ldr r2, .L7
ldrh r3, [sp, #6]
strh r3, [r2] # movhi
add sp, sp, #8
bx lr
.L7:
.word 1207993344
https://godbolt.org/z/Th7naabf7

Related

ARM GCC hardfault when using -O2

When using ARM GCC g++ compiler with optimization level -O2 (and up) this code:
void foo(void)
{
DBB("#0x%08X: 0x%08X", 1, *((uint32_t *)1));
DBB("#0x%08X: 0x%08X", 0, *((uint32_t *)0));
}
Compiles to:
0800abb0 <_Z3foov>:
800abb0: b508 push {r3, lr}
800abb2: 2301 movs r3, #1
800abb4: 4619 mov r1, r3
800abb6: 681a ldr r2, [r3, #0]
800abb8: 4802 ldr r0, [pc, #8] ; (800abc4 <_Z3foov+0x14>)
800abba: f007 fa83 bl 80120c4 <debug_print_blocking>
800abbe: 2300 movs r3, #0
800abc0: 681b ldr r3, [r3, #0]
800abc2: deff udf #255 ; 0xff
800abc4: 08022704 stmdaeq r2, {r2, r8, r9, sl, sp}
And this gives me hardfault at undefined instruction #0x0800abc2.
Also, if there is more code after that, it is not compiled into final binary.
The question is why compiler generates it like that, why undefined istruction?
By the way, it works fine for stuff like this:
...
uint32_t num = 2;
num -= 2;
DBB("#0x%08X: 0x%08X", 0, *((uint32_t *)num));
...
Compiler version:
arm-none-eabi-g++.exe (GNU Tools for ARM Embedded Processors 6-2017-q2-update) 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437]

You can disable this (and verify this answer) by using -fno-delete-null-pointer-checks
The pointer you are passing has a value which matches the null pointer, and the compiler can see that from static analysis, so it faults (because that is the defined behaviour).
In your second example, the static analysis doesn't identify a NULL.

Why does _exit() jump to _etext?

I am running a project using the ARM Embedded Tollchain on a stm32 microcontroller which uses the newLib.
I called assert(false) to test the assert output and ended in a Hard Fault Exception. I debugged into the assembly of assert(...) and found out that a subsequent call to _exit(1) jumps to a Address which is called _etext. Taking a look to the manpage of _etext shows that _etext is the address of the end of the .text section.
I am really confused. Normally I had supposed that _exit() is calling __exit() (which is defined as global symbol by the newLib) which I had implemented in a file named syscalls.c.
Why does _exit() jump to _etext?
Here are some cope snippets for a better understanding:
The subsequent call to _exit() by assert() taken from newLib 2.5:
_VOID
_DEFUN_VOID (abort)
{
#ifdef ABORT_MESSAGE
write (2, "Abort called\n", sizeof ("Abort called\n")-1);
#endif
while (1)
{
raise (SIGABRT);
_exit (1);
}
}
The disassembly of abort and assert. Take a special look to address 0808a10a where the jump to 80a5198 (_etext) is performed:
abort:
0808a100: push {r3, lr}
0808a102: movs r0, #6
0808a104: bl 0x808bfdc <raise>
0808a108: movs r0, #1
0808a10a: bl 0x80a51d8
0808a10e: nop
__assert_func:
0808a110: push {lr}
0808a112: ldr r4, [pc, #40] ; (0x808a13c <__assert_func+44>)
0808a114: ldr r6, [r4, #0]
0808a116: mov r5, r0
0808a118: sub sp, #20
0808a11a: mov r4, r3
0808a11c: ldr r0, [r6, #12]
0808a11e: cbz r2, 0x808a136 <__assert_func+38>
0808a120: ldr r3, [pc, #28] ; (0x808a140 <__assert_func+48>)
0808a122: str r2, [sp, #8]
0808a124: stmia.w sp, {r1, r3}
0808a128: mov r2, r4
0808a12a: mov r3, r5
0808a12c: ldr r1, [pc, #20] ; (0x808a144 <__assert_func+52>)
0808a12e: bl 0x808a5f4 <fiprintf>
0808a132: bl 0x808a100 <abort>
0808a136: ldr r3, [pc, #16] ; (0x808a148 <__assert_func+56>)
0808a138: mov r2, r3
0808a13a: b.n 0x808a122 <__assert_func+18>
0808a13c: str r0, [r3, #120] ; 0x78
0808a13e: movs r0, #0
0808a140: add r12, r11
0808a142: lsrs r2, r1, #32
0808a144: add r12, sp
0808a146: lsrs r2, r1, #32
0808a148: add r8, sp
0808a14a: lsrs r2, r1, #32
The lss-file which shows that 80a5198 is the address of _etext:
0808a0c0 <abort>:
808a0c0: b508 push {r3, lr}
808a0c2: 2006 movs r0, #6
808a0c4: f001 ff6a bl 808bf9c <raise>
808a0c8: 2001 movs r0, #1
808a0ca: f01b f865 bl 80a5198 <_etext>
808a0ce: bf00 nop

Hardfault on STM32F030 startup, __libc_init_array

I'm trying to get a STM32Cube project compiled using arm-none-eabi-gcc and a Makefile.
I have specified:
CFLAGS = -mthumb\
-march=armv6-m\
-mlittle-endian\
-mcpu=cortex-m0\
-ffunction-sections\
-fdata-sections\
-MMD\
-std=c99\
-Wall\
-g\
-D$(PART)\
-c
and:
LDFLAGS = -Wl,--gc-sections\
-Wl,-T$(LDFILE)\
-Wl,-v
The FW builds without problems.but when I boot the MCU i get stuck in Hard Fault.
Stack trace is:
#0 HardFault_Handler () at ./Src/main.c:156
#1 <signal handler called>
#2 0x0800221c in ____libc_init_array_from_thumb ()
#3 0x080021be in LoopFillZerobss () at Src/startup_stm32f030x8.s:103
#4 0x080021be in LoopFillZerobss () at Src/startup_stm32f030x8.s:103
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
and I go straight to Hard Fault when stepping to bl __libc_init_array in the startup file.
/* Zero fill the bss segment. */
FillZerobss:
movs r3, #0
str r3, [r2]
adds r2, r2, #4
LoopFillZerobss:
ldr r3, = _ebss
cmp r2, r3
bcc FillZerobss
/* Call the clock system intitialization function.*/
bl SystemInit
/* Call static constructors */
bl __libc_init_array
/* Call the application's entry point.*/
bl main
Any ideas what could be wrong?
My arm-none-eabi-gcc version is 4.8.4 20140725 (release)
[edit]
The disassembly of the calls
08002218 <____libc_init_array_from_thumb>:
8002218: 4778 bx pc
800221a: 46c0 nop ; (mov r8, r8)
800221c: eafff812 b 800026c <__libc_init_array>
0800026c <__libc_init_array>:
800026c: e92d4070 push {r4, r5, r6, lr}
8000270: e59f506c ldr r5, [pc, #108] ; 80002e4 <__libc_init_array+0x78>
8000274: e59f606c ldr r6, [pc, #108] ; 80002e8 <__libc_init_array+0x7c>
8000278: e0656006 rsb r6, r5, r6
800027c: e1b06146 asrs r6, r6, #2
8000280: 12455004 subne r5, r5, #4
8000284: 13a04000 movne r4, #0
8000288: 0a000005 beq 80002a4 <__libc_init_array+0x38>
800028c: e2844001 add r4, r4, #1
8000290: e5b53004 ldr r3, [r5, #4]!
8000294: e1a0e00f mov lr, pc
8000298: e12fff13 bx r3
800029c: e1560004 cmp r6, r4
80002a0: 1afffff9 bne 800028c <__libc_init_array+0x20>
80002a4: e59f5040 ldr r5, [pc, #64] ; 80002ec <__libc_init_array+0x80>
80002a8: e59f6040 ldr r6, [pc, #64] ; 80002f0 <__libc_init_array+0x84>
80002ac: e0656006 rsb r6, r5, r6
80002b0: eb0007ca bl 80021e0 <_init>
80002b4: e1b06146 asrs r6, r6, #2
80002b8: 12455004 subne r5, r5, #4
80002bc: 13a04000 movne r4, #0
80002c0: 0a000005 beq 80002dc <__libc_init_array+0x70>
80002c4: e2844001 add r4, r4, #1
80002c8: e5b53004 ldr r3, [r5, #4]!
80002cc: e1a0e00f mov lr, pc
80002d0: e12fff13 bx r3
80002d4: e1560004 cmp r6, r4
80002d8: 1afffff9 bne 80002c4 <__libc_init_array+0x58>
80002dc: e8bd4070 pop {r4, r5, r6, lr}
80002e0: e12fff1e bx lr
80002e4: 08002258 .word 0x08002258
80002e8: 08002258 .word 0x08002258
80002ec: 08002258 .word 0x08002258
80002f0: 08002260 .word 0x08002260
[edit 2]
The register values from gdb:
(gdb) info reg
r0 0x20000000 536870912
r1 0x1 1
r2 0x0 0
r3 0x40021000 1073876992
r4 0xffffffff -1
r5 0xffffffff -1
r6 0xffffffff -1
r7 0x20001fd0 536879056
r8 0xffffffff -1
r9 0xffffffff -1
r10 0xffffffff -1
r11 0xffffffff -1
r12 0xffffffff -1
sp 0x20001fd0 0x20001fd0
lr 0xfffffff9 -7
pc 0x800067c 0x800067c <HardFault_Handler+4>
xPSR 0x61000003 1627389955

That __libc_init_array is ARM code, not Thumb, hence the M0 will fall over trying to execute some nonsense it doesn't understand (actually, it never quite gets there since it faults on the attempt to switch to ARM state in the bx, but hey, same difference...)
You'll need to make sure you use pure-Thumb versions of any libraries - a Cortex-M-specific toolchain might be a better bet than a generic ARM one. If you have a multilib toolchain, I'd suggest checking the output of arm-none-eabi-gcc --print-multi-lib to make sure you've specified all the relevant options to get proper Cortex-M libraries, and if you're using a separate link step, make sure you invoke it with LD=arm-none-eabi-gcc (plus the relevant multilib options), rather than LD=arm-none-eabi-ld.

Indirect function call uses odd address

When the GCC 4.7.3 (20121207) for ARM Cortex-M3 takes the address of a function it doesn't get the exact address of the function. I can see an off-by-one in that pointer.
// assume at address 0x00001204;
int foo() {
return 42;
}
void bar() {
int(*p)() = &foo; // p = 0x1205;
p(); // executed successfully
foo(); // assembly: "bl 0x00001204;"
}
Although the pointer points to an odd address, the execution is successful. I would expect an exception at this point. Why does it takes that strange address and why doesn't it hurt.
Edit
The SO article describes a difference between thumb and ARM mode. Why is that offset not visible when the function is called directly although the CPU is in the same mode?
Should the odd address be kept or would resetting the bit 0 cause hard? (what I could not see until now)

I cobbled up something from one of my examples to quickly demonstrate what is going on.
vectors.s:
/* vectors.s */
.cpu cortex-m3
.thumb
.word 0x20002000 /* stack top address */
.word _start /* 1 Reset */
.word hang /* 2 NMI */
.word hello /* 3 HardFault */
.word hang /* 4 MemManage */
.word hang /* 5 BusFault */
.word hang /* 6 UsageFault */
.word hang /* 7 RESERVED */
.word hang /* 8 RESERVED */
.word hang /* 9 RESERVED*/
.word hang /* 10 RESERVED */
.word hang /* 11 SVCall */
.word hang /* 12 Debug Monitor */
.word hang /* 13 RESERVED */
.word hang /* 14 PendSV */
.word hang /* 15 SysTick */
.word hang /* 16 External Interrupt(0) */
.word hang /* 17 External Interrupt(1) */
.word hang /* 18 External Interrupt(2) */
.word hang /* 19 ... */
.thumb_func
.global _start
_start:
/*ldr r0,stacktop */
/*mov sp,r0*/
bl notmain
ldr r0,=notmain
mov lr,pc
bx r0
b hang
.thumb_func
hang: b .
hello: b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.end
blinker01.c:
extern void PUT32 ( unsigned int, unsigned int );
int notmain ( void )
{
PUT32(0x12345678,0xAABBCCDD);
return(0);
}
Makefile:
#ARMGNU = arm-none-eabi
ARMGNU = arm-none-linux-gnueabi
AOPS = --warn --fatal-warnings
COPS = -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding
all : blinker01.gcc.thumb.bin
vectors.o : vectors.s
$(ARMGNU)-as vectors.s -o vectors.o
blinker01.gcc.thumb.o : blinker01.c
$(ARMGNU)-gcc $(COPS) -mthumb -c blinker01.c -o blinker01.gcc.thumb.o
blinker01.gcc.thumb2.o : blinker01.c
$(ARMGNU)-gcc $(COPS) -mthumb -mcpu=cortex-m3 -march=armv7-m -c blinker01.c -o blinker01.gcc.thumb2.o
blinker01.gcc.thumb.bin : memmap vectors.o blinker01.gcc.thumb.o
$(ARMGNU)-ld -o blinker01.gcc.thumb.elf -T memmap vectors.o blinker01.gcc.thumb.o
$(ARMGNU)-objdump -D blinker01.gcc.thumb.elf > blinker01.gcc.thumb.list
$(ARMGNU)-objcopy blinker01.gcc.thumb.elf blinker01.gcc.thumb.bin -O binary
Disassembly:
Disassembly of section .text:
08000000 <_start-0x50>:
8000000: 20002000 andcs r2, r0, r0
8000004: 08000051 stmdaeq r0, {r0, r4, r6}
8000008: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800000c: 0800005e stmdaeq r0, {r1, r2, r3, r4, r6}
8000010: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000014: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000018: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800001c: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000020: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000024: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000028: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800002c: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000030: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000034: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000038: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800003c: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000040: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000044: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000048: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800004c: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
08000050 <_start>:
8000050: f000 f80a bl 8000068 <notmain>
8000054: 4803 ldr r0, [pc, #12] ; (8000064 <PUT32+0x4>)
8000056: 46fe mov lr, pc
8000058: 4700 bx r0
800005a: e7ff b.n 800005c <hang>
0800005c <hang>:
800005c: e7fe b.n 800005c <hang>
0800005e <hello>:
800005e: e7fe b.n 800005e <hello>
08000060 <PUT32>:
8000060: 6001 str r1, [r0, #0]
8000062: 4770 bx lr
8000064: 08000069 stmdaeq r0, {r0, r3, r5, r6}
08000068 <notmain>:
8000068: b508 push {r3, lr}
800006a: 4803 ldr r0, [pc, #12] ; (8000078 <notmain+0x10>)
800006c: 4903 ldr r1, [pc, #12] ; (800007c <notmain+0x14>)
800006e: f7ff fff7 bl 8000060 <PUT32>
8000072: 2000 movs r0, #0
8000074: bd08 pop {r3, pc}
8000076: 46c0 nop ; (mov r8, r8)
8000078: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
800007c: aabbccdd bge 6ef33f8 <_start-0x110cc58>
First off note hang vs hello, this is a gnuism you need to, in assembly, declare a label to be a thumb function in order for it to actually work for this kind of thing. hang is properly declared and the vector table properly uses the odd address, hello is not properly declared and the even address is put in there. C compiled code automatically does this properly.
Here is a prime example of what you are asking though, bl to the C function notmain does not, cannot, use an odd address. But to use bx you ask for the address to the function main and that address is provided to the code as 0x8000069 for for a function at address 0x8000068, if you did a bx to 0x800068 on an ARMvsometingT it would switch to arm mode and crash eventually if it hit thumb mode (hopefully crash and not stumble along) on a cortex-m a bx to an even address should fault immediately.
08000050 <_start>:
8000050: f000 f80a bl 8000068 <notmain>
8000054: 4803 ldr r0, [pc, #12] ; (8000064 <PUT32+0x4>)
8000056: 46fe mov lr, pc
8000058: 4700 bx r0
800005a: e7ff b.n 800005c <hang>
8000064: 08000069 stmdaeq r0, {r0, r3, r5, r6}
Why can't bl be odd? Look at the encoding above bl from 0x8000050 to 0x8000068, the pc is two ahead so 4 byte so take 0x8000068 - 0x8000054 = 0x14 divide that by 2 and you get 0x00A. That is the offset to the pc and that is what is encoded in the instructions (the 0A in the second half of the instruction). The divide by two is based on knowledge that thumb instructions are always 2 bytes (well at the time) and so they can reach twice as far if they put the offset in 2 byte instructions rather than in bytes. So the lsbit is lost of the delta between the two, so controlled by the hardware.
What your code did was in one place you asked for the address of a thumb function which gives the odd address, the other case was looking at the disassembly of a branch link which is always even.

Inline NOPs not optimized out in LLVM

I'm working through an example in this overview of compiling inline ARM assembly using GCC. Rather than GCC, I'm using llvm-gcc 4.2.1, and I'm compiling the following C code:
#include <stdio.h>
int main(void) {
printf("Volatile NOP\n");
asm volatile("mov r0, r0");
printf("Non-volatile NOP\n");
asm("mov r0, r0");
return 0;
}
Using the following commands:
llvm-gcc -emit-llvm -c -o compiled.bc input.c
llc -O3 -march=arm -o output.s compiled.bc
My output.s ARM ASM file looks like this:
.syntax unified
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.file "compiled.bc"
.text
.globl main
.align 2
.type main,%function
main: # #main
# BB#0: # %entry
str lr, [sp, #-4]!
sub sp, sp, #16
str r0, [sp, #12]
ldr r0, .LCPI0_0
str r1, [sp, #8]
bl puts
#APP
mov r0, r0
#NO_APP
ldr r0, .LCPI0_1
bl puts
#APP
mov r0, r0
#NO_APP
mov r0, #0
str r0, [sp, #4]
str r0, [sp]
ldr r0, [sp, #4]
add sp, sp, #16
ldr lr, [sp], #4
bx lr
# BB#1:
.align 2
.LCPI0_0:
.long .L.str
.align 2
.LCPI0_1:
.long .L.str1
.Ltmp0:
.size main, .Ltmp0-main
.type .L.str,%object # #.str
.section .rodata.str1.1,"aMS",%progbits,1
.L.str:
.asciz "Volatile NOP"
.size .L.str, 13
.type .L.str1,%object # #.str1
.section .rodata.str1.16,"aMS",%progbits,1
.align 4
.L.str1:
.asciz "Non-volatile NOP"
.size .L.str1, 17
The two NOPs are between their respective #APP/#NO_APP pairs. My expectation is that the asm() statement without the volatile keyword will be optimized out of existence due to the -O3 flag, but clearly both inline assembly statements survive.
Why does the asm("mov r0, r0") line not get recognized and removed as a NOP?

As Mystical and Mārtiņš Možeiko have describe the compiler does not optimize the code; ie, change the instructions. What the compiler does optimize is when the instruction is scheduled. When you use volatile, then the compiler will not re-schedule. In your example, re-scheduling would be moving before or after the printf.
The other optimization the compiler might make is to get C values to register for you. Register allocation is very important to optimization. This doesn't optimize the assembler, but allow the compiler to do sensible things with other code with-in the function.
To see the effect of volatile, here is some sample code,
int example(int test, int add)
{
int v1=5, v2=0;
int i=0;
if(test) {
asm volatile("add %0, %1, #7" : "=r" (v2) : "r" (v2));
i+= add * v1;
i+= v2;
} else {
asm ("add %0, %1, #7" : "=r" (v2) : "r" (v2));
i+= add * v1;
i+= v2;
}
return i;
}
The two branches have identical code except for the volatile. gcc 4.7.2 generates the following code for an ARM926,
example:
cmp r0, #0
bne 1f /* branch if test set? */
add r1, r1, r1, lsl #2
add r0, r0, #7 /* add seven delayed */
add r0, r0, r1
bx lr
1: mov r0, #0 /* test set */
add r0, r0, #7 /* add seven immediate */
add r1, r1, r1, lsl #2
add r0, r0, r1
bx lr
Note: The assembler branches are reversed to the 'C' code. The 2nd branch is slower on some processors due to pipe lining. The compiler prefers that
add r1, r1, r1, lsl #2
add r0, r0, r1
do not execute sequentially.
The Ethernut ARM Tutorial is an excellent resource. However, optimize is a bit of an overloaded word. The compiler doesn't analyze the assembler, only the arguments and where the code will be emitted.

volatile is implied if the asm statement has no outputs declared.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why adding delay when gcc -o2 optimization is used? - gcc

Related

ARM GCC hardfault when using -O2

Why does _exit() jump to _etext?

Hardfault on STM32F030 startup, __libc_init_array

Indirect function call uses odd address

Inline NOPs not optimized out in LLVM

Categories

Resources