Related
I'm using a Cortex-M0 MCU from NXP (LPC845) and I'm trying to figure out what GCC is trying to do :)
Basically, the C code (pseudo) is as follows:
volatile uint8_t readb1 = 0x1a; // dummy
readb1 = GpioPadB(GPIO_PIN);
and the macro I wrote is
(*((volatile uint8_t*)(SOME_GPIO_ADDRESS)))
Now the code is working, but it produced some extra UXTB instruction I don't understand
00000378: ldrb r3, [r3, #0]
0000037a: ldr r2, [pc, #200] ; (0x444 <AppInit+272>)
0000037c: uxtb r3, r3
0000037e: strb r3, [r2, #0]
105 asm("nop");
My explanation is as follows:
load BYTE from address specified in R3, put result in R3 <-- this is load from GPIO register as BYTE
load in R2 address of readb1 variable
UXTB extends the uint8 value ??? But rotate argument is 0, so basically does nothing for uint8 !
store as BYTE to R2's address (my variable) data from R3
Why does that?
First of all, it should know that data in R3 has just a BYTE meaning (it already generates LDRB correctly). Second, the STRB will already trim 7..0 LSB so why using UXTB ?
Thanks for clarifications,
EDITED:
Compiler version:
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major)
I use -O3
Looks like an extra instruction left in by the compiler and/or there is some nuance to the cortex-m or newer cores (would love to know what that nuance is).
#define GpioPadB(x) (*((volatile unsigned char *)(x)))
volatile unsigned char readb1;
void fun ( void )
{
readb1 = 0x1A;
readb1 = GpioPadB(0x1234000);
}
an apt gotten gcc
arm-none-eabi-gcc --version
arm-none-eabi-gcc (15:4.9.3+svn231177-1) 4.9.3 20150529 (prerelease)
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
arm-none-eabi-gcc -O2 -c -mthumb so.c -o so.o
arm-none-eabi-objdump -d so.o
00000000 <fun>:
0: 231a movs r3, #26
2: 4a03 ldr r2, [pc, #12] ; (10 <fun+0x10>)
4: 7013 strb r3, [r2, #0]
6: 4b03 ldr r3, [pc, #12] ; (14 <fun+0x14>)
8: 781b ldrb r3, [r3, #0]
a: 7013 strb r3, [r2, #0]
c: 4770 bx lr
e: 46c0 nop ; (mov r8, r8)
10: 00000000 .word 0x00000000
14: 01234000 .word 0x01234000
as one would expect.
arm-none-eabi-gcc -O2 -c -mthumb -march=armv7-m so.c -o so.o
arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: 4a03 ldr r2, [pc, #12] ; (10 <fun+0x10>)
2: 211a movs r1, #26
4: 4b03 ldr r3, [pc, #12] ; (14 <fun+0x14>)
6: 7011 strb r1, [r2, #0]
8: 781b ldrb r3, [r3, #0]
a: b2db uxtb r3, r3
c: 7013 strb r3, [r2, #0]
e: 4770 bx lr
10: 00000000 .word 0x00000000
14: 01234000 .word 0x01234000
with the extra utxb instruction in there
Something a bit newer
arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
for armv6m and armv7m
00000000 <fun>:
0: 231a movs r3, #26
2: 4a03 ldr r2, [pc, #12] ; (10 <fun+0x10>)
4: 7013 strb r3, [r2, #0]
6: 4b03 ldr r3, [pc, #12] ; (14 <fun+0x14>)
8: 781b ldrb r3, [r3, #0]
a: 7013 strb r3, [r2, #0]
c: 4770 bx lr
e: 46c0 nop ; (mov r8, r8)
10: 00000000 .word 0x00000000
14: 01234000 .word 0x01234000
for armv4t
00000000 <fun>:
0: 231a movs r3, #26
2: 4a03 ldr r2, [pc, #12] ; (10 <fun+0x10>)
4: 7013 strb r3, [r2, #0]
6: 4b03 ldr r3, [pc, #12] ; (14 <fun+0x14>)
8: 781b ldrb r3, [r3, #0]
a: 7013 strb r3, [r2, #0]
c: 4770 bx lr
e: 46c0 nop ; (mov r8, r8)
10: 00000000 .word 0x00000000
14: 01234000 .word 0x01234000
and the utxb is gone.
I think it is just a missed optimization, peephole or otherwise.
As answered already though, when you use non-gpr-sized variables you can expect and/or tolerate the compiler converting up to the register size. Varies by compiler and target as to whether they do it on the way in or the way out (when a variable is read or just before it is written or used down the road).
For x86 where you can access various portions of the register separately (or use memory based operands) you will see they do not do this (in gcc) even for cases when it clearly needs a sign extension or padding. And sort it out down the road when the value is used.
You can search the gcc sources for utxb and perhaps see the issue or a comment.
EDIT
Note that clang takes a different path, it burns clocks generating the address but does not do the extension
00000000 <fun>:
0: f240 0000 movw r0, #0
4: f2c0 0000 movt r0, #0
8: 211a movs r1, #26
a: 7001 strb r1, [r0, #0]
c: f244 0100 movw r1, #16384 ; 0x4000
10: f2c0 1123 movt r1, #291 ; 0x123
14: 7809 ldrb r1, [r1, #0]
16: 7001 strb r1, [r0, #0]
18: 4770 bx lr
clang --version
clang version 11.1.0 (https://github.com/llvm/llvm-project.git 1fdec59bffc11ae37eb51a1b9869f0696bfd5312)
Target: armv7m-none-unknown-eabi
Thread model: posix
InstalledDir: /opt/llvm11armv7m/bin
I think it is simply an optimization problem with gcc/gnu.
The "volatile" modifier is to blame. It does not call type extensions when written, because it doesn't make sense. But when reading, it always calls the extension. Because now the data is stored in a register, and must be ready for any operations, over the entire range of the visibility limit.
Abandoning "volatile" removes any additional operations on the data, but it can also remove the very fact of using the variable.
https://godbolt.org/z/cGvc8r6se
First of all, it should know that data in R3 has just a BYTE meaning
Registers are only 32 bits. They do not have any other "meaning". The register must contain the same value as the loaded byte - thus UXTB. Any other operation later (for example adding something requires the whole register to contain the correct value.
Generally speaking, using shorter types than 32 bit usually adds some overhead as Cortex-Mx processors do not do operations on the "portions" of the registers.
To fix this problem, you need to file a bug at https://gcc.gnu.org/bugzilla/. But there are two difficult situations.
There are a lot of bugs related to "volatile", and all of them are not closed, and most of them are not even confirmed. As far as I understand, the developers are already tired of fighting windmills, and do not even react to it.
To successfully fix the problem - you need to find the extreme, the very one that wrote the root of evil. Authorship and all. You will not be allowed into someone else's branch, and only the most advanced are allowed into the master.
But even before this moment, you need to find the reason for this behavior, and here again there are problems.
The GCC code is huge, you can search endlessly.
My personal opinion: GCC treats ARM kernel registers as part of fast memory. This memory can be accessed via a physical address, which only adds to the problems. Well, if this is memory, and the dimension does not match, then, according to GCC, you need to add expansion commands.
Why does GCC use the correct commands when simply accessed? - well, he reads from memory to memory. Emphasis - "from memory". No matter what happens next, you need to read it right now.
I am trying to get the the address of a label in thumb assembly and I am having some trouble.
I already read this post but that cannot help me and I will explain why.
I am writing an simple program with Thumb assembly ( unfortunately I cannot use Thumb2 ).
Let's consider this code:
.arch armv5te
.syntax unified
.text
.thumb
.thumb_func
thumbnow:
0x0 PUSH {LR}
0x2 LDR R0, =loadValues
0x4 POP {PC}
.align
loadValues:
0x8 .word 0xdeadbee1
0xC .word 0xdeadbee2
0x10 .word 0xdeadbee3
I am using the arm-linux-gnueabi toolchain to assemble that.
My microcontroller doesn't have an MMU so the memory address are static, no virtual pages etc.
The thing that I am trying to do is to make R0 having the value of 0x8 here so that then I can access the three words like this:
LDR R1, [R0]
LDR R2, [R0,#4]
LDR R3, [R0,#8]
This is not possible with LDR though because the value in the word is not possible to fit in a MOV command. The documentation of the assembler states that if the value cannot fit in a MOV command then it will put the value in a literal pool.
So my question is, is it possible in Thumb assembly to get the actual address of the label if the content of the address cannot fit in a MOV command?
Starting with this
.thumb
ldr r0,=hello
adr r0,hello
nop
nop
nop
nop
hello:
.word 0,1,2,3
gives this unlinked
00000000 <hello-0xc>:
0: 4806 ldr r0, [pc, #24] ; (1c <hello+0x10>)
2: a002 add r0, pc, #8 ; (adr r0, c <hello>)
4: 46c0 nop ; (mov r8, r8)
6: 46c0 nop ; (mov r8, r8)
8: 46c0 nop ; (mov r8, r8)
a: 46c0 nop ; (mov r8, r8)
0000000c <hello>:
c: 00000000 andeq r0, r0, r0
10: 00000001 andeq r0, r0, r1
14: 00000002 andeq r0, r0, r2
18: 00000003 andeq r0, r0, r3
1c: 0000000c andeq r0, r0, r12
linked
00001000 <hello-0xc>:
1000: 4806 ldr r0, [pc, #24] ; (101c <hello+0x10>)
1002: a002 add r0, pc, #8 ; (adr r0, 100c <hello>)
1004: 46c0 nop ; (mov r8, r8)
1006: 46c0 nop ; (mov r8, r8)
1008: 46c0 nop ; (mov r8, r8)
100a: 46c0 nop ; (mov r8, r8)
0000100c <hello>:
100c: 00000000 andeq r0, r0, r0
1010: 00000001 andeq r0, r0, r1
1014: 00000002 andeq r0, r0, r2
1018: 00000003 andeq r0, r0, r3
101c: 0000100c andeq r1, r0, r12
both ways r0 will return the address to the start of data from which you can then offset into that data from the caller or wherever.
Edit
.thumb
adr r0,hello
nop
nop
nop
arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:2: Error: address calculation needs a strongly defined nearby symbol
So the tool won't turn that into a load from the pool for you.
For what you want to do I think the pc relative add (adr) is the best you are going to get. You can try other toolchains as all of this is language and toolchain specific (assembly language is defined by the assembler not the target and for each toolchain (with an assembler) there can be differences in the language). Over time within gnu, how the linker and assembler worked together has changed, the linker patches up things it didn't used to.
You could of course go into the linker and add code to it to perform this optimization, the problem is most likely that by link time the linker is looking to resolve an address in the pool which is easy for it to do it doesn't have to change the instruction, the assembler would have to leave information for the linker that this is not just a fill this memory location with an address thing, either you modify gas to allow adr to work, and then if the linker cant resolve it within the instruction then the linker bails out with an error.'
Or you could just hard-code what you want and maintain it. I am not sure why the adr solution isn't adequate.
mov r0,#8 is a valid thumb instruction.
I am running a project using the ARM Embedded Tollchain on a stm32 microcontroller which uses the newLib.
I called assert(false) to test the assert output and ended in a Hard Fault Exception. I debugged into the assembly of assert(...) and found out that a subsequent call to _exit(1) jumps to a Address which is called _etext. Taking a look to the manpage of _etext shows that _etext is the address of the end of the .text section.
I am really confused. Normally I had supposed that _exit() is calling __exit() (which is defined as global symbol by the newLib) which I had implemented in a file named syscalls.c.
Why does _exit() jump to _etext?
Here are some cope snippets for a better understanding:
The subsequent call to _exit() by assert() taken from newLib 2.5:
_VOID
_DEFUN_VOID (abort)
{
#ifdef ABORT_MESSAGE
write (2, "Abort called\n", sizeof ("Abort called\n")-1);
#endif
while (1)
{
raise (SIGABRT);
_exit (1);
}
}
The disassembly of abort and assert. Take a special look to address 0808a10a where the jump to 80a5198 (_etext) is performed:
abort:
0808a100: push {r3, lr}
0808a102: movs r0, #6
0808a104: bl 0x808bfdc <raise>
0808a108: movs r0, #1
0808a10a: bl 0x80a51d8
0808a10e: nop
__assert_func:
0808a110: push {lr}
0808a112: ldr r4, [pc, #40] ; (0x808a13c <__assert_func+44>)
0808a114: ldr r6, [r4, #0]
0808a116: mov r5, r0
0808a118: sub sp, #20
0808a11a: mov r4, r3
0808a11c: ldr r0, [r6, #12]
0808a11e: cbz r2, 0x808a136 <__assert_func+38>
0808a120: ldr r3, [pc, #28] ; (0x808a140 <__assert_func+48>)
0808a122: str r2, [sp, #8]
0808a124: stmia.w sp, {r1, r3}
0808a128: mov r2, r4
0808a12a: mov r3, r5
0808a12c: ldr r1, [pc, #20] ; (0x808a144 <__assert_func+52>)
0808a12e: bl 0x808a5f4 <fiprintf>
0808a132: bl 0x808a100 <abort>
0808a136: ldr r3, [pc, #16] ; (0x808a148 <__assert_func+56>)
0808a138: mov r2, r3
0808a13a: b.n 0x808a122 <__assert_func+18>
0808a13c: str r0, [r3, #120] ; 0x78
0808a13e: movs r0, #0
0808a140: add r12, r11
0808a142: lsrs r2, r1, #32
0808a144: add r12, sp
0808a146: lsrs r2, r1, #32
0808a148: add r8, sp
0808a14a: lsrs r2, r1, #32
The lss-file which shows that 80a5198 is the address of _etext:
0808a0c0 <abort>:
808a0c0: b508 push {r3, lr}
808a0c2: 2006 movs r0, #6
808a0c4: f001 ff6a bl 808bf9c <raise>
808a0c8: 2001 movs r0, #1
808a0ca: f01b f865 bl 80a5198 <_etext>
808a0ce: bf00 nop
I'm trying to get a STM32Cube project compiled using arm-none-eabi-gcc and a Makefile.
I have specified:
CFLAGS = -mthumb\
-march=armv6-m\
-mlittle-endian\
-mcpu=cortex-m0\
-ffunction-sections\
-fdata-sections\
-MMD\
-std=c99\
-Wall\
-g\
-D$(PART)\
-c
and:
LDFLAGS = -Wl,--gc-sections\
-Wl,-T$(LDFILE)\
-Wl,-v
The FW builds without problems.but when I boot the MCU i get stuck in Hard Fault.
Stack trace is:
#0 HardFault_Handler () at ./Src/main.c:156
#1 <signal handler called>
#2 0x0800221c in ____libc_init_array_from_thumb ()
#3 0x080021be in LoopFillZerobss () at Src/startup_stm32f030x8.s:103
#4 0x080021be in LoopFillZerobss () at Src/startup_stm32f030x8.s:103
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
and I go straight to Hard Fault when stepping to bl __libc_init_array in the startup file.
/* Zero fill the bss segment. */
FillZerobss:
movs r3, #0
str r3, [r2]
adds r2, r2, #4
LoopFillZerobss:
ldr r3, = _ebss
cmp r2, r3
bcc FillZerobss
/* Call the clock system intitialization function.*/
bl SystemInit
/* Call static constructors */
bl __libc_init_array
/* Call the application's entry point.*/
bl main
Any ideas what could be wrong?
My arm-none-eabi-gcc version is 4.8.4 20140725 (release)
[edit]
The disassembly of the calls
08002218 <____libc_init_array_from_thumb>:
8002218: 4778 bx pc
800221a: 46c0 nop ; (mov r8, r8)
800221c: eafff812 b 800026c <__libc_init_array>
0800026c <__libc_init_array>:
800026c: e92d4070 push {r4, r5, r6, lr}
8000270: e59f506c ldr r5, [pc, #108] ; 80002e4 <__libc_init_array+0x78>
8000274: e59f606c ldr r6, [pc, #108] ; 80002e8 <__libc_init_array+0x7c>
8000278: e0656006 rsb r6, r5, r6
800027c: e1b06146 asrs r6, r6, #2
8000280: 12455004 subne r5, r5, #4
8000284: 13a04000 movne r4, #0
8000288: 0a000005 beq 80002a4 <__libc_init_array+0x38>
800028c: e2844001 add r4, r4, #1
8000290: e5b53004 ldr r3, [r5, #4]!
8000294: e1a0e00f mov lr, pc
8000298: e12fff13 bx r3
800029c: e1560004 cmp r6, r4
80002a0: 1afffff9 bne 800028c <__libc_init_array+0x20>
80002a4: e59f5040 ldr r5, [pc, #64] ; 80002ec <__libc_init_array+0x80>
80002a8: e59f6040 ldr r6, [pc, #64] ; 80002f0 <__libc_init_array+0x84>
80002ac: e0656006 rsb r6, r5, r6
80002b0: eb0007ca bl 80021e0 <_init>
80002b4: e1b06146 asrs r6, r6, #2
80002b8: 12455004 subne r5, r5, #4
80002bc: 13a04000 movne r4, #0
80002c0: 0a000005 beq 80002dc <__libc_init_array+0x70>
80002c4: e2844001 add r4, r4, #1
80002c8: e5b53004 ldr r3, [r5, #4]!
80002cc: e1a0e00f mov lr, pc
80002d0: e12fff13 bx r3
80002d4: e1560004 cmp r6, r4
80002d8: 1afffff9 bne 80002c4 <__libc_init_array+0x58>
80002dc: e8bd4070 pop {r4, r5, r6, lr}
80002e0: e12fff1e bx lr
80002e4: 08002258 .word 0x08002258
80002e8: 08002258 .word 0x08002258
80002ec: 08002258 .word 0x08002258
80002f0: 08002260 .word 0x08002260
[edit 2]
The register values from gdb:
(gdb) info reg
r0 0x20000000 536870912
r1 0x1 1
r2 0x0 0
r3 0x40021000 1073876992
r4 0xffffffff -1
r5 0xffffffff -1
r6 0xffffffff -1
r7 0x20001fd0 536879056
r8 0xffffffff -1
r9 0xffffffff -1
r10 0xffffffff -1
r11 0xffffffff -1
r12 0xffffffff -1
sp 0x20001fd0 0x20001fd0
lr 0xfffffff9 -7
pc 0x800067c 0x800067c <HardFault_Handler+4>
xPSR 0x61000003 1627389955
That __libc_init_array is ARM code, not Thumb, hence the M0 will fall over trying to execute some nonsense it doesn't understand (actually, it never quite gets there since it faults on the attempt to switch to ARM state in the bx, but hey, same difference...)
You'll need to make sure you use pure-Thumb versions of any libraries - a Cortex-M-specific toolchain might be a better bet than a generic ARM one. If you have a multilib toolchain, I'd suggest checking the output of arm-none-eabi-gcc --print-multi-lib to make sure you've specified all the relevant options to get proper Cortex-M libraries, and if you're using a separate link step, make sure you invoke it with LD=arm-none-eabi-gcc (plus the relevant multilib options), rather than LD=arm-none-eabi-ld.
I have this strange problem where the MMU translate memory for str but not for ldr instruction. I'm compiling using gcc (no optimization) for an arm7TDMI.
The program enter a function and store 4 parameters in the stack(r0 to r3)
I have theses registers :
r0 = 0x1e10c8
r1 = 0x12adf0
r2 = 0x0
r3 = 0x2
r11 = 0x12ade4
The MMU is active and everything between 0x0 and 0x00FFFFFF is located physically between 0xC0000000 and 0xC0FFFFFF
The pc execute this 4 lines of assembly :
str r0, [r11, #-24]
str r1, [r11, #-28]
str r2, [r11, #-32]
strb r3, [r11, #-33]
This is the range of memory , where the data is stored after execution :
0xC012ADC0 02000000 ....
0xC012ADC4 00000000 ....
0xC012ADC8 0012ADF0 ð..
0xC012ADCC 001E10C8 È...
And this range of memory is at FF
0x0012ADC0 FFFFFFFF ÿÿÿÿ
0x0012ADC4 FFFFFFFF ÿÿÿÿ
0x0012ADC8 FFFFFFFF ÿÿÿÿ
0x0012ADCC FFFFFFFF ÿÿÿÿ
We see that the data was physically stored in the 0xC0000000 region because of the MMU.
Because I'm in debug mode, I can change manually this area with the following value:
0x0012ADC0 F4F4F4F4 ôôôô
0x0012ADC4 3F3F3F3F ????
0x0012ADC8 F2F2F2F2 òòòò
0x0012ADCC 1F1F1F1F ....
Now 2-3 assembly execution later, I have this assembly line :
ldr r3, [r11, #-24]
I execute this line and I have this value in r3:
r3=0x1f1f1f1f
(if I don't change the memory between 0x0012ADC0 and 0x0012ADCC I normally get 0xFFFFFFFF...)
I really do not understand why r3 is not equal to 0x1E10C8. It's like the MMU does its job when the str command is executed, but when ldr is executed, the MMU is not translating the address(0x0012ADCC instead of 0xC012ADCC). There is something I cannot understand here.
Just in case, here is a snap of the assembly instruction involved :
kapiReceiveQueue:
000195fc: push {r11, lr}
00019600: add r11, sp, #4
00019604: sub sp, sp, #32
00019608: str r0, [r11, #-24] <----- r0 stored physically at C012ADCC
0001960c: str r1, [r11, #-28]
00019610: str r2, [r11, #-32]
00019614: strb r3, [r11, #-33] ; 0x21
693 switch (Option)
00019618: ldrb r3, [r11, #-33] ; 0x21
0001961c: cmp r3, #2
00019620: beq 0x1974c <kapiReceiveQueue+336>
...
0001974c: ldr r3, [r11, #-24] <------ r3 get the value of physical address 0x12ADCC
00019750: ldr r2, [r3]
00019754: sub r3, r11, #17
00019758: mov r0, r2
If this has any relation with my compilations flags, here they are :
arm-none-eabi-gcc -march=armv4t -mcpu=arm7tdmi -dp -DNG_COMP_GCC -c
-Wa,-adhlns="../../Base/Lib/Pa/Kapi.o.lst" -fmessage-length=0 -fno-zero-initialized-
in-bss -MMD -MP -MF"../../Base/Lib/Pa/Kapi.d" -MT"../../Base/Lib/Pa/Kapi.d" -fpic
-mlittle-endian -Wall -DNGHW_TOPMEM_ADDR=0x00800000 -DNG_CPU_ARM -DNG_CPU_ARMv4T
-DNG_CODE_ARM -DNG_LITTLE_ENDIAN -DNG_DEBUG -DNG_RTOS -DNG_COMP_GCC -DNG_RTOS_UCOSII
-DDHCP_CLIENT -g3 -gdwarf-2 ../../Base/Kernel/Alos/Ucos-II/Kapi.c -o"../../Base/Lib/Pa/Kapi.o"
Any help will be greatly appreciated!!