Is there a way to have gcc generate %pc relative addresses of constants? Even when the string appears in the text segment, arm-elf-gcc will generate a constant pointer to the data, load the address of the pointer via a %pc relative address and then dereference it. For a variety of reasons, I need to skip the middle step. As an example, this simple function:
const char * filename(void)
{
static const char _filename[]
__attribute__((section(".text")))
= "logfile";
return _filename;
}
generates (when compiled with arm-elf-gcc-4.3.2 -nostdlib -c
-O3 -W -Wall logfile.c):
00000000 <filename>:
0: e59f0000 ldr r0, [pc, #0] ; 8 <filename+0x8>
4: e12fff1e bx lr
8: 0000000c .word 0x0000000c
0000000c <_filename.1175>:
c: 66676f6c .word 0x66676f6c
10: 00656c69 .word 0x00656c69
I would have expected it to generate something more like:
filename:
add r0, pc, #0
bx lr
_filename.1175:
.ascii "logfile\000"
The code in question needs to be partially position independent since it will be relocated in memory at load time, but also integrate with code that was not compiled -fPIC, so there is no global offset table.
My current work around is to call a non-inline function (which will be done via a %pc relative address) to find the offset from the compiled location in a technique similar to how -fPIC code works:
static intptr_t
__attribute__((noinline))
find_offset( void )
{
uintptr_t pc;
asm __volatile__ (
"mov %0, %%pc" : "=&r"(pc)
);
return pc - 8 - (uintptr_t) find_offset;
}
But this technique requires that all data references be fixed up manually, so the filename() function in the above example would become:
const char * filename(void)
{
static const char _filename[]
__attribute__((section(".text")))
= "logfile";
return _filename + find_offset();
}
Hmmm, maybe you have to compile it as -fPIC to get PIC. Or simply write it in assembler, assembler is a lot easier than the C you are writing.
00000000 :
0: e59f300c ldr r3, [pc, #12] ; 14
4: e59f000c ldr r0, [pc, #12] ; 18
8: e08f3003 add r3, pc, r3
c: e0830000 add r0, r3, r0
10: e12fff1e bx lr
14: 00000004 andeq r0, r0, r4
18: 00000000 andeq r0, r0, r0
0000001c :
1c: 66676f6c strbtvs r6, [r7], -ip, ror #30
20: 00656c69 rsbeq r6, r5, r9, ror #24
Are you getting the same warning I am getting?
/tmp/ccySyaUE.s: Assembler messages:
/tmp/ccySyaUE.s:35: Warning: ignoring changed section attributes for .text
Related
I'm using a Cortex-M0 MCU from NXP (LPC845) and I'm trying to figure out what GCC is trying to do :)
Basically, the C code (pseudo) is as follows:
volatile uint8_t readb1 = 0x1a; // dummy
readb1 = GpioPadB(GPIO_PIN);
and the macro I wrote is
(*((volatile uint8_t*)(SOME_GPIO_ADDRESS)))
Now the code is working, but it produced some extra UXTB instruction I don't understand
00000378: ldrb r3, [r3, #0]
0000037a: ldr r2, [pc, #200] ; (0x444 <AppInit+272>)
0000037c: uxtb r3, r3
0000037e: strb r3, [r2, #0]
105 asm("nop");
My explanation is as follows:
load BYTE from address specified in R3, put result in R3 <-- this is load from GPIO register as BYTE
load in R2 address of readb1 variable
UXTB extends the uint8 value ??? But rotate argument is 0, so basically does nothing for uint8 !
store as BYTE to R2's address (my variable) data from R3
Why does that?
First of all, it should know that data in R3 has just a BYTE meaning (it already generates LDRB correctly). Second, the STRB will already trim 7..0 LSB so why using UXTB ?
Thanks for clarifications,
EDITED:
Compiler version:
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major)
I use -O3
Looks like an extra instruction left in by the compiler and/or there is some nuance to the cortex-m or newer cores (would love to know what that nuance is).
#define GpioPadB(x) (*((volatile unsigned char *)(x)))
volatile unsigned char readb1;
void fun ( void )
{
readb1 = 0x1A;
readb1 = GpioPadB(0x1234000);
}
an apt gotten gcc
arm-none-eabi-gcc --version
arm-none-eabi-gcc (15:4.9.3+svn231177-1) 4.9.3 20150529 (prerelease)
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
arm-none-eabi-gcc -O2 -c -mthumb so.c -o so.o
arm-none-eabi-objdump -d so.o
00000000 <fun>:
0: 231a movs r3, #26
2: 4a03 ldr r2, [pc, #12] ; (10 <fun+0x10>)
4: 7013 strb r3, [r2, #0]
6: 4b03 ldr r3, [pc, #12] ; (14 <fun+0x14>)
8: 781b ldrb r3, [r3, #0]
a: 7013 strb r3, [r2, #0]
c: 4770 bx lr
e: 46c0 nop ; (mov r8, r8)
10: 00000000 .word 0x00000000
14: 01234000 .word 0x01234000
as one would expect.
arm-none-eabi-gcc -O2 -c -mthumb -march=armv7-m so.c -o so.o
arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: 4a03 ldr r2, [pc, #12] ; (10 <fun+0x10>)
2: 211a movs r1, #26
4: 4b03 ldr r3, [pc, #12] ; (14 <fun+0x14>)
6: 7011 strb r1, [r2, #0]
8: 781b ldrb r3, [r3, #0]
a: b2db uxtb r3, r3
c: 7013 strb r3, [r2, #0]
e: 4770 bx lr
10: 00000000 .word 0x00000000
14: 01234000 .word 0x01234000
with the extra utxb instruction in there
Something a bit newer
arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
for armv6m and armv7m
00000000 <fun>:
0: 231a movs r3, #26
2: 4a03 ldr r2, [pc, #12] ; (10 <fun+0x10>)
4: 7013 strb r3, [r2, #0]
6: 4b03 ldr r3, [pc, #12] ; (14 <fun+0x14>)
8: 781b ldrb r3, [r3, #0]
a: 7013 strb r3, [r2, #0]
c: 4770 bx lr
e: 46c0 nop ; (mov r8, r8)
10: 00000000 .word 0x00000000
14: 01234000 .word 0x01234000
for armv4t
00000000 <fun>:
0: 231a movs r3, #26
2: 4a03 ldr r2, [pc, #12] ; (10 <fun+0x10>)
4: 7013 strb r3, [r2, #0]
6: 4b03 ldr r3, [pc, #12] ; (14 <fun+0x14>)
8: 781b ldrb r3, [r3, #0]
a: 7013 strb r3, [r2, #0]
c: 4770 bx lr
e: 46c0 nop ; (mov r8, r8)
10: 00000000 .word 0x00000000
14: 01234000 .word 0x01234000
and the utxb is gone.
I think it is just a missed optimization, peephole or otherwise.
As answered already though, when you use non-gpr-sized variables you can expect and/or tolerate the compiler converting up to the register size. Varies by compiler and target as to whether they do it on the way in or the way out (when a variable is read or just before it is written or used down the road).
For x86 where you can access various portions of the register separately (or use memory based operands) you will see they do not do this (in gcc) even for cases when it clearly needs a sign extension or padding. And sort it out down the road when the value is used.
You can search the gcc sources for utxb and perhaps see the issue or a comment.
EDIT
Note that clang takes a different path, it burns clocks generating the address but does not do the extension
00000000 <fun>:
0: f240 0000 movw r0, #0
4: f2c0 0000 movt r0, #0
8: 211a movs r1, #26
a: 7001 strb r1, [r0, #0]
c: f244 0100 movw r1, #16384 ; 0x4000
10: f2c0 1123 movt r1, #291 ; 0x123
14: 7809 ldrb r1, [r1, #0]
16: 7001 strb r1, [r0, #0]
18: 4770 bx lr
clang --version
clang version 11.1.0 (https://github.com/llvm/llvm-project.git 1fdec59bffc11ae37eb51a1b9869f0696bfd5312)
Target: armv7m-none-unknown-eabi
Thread model: posix
InstalledDir: /opt/llvm11armv7m/bin
I think it is simply an optimization problem with gcc/gnu.
The "volatile" modifier is to blame. It does not call type extensions when written, because it doesn't make sense. But when reading, it always calls the extension. Because now the data is stored in a register, and must be ready for any operations, over the entire range of the visibility limit.
Abandoning "volatile" removes any additional operations on the data, but it can also remove the very fact of using the variable.
https://godbolt.org/z/cGvc8r6se
First of all, it should know that data in R3 has just a BYTE meaning
Registers are only 32 bits. They do not have any other "meaning". The register must contain the same value as the loaded byte - thus UXTB. Any other operation later (for example adding something requires the whole register to contain the correct value.
Generally speaking, using shorter types than 32 bit usually adds some overhead as Cortex-Mx processors do not do operations on the "portions" of the registers.
To fix this problem, you need to file a bug at https://gcc.gnu.org/bugzilla/. But there are two difficult situations.
There are a lot of bugs related to "volatile", and all of them are not closed, and most of them are not even confirmed. As far as I understand, the developers are already tired of fighting windmills, and do not even react to it.
To successfully fix the problem - you need to find the extreme, the very one that wrote the root of evil. Authorship and all. You will not be allowed into someone else's branch, and only the most advanced are allowed into the master.
But even before this moment, you need to find the reason for this behavior, and here again there are problems.
The GCC code is huge, you can search endlessly.
My personal opinion: GCC treats ARM kernel registers as part of fast memory. This memory can be accessed via a physical address, which only adds to the problems. Well, if this is memory, and the dimension does not match, then, according to GCC, you need to add expansion commands.
Why does GCC use the correct commands when simply accessed? - well, he reads from memory to memory. Emphasis - "from memory". No matter what happens next, you need to read it right now.
When using ARM GCC g++ compiler with optimization level -O2 (and up) this code:
void foo(void)
{
DBB("#0x%08X: 0x%08X", 1, *((uint32_t *)1));
DBB("#0x%08X: 0x%08X", 0, *((uint32_t *)0));
}
Compiles to:
0800abb0 <_Z3foov>:
800abb0: b508 push {r3, lr}
800abb2: 2301 movs r3, #1
800abb4: 4619 mov r1, r3
800abb6: 681a ldr r2, [r3, #0]
800abb8: 4802 ldr r0, [pc, #8] ; (800abc4 <_Z3foov+0x14>)
800abba: f007 fa83 bl 80120c4 <debug_print_blocking>
800abbe: 2300 movs r3, #0
800abc0: 681b ldr r3, [r3, #0]
800abc2: deff udf #255 ; 0xff
800abc4: 08022704 stmdaeq r2, {r2, r8, r9, sl, sp}
And this gives me hardfault at undefined instruction #0x0800abc2.
Also, if there is more code after that, it is not compiled into final binary.
The question is why compiler generates it like that, why undefined istruction?
By the way, it works fine for stuff like this:
...
uint32_t num = 2;
num -= 2;
DBB("#0x%08X: 0x%08X", 0, *((uint32_t *)num));
...
Compiler version:
arm-none-eabi-g++.exe (GNU Tools for ARM Embedded Processors 6-2017-q2-update) 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437]
You can disable this (and verify this answer) by using -fno-delete-null-pointer-checks
The pointer you are passing has a value which matches the null pointer, and the compiler can see that from static analysis, so it faults (because that is the defined behaviour).
In your second example, the static analysis doesn't identify a NULL.
I have built a simple application using GNU gcc (4.8.1429) for ARM (ATsam4LC4B: cortexM4 core, 256K flash) that:
loads at 0x3F000 (linker option -Ttext=0x3F000) and initializes,
copies itself (mirrors) at 0
jumps to the mirror Reset_Handler() (near 0) using assembly
the mirror application begins to run as expected
The problem raises when the flash memory location of the mirror main() call (flash location 0x532) is reached:
The instruction ldr [pc, #32] that is located there, loads r3 with 0x3F565 (thus calling the original main()), instead of the expected 0x565 (so that to call the mirror main() as expected).
This happens even if all the registers containing 0x3F... are zeroed prior to the ldr instruction.
The debugger reports the PC register to be 0x534 as expected and in compliance to the flash region within the stepping is performed.
Can anybody please explain to me what is going on?
Thank you in advance.
The program is compiled with [-mthumb -nostdlib -nostartfiles -ffreestanding -msingle-pic-base -mno-pic-data-is-text-relative -mlong-calls -mpoke-function-name] flags and linked with the [-Wl,--gc-sections -mcpu=cortex-m4 -Ttext=0x3f000 -nostdlib -nostartfiles -ffreestanding -Wl,-dead-strip -Wl,-static -Wl,--entry=Reset_Handler -Wl,--cref -mthumb] flags. The linker script defines rom start=0.
The region of interest within the produced .lss file contains (omitted lines are marked as ...):
...
void Reset_Handler(void)
{
...
/* Initialize the relocate segment */
...
/* Clear the zero segment */
...
/* Set the vector table base address */
/* Initialize the C library */
// __libc_init_array();
/* Branch to main function */
main();
3f532: 4b08 ldr r3, [pc, #32] ; (3f554 <Reset_Handler+0x5c>)
3f534: 4798 blx r3
3f536: e7fe b.n 3f536 <Reset_Handler+0x3e>
3f538: 20000000 .word 0x20000000
3f53c: 0003f59c .word 0x0003f59c
3f540: 20000004 .word 0x20000004
3f544: 20000004 .word 0x20000004
3f548: 20000008 .word 0x20000008
3f54c: e000ed00 .word 0xe000ed00
3f550: 0003f000 .word 0x0003f000
3f554: 0003f565 .word 0x0003f565
3f558: 6e69616d .word 0x6e69616d
3f55c: 00 .byte 0x00
3f55d: 00 .byte 0x00
3f55e: bf00 nop
3f560: ff000008 .word 0xff000008
0003f564 <main>:
int main (void)
{
3f564: b510 push {r4, lr}
...
asm("bx %0"::"r"(*(unsigned*)0x3F004 - 0x3F000));
3f580: 4b05 ldr r3, [pc, #20] ; (3f598 <main+0x34>)
3f582: 681b ldr r3, [r3, #0]
3f584: f5a3 337c sub.w r3, r3, #258048 ; 0x3f000
3f588: 4718 bx r3
}
return 0;
}
3f58a: 2000 movs r0, #0
3f58c: bd10 pop {r4, pc}
3f58e: bf00 nop
3f590: e0001000 .word 0xe0001000
3f594: 0003f329 .word 0x0003f329
3f598: 0003f004 .word 0x0003f004
...
#Notlikethat answered my question.
I was confused assuming that pc+32 address should be loaded to r3 directly, instead of the contents of this address.
I am working on a project for an ARM Cortex-M3 (SiLabs) SOC. I need to move the interrupt vector [edit] and code away from the bottom of flash to make room for a "boot loader". The boot loader starts at address 0 to come up when the core comes out of reset. Its function is to validate the main image, loaded at a higher address and possibly replace that main image with a new one.
Therefore, the boot loader would have its vector table at 0, followed by its code. At a higher, fixed address, say 8KB, would be the main image, starting with its vector table.
I have found this page which describes the Vector Table Offset Register which the boot loader can use (with interrupts masked, obviously) to point the hardware to the new vector table.
My question is how to get the "main" image linked so that it will work when written to flash, starting not at zero. I'm not familiar with ARM assembly but I assume the code is not position independent.
I'm using SiLabs's Precision32 IDE which uses gcc for the toolchain. I've found how to add linker flags. My question is what gcc flag(s) will provide the change to the base of the vector table and code.
Thank you.
vectors.s
.cpu cortex-m3
.thumb
.word 0x20008000 /* stack top address */
.word _start /* Reset */
.word hang
.word hang
/* ... */
.thumb_func
hang: b .
.thumb_func
.globl _start
_start:
bl notmain
b hang
notmain.c
extern void fun ( unsigned int );
void notmain ( void )
{
fun(7);
}
fun.c
void fun ( unsigned int x )
{
}
Makefile
hello.elf : vectors.s fun.c notmain.c memmap
arm-none-eabi-as vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c notmain.c -o notmain.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c fun.c -o fun.o
arm-none-eabi-ld -o hello.elf -T memmap vectors.o notmain.o fun.o
arm-none-eabi-objdump -D hello.elf > hello.list
arm-none-eabi-objcopy hello.elf -O binary hello.bin
so if the linker script (memmap is the name I used) looks like this
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x40000
ram : ORIGIN = 0x20000000, LENGTH = 0x8000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram
}
since all of the above is .text, no .bss nor .data, the linker takes the objects as listed on the ld command line and places them starting at that address...
mbly of section .text:
00000000 <hang-0x10>:
0: 20008000 andcs r8, r0, r0
4: 00000013 andeq r0, r0, r3, lsl r0
8: 00000011 andeq r0, r0, r1, lsl r0
c: 00000011 andeq r0, r0, r1, lsl r0
00000010 <hang>:
10: e7fe b.n 10 <hang>
00000012 <_start>:
12: f000 f801 bl 18 <notmain>
16: e7fb b.n 10 <hang>
00000018 <notmain>:
18: 2007 movs r0, #7
1a: f000 b801 b.w 20 <fun>
1e: bf00 nop
00000020 <fun>:
20: 4770 bx lr
22: bf00 nop
So for this to work you have to be careful to put your bootstrap code first on the command line. But you can also do things like this with a linker script.
the order appears to matter, listing the specific object files first then the generic .text later
MEMORY
{
romx : ORIGIN = 0x00000000, LENGTH = 0x1000
romy : ORIGIN = 0x00010000, LENGTH = 0x1000
ram : ORIGIN = 0x00030000, LENGTH = 0x1000
bob : ORIGIN = 0x00040000, LENGTH = 0x1000
ted : ORIGIN = 0x00050000, LENGTH = 0x1000
}
SECTIONS
{
abc : { vectors.o } > romx
def : { fun.o } > ted
.text : { *(.text*) } > romy
.bss : { *(.bss*) } > ram
}
and we get this
00000000 <hang-0x10>:
0: 20008000 andcs r8, r0, r0
4: 00000013 andeq r0, r0, r3, lsl r0
8: 00000011 andeq r0, r0, r1, lsl r0
c: 00000011 andeq r0, r0, r1, lsl r0
00000010 <hang>:
10: e7fe b.n 10 <hang>
00000012 <_start>:
12: f00f fff5 bl 10000 <notmain>
16: e7fb b.n 10 <hang>
Disassembly of section def:
00050000 <fun>:
50000: 4770 bx lr
50002: bf00 nop
Disassembly of section .text:
00010000 <notmain>:
10000: 2007 movs r0, #7
10002: f03f bffd b.w 50000 <fun>
10006: bf00 nop
the short answer is with gnu tools you use a linker script to manipulate where things end up, I assume you want these functions to be in the rom at some specified location. I dont quite understand exactly what you are doing. But if for example you are trying to put something simple like the branch to main() in the flash initially with main() being deeper in the flash, then somehow either through whatever code is deeper in the flash or through some other method, then later you erase and reprogram only the stuff near zero. you will still need a simple branch to main() the first time. you can force what I am calling vectors.o to be at address zero then .text can be deeper in the flash putting all of the reset of the code basically there, then leave that in flash and replace just the stuff at zero.
like this
MEMORY
{
romx : ORIGIN = 0x00000000, LENGTH = 0x1000
romy : ORIGIN = 0x00010000, LENGTH = 0x1000
ram : ORIGIN = 0x00030000, LENGTH = 0x1000
bob : ORIGIN = 0x00040000, LENGTH = 0x1000
ted : ORIGIN = 0x00050000, LENGTH = 0x1000
}
SECTIONS
{
abc : { vectors.o } > romx
.text : { *(.text*) } > romy
.bss : { *(.bss*) } > ram
}
giving
00000000 <hang-0x10>:
0: 20008000 andcs r8, r0, r0
4: 00000013 andeq r0, r0, r3, lsl r0
8: 00000011 andeq r0, r0, r1, lsl r0
c: 00000011 andeq r0, r0, r1, lsl r0
00000010 <hang>:
10: e7fe b.n 10 <hang>
00000012 <_start>:
12: f00f fff7 bl 10004 <notmain>
16: e7fb b.n 10 <hang>
Disassembly of section .text:
00010000 <fun>:
10000: 4770 bx lr
10002: bf00 nop
00010004 <notmain>:
10004: 2007 movs r0, #7
10006: f7ff bffb b.w 10000 <fun>
1000a: bf00 nop
then leave the 0x10000 stuff and replace 0x00000 stuff later.
Anyway, the short answer is linker script you need to craft a linker script to put things where you want them. gnu linker scripts can get extremely complicated, I lean toward the simple.
If you want to place everything at some other address, including your vector table, then perhaps something like this:
hop.s
.cpu cortex-m3
.thumb
.word 0x20008000 /* stack top address */
.word _start /* Reset */
.word hang
.word hang
/* ... */
.thumb_func
hang: b .
.thumb_func
ldr r0,=_start
bx r0
and this
MEMORY
{
romx : ORIGIN = 0x00000000, LENGTH = 0x1000
romy : ORIGIN = 0x00010000, LENGTH = 0x1000
ram : ORIGIN = 0x00030000, LENGTH = 0x1000
bob : ORIGIN = 0x00040000, LENGTH = 0x1000
ted : ORIGIN = 0x00050000, LENGTH = 0x1000
}
SECTIONS
{
abc : { hop.o } > romx
.text : { *(.text*) } > romy
.bss : { *(.bss*) } > ram
}
gives this
Disassembly of section abc:
00000000 <hang-0x10>:
0: 20008000 andcs r8, r0, r0
4: 00010013 andeq r0, r1, r3, lsl r0
8: 00000011 andeq r0, r0, r1, lsl r0
c: 00000011 andeq r0, r0, r1, lsl r0
00000010 <hang>:
10: e7fe b.n 10 <hang>
12: 4801 ldr r0, [pc, #4] ; (18 <hang+0x8>)
14: 4700 bx r0
16: 00130000
1a: 20410001
Disassembly of section .text:
00010000 <hang-0x10>:
10000: 20008000 andcs r8, r0, r0
10004: 00010013 andeq r0, r1, r3, lsl r0
10008: 00010011 andeq r0, r1, r1, lsl r0
1000c: 00010011 andeq r0, r1, r1, lsl r0
00010010 <hang>:
10010: e7fe b.n 10010 <hang>
00010012 <_start>:
10012: f000 f803 bl 1001c <notmain>
10016: e7fb b.n 10010 <hang>
00010018 <fun>:
10018: 4770 bx lr
1001a: bf00 nop
0001001c <notmain>:
1001c: 2007 movs r0, #7
1001e: f7ff bffb b.w 10018 <fun>
10022: bf00 nop
then you can change the vector table to 0x10000 for example.
if you are asking a different question like having the bootloader at 0x00000, then the bootloader modifies the flash to add an application say at 0x20000, then you want to run that application, there are simpler solutions that dont necessarily require you to modify the location of the vector table.
In the CMSIS definitions for gcc you can find something like this:
static __INLINE void __DMB(void) { __ASM volatile ("dmb"); }
My question is: what use does a memory barrier have if it does not declare "memory" in the clobber list?
Is it an error in the core_cm3.h or is there a reason why gcc should behave correctly without any additional help?
I did some testing with gcc 4.5.2 (built with LTO). If I compile this code:
static inline void __DMB(void) { asm volatile ("dmb"); }
static inline void __DMB2(void) { asm volatile ("dmb" ::: "memory"); }
char x;
char test1 (void)
{
x = 15;
return x;
}
char test2 (void)
{
x = 15;
__DMB();
return x;
}
char test3 (void)
{
x = 15;
__DMB2();
return x;
}
using arm-none-eabi-gcc -Os -mcpu=cortex-m3 -mthumb -c dmb.c, then from arm-none-eabi-objdump -d dmb.o I get this:
00000000 <test1>:
0: 4b01 ldr r3, [pc, #4] ; (8 <test1+0x8>)
2: 200f movs r0, #15
4: 7018 strb r0, [r3, #0]
6: 4770 bx lr
8: 00000000 .word 0x00000000
0000000c <test2>:
c: 4b02 ldr r3, [pc, #8] ; (18 <test2+0xc>)
e: 200f movs r0, #15
10: 7018 strb r0, [r3, #0]
12: f3bf 8f5f dmb sy
16: 4770 bx lr
18: 00000000 .word 0x00000000
0000001c <test3>:
1c: 4b03 ldr r3, [pc, #12] ; (2c <test3+0x10>)
1e: 220f movs r2, #15
20: 701a strb r2, [r3, #0]
22: f3bf 8f5f dmb sy
26: 7818 ldrb r0, [r3, #0]
28: 4770 bx lr
2a: bf00 nop
2c: 00000000 .word 0x00000000
It is obvious that __DBM() only inserts the dmb instruction and it takes DMB2() to actually force the compiler to flush the values cached in the registers.
I guess I found a CMSIS bug.
IMHO the CMSIS version is right.
Injecting the barrier instruction without the memory in clobber list achieves exactly what it is supposed to do:
If the previous write on "x" variable was buffered then it is committed. This is useful, for example, if you are going to pass "x" address to as a DMA address, or if you are going to setup MPU.
It has no effect on returning "x" (your program is guaranteed to be correct even if you omit memory barrier).
On the other hand by inserting memory in clobber list, you have no kind of effect in situations like the example before (DMA, MPU..).
The only difference in the latter case is that if you have for example an ISR modifying the value of "x" right after "strb" , then the value that will be returned is the value modified by the ISR, because the clobber caused the compiler to read from memory to register again.
But if you want to obtain this thing then you should use "volatile" variables.
In other words: the barrier forces cache vs memory commit in order to guarantee consistency with other HW resources that might access RAM memory, while clobbering memory causes compiler to stop assuming the memory has not changed and to read again in local registers, that is another thing with different purposes (it does not matter if a memory change is still in cache or already committed on RAM, because an eventual asm load operation is guaranteed to work in both cases without barriers).