Why is a write to a memory-mapped peripheral register not actioned (LPC43xx)? - gcc

I'm building an application for NXP LPC4330 (Arm Cortex M0/M4 dual core). I'm compiling using arm-none-eabi-gcc 4.9.3. At one point in my code, I am performing a write to a (32-bit) memory location. Immediately afterwards, if I read back from that memory location, around one time in ten the result indicates that the write did not occur. Subsequent reads at later times indicate the same thing, so it is not a transient condition. Interrupts are disabled at the global level, and the assembler generated by the compiler is clearly attempting the write, so how is it possible that the write is not being actioned?
Specifically, I am writing to SLICE_MUX_CFG0 which is a memory-mapped register in the SGPIO peripheral. When the write works, the peripheral functions correctly. When the read-back indicates that the write has not worked, the peripheral does not function correctly. So, it seems that the register in question is not being set correctly, as indicated by the read-back.
Looking into the .asm (listed below), the write is clear. When I read back the value afterwards, it reads as zero, which - given the listing below - seems to me to be impossible. If I perform a read immediately before the write (see the .c listing, below), the problem goes away, which is perhaps a clue.
So the above indicates, what? Does this break some rule for use of the memory bus? I've looked at the GCC bugs list and can't see anything that relates to this.
The function follows, both source and ASM, with some annotation. What could be happening, here? Why does the write at "store value" apparently not have any effect?
20000f7c <camera_SGPIO_init_sub>:
; disable interrupts globally
20000f7c: b672 cpsid i
20000f7e: 2346 movs r3, #70 ; 0x46
20000f80: 4a16 ldr r2, [pc, #88] ; (20000fdc <camera_SGPIO_init_sub+0x60>)
20000f82: 6013 str r3, [r2, #0]
20000f84: 4a16 ldr r2, [pc, #88] ; (20000fe0 <camera_SGPIO_init_sub+0x64>)
20000f86: 6013 str r3, [r2, #0]
20000f88: 4a16 ldr r2, [pc, #88] ; (20000fe4 <camera_SGPIO_init_sub+0x68>)
20000f8a: 6013 str r3, [r2, #0]
20000f8c: 4a16 ldr r2, [pc, #88] ; (20000fe8 <camera_SGPIO_init_sub+0x6c>)
20000f8e: 6013 str r3, [r2, #0]
20000f90: 4a16 ldr r2, [pc, #88] ; (20000fec <camera_SGPIO_init_sub+0x70>)
20000f92: 3301 adds r3, #1
20000f94: 6013 str r3, [r2, #0]
20000f96: 4a16 ldr r2, [pc, #88] ; (20000ff0 <camera_SGPIO_init_sub+0x74>)
20000f98: 6013 str r3, [r2, #0]
20000f9a: 4a16 ldr r2, [pc, #88] ; (20000ff4 <camera_SGPIO_init_sub+0x78>)
20000f9c: 6013 str r3, [r2, #0]
20000f9e: 4a16 ldr r2, [pc, #88] ; (20000ff8 <camera_SGPIO_init_sub+0x7c>)
20000fa0: 6013 str r3, [r2, #0]
20000fa2: 4a16 ldr r2, [pc, #88] ; (20000ffc <camera_SGPIO_init_sub+0x80>)
20000fa4: 6013 str r3, [r2, #0]
20000fa6: 2240 movs r2, #64 ; 0x40
20000fa8: 4b15 ldr r3, [pc, #84] ; (20001000 <camera_SGPIO_init_sub+0x84>)
20000faa: 601a str r2, [r3, #0]
20000fac: 2290 movs r2, #144 ; 0x90
20000fae: 4b15 ldr r3, [pc, #84] ; (20001004 <camera_SGPIO_init_sub+0x88>)
20000fb0: 0512 lsls r2, r2, #20
20000fb2: 601a str r2, [r3, #0]
; load value
20000fb4: 23c6 movs r3, #198 ; 0xc6
; load destination address
20000fb6: 4a14 ldr r2, [pc, #80] ; (20001008 <camera_SGPIO_init_sub+0x8c>)
; store value
20000fb8: 6013 str r3, [r2, #0]
; read value back
20000fba: 6810 ldr r0, [r2, #0]
20000fbc: 4a13 ldr r2, [pc, #76] ; (2000100c <camera_SGPIO_init_sub+0x90>)
20000fbe: 6013 str r3, [r2, #0]
20000fc0: 4a13 ldr r2, [pc, #76] ; (20001010 <camera_SGPIO_init_sub+0x94>)
20000fc2: 6013 str r3, [r2, #0]
20000fc4: 4a13 ldr r2, [pc, #76] ; (20001014 <camera_SGPIO_init_sub+0x98>)
20000fc6: 6013 str r3, [r2, #0]
20000fc8: 4a13 ldr r2, [pc, #76] ; (20001018 <camera_SGPIO_init_sub+0x9c>)
20000fca: 6013 str r3, [r2, #0]
20000fcc: 4a13 ldr r2, [pc, #76] ; (2000101c <camera_SGPIO_init_sub+0xa0>)
20000fce: 6013 str r3, [r2, #0]
20000fd0: 4a13 ldr r2, [pc, #76] ; (20001020 <camera_SGPIO_init_sub+0xa4>)
20000fd2: 6013 str r3, [r2, #0]
20000fd4: 4a13 ldr r2, [pc, #76] ; (20001024 <camera_SGPIO_init_sub+0xa8>)
20000fd6: 6013 str r3, [r2, #0]
; enable interrupts globally
20000fd8: b662 cpsie i
20000fda: 4770 bx lr
20000fdc: 40086480 .word 0x40086480
20000fe0: 40086484 .word 0x40086484
20000fe4: 40086488 .word 0x40086488
20000fe8: 40086494 .word 0x40086494
20000fec: 40086380 .word 0x40086380
20000ff0: 40086384 .word 0x40086384
20000ff4: 40086388 .word 0x40086388
20000ff8: 4008639c .word 0x4008639c
20000ffc: 40086208 .word 0x40086208
20001000: 40086204 .word 0x40086204
20001004: 40050064 .word 0x40050064
20001008: 40101080 .word 0x40101080
2000100c: 401010a0 .word 0x401010a0
20001010: 40101090 .word 0x40101090
20001014: 401010a4 .word 0x401010a4
20001018: 40101088 .word 0x40101088
2000101c: 401010a8 .word 0x401010a8
20001020: 40101094 .word 0x40101094
20001024: 401010ac .word 0x401010ac
The C-code which compiled to the above follows.
volatile uint32_t vol_dummy_for_read;
#define __SFS(addr, value) *((volatile uint32_t*)addr) = value;
#define SGPIO_SLICE_MUX_CFG0 (*((volatile uint32_t*) ... some address ... ))
uint32_t camera_SGPIO_init_sub()
{
__asm volatile ("cpsid i" : : : "memory");
// configure pins to SGPIO
__SFS(P9_0, SCU_SFS_INPUT | 6); // D0, SGPIO0
__SFS(P9_1, SCU_SFS_INPUT | 6);
__SFS(P9_2, SCU_SFS_INPUT | 6);
__SFS(P9_5, SCU_SFS_INPUT | 6);
__SFS(P7_0, SCU_SFS_INPUT | 7);
__SFS(P7_1, SCU_SFS_INPUT | 7);
__SFS(P7_2, SCU_SFS_INPUT | 7);
__SFS(P7_7, SCU_SFS_INPUT | 7); // D7, SGPIO7
// SGPIO8
__SFS(P4_2, SCU_SFS_INPUT | 7); // PCLK, SGPIO8
// configure pins to GPIO
__SFS(P4_1, SCU_SFS_INPUT | 0); // HSYNC, GPIO2[1]
// bring SGPIO clock up to full speed (same as PLL1, M4)
CGU_BASE_PERIPH_CLK = (0 << 1) | (0 << 11) | (9 << 24);
// SLICE_MUX_CFG
uint32_t SLICE_MUX_CFG_VALUE =
(1 << 1) /* clock on falling edge */
| (1 << 2) /* clock from external pin */
| (3 << 6) /* shift 8 bytes per clock */
;
//// see note above (this fixes it)
//vol_dummy_for_read = SGPIO_SLICE_MUX_CFG0 ;
SGPIO_SLICE_MUX_CFG0 = SLICE_MUX_CFG_VALUE; // A
uint32_t ret = SGPIO_SLICE_MUX_CFG0;
SGPIO_SLICE_MUX_CFG8 = SLICE_MUX_CFG_VALUE; // I
SGPIO_SLICE_MUX_CFG4 = SLICE_MUX_CFG_VALUE; // E
SGPIO_SLICE_MUX_CFG9 = SLICE_MUX_CFG_VALUE; // J
SGPIO_SLICE_MUX_CFG2 = SLICE_MUX_CFG_VALUE; // C
SGPIO_SLICE_MUX_CFG10 = SLICE_MUX_CFG_VALUE; // K
SGPIO_SLICE_MUX_CFG5 = SLICE_MUX_CFG_VALUE; // F
SGPIO_SLICE_MUX_CFG11 = SLICE_MUX_CFG_VALUE; // L
__asm volatile ("cpsie i" : : : "memory");
return ret;
}

(I am answering my own question; this answer was reached based on clues offered in the comments, above).
Short Answer
The peripheral is not actioning the register update reliably because the peripheral clock that is driving it (CGU_BASE_PERIPH_CLK) has only just had its speed changed at the time of the write operation. Setting the AUTOBLOCK bit when updating the clock's speed eliminates the problem.
Discussion
Presumably, the clock to the peripheral is transiently invalid during the frequency change, depending on conditions. Perhaps, if the timing of edges happens to be just-so, very short clock pulses find their way through during the change. Or something similarly unpleasant finds its way down the clock line to the peripheral. In any case, in these unpredictable conditions, the write may not occur, causing the reported failure.
Waiting for a period of time between the clock speed change and the subsequent assignment also eliminates the problem, understandably. As reported in the question, performing a read of the register prior to the write also eliminates the problem; whether this is because it takes time, or the read operation blocks (for reasons unclear) until the peripheral clock has settled, is unclear.
AUTOBLOCK is documented only as far as the statement of function: "Block clock automatically during frequency change". The User Manual gives no indication of the conditions under which the bit should be set, or left clear, during a clock speed change. However, given the evidence reported here, a policy of always setting AUTOBLOCK when updating the speed of a clock in one of these devices, unless there is a known reason to leave it clear, seems wise.
Reference: NXP User Manual for LPC43xx, UM10503 Rev 1.9, Chapter 13.

Related

Is this a GCC bug or am I doing something wrong?

I am trying to get the final accumulate in the code below to use the ARM M7 SMLAL 32*32->64 bit accumulate function. If I include the T3 = T3 + 1 than it does use this, but if I comment it out it does a full 64*64 bit and accumulate using 3 multiply and 2 add instructions. I don't actually want to add 1 to T3 so it needs to go.
I've broken the code down so that I could analyse it in more detail and it definitely seems to be that the cast of T3 to int32_t and throwing away the bottom 32 bits from the multiply isn't being picked up by the compiler and it thinks T3 still has 64 bits. Bit when I add the simple increment of T3 it then gets it correct. I tried adding zero but then it goes back to the full 64*64 bit multiply.
I'm using the -O2 optimisation on STM's STM32CubeIDE which uses a version of GCC. Other optimations never use SMLAL or unroll everything.
int64_t T4 = 0;
osc = key * NumHarmonics;
harmonic = 0;
do
{
if (OscLevel[osc] > 1)
{
OscPhase[osc] = OscPhase[osc] + (uint32_t)(T2);
int32_t T5 = Sine[(OscPhase[osc] >> 16) & 0x0000FFFF];
int64_t T6 = (int64_t)T1 * Tremelo[harmonic];
int32_t T3 = (int32_t)(T6 >> 32); // grab the most significant register
// T3 = T3 + 1; // needs the +1 to force use of SMLAL in next instruction ! (+0 doesn't help)
T4 = T4 + (int64_t)T3 * (int64_t)T5; // should be SMLAL but does a full 64*64 mult if no +1 above
}
osc++;
harmonic++;
}
while (harmonic < NumHarmonics);
OscTotal = T4;
without the addition :
800054e: 4b13 ldr r3, [pc, #76] ; (800059c <main+0xd8>)
8000550: f853 1024 ldr.w r1, [r3, r4, lsl #2]
8000554: ea4f 79e1 mov.w r9, r1, asr #31
8000558: fba7 4501 umull r4, r5, r7, r1
800055c: fb07 f309 mul.w r3, r7, r9
8000560: fb01 3202 mla r2, r1, r2, r3
8000564: 4415 add r5, r2
8000566: e9dd 2300 ldrd r2, r3, [sp]
800056a: 1912 adds r2, r2, r4
800056c: 416b adcs r3, r5
800056e: e9cd 2300 strd r2, r3, [sp]
}
osc++;
8000572: 3001 adds r0, #1
harmonic++;
with the addition
8000542: 4b0b ldr r3, [pc, #44] ; (8000570 <main+0xac>)
8000544: f853 3020 ldr.w r3, [r3, r0, lsl #2]
8000548: fbc3 6701 smlal r6, r7, r3, r1
}
osc++;
800054c: 3201 adds r2, #1
harmonic++;

Why does _exit() jump to _etext?

I am running a project using the ARM Embedded Tollchain on a stm32 microcontroller which uses the newLib.
I called assert(false) to test the assert output and ended in a Hard Fault Exception. I debugged into the assembly of assert(...) and found out that a subsequent call to _exit(1) jumps to a Address which is called _etext. Taking a look to the manpage of _etext shows that _etext is the address of the end of the .text section.
I am really confused. Normally I had supposed that _exit() is calling __exit() (which is defined as global symbol by the newLib) which I had implemented in a file named syscalls.c.
Why does _exit() jump to _etext?
Here are some cope snippets for a better understanding:
The subsequent call to _exit() by assert() taken from newLib 2.5:
_VOID
_DEFUN_VOID (abort)
{
#ifdef ABORT_MESSAGE
write (2, "Abort called\n", sizeof ("Abort called\n")-1);
#endif
while (1)
{
raise (SIGABRT);
_exit (1);
}
}
The disassembly of abort and assert. Take a special look to address 0808a10a where the jump to 80a5198 (_etext) is performed:
abort:
0808a100: push {r3, lr}
0808a102: movs r0, #6
0808a104: bl 0x808bfdc <raise>
0808a108: movs r0, #1
0808a10a: bl 0x80a51d8
0808a10e: nop
__assert_func:
0808a110: push {lr}
0808a112: ldr r4, [pc, #40] ; (0x808a13c <__assert_func+44>)
0808a114: ldr r6, [r4, #0]
0808a116: mov r5, r0
0808a118: sub sp, #20
0808a11a: mov r4, r3
0808a11c: ldr r0, [r6, #12]
0808a11e: cbz r2, 0x808a136 <__assert_func+38>
0808a120: ldr r3, [pc, #28] ; (0x808a140 <__assert_func+48>)
0808a122: str r2, [sp, #8]
0808a124: stmia.w sp, {r1, r3}
0808a128: mov r2, r4
0808a12a: mov r3, r5
0808a12c: ldr r1, [pc, #20] ; (0x808a144 <__assert_func+52>)
0808a12e: bl 0x808a5f4 <fiprintf>
0808a132: bl 0x808a100 <abort>
0808a136: ldr r3, [pc, #16] ; (0x808a148 <__assert_func+56>)
0808a138: mov r2, r3
0808a13a: b.n 0x808a122 <__assert_func+18>
0808a13c: str r0, [r3, #120] ; 0x78
0808a13e: movs r0, #0
0808a140: add r12, r11
0808a142: lsrs r2, r1, #32
0808a144: add r12, sp
0808a146: lsrs r2, r1, #32
0808a148: add r8, sp
0808a14a: lsrs r2, r1, #32
The lss-file which shows that 80a5198 is the address of _etext:
0808a0c0 <abort>:
808a0c0: b508 push {r3, lr}
808a0c2: 2006 movs r0, #6
808a0c4: f001 ff6a bl 808bf9c <raise>
808a0c8: 2001 movs r0, #1
808a0ca: f01b f865 bl 80a5198 <_etext>
808a0ce: bf00 nop

Is gcc reordering my volatile variables?

I am developping on an STM32F746 (ARM Cortex-M7) with the GNU ARM Toolchain (gcc 5.4.1) using FreeRTOS. I am using the following CFLAGS set:
-g -Werror -Wextra -O2 -fno-common -ffunction-sections -fmessage-length=0 -g -mcpu=cortex-m7 -mthumb -mfloat-abi=hard -mfpu=fpv5-sp-d16 -fno-common -ffunction-sections -fmessage-length=0 -g -mcpu=cortex-m7 -mthumb -mfloat-abi=hard -mfpu=fpv5-sp-d16
The following code shows '0' in the console as if the DMA controller was not powered on. Normally, writing RCC_AHB1ENR_DMA1EN in the AHB1ENR register power on the DMA controller.
/* *** Other peripheral configurations *** */
RCC->AHB1ENR |= RCC_AHB1ENR_DMA1EN;
DMA1_Stream3->CR = DMA_SxCR_PL | DMA_SxCR_MINC |
DMA_SxCR_TCIE | DMA_SxCR_TEIE | DMA_SxCR_DMEIE;
DEBUG_PRINTF("%x\r\n", DMA1_Stream3->CR);
But the following code is printing the good value of DMA1_Stream3->CR.
/* *** Other peripheral configurations *** */
RCC->AHB1ENR |= RCC_AHB1ENR_DMA1EN;
__asm__ __volatile__ ("" ::: "memory");
DMA1_Stream3->CR = DMA_SxCR_PL | DMA_SxCR_MINC |
DMA_SxCR_TCIE | DMA_SxCR_TEIE | DMA_SxCR_DMEIE;
DEBUG_PRINTF("%x\r\n", DMA1_Stream3->CR);
I never had to add those memory barriers for any other controllers so why do I have to do this for this one? I am thinking about either a load/store instructions reordering by GCC either a timing issue. I checked in errata sheet of the CPU (STM32F746) or datasheet but didn't find anything.
Here is the definition of the RCC structure:
typedef struct
{
/* ... */
__IO uint32_t AHB1ENR;
/* ... */
} RCC_TypeDef;
And here is the definition of the DMA1_Stream3 structure type:
typedef struct
{
__IO uint32_t CR; /*!< DMA stream x configuration register */
__IO uint32_t NDTR; /*!< DMA stream x number of data register */
__IO uint32_t PAR; /*!< DMA stream x peripheral address register */
__IO uint32_t M0AR; /*!< DMA stream x memory 0 address register */
__IO uint32_t M1AR; /*!< DMA stream x memory 1 address register */
__IO uint32_t FCR; /*!< DMA stream x FIFO control register */
} DMA_Stream_TypeDef;
__IO is defined as volatile.
Here is the assembly code generated for the function:
08003f4c <bsp_initSPIADIS>:
8003f4c: 4956 ldr r1, [pc, #344] ; (80040a8 <bsp_initSPIADIS+0x15c>)
8003f4e: 4a57 ldr r2, [pc, #348] ; (80040ac <bsp_initSPIADIS+0x160>)
8003f50: 6b08 ldr r0, [r1, #48] ; 0x30
8003f52: f440 7080 orr.w r0, r0, #256 ; 0x100
8003f56: b5f8 push {r3, r4, r5, r6, r7, lr}
8003f58: 6308 str r0, [r1, #48] ; 0x30
8003f5a: 2500 movs r5, #0
8003f5c: 6810 ldr r0, [r2, #0]
8003f5e: f240 3607 movw r6, #775 ; 0x307
8003f62: 4b53 ldr r3, [pc, #332] ; (80040b0 <bsp_initSPIADIS+0x164>)
8003f64: f44f 5eb8 mov.w lr, #5888 ; 0x1700
8003f68: f020 000c bic.w r0, r0, #12
8003f6c: 4c51 ldr r4, [pc, #324] ; (80040b4 <bsp_initSPIADIS+0x168>)
8003f6e: 4f52 ldr r7, [pc, #328] ; (80040b8 <bsp_initSPIADIS+0x16c>)
8003f70: 6010 str r0, [r2, #0]
8003f72: 6810 ldr r0, [r2, #0]
8003f74: f040 0008 orr.w r0, r0, #8
8003f78: 6010 str r0, [r2, #0]
8003f7a: 6890 ldr r0, [r2, #8]
8003f7c: f020 000c bic.w r0, r0, #12
8003f80: 6090 str r0, [r2, #8]
8003f82: 6890 ldr r0, [r2, #8]
8003f84: f040 0004 orr.w r0, r0, #4
8003f88: 6090 str r0, [r2, #8]
8003f8a: 6890 ldr r0, [r2, #8]
8003f8c: f040 0008 orr.w r0, r0, #8
8003f90: 6090 str r0, [r2, #8]
8003f92: 6a10 ldr r0, [r2, #32]
8003f94: f020 00f0 bic.w r0, r0, #240 ; 0xf0
8003f98: 6210 str r0, [r2, #32]
8003f9a: 6a10 ldr r0, [r2, #32]
8003f9c: f040 0010 orr.w r0, r0, #16
8003fa0: 6210 str r0, [r2, #32]
8003fa2: 6a10 ldr r0, [r2, #32]
8003fa4: f040 0040 orr.w r0, r0, #64 ; 0x40
8003fa8: 6210 str r0, [r2, #32]
8003faa: 6b08 ldr r0, [r1, #48] ; 0x30
8003fac: 4a43 ldr r2, [pc, #268] ; (80040bc <bsp_initSPIADIS+0x170>)
8003fae: f040 0002 orr.w r0, r0, #2
8003fb2: 6308 str r0, [r1, #48] ; 0x30
8003fb4: 6818 ldr r0, [r3, #0]
8003fb6: f020 5040 bic.w r0, r0, #805306368 ; 0x30000000
8003fba: 6018 str r0, [r3, #0]
8003fbc: 6818 ldr r0, [r3, #0]
8003fbe: f040 5000 orr.w r0, r0, #536870912 ; 0x20000000
8003fc2: 6018 str r0, [r3, #0]
8003fc4: 6898 ldr r0, [r3, #8]
8003fc6: f020 5040 bic.w r0, r0, #805306368 ; 0x30000000
8003fca: 6098 str r0, [r3, #8]
8003fcc: 6898 ldr r0, [r3, #8]
8003fce: f040 5080 orr.w r0, r0, #268435456 ; 0x10000000
8003fd2: 6098 str r0, [r3, #8]
8003fd4: 6898 ldr r0, [r3, #8]
8003fd6: f040 5000 orr.w r0, r0, #536870912 ; 0x20000000
8003fda: 6098 str r0, [r3, #8]
8003fdc: 6a58 ldr r0, [r3, #36] ; 0x24
8003fde: f020 6070 bic.w r0, r0, #251658240 ; 0xf000000
8003fe2: 6258 str r0, [r3, #36] ; 0x24
8003fe4: 6a58 ldr r0, [r3, #36] ; 0x24
8003fe6: f040 7080 orr.w r0, r0, #16777216 ; 0x1000000
8003fea: 6258 str r0, [r3, #36] ; 0x24
8003fec: 6a58 ldr r0, [r3, #36] ; 0x24
8003fee: f040 6080 orr.w r0, r0, #67108864 ; 0x4000000
8003ff2: 6258 str r0, [r3, #36] ; 0x24
8003ff4: 6b08 ldr r0, [r1, #48] ; 0x30
8003ff6: f040 0002 orr.w r0, r0, #2
8003ffa: 6308 str r0, [r1, #48] ; 0x30
8003ffc: 6818 ldr r0, [r3, #0]
8003ffe: f020 4040 bic.w r0, r0, #3221225472 ; 0xc0000000
8004002: 6018 str r0, [r3, #0]
8004004: 6818 ldr r0, [r3, #0]
8004006: f040 4000 orr.w r0, r0, #2147483648 ; 0x80000000
800400a: 6018 str r0, [r3, #0]
800400c: 6898 ldr r0, [r3, #8]
800400e: f020 4040 bic.w r0, r0, #3221225472 ; 0xc0000000
8004012: 6098 str r0, [r3, #8]
8004014: 6898 ldr r0, [r3, #8]
8004016: f040 4080 orr.w r0, r0, #1073741824 ; 0x40000000
800401a: 6098 str r0, [r3, #8]
800401c: 6898 ldr r0, [r3, #8]
800401e: f040 4000 orr.w r0, r0, #2147483648 ; 0x80000000
8004022: 6098 str r0, [r3, #8]
8004024: 6a58 ldr r0, [r3, #36] ; 0x24
8004026: f020 4070 bic.w r0, r0, #4026531840 ; 0xf0000000
800402a: 6258 str r0, [r3, #36] ; 0x24
800402c: 6a58 ldr r0, [r3, #36] ; 0x24
800402e: f040 5080 orr.w r0, r0, #268435456 ; 0x10000000
8004032: 6258 str r0, [r3, #36] ; 0x24
8004034: 6a58 ldr r0, [r3, #36] ; 0x24
8004036: f040 4080 orr.w r0, r0, #1073741824 ; 0x40000000
800403a: 6258 str r0, [r3, #36] ; 0x24
800403c: 6c0b ldr r3, [r1, #64] ; 0x40
800403e: 4820 ldr r0, [pc, #128] ; (80040c0 <bsp_initSPIADIS+0x174>)
8004040: f443 4380 orr.w r3, r3, #16384 ; 0x4000
8004044: 640b str r3, [r1, #64] ; 0x40
8004046: 61e5 str r5, [r4, #28]
8004048: 6026 str r6, [r4, #0]
800404a: 6823 ldr r3, [r4, #0]
800404c: 4e1d ldr r6, [pc, #116] ; (80040c4 <bsp_initSPIADIS+0x178>)
800404e: f023 0338 bic.w r3, r3, #56 ; 0x38
8004052: 6023 str r3, [r4, #0]
8004054: 6823 ldr r3, [r4, #0]
8004056: f043 0338 orr.w r3, r3, #56 ; 0x38
800405a: 6023 str r3, [r4, #0]
800405c: f8c4 e004 str.w lr, [r4, #4]
8004060: 6b0b ldr r3, [r1, #48] ; 0x30
8004062: 4c19 ldr r4, [pc, #100] ; (80040c8 <bsp_initSPIADIS+0x17c>)
8004064: f443 1300 orr.w r3, r3, #2097152 ; 0x200000
8004068: 630b str r3, [r1, #48] ; 0x30
800406a: 603a str r2, [r7, #0]
800406c: 6839 ldr r1, [r7, #0]
800406e: f000 faab bl 80045c8 <printf>
8004072: 4b16 ldr r3, [pc, #88] ; (80040cc <bsp_initSPIADIS+0x180>)
8004074: 4629 mov r1, r5
8004076: 2203 movs r2, #3
8004078: 601c str r4, [r3, #0]
800407a: 2001 movs r0, #1
800407c: f7fe f90e bl 800229c <xQueueGenericCreate>
8004080: 4629 mov r1, r5
8004082: 6170 str r0, [r6, #20]
8004084: 2203 movs r2, #3
8004086: 2001 movs r0, #1
8004088: f44f 4480 mov.w r4, #16384 ; 0x4000
800408c: f7fe f906 bl 800229c <xQueueGenericCreate>
8004090: 4b0f ldr r3, [pc, #60] ; (80040d0 <bsp_initSPIADIS+0x184>)
8004092: 2240 movs r2, #64 ; 0x40
8004094: f44f 4100 mov.w r1, #32768 ; 0x8000
8004098: 6130 str r0, [r6, #16]
800409a: 601c str r4, [r3, #0]
800409c: 6019 str r1, [r3, #0]
800409e: f883 230e strb.w r2, [r3, #782] ; 0x30e
80040a2: f883 230f strb.w r2, [r3, #783] ; 0x30f
80040a6: bdf8 pop {r3, r4, r5, r6, r7, pc}
80040a8: 40023800 andmi r3, r2, r0, lsl #16
80040ac: 40022000 andmi r2, r2, r0
80040b0: 40020400 andmi r0, r2, r0, lsl #8
80040b4: 40003800 andmi r3, r0, r0, lsl #16
80040b8: 40026058 andmi r6, r2, r8, asr r0
80040bc: 00030416 andeq r0, r3, r6, lsl r4
80040c0: 08008ec8 stmdaeq r0, {r3, r6, r7, r9, sl, fp, pc}
80040c4: 2000c2d0 ldrdcs ip, [r0], -r0
80040c8: 00030456 andeq r0, r3, r6, asr r4
80040cc: 40026070 andmi r6, r2, r0, ror r0
80040d0: e000e100 and lr, r0, r0, lsl #2
Yes after any operation enabling peripheral clock there is a bug in both STM32F4 & F7 micros. There is a small delay needed
I usually use __dsb(); for it. You can provide this delay (it is described in the errata but not for all affected uC (actually I have not found any unaffected F4 or F7)). I personally use dsb as it advised in the F4 errata, it reminds me about this bug.
2.1.5 Delay after an RCC peripheral clock enabling Description A delay between an RCC peripheral clock enable and the effective
peripheral enabling should be taken into account in order to manage
the peripheral read/write to registers. This delay depends on the
peripheral mapping: • If the peripheral is mapped on AHB: the delay
should be equal to 2 AHB cycles. • If the peripheral is mapped on APB:
the delay should be equal to 1 + (AHB/APB prescaler) cycles.
Workarounds
1. Use the DSB instructio n to stall the Cortex ®
-M4 CPU pipeline until the instruction is completed.
2. Insert “n” NOPs between the RCC enable bi t write and the peripheral register writes (n = 2 for AHB peripherals, n = 1 + AHB/
APB prescaler in case of APB peripherals).
3. Or simply insert a dummy read operation from the corresponding register just after enabling the peripheral clock.

How to align to cache line GCC ldr pc-relative

In ARM, GCC uses the PC-relative load is usually used to load constants into registers. The idea is that you store the constant relative to the instruction loading the constant. E.g. the following instruction can be used to load a constant from the address PC+8+offset
ldr r0, [pc, #offset]
As result, the .text segment interleaves instructions and data. The latter usually stored at the end of function's code. E.g.
00010860 <call_weak_fn>:
10860: e59f3014 ldr r3, [pc, #20] ; 1087c <call_weak_fn+0x1c>
10864: e59f2014 ldr r2, [pc, #20] ; 10880 <call_weak_fn+0x20>
10868: e08f3003 add r3, pc, r3
1086c: e7932002 ldr r2, [r3, r2]
10870: e3520000 cmp r2, #0
10874: 012fff1e bxeq lr
10878: e1a00000 nop ; (mov r0, r0)
1087c: 00089790 muleq r8, r0, r7
10880: 00000074 andeq r0, r0, r4, ror r0
For a research project, I would like to ensure that code and constant never reside on the same cache line (i.e. block 64 bytes aligned).
Is it possible to align the constants generated by GCC?

Indirect function call uses odd address

When the GCC 4.7.3 (20121207) for ARM Cortex-M3 takes the address of a function it doesn't get the exact address of the function. I can see an off-by-one in that pointer.
// assume at address 0x00001204;
int foo() {
return 42;
}
void bar() {
int(*p)() = &foo; // p = 0x1205;
p(); // executed successfully
foo(); // assembly: "bl 0x00001204;"
}
Although the pointer points to an odd address, the execution is successful. I would expect an exception at this point. Why does it takes that strange address and why doesn't it hurt.
Edit
The SO article describes a difference between thumb and ARM mode. Why is that offset not visible when the function is called directly although the CPU is in the same mode?
Should the odd address be kept or would resetting the bit 0 cause hard? (what I could not see until now)
I cobbled up something from one of my examples to quickly demonstrate what is going on.
vectors.s:
/* vectors.s */
.cpu cortex-m3
.thumb
.word 0x20002000 /* stack top address */
.word _start /* 1 Reset */
.word hang /* 2 NMI */
.word hello /* 3 HardFault */
.word hang /* 4 MemManage */
.word hang /* 5 BusFault */
.word hang /* 6 UsageFault */
.word hang /* 7 RESERVED */
.word hang /* 8 RESERVED */
.word hang /* 9 RESERVED*/
.word hang /* 10 RESERVED */
.word hang /* 11 SVCall */
.word hang /* 12 Debug Monitor */
.word hang /* 13 RESERVED */
.word hang /* 14 PendSV */
.word hang /* 15 SysTick */
.word hang /* 16 External Interrupt(0) */
.word hang /* 17 External Interrupt(1) */
.word hang /* 18 External Interrupt(2) */
.word hang /* 19 ... */
.thumb_func
.global _start
_start:
/*ldr r0,stacktop */
/*mov sp,r0*/
bl notmain
ldr r0,=notmain
mov lr,pc
bx r0
b hang
.thumb_func
hang: b .
hello: b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.end
blinker01.c:
extern void PUT32 ( unsigned int, unsigned int );
int notmain ( void )
{
PUT32(0x12345678,0xAABBCCDD);
return(0);
}
Makefile:
#ARMGNU = arm-none-eabi
ARMGNU = arm-none-linux-gnueabi
AOPS = --warn --fatal-warnings
COPS = -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding
all : blinker01.gcc.thumb.bin
vectors.o : vectors.s
$(ARMGNU)-as vectors.s -o vectors.o
blinker01.gcc.thumb.o : blinker01.c
$(ARMGNU)-gcc $(COPS) -mthumb -c blinker01.c -o blinker01.gcc.thumb.o
blinker01.gcc.thumb2.o : blinker01.c
$(ARMGNU)-gcc $(COPS) -mthumb -mcpu=cortex-m3 -march=armv7-m -c blinker01.c -o blinker01.gcc.thumb2.o
blinker01.gcc.thumb.bin : memmap vectors.o blinker01.gcc.thumb.o
$(ARMGNU)-ld -o blinker01.gcc.thumb.elf -T memmap vectors.o blinker01.gcc.thumb.o
$(ARMGNU)-objdump -D blinker01.gcc.thumb.elf > blinker01.gcc.thumb.list
$(ARMGNU)-objcopy blinker01.gcc.thumb.elf blinker01.gcc.thumb.bin -O binary
Disassembly:
Disassembly of section .text:
08000000 <_start-0x50>:
8000000: 20002000 andcs r2, r0, r0
8000004: 08000051 stmdaeq r0, {r0, r4, r6}
8000008: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800000c: 0800005e stmdaeq r0, {r1, r2, r3, r4, r6}
8000010: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000014: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000018: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800001c: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000020: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000024: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000028: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800002c: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000030: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000034: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000038: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800003c: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000040: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000044: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
8000048: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
800004c: 0800005d stmdaeq r0, {r0, r2, r3, r4, r6}
08000050 <_start>:
8000050: f000 f80a bl 8000068 <notmain>
8000054: 4803 ldr r0, [pc, #12] ; (8000064 <PUT32+0x4>)
8000056: 46fe mov lr, pc
8000058: 4700 bx r0
800005a: e7ff b.n 800005c <hang>
0800005c <hang>:
800005c: e7fe b.n 800005c <hang>
0800005e <hello>:
800005e: e7fe b.n 800005e <hello>
08000060 <PUT32>:
8000060: 6001 str r1, [r0, #0]
8000062: 4770 bx lr
8000064: 08000069 stmdaeq r0, {r0, r3, r5, r6}
08000068 <notmain>:
8000068: b508 push {r3, lr}
800006a: 4803 ldr r0, [pc, #12] ; (8000078 <notmain+0x10>)
800006c: 4903 ldr r1, [pc, #12] ; (800007c <notmain+0x14>)
800006e: f7ff fff7 bl 8000060 <PUT32>
8000072: 2000 movs r0, #0
8000074: bd08 pop {r3, pc}
8000076: 46c0 nop ; (mov r8, r8)
8000078: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
800007c: aabbccdd bge 6ef33f8 <_start-0x110cc58>
First off note hang vs hello, this is a gnuism you need to, in assembly, declare a label to be a thumb function in order for it to actually work for this kind of thing. hang is properly declared and the vector table properly uses the odd address, hello is not properly declared and the even address is put in there. C compiled code automatically does this properly.
Here is a prime example of what you are asking though, bl to the C function notmain does not, cannot, use an odd address. But to use bx you ask for the address to the function main and that address is provided to the code as 0x8000069 for for a function at address 0x8000068, if you did a bx to 0x800068 on an ARMvsometingT it would switch to arm mode and crash eventually if it hit thumb mode (hopefully crash and not stumble along) on a cortex-m a bx to an even address should fault immediately.
08000050 <_start>:
8000050: f000 f80a bl 8000068 <notmain>
8000054: 4803 ldr r0, [pc, #12] ; (8000064 <PUT32+0x4>)
8000056: 46fe mov lr, pc
8000058: 4700 bx r0
800005a: e7ff b.n 800005c <hang>
8000064: 08000069 stmdaeq r0, {r0, r3, r5, r6}
Why can't bl be odd? Look at the encoding above bl from 0x8000050 to 0x8000068, the pc is two ahead so 4 byte so take 0x8000068 - 0x8000054 = 0x14 divide that by 2 and you get 0x00A. That is the offset to the pc and that is what is encoded in the instructions (the 0A in the second half of the instruction). The divide by two is based on knowledge that thumb instructions are always 2 bytes (well at the time) and so they can reach twice as far if they put the offset in 2 byte instructions rather than in bytes. So the lsbit is lost of the delta between the two, so controlled by the hardware.
What your code did was in one place you asked for the address of a thumb function which gives the odd address, the other case was looking at the disassembly of a branch link which is always even.

Resources