I have a C program in file delay.c:
void delay(int num)
{
volatile int i;
for(i=0; i<num; i++);
}
Then I compile the program with gcc 4.6.3 on ARM emulator (armel, more specifically) with command gcc -g -O1 -o delay.o delay.c. The assembly in delay.o is:
00000000 <delay>:
0: e24dd008 sub sp, sp, #8
4: e3a03000 mov r3, #0
8: e58d3004 str r3, [sp, #4]
c: e59d3004 ldr r3, [sp, #4]
10: e1500003 cmp r0, r3
14: da000005 ble 30 <delay+0x30>
18: e59d3004 ldr r3, [sp, #4]
1c: e2833001 add r3, r3, #1
20: e58d3004 str r3, [sp, #4]
24: e59d3004 ldr r3, [sp, #4]
28: e1530000 cmp r3, r0
2c: bafffff9 blt 18 <delay+0x18>
30: e28dd008 add sp, sp, #8
34: e12fff1e bx lr
I want to figure out where the variable i is on the stack of function delay from debugging information. Below is the information about delay and i in .debug_info section:
<1><25>: Abbrev Number: 2 (DW_TAG_subprogram)
<26> DW_AT_external : 1
<27> DW_AT_name : (indirect string, offset: 0x19): delay
<2b> DW_AT_decl_file : 1
<2c> DW_AT_decl_line : 1
<2d> DW_AT_prototyped : 1
<2e> DW_AT_low_pc : 0x0
<32> DW_AT_high_pc : 0x38
<36> DW_AT_frame_base : 0x0 (location list)
<3a> DW_AT_sibling : <0x59>
...
<2><4b>: Abbrev Number: 4 (DW_TAG_variable)
<4c> DW_AT_name : i
<4e> DW_AT_decl_file : 1
<4f> DW_AT_decl_line : 3
<50> DW_AT_type : <0x60>
<54> DW_AT_location : 0x20 (location list)
It shows that the location of i is in the location list. So I output the location list:
Offset Begin End Expression
00000000 00000000 00000004 (DW_OP_breg13 (r13): 0)
00000000 00000004 00000038 (DW_OP_breg13 (r13): 8)
00000000 <End of list>
00000020 0000000c 00000020 (DW_OP_fbreg: -12)
00000020 00000024 00000028 (DW_OP_reg3 (r3))
00000020 00000028 00000038 (DW_OP_fbreg: -12)
00000020 <End of list>
From address 4 to 38, the frame base of delay should be r13 + 8. So from address c to 20 and from address 28 to 38, the location of i is r13 + 8 -12 = r13 - 4.
However, from the assembly, we can know that there is no location r13 - 4 and i is apparently at location r13 + 4.
Do I miss some calculation step? Anyone can explain the difference of i's location between calculation from debugging information and in assembly?
Thanks in advance!
TL;DR The analysis in the question is correct and the discrepancy is a bug in one of the gcc components (GNU Arm Embedded Toolchain is an obvious place to log one).
As it stands, this other answer is incorrect because it erroneously conflates the value of the stack pointer on evaluation of a location expression with the earlier value of the stack pointer on entry to the function.
As far as the DWARF is concerned, the location of i varies with the program counter. Consider, for example, the text address delay+0x18. At this point, the location of i is given by DW_OP_fbreg(-12), i.e. 12 bytes below the frame base. The frame base is given by the parent DW_TAG_subprogram's DW_AT_frame_base attribute which, in this case, is also dependent on the program counter: for delay+0x18 its expression is DW_OP_breg13(8), i.e. r13 + 8. Importantly, this calculation uses the current value of r13, i.e. the value of r13 when the program counter is equal to delay+0x18.
Thus the DWARF asserts that, at delay+0x18, i is located at r13 + 8 - 12, i.e. 4 bytes below the bottom of the existing stack. Inspection of the assembly shows that, at delay+018, i should be found 4 bytes above the bottom of the stack. Therefore the DWARF is in error and whatever generated it is defective.
One can demonstrate the bug using gdb with a simple wrapper around the test case provided in the question:
$ cat delay.c
void delay(int num)
{
volatile int i;
for(i=0; i<num; i++);
}
$ gcc-4.6 -g -O1 -c delay.c
$ cat main.c
void delay(int);
int main(int argc, char **argv) {
delay(3);
}
$ gcc-4.6 -o test main.c delay.o
$ gdb ./test
.
.
.
(gdb)
Set a breakpoint at delay+0x18 and run to the second occurrence (where we expect i to be 1):
(gdb) break *delay+0x18
Breakpoint 1 at 0x103cc: file delay.c, line 4.
(gdb) run
Starting program: /home/pi/test
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) cont
Continuing.
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb)
We know from the disassembly that i is four bytes above the stack pointer. Indeed, there it is:
(gdb) print *((int *)($r13 + 4))
$1 = 1
(gdb)
However, the bogus DWARF means that gdb looks in the wrong place:
(gdb) print i
$2 = 0
(gdb)
As explained above, the DWARF is incorrectly giving the location of i at four bytes below the stack pointer. There's a zero there, hence the reported value of i:
(gdb) print *((int *)($r13 - 4))
$3 = 0
(gdb)
This isn't a coincidence. A magic number written into this bogus location below the stack pointer reappears when gdb is asked to print i:
(gdb) set *((int *)($r13 - 4)) = 42
(gdb) print i
$6 = 42
(gdb)
Thus, at delay+0x18, the DWARF incorrectly encodes the location of i as r13 - 4 even though its true location is r13 + 4.
One can go a step further by editing the compilation unit by hand and replacing DW_OP_fbreg(-12) (bytes 0x91 0x74) with DW_OP_fbreg(-4) (bytes 0x91 0x7c). This gives
$ readelf --debug-dump=loc delay.modified.o
Contents of the .debug_loc section:
Offset Begin End Expression
00000000 00000000 00000004 (DW_OP_breg13 (r13): 0)
0000000c 00000004 00000038 (DW_OP_breg13 (r13): 8)
00000018 <End of list>
00000020 0000000c 00000020 (DW_OP_fbreg: -4)
0000002c 00000024 00000028 (DW_OP_reg3 (r3))
00000037 00000028 00000038 (DW_OP_fbreg: -4)
00000043 <End of list>
$
In other words, the DWARF has been corrected so that at, e.g., delay+0x18 the location of i is given as frame base - 4 = r13 + 8 - 4 = r13 + 4, matching the assembly. Repeating the gdb experiment with the corrected DWARF shows the expected value of i each time around the loop:
$ gcc-4.6 -o test.modified main.c delay.modified.o
$ gdb ./test.modified
.
.
.
(gdb) break *delay+0x18
Breakpoint 1 at 0x103cc: file delay.c, line 4.
(gdb) run
Starting program: /home/pi/test.modified
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) print i
$1 = 0
(gdb) cont
Continuing.
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) print i
$2 = 1
(gdb) cont
Continuing.
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) print i
$3 = 2
(gdb) cont
Continuing.
[Inferior 1 (process 30954) exited with code 03]
(gdb)
I am not agree with the OP's asm analysis:
00000000 <delay>: ; so far, let's suppose sp = sp(0)
0: e24dd008 sub sp, sp, #8 ; sp = sp(0) - 8
4: e3a03000 mov r3, #0 ; r3 = 0
8: e58d3004 str r3, [sp, #4] ; store the value of r3 in (sp + 4)
c: e59d3004 ldr r3, [sp, #4] ; load (sp + 4) in r3
10: e1500003 cmp r0, r3 ; compare r3 and r0
14: da000005 ble 30 <delay+0x30> ; go to end of loop
18: e59d3004 ldr r3, [sp, #4] ; i is in r3, and it is being loaded from
; (sp + 4), that is,
; sp(i) = sp(0) - 8 + 4 = sp(0) - 4
1c: e2833001 add r3, r3, #1 ; r3 = r3 + 1, that is, increment i
20: e58d3004 str r3, [sp, #4] ; store i (which is in r3) in (sp + 4),
; being again sp(i) = sp(0) - 8 + 4 = \
; sp(0) - 4
24: e59d3004 ldr r3, [sp, #4] ; load sp + 4 in r3
28: e1530000 cmp r3, r0 ; compare r3 and r0
2c: bafffff9 blt 18 <delay+0x18> ; go to init of loop
30: e28dd008 add sp, sp, #8 ; sp = sp + 8
34: e12fff1e bx lr ;
So i is located in sp(0) - 4, which matchs with the dwarf analysis (which says that i is being located in 0 + 8 - 12)
Edit in order to add information regarding my DWARF analysis:
According to this line: 00000020 0000000c 00000020 (DW_OP_fbreg: -12) , being DW_OP_fbreg :
The DW_OP_fbreg operation provides a signed LEB128 offset from
the address specified by
the location description in the DW_AT_frame_base attribute of the
current function. (This is
typically a “stack pointer” register plus or minus some offset.
On more sophisticated systems
it might be a location list that adjusts the offset according to
changes in the stack pointer as
the PC changes.)
,the address is frame_base + offset, where:
frame_base : is the stack pointer +/- some offset, and according to the previous line (00000000 00000004 00000038 (DW_OP_breg13 (r13): 8)), from 00000004 to 00000038, it has an offset of +8 (r13 is SP)
offset: obviously it is -12
Given that, DWARF indicates that it is pointing to sp(0) + 8 - 12 = sp(0) - 4
Related
So I'm having trouble with my program. It's supposed to read in a text file
that has a number on each line. It then stores that in an array, sorts it using selection sort, and then outputs it to a new file. The reading of and writing to the file work perfectly fine but my code for the sort isn't working properly. When I run the program, it only seems to store some of the numbers
in the array and then a bunch of zeroes.
So if my input is 112323, 32, 12, 19, 2, 1, 23. The output is 0,0,0,0, 2,1,23. I'm pretty sure the problem's with how I'm storing and loading from the array
onto the registers because assuming that part works, I can't find any reason why the selection sort algorithm shouldn't work.
Ok thanks to your help, I figured out that I needed to change the load and store instruction so that it matches the specifier used (ldr -> ldrb and str -> strb). But I need to make a sorting algorithm that works for 32 bit numbers so which combination of specifiers and load/store instructions would allow me to do that? Or would I have to load/store 8 bits a time? And if so, how would I do that?
.data
.balign 4
readfile: .asciz "myfile.txt"
.balign 4
readmode: .asciz "r"
.balign 4
writefile: .asciz "output.txt"
.balign 4
writemode: .asciz "w"
.balign 4
return: .word 0
.balign 4
scanformat: .asciz "%d"
.balign 4
printformat: .asciz "%d\n"
.balign 4
a: .space 32
.text
.global main
.global fopen
.global fprintf
.global fclose
.global fscanf
.global printf
main:
ldr r1, =return
str lr, [r1]
ldr r0, =readfile
ldr r1, =readmode
bl fopen
mov r4, r0
mov r5, #0
ldr r6, =a
loop:
cmp r5, #7
beq sort
mov r0, r4
ldr r1, =scanformat
mov r2, r6
bl fscanf
add r5, r5, #1
add r6, r6, #1
b loop
sort:
mov r5,#0 /*array parser for first loop*/
mov r6, #0 /* #stores index of minimum*/
mov r7, #0 /* #temp*/
mov r8, #0 /*# array parser for second loop*/
mov r9, #7 /*# stores length of array*/
ldr r10, =a /*# the array*/
mov r11, #0 /*#used to obtain offset for min*/
mov r12, #0 /*# used to obtain offset for second parser access*/
loop3:
cmp r5, r9 /*# check if first parser reached end of array*/
beq write /* #if it did array is sorted write it to file*/
mov r6, r5 /*#set the min index to the current position*/
mov r8, r6 /*#set the second parser to where first parser is at*/
b loop4 /*#start looking for min in this subarray*/
loop4:
cmp r8, r9 /* #if reached end of list min is found*/
beq increment /* #get out of this loop and increment 1st parser**/
lsl r7, r6, #3 /*multiplies min index by 8 */
ADD r7, r10, r7 /* adds offset to r10 address storing it in r7 */
ldr r11, [r7] /* loads value of min in r11 */
lsl r7, r8, #3 /* multiplies second parse index by 8 */
ADD r7, r10, r7 /* adds offset to r10 address storing in r7 */
ldr r12, [r7] /* loads value of second parse into r12 */
cmp r11, r12 /* #compare current min to the current position of 2nd parser !!!!!*/
movgt r6, r8 /*# set new min to current position of second parser */
add r8, r8, #1 /*increment second parser*/
b loop4 /*repeat */
increment:
lsl r11, r5, #3 /* multiplies first parse index by 8 */
ADD r11, r10, r11 /* adds offset to r10 address stored in r11*/
ldr r8, [r11] /* loads value in memory address in r11 to r8*/
lsl r12, r6, #3 /*multiplies min index by 8 */
ADD r12, r10, r12 /*ads offset to r10 address stored in r12 */
ldr r7, [r12] /* loads value in memory address in r12 to r7 */
str r8, [r12] /* # stores value of first parser where min was !!!!!*/
str r7, [r11] /*# store value of min where first parser was !!!!!*/
add r5, r5, #1 /*#increment the first parser*/
ldr r0,=printformat
mov r1, r7
bl printf
b loop3 /*#go to loop1*/
write:
mov r0, r4
bl fclose
ldr r0, =writefile
ldr r1, =writemode
bl fopen
mov r4, r0
mov r5, #0
ldr r6, =a
loop2:
cmp r5, #7
beq end
mov r0, r4
ldr r1, =printformat
ldrb r2, [r6]
bl fprintf
add r5, r5, #1
add r6, r6, #1
b loop2
end:
mov r0, r4
bl fclose
ldr r0, =a
ldr r0, [r0]
ldr lr, =return
ldr lr, [lr]
bx lr
I figured out that I needed to change the load and store instruction
so that it matches the specifier used (ldr -> ldrb and str -> strb).
But I need to make a sorting algorithm that works for 32 bit numbers
so which combination of specifiers and load/store instructions would
allow me to do that?
If you want to read 32b (4 bytes) values from memory, you have to have 4 bytes values in memory to begin with. Well that should not be surprising :)
Eg if your input is numbers 1, 2, 3, 4, each number is 32b value than in memory that would be
0x00000000: 01 00 00 00 | 02 00 00 00 <- 32b values of 1 & 2
0x00000008: 03 00 00 00 | 04 00 00 00 <- 32b values of 3 & 4
In such case ldr would read 32b each time and you would get 1, 2, 3, 4 with each read in register.
Now, you have in memory byte values (based on your statement that `ldrb` gives right result), eg
0x00000000: 01
0x00000001: 02
0x00000002: 03
0x00000003: 04
or same in one line
0x00000000: 01 02 03 04
So reading 8bit by ldrb gives you numbers 1, 2, 3, 4
But ldr would do read 32b value from memory (all 4 bytes at once) and you would get 32b value 0x04030201 in register.
Note: examples for little-endian systems
I'm trying to get familiar with the linking and startup procedures in ARM Cortex-M4 microcontrollers. Looking through the linker scripts almost all the sections are marked as loadable.
At first I thought that meant it would be copied from flash to RAM, but then I learned that doing that is handled in a different way. So what does it mean for a section in flash to be loadable? Isn't it already loaded and run from the location in flash? Also, I'm referring to a section section containing instructions.
Does loadable in this context mean loading by the debugger into the device?
a fully functional cortex-m program
flash.s
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl dummy
dummy:
bx lr
so.c
void dummy ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<10;ra++) dummy(ra);
return(0);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -c so.c -o so.o
arm-none-eabi-ld -o so.elf -T flash.ld flash.o so.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
(can use cortex-m4 either works)
so.list
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000011 andeq r0, r0, r1, lsl r0
8: 00000017 andeq r0, r0, r7, lsl r0
c: 00000017 andeq r0, r0, r7, lsl r0
00000010 <reset>:
10: f000 f804 bl 1c <notmain>
14: e7ff b.n 16 <hang>
00000016 <hang>:
16: e7fe b.n 16 <hang>
00000018 <dummy>:
18: 4770 bx lr
1a: 46c0 nop ; (mov r8, r8)
0000001c <notmain>:
1c: b510 push {r4, lr}
1e: 2400 movs r4, #0
20: 0020 movs r0, r4
22: 3401 adds r4, #1
24: f7ff fff8 bl 18 <dummy>
28: 2c0a cmp r4, #10
2a: d1f9 bne.n 20 <notmain+0x4>
2c: 2000 movs r0, #0
2e: bc10 pop {r4}
30: bc02 pop {r1}
32: 4708 bx r1
being a less complicated linker script and program the elf has less stuff
part of readelf
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 010000 000034 00 AX 0 0 4
[ 2] .ARM.attributes ARM_ATTRIBUTES 00000000 010034 00002d 00 0 0 1
[ 3] .comment PROGBITS 00000000 010061 000011 01 MS 0 0 1
[ 4] .symtab SYMTAB 00000000 010074 0000f0 10 5 12 4
[ 5] .strtab STRTAB 00000000 010164 00003d 00 0 0 1
[ 6] .shstrtab STRTAB 00000000 0101a1 00003a 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
y (purecode), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x00000000 0x00000000 0x00034 0x00034 R E 0x10000
hexdump -C so.bin
00000000 00 10 00 20 11 00 00 00 17 00 00 00 17 00 00 00 |... ............|
00000010 00 f0 04 f8 ff e7 fe e7 70 47 c0 46 10 b5 00 24 |........pG.F...$|
00000020 20 00 01 34 ff f7 f8 ff 0a 2c f9 d1 00 20 10 bc | ..4.....,... ..|
00000030 02 bc 08 47 |...G|
00000034
a "binary" comes in many flavors, it is an unfortunately poorly named term as it is so confusing. Everything you see in the disassembly above is required for the program to run, that is the real binary part of this, it has to be loaded wherever you need it loaded to run. if on an operating system then the operating system reads the elf file extracts the loadable sections with their addresses/offsets and loads them into memory before launching at the entry point. Take advantage of tools already having the elf file format and we can re-use some of that for microcontrollers, we cant/dont normally use the entry point as that makes no sense, we have to make the vector/entry point match what the hardware needs, in this case a vector table.
The hexdump coming from the objcopy also shows the parts we have to have visible to the processor for the program to run, or loaded in address/memory space (flash is in memory or address space in this case).
But the "binary" file, the elf also contains debug symbols in case you want to pull up a debugger, add some more options on the toolchain command line and you get even more information about where these items are in the source code file so you could in some cases see the high level langauge when single stepping through code. Those are not loadable sections, they are descriptive of the program or just there to help out, they are not machine code nor data that the processor needs to execute the program so they do not need to be loaded into the memory space.
yet another "binary" file format
arm-none-eabi-objcopy so.elf -O ihex so.hex
cat so.hex
:100000000010002011000000170000001700000081
:1000100000F004F8FFE7FEE77047C04610B5002483
:1000200020000134FFF7F8FF0A2CF9D1002010BCA2
:0400300002BC0847BF
:00000001FF
it has a little extra information but almost all of it is the part we have to load into the address space for the program to run
another binary file format
S00A0000736F2E7372656338
S1130000001000201100000017000000170000007D
S113001000F004F8FFE7FEE77047C04610B500247F
S113002020000134FFF7F8FF0A2CF9D1002010BC9E
S107003002BC0847BB
S9030000FC
also most of it is program.
elf is just another file format (fairly popular with gnu tools, but still just another file format), it contains the machine code and data required plus a bunch of other stuff, the machine code and data is what we have to load into ram if this is an operating system, or what we ideally load into flash for a microcontroller, but not all microcontrollers are equal some are ram only based and the program is downloaded via usb during enumeration (into ram). and other solutions, or if debugging you could probably have items loaded into ram depending on the mcu and tools, although that is not how the thing boots so that wouldnt really be a good binary.
if you feel the need to zero .bss and to have any .data then you need additional information, the offset and size of .bss and the offset and contents of .data, the bootstrap then zeros and copies those items, and you need that information in the non-volatile flash/rom, it is just more data that is required of the machine code and data needed to run the program. If you are letting others write the code for you then there are possibly tailored linker scripts and bootstrap code that allows you to just have .data items and push a build button on a gui and it all magically is in place when your entry point (main() by convention and or standard) starts execution or the code that represents your high level code at the C entry point.
Consider the following code:
extern unsigned int foo(char c, char **p, unsigned int *n);
unsigned int test(const char *s, char **p, unsigned int *n)
{
unsigned int done = 0;
while (*s)
done += foo(*s++, p, n);
return done;
}
Output in Assembly:
00000000 <test>:
0: b5f8 push {r3, r4, r5, r6, r7, lr}
2: 0005 movs r5, r0
4: 000e movs r6, r1
6: 0017 movs r7, r2
8: 2400 movs r4, #0
a: 7828 ldrb r0, [r5, #0]
c: 2800 cmp r0, #0
e: d101 bne.n 14 <test+0x14>
10: 0020 movs r0, r4
12: bdf8 pop {r3, r4, r5, r6, r7, pc}
14: 003a movs r2, r7
16: 0031 movs r1, r6
18: f7ff fffe bl 0 <foo>
1c: 3501 adds r5, #1
1e: 1824 adds r4, r4, r0
20: e7f3 b.n a <test+0xa>
C code compiled using arm-none-eabi-gcc versions: 4.9.1, 5.4.0, 6.3.0 and 7.1.0 on
Linux host. Assembly output is the same for all GCC versions.
CFLAGS := -Os -march=armv6-m -mcpu=cortex-m0plus -mthumb
My understanding of the execution flow is following:
Push R3-R7 + LR onto the stack (totally unclear)
Move R0 to R5 (this is clear)
Move R1 to R6 and R2 to R7 (totally unclear)
Dereference R5 into R0 (This is clear)
Compare R0 with 0 (This is clear)
If R0 != 0 go to line 14: - Restore R1 from R6 and R2 from R7 and call foo(),
If R0 == 0 stay at line 10, restore R3 - R7 + PC from stack (totally unclear)
Increment R5 (clear)
accumulate result from foo() (clear)
Branch back to line a: (clear)
My own Assembly. Not extensively tested, but definitely I would not need more than R4 + LR to be pushed onto the stack:
EDIT: According to the provided answers, my example from below will fail due to R1 and R2 not being persistent through call to foo()
51 unsigned int __attribute__((naked)) test_asm(const char *s, char **p, unsigned int *n)
52 {
53 // r0 - *s (move ptr to r3 and dereference it to r0)
54 // r1 - **p
55 // r2 - *n
56 asm volatile(
57 " push {r4, lr} \n\t"
58 " movs r4, #0 \n\t"
59 " movs r3, r0 \n\t"
60 "1: \n\t"
61 " ldrb r0, [r3, #0] \n\t"
62 " cmp r0, #0 \n\t"
63 " beq 2f \n\t"
64 " bl foo \n\t"
65 " add r4, r4, r0 \n\t"
66 " add r3, #1 \n\t"
67 " b 1b \n\t"
68 "2: \n\t"
69 " movs r0, r4 \n\t"
70 " pop {r4, pc} \n\t"
71 );
72 }
Questions:
Why GCC stores so many registers for such trivial function?
Why it pushes R3 while it is written in ABI that R0-R3 are argument registers
and supposed to be a caller save and should be safely used inside called function
in this case test()
Why it copy R1 to R6 and R2 to R7 while the prototype of extern function almost
ideally matches the test() function. So R1 and R2 are already ready to be passed
to foo() routine. My understanding is that only R0 need to be dereferenced before
call to foo()
LR must be saved since test is not a leaf function. r5-r7 are used by the function to store values that are used across function calls and since they are not scratch they must be saved. r3 is pushed to align the stack.
Adding an extra register to push is a fast and compact way to align the stack.
r1 and r2 may be trashed by the call to foo and since the values initially stored in these registers are needed after the call they must be stored in a location that survive calls.
I am working on a project for an ARM Cortex-M3 (SiLabs) SOC. I need to move the interrupt vector [edit] and code away from the bottom of flash to make room for a "boot loader". The boot loader starts at address 0 to come up when the core comes out of reset. Its function is to validate the main image, loaded at a higher address and possibly replace that main image with a new one.
Therefore, the boot loader would have its vector table at 0, followed by its code. At a higher, fixed address, say 8KB, would be the main image, starting with its vector table.
I have found this page which describes the Vector Table Offset Register which the boot loader can use (with interrupts masked, obviously) to point the hardware to the new vector table.
My question is how to get the "main" image linked so that it will work when written to flash, starting not at zero. I'm not familiar with ARM assembly but I assume the code is not position independent.
I'm using SiLabs's Precision32 IDE which uses gcc for the toolchain. I've found how to add linker flags. My question is what gcc flag(s) will provide the change to the base of the vector table and code.
Thank you.
vectors.s
.cpu cortex-m3
.thumb
.word 0x20008000 /* stack top address */
.word _start /* Reset */
.word hang
.word hang
/* ... */
.thumb_func
hang: b .
.thumb_func
.globl _start
_start:
bl notmain
b hang
notmain.c
extern void fun ( unsigned int );
void notmain ( void )
{
fun(7);
}
fun.c
void fun ( unsigned int x )
{
}
Makefile
hello.elf : vectors.s fun.c notmain.c memmap
arm-none-eabi-as vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c notmain.c -o notmain.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -march=armv7-m -c fun.c -o fun.o
arm-none-eabi-ld -o hello.elf -T memmap vectors.o notmain.o fun.o
arm-none-eabi-objdump -D hello.elf > hello.list
arm-none-eabi-objcopy hello.elf -O binary hello.bin
so if the linker script (memmap is the name I used) looks like this
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x40000
ram : ORIGIN = 0x20000000, LENGTH = 0x8000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram
}
since all of the above is .text, no .bss nor .data, the linker takes the objects as listed on the ld command line and places them starting at that address...
mbly of section .text:
00000000 <hang-0x10>:
0: 20008000 andcs r8, r0, r0
4: 00000013 andeq r0, r0, r3, lsl r0
8: 00000011 andeq r0, r0, r1, lsl r0
c: 00000011 andeq r0, r0, r1, lsl r0
00000010 <hang>:
10: e7fe b.n 10 <hang>
00000012 <_start>:
12: f000 f801 bl 18 <notmain>
16: e7fb b.n 10 <hang>
00000018 <notmain>:
18: 2007 movs r0, #7
1a: f000 b801 b.w 20 <fun>
1e: bf00 nop
00000020 <fun>:
20: 4770 bx lr
22: bf00 nop
So for this to work you have to be careful to put your bootstrap code first on the command line. But you can also do things like this with a linker script.
the order appears to matter, listing the specific object files first then the generic .text later
MEMORY
{
romx : ORIGIN = 0x00000000, LENGTH = 0x1000
romy : ORIGIN = 0x00010000, LENGTH = 0x1000
ram : ORIGIN = 0x00030000, LENGTH = 0x1000
bob : ORIGIN = 0x00040000, LENGTH = 0x1000
ted : ORIGIN = 0x00050000, LENGTH = 0x1000
}
SECTIONS
{
abc : { vectors.o } > romx
def : { fun.o } > ted
.text : { *(.text*) } > romy
.bss : { *(.bss*) } > ram
}
and we get this
00000000 <hang-0x10>:
0: 20008000 andcs r8, r0, r0
4: 00000013 andeq r0, r0, r3, lsl r0
8: 00000011 andeq r0, r0, r1, lsl r0
c: 00000011 andeq r0, r0, r1, lsl r0
00000010 <hang>:
10: e7fe b.n 10 <hang>
00000012 <_start>:
12: f00f fff5 bl 10000 <notmain>
16: e7fb b.n 10 <hang>
Disassembly of section def:
00050000 <fun>:
50000: 4770 bx lr
50002: bf00 nop
Disassembly of section .text:
00010000 <notmain>:
10000: 2007 movs r0, #7
10002: f03f bffd b.w 50000 <fun>
10006: bf00 nop
the short answer is with gnu tools you use a linker script to manipulate where things end up, I assume you want these functions to be in the rom at some specified location. I dont quite understand exactly what you are doing. But if for example you are trying to put something simple like the branch to main() in the flash initially with main() being deeper in the flash, then somehow either through whatever code is deeper in the flash or through some other method, then later you erase and reprogram only the stuff near zero. you will still need a simple branch to main() the first time. you can force what I am calling vectors.o to be at address zero then .text can be deeper in the flash putting all of the reset of the code basically there, then leave that in flash and replace just the stuff at zero.
like this
MEMORY
{
romx : ORIGIN = 0x00000000, LENGTH = 0x1000
romy : ORIGIN = 0x00010000, LENGTH = 0x1000
ram : ORIGIN = 0x00030000, LENGTH = 0x1000
bob : ORIGIN = 0x00040000, LENGTH = 0x1000
ted : ORIGIN = 0x00050000, LENGTH = 0x1000
}
SECTIONS
{
abc : { vectors.o } > romx
.text : { *(.text*) } > romy
.bss : { *(.bss*) } > ram
}
giving
00000000 <hang-0x10>:
0: 20008000 andcs r8, r0, r0
4: 00000013 andeq r0, r0, r3, lsl r0
8: 00000011 andeq r0, r0, r1, lsl r0
c: 00000011 andeq r0, r0, r1, lsl r0
00000010 <hang>:
10: e7fe b.n 10 <hang>
00000012 <_start>:
12: f00f fff7 bl 10004 <notmain>
16: e7fb b.n 10 <hang>
Disassembly of section .text:
00010000 <fun>:
10000: 4770 bx lr
10002: bf00 nop
00010004 <notmain>:
10004: 2007 movs r0, #7
10006: f7ff bffb b.w 10000 <fun>
1000a: bf00 nop
then leave the 0x10000 stuff and replace 0x00000 stuff later.
Anyway, the short answer is linker script you need to craft a linker script to put things where you want them. gnu linker scripts can get extremely complicated, I lean toward the simple.
If you want to place everything at some other address, including your vector table, then perhaps something like this:
hop.s
.cpu cortex-m3
.thumb
.word 0x20008000 /* stack top address */
.word _start /* Reset */
.word hang
.word hang
/* ... */
.thumb_func
hang: b .
.thumb_func
ldr r0,=_start
bx r0
and this
MEMORY
{
romx : ORIGIN = 0x00000000, LENGTH = 0x1000
romy : ORIGIN = 0x00010000, LENGTH = 0x1000
ram : ORIGIN = 0x00030000, LENGTH = 0x1000
bob : ORIGIN = 0x00040000, LENGTH = 0x1000
ted : ORIGIN = 0x00050000, LENGTH = 0x1000
}
SECTIONS
{
abc : { hop.o } > romx
.text : { *(.text*) } > romy
.bss : { *(.bss*) } > ram
}
gives this
Disassembly of section abc:
00000000 <hang-0x10>:
0: 20008000 andcs r8, r0, r0
4: 00010013 andeq r0, r1, r3, lsl r0
8: 00000011 andeq r0, r0, r1, lsl r0
c: 00000011 andeq r0, r0, r1, lsl r0
00000010 <hang>:
10: e7fe b.n 10 <hang>
12: 4801 ldr r0, [pc, #4] ; (18 <hang+0x8>)
14: 4700 bx r0
16: 00130000
1a: 20410001
Disassembly of section .text:
00010000 <hang-0x10>:
10000: 20008000 andcs r8, r0, r0
10004: 00010013 andeq r0, r1, r3, lsl r0
10008: 00010011 andeq r0, r1, r1, lsl r0
1000c: 00010011 andeq r0, r1, r1, lsl r0
00010010 <hang>:
10010: e7fe b.n 10010 <hang>
00010012 <_start>:
10012: f000 f803 bl 1001c <notmain>
10016: e7fb b.n 10010 <hang>
00010018 <fun>:
10018: 4770 bx lr
1001a: bf00 nop
0001001c <notmain>:
1001c: 2007 movs r0, #7
1001e: f7ff bffb b.w 10018 <fun>
10022: bf00 nop
then you can change the vector table to 0x10000 for example.
if you are asking a different question like having the bootloader at 0x00000, then the bootloader modifies the flash to add an application say at 0x20000, then you want to run that application, there are simpler solutions that dont necessarily require you to modify the location of the vector table.
Is there a way to have gcc generate %pc relative addresses of constants? Even when the string appears in the text segment, arm-elf-gcc will generate a constant pointer to the data, load the address of the pointer via a %pc relative address and then dereference it. For a variety of reasons, I need to skip the middle step. As an example, this simple function:
const char * filename(void)
{
static const char _filename[]
__attribute__((section(".text")))
= "logfile";
return _filename;
}
generates (when compiled with arm-elf-gcc-4.3.2 -nostdlib -c
-O3 -W -Wall logfile.c):
00000000 <filename>:
0: e59f0000 ldr r0, [pc, #0] ; 8 <filename+0x8>
4: e12fff1e bx lr
8: 0000000c .word 0x0000000c
0000000c <_filename.1175>:
c: 66676f6c .word 0x66676f6c
10: 00656c69 .word 0x00656c69
I would have expected it to generate something more like:
filename:
add r0, pc, #0
bx lr
_filename.1175:
.ascii "logfile\000"
The code in question needs to be partially position independent since it will be relocated in memory at load time, but also integrate with code that was not compiled -fPIC, so there is no global offset table.
My current work around is to call a non-inline function (which will be done via a %pc relative address) to find the offset from the compiled location in a technique similar to how -fPIC code works:
static intptr_t
__attribute__((noinline))
find_offset( void )
{
uintptr_t pc;
asm __volatile__ (
"mov %0, %%pc" : "=&r"(pc)
);
return pc - 8 - (uintptr_t) find_offset;
}
But this technique requires that all data references be fixed up manually, so the filename() function in the above example would become:
const char * filename(void)
{
static const char _filename[]
__attribute__((section(".text")))
= "logfile";
return _filename + find_offset();
}
Hmmm, maybe you have to compile it as -fPIC to get PIC. Or simply write it in assembler, assembler is a lot easier than the C you are writing.
00000000 :
0: e59f300c ldr r3, [pc, #12] ; 14
4: e59f000c ldr r0, [pc, #12] ; 18
8: e08f3003 add r3, pc, r3
c: e0830000 add r0, r3, r0
10: e12fff1e bx lr
14: 00000004 andeq r0, r0, r4
18: 00000000 andeq r0, r0, r0
0000001c :
1c: 66676f6c strbtvs r6, [r7], -ip, ror #30
20: 00656c69 rsbeq r6, r5, r9, ror #24
Are you getting the same warning I am getting?
/tmp/ccySyaUE.s: Assembler messages:
/tmp/ccySyaUE.s:35: Warning: ignoring changed section attributes for .text