I'm working on an armv6 core and have an FIQ hander that works great when I do all of my work in it. However I need to branch to some additional code that's too large for the FIQ memory area.
The FIQ handler gets copied from fiq_start to fiq_end to 0xFFFF001C when registered
static void test_fiq_handler(void)
{
asm volatile("\
.global fiq_start\n\
fiq_start:");
// clear gpio irq
asm("ldr r10, GPIO_BASE_ISR");
asm("ldr r9, [r10]");
asm("orr r9, #0x04");
asm("str r9, [r10]");
// clear force register
asm("ldr r10, AVIC_BASE_INTFRCH");
asm("ldr r9, [r10]");
asm("mov r9, #0");
asm("str r9, [r10]");
// prepare branch register
asm(" ldr r11, fiq_handler");
// save all registers, build sp and branch to C
asm(" adr r9, regpool");
asm(" stmia r9, {r0 - r8, r14}");
asm(" adr sp, fiq_sp");
asm(" ldr sp, [sp]");
asm(" add lr, pc,#4");
asm(" mov pc, r11");
#if 0
asm("ldr r10, IOMUX_ADDR12");
asm("ldr r9, [r10]");
asm("orr r9, #0x08 # top/vertex LED");
asm("str r9,[r10] #turn on LED");
asm("bic r9, #0x08 # top/vertex LED");
asm("str r9,[r10] #turn on LED");
#endif
asm(" adr r9, regpool");
asm(" ldmia r9, {r0 - r8, r14}");
// return
asm("subs pc, r14, #4");
asm("IOMUX_ADDR12: .word 0xFC2A4000");
asm("AVIC_BASE_INTCNTL: .word 0xFC400000");
asm("AVIC_BASE_INTENNUM: .word 0xFC400008");
asm("AVIC_BASE_INTDISNUM: .word 0xFC40000C");
asm("AVIC_BASE_FIVECSR: .word 0xFC400044");
asm("AVIC_BASE_INTFRCH: .word 0xFC400050");
asm("GPIO_BASE_ISR: .word 0xFC2CC018");
asm(".globl fiq_handler");
asm("fiq_sp: .long fiq_stack+120");
asm("fiq_handler: .long 0");
asm("regpool: .space 40");
asm(".pool");
asm(".align 5");
asm("fiq_stack: .space 124");
asm(".global fiq_end");
asm("fiq_end:");
}
fiq_hander gets set to the following function:
static void fiq_flip_pins(void)
{
asm("ldr r10, IOMUX_ADDR12_k");
asm("ldr r9, [r10]");
asm("orr r9, #0x08 # top/vertex LED");
asm("str r9,[r10] #turn on LED");
asm("bic r9, #0x08 # top/vertex LED");
asm("str r9,[r10] #turn on LED");
asm("IOMUX_ADDR12_k: .word 0xFC2A4000");
}
EXPORT_SYMBOL(fiq_flip_pins);
I know that since the FIQ handler operates outside of any normal kernel API's and that it is a rather high priority interrupt I must ensure that whatever I call is already swapped into memory. I do this by having the fiq_flip_pins function defined in the monolithic kernel and not as a module which gets vmalloc.
If I don't branch to the fiq_flip_pins function, and instead do the work in the test_fiq_handler function everything works as expected. It's the branching that's causing me problems at the moment. Right after branching I get a kernel panic about a paging request. I don't understand why I'm getting the paging request.
fiq_flip_pins is in the kernel at:
c00307ec t fiq_flip_pins
Unable to handle kernel paging request at virtual address 736e6f63
pgd = c3dd0000
[736e6f63] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT
Modules linked in: hello_1
CPU: 0 Not tainted (2.6.31-207-g7286c01-svn4 #122)
PC is at strnlen+0x10/0x28
LR is at string+0x38/0xcc
pc : [<c016b004>] lr : [<c016c754>] psr: a00001d3
sp : c3817ea0 ip : 736e6f63 fp : 00000400
r10: c03cab5c r9 : c0339ae0 r8 : 736e6f63
r7 : c03caf5c r6 : c03cab6b r5 : ffffffff r4 : 00000000
r3 : 00000004 r2 : 00000000 r1 : ffffffff r0 : 736e6f63
Flags: NzCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment user
Control: 00c5387d Table: 83dd0008 DAC: 00000015
Process sh (pid: 1663, stack limit = 0xc3816268)
Stack: (0xc3817ea0 to 0xc3818000)
Since there are no API calls in my code I have to assume that something is going wrong in the C call and back. Any help solving this is appreciated.
Here's the assembly with comments for fiq_flip_pins:
static void fiq_flip_pins(void)
{
asm("ldr r10, IOMUX_ADDR12_k");
0: e59fa010 ldr sl, [pc, #16] ; 18 <IOMUX_ADDR12_k>
asm("ldr r9, [r10]");
4: e59a9000 ldr r9, [sl]
asm("orr r9, #0x08 # top/vertex LED");
8: e3899008 orr r9, r9, #8 ; 0x8
asm("str r9,[r10] #turn on LED");
c: e58a9000 str r9, [sl]
asm("bic r9, #0x08 # top/vertex LED");
10: e3c99008 bic r9, r9, #8 ; 0x8
asm("str r9,[r10] #turn on LED");
14: e58a9000 str r9, [sl]
00000018 <IOMUX_ADDR12_k>:
18: fc2a4000 .word 0xfc2a4000
asm("IOMUX_ADDR12_k: .word 0xFC2A4000");
}
1c: e12fff1e bx lr
Unless I'm misunderstanding something, it looks like fiq_handler points to address 0, not fiq_flip_pins:
asm("fiq_handler: .long 0");
Another possible problem (assuming that there's code that fixes up the fiq_handler pointer when fiq_test is copied over) is that you have this at the end of fiq_flip_pins:
asm("IOMUX_ADDR12_k: .word 0xFC2A4000");
You'll need to have some code that jumps over that data or have your own return sequence for fiq_flip_pins prior to that data word, otherwise the CPU will try to execute whatever opcode 0xFC2A4000 is, and I imagine it's not likely to be something benign.
Related
So I'm having trouble with my program. It's supposed to read in a text file
that has a number on each line. It then stores that in an array, sorts it using selection sort, and then outputs it to a new file. The reading of and writing to the file work perfectly fine but my code for the sort isn't working properly. When I run the program, it only seems to store some of the numbers
in the array and then a bunch of zeroes.
So if my input is 112323, 32, 12, 19, 2, 1, 23. The output is 0,0,0,0, 2,1,23. I'm pretty sure the problem's with how I'm storing and loading from the array
onto the registers because assuming that part works, I can't find any reason why the selection sort algorithm shouldn't work.
Ok thanks to your help, I figured out that I needed to change the load and store instruction so that it matches the specifier used (ldr -> ldrb and str -> strb). But I need to make a sorting algorithm that works for 32 bit numbers so which combination of specifiers and load/store instructions would allow me to do that? Or would I have to load/store 8 bits a time? And if so, how would I do that?
.data
.balign 4
readfile: .asciz "myfile.txt"
.balign 4
readmode: .asciz "r"
.balign 4
writefile: .asciz "output.txt"
.balign 4
writemode: .asciz "w"
.balign 4
return: .word 0
.balign 4
scanformat: .asciz "%d"
.balign 4
printformat: .asciz "%d\n"
.balign 4
a: .space 32
.text
.global main
.global fopen
.global fprintf
.global fclose
.global fscanf
.global printf
main:
ldr r1, =return
str lr, [r1]
ldr r0, =readfile
ldr r1, =readmode
bl fopen
mov r4, r0
mov r5, #0
ldr r6, =a
loop:
cmp r5, #7
beq sort
mov r0, r4
ldr r1, =scanformat
mov r2, r6
bl fscanf
add r5, r5, #1
add r6, r6, #1
b loop
sort:
mov r5,#0 /*array parser for first loop*/
mov r6, #0 /* #stores index of minimum*/
mov r7, #0 /* #temp*/
mov r8, #0 /*# array parser for second loop*/
mov r9, #7 /*# stores length of array*/
ldr r10, =a /*# the array*/
mov r11, #0 /*#used to obtain offset for min*/
mov r12, #0 /*# used to obtain offset for second parser access*/
loop3:
cmp r5, r9 /*# check if first parser reached end of array*/
beq write /* #if it did array is sorted write it to file*/
mov r6, r5 /*#set the min index to the current position*/
mov r8, r6 /*#set the second parser to where first parser is at*/
b loop4 /*#start looking for min in this subarray*/
loop4:
cmp r8, r9 /* #if reached end of list min is found*/
beq increment /* #get out of this loop and increment 1st parser**/
lsl r7, r6, #3 /*multiplies min index by 8 */
ADD r7, r10, r7 /* adds offset to r10 address storing it in r7 */
ldr r11, [r7] /* loads value of min in r11 */
lsl r7, r8, #3 /* multiplies second parse index by 8 */
ADD r7, r10, r7 /* adds offset to r10 address storing in r7 */
ldr r12, [r7] /* loads value of second parse into r12 */
cmp r11, r12 /* #compare current min to the current position of 2nd parser !!!!!*/
movgt r6, r8 /*# set new min to current position of second parser */
add r8, r8, #1 /*increment second parser*/
b loop4 /*repeat */
increment:
lsl r11, r5, #3 /* multiplies first parse index by 8 */
ADD r11, r10, r11 /* adds offset to r10 address stored in r11*/
ldr r8, [r11] /* loads value in memory address in r11 to r8*/
lsl r12, r6, #3 /*multiplies min index by 8 */
ADD r12, r10, r12 /*ads offset to r10 address stored in r12 */
ldr r7, [r12] /* loads value in memory address in r12 to r7 */
str r8, [r12] /* # stores value of first parser where min was !!!!!*/
str r7, [r11] /*# store value of min where first parser was !!!!!*/
add r5, r5, #1 /*#increment the first parser*/
ldr r0,=printformat
mov r1, r7
bl printf
b loop3 /*#go to loop1*/
write:
mov r0, r4
bl fclose
ldr r0, =writefile
ldr r1, =writemode
bl fopen
mov r4, r0
mov r5, #0
ldr r6, =a
loop2:
cmp r5, #7
beq end
mov r0, r4
ldr r1, =printformat
ldrb r2, [r6]
bl fprintf
add r5, r5, #1
add r6, r6, #1
b loop2
end:
mov r0, r4
bl fclose
ldr r0, =a
ldr r0, [r0]
ldr lr, =return
ldr lr, [lr]
bx lr
I figured out that I needed to change the load and store instruction
so that it matches the specifier used (ldr -> ldrb and str -> strb).
But I need to make a sorting algorithm that works for 32 bit numbers
so which combination of specifiers and load/store instructions would
allow me to do that?
If you want to read 32b (4 bytes) values from memory, you have to have 4 bytes values in memory to begin with. Well that should not be surprising :)
Eg if your input is numbers 1, 2, 3, 4, each number is 32b value than in memory that would be
0x00000000: 01 00 00 00 | 02 00 00 00 <- 32b values of 1 & 2
0x00000008: 03 00 00 00 | 04 00 00 00 <- 32b values of 3 & 4
In such case ldr would read 32b each time and you would get 1, 2, 3, 4 with each read in register.
Now, you have in memory byte values (based on your statement that `ldrb` gives right result), eg
0x00000000: 01
0x00000001: 02
0x00000002: 03
0x00000003: 04
or same in one line
0x00000000: 01 02 03 04
So reading 8bit by ldrb gives you numbers 1, 2, 3, 4
But ldr would do read 32b value from memory (all 4 bytes at once) and you would get 32b value 0x04030201 in register.
Note: examples for little-endian systems
I am trying to get the the address of a label in thumb assembly and I am having some trouble.
I already read this post but that cannot help me and I will explain why.
I am writing an simple program with Thumb assembly ( unfortunately I cannot use Thumb2 ).
Let's consider this code:
.arch armv5te
.syntax unified
.text
.thumb
.thumb_func
thumbnow:
0x0 PUSH {LR}
0x2 LDR R0, =loadValues
0x4 POP {PC}
.align
loadValues:
0x8 .word 0xdeadbee1
0xC .word 0xdeadbee2
0x10 .word 0xdeadbee3
I am using the arm-linux-gnueabi toolchain to assemble that.
My microcontroller doesn't have an MMU so the memory address are static, no virtual pages etc.
The thing that I am trying to do is to make R0 having the value of 0x8 here so that then I can access the three words like this:
LDR R1, [R0]
LDR R2, [R0,#4]
LDR R3, [R0,#8]
This is not possible with LDR though because the value in the word is not possible to fit in a MOV command. The documentation of the assembler states that if the value cannot fit in a MOV command then it will put the value in a literal pool.
So my question is, is it possible in Thumb assembly to get the actual address of the label if the content of the address cannot fit in a MOV command?
Starting with this
.thumb
ldr r0,=hello
adr r0,hello
nop
nop
nop
nop
hello:
.word 0,1,2,3
gives this unlinked
00000000 <hello-0xc>:
0: 4806 ldr r0, [pc, #24] ; (1c <hello+0x10>)
2: a002 add r0, pc, #8 ; (adr r0, c <hello>)
4: 46c0 nop ; (mov r8, r8)
6: 46c0 nop ; (mov r8, r8)
8: 46c0 nop ; (mov r8, r8)
a: 46c0 nop ; (mov r8, r8)
0000000c <hello>:
c: 00000000 andeq r0, r0, r0
10: 00000001 andeq r0, r0, r1
14: 00000002 andeq r0, r0, r2
18: 00000003 andeq r0, r0, r3
1c: 0000000c andeq r0, r0, r12
linked
00001000 <hello-0xc>:
1000: 4806 ldr r0, [pc, #24] ; (101c <hello+0x10>)
1002: a002 add r0, pc, #8 ; (adr r0, 100c <hello>)
1004: 46c0 nop ; (mov r8, r8)
1006: 46c0 nop ; (mov r8, r8)
1008: 46c0 nop ; (mov r8, r8)
100a: 46c0 nop ; (mov r8, r8)
0000100c <hello>:
100c: 00000000 andeq r0, r0, r0
1010: 00000001 andeq r0, r0, r1
1014: 00000002 andeq r0, r0, r2
1018: 00000003 andeq r0, r0, r3
101c: 0000100c andeq r1, r0, r12
both ways r0 will return the address to the start of data from which you can then offset into that data from the caller or wherever.
Edit
.thumb
adr r0,hello
nop
nop
nop
arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:2: Error: address calculation needs a strongly defined nearby symbol
So the tool won't turn that into a load from the pool for you.
For what you want to do I think the pc relative add (adr) is the best you are going to get. You can try other toolchains as all of this is language and toolchain specific (assembly language is defined by the assembler not the target and for each toolchain (with an assembler) there can be differences in the language). Over time within gnu, how the linker and assembler worked together has changed, the linker patches up things it didn't used to.
You could of course go into the linker and add code to it to perform this optimization, the problem is most likely that by link time the linker is looking to resolve an address in the pool which is easy for it to do it doesn't have to change the instruction, the assembler would have to leave information for the linker that this is not just a fill this memory location with an address thing, either you modify gas to allow adr to work, and then if the linker cant resolve it within the instruction then the linker bails out with an error.'
Or you could just hard-code what you want and maintain it. I am not sure why the adr solution isn't adequate.
mov r0,#8 is a valid thumb instruction.
I'm writing THUMB code for an embedded core (ARM7TDMI) that needs to be linked to existing THUMB code. I'm using the GNU ARM embedded toolchain (link). I cannot get the linker to treat the existing external code as THUMB; it seems to always think that it's ARM. The existing code that I'm linking to is absolutely static and cannot be changed/recompiled (it's a plain binary sitting on a ROM chip, basically).
Here is an example program, multiply.c, that demonstrates the issue:
extern int externalFunction(int x);
int multiply(int x, int y)
{
return externalFunction(x * y);
}
Compiled using:
arm-none-eabi-gcc -o multiply.o -c -O3 multiply.c -march=armv4t -mtune=arm7tdmi -mthumb
arm-none-eabi-ld -o linked.o multiply.o -T symbols.txt
Where symbols.txt is a simple linker script:
SECTIONS
{
.text 0x8000000 : { *(.text) }
}
externalFunction = 0x8002000;
When I objdump -d linked.o, I get:
08000000 <multiply>:
8000000: b510 push {r4, lr}
8000002: 4348 muls r0, r1
8000004: f000 f804 bl 8000010 <__externalFunction_from_thumb>
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
800000e: 46c0 nop ; (mov r8, r8)
08000010 <__externalFunction_from_thumb>:
8000010: 4778 bx pc
8000012: 46c0 nop ; (mov r8, r8)
8000014: ea0007f9 b 8002000 <externalFunction>
Instead of branching directly to 0x8002000, it branches to a stub that switches to ARM mode first and then branches to 0x8002000 in ARM mode. I want that BL to branch directly to 0x8002000 and stay in THUMB mode, so that I'd get this instead:
08000000 <multiply>:
8000000: b510 push {r4, lr}
8000002: 4348 muls r0, r1
8000004: ???? ???? bl 8002000 <__externalFunction>
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
ABI and calling convention issues aside, how do I achieve this?
one way to do it is make it do what you want
branchto.s
.thumb
.thumb_func
.globl branchto
branchto:
bx r0
so.c
extern unsigned int externalFunction;
extern int branchto ( unsigned int, int );
int fun ( int x )
{
return(branchto(externalFunction,x)+3);
}
so.ld
SECTIONS
{
.text 0x8000000 : { *(.text) }
}
externalFunction = 0x8002001;
producing
08000000 <fun>:
8000000: 4b04 ldr r3, [pc, #16] ; (8000014 <fun+0x14>)
8000002: b510 push {r4, lr}
8000004: 0001 movs r1, r0
8000006: 6818 ldr r0, [r3, #0]
8000008: f000 f806 bl 8000018 <branchto>
800000c: 3003 adds r0, #3
800000e: bc10 pop {r4}
8000010: bc02 pop {r1}
8000012: 4708 bx r1
8000014: 08002001 stmdaeq r0, {r0, sp}
08000018 <branchto>:
8000018: 4700 bx r0
Ross Ridge's solution in the comments works
static int (* const externalFunction)(int x) = (int (*)(int)) 0x80002001;
int fun ( int x )
{
return((* externalFunction)(x)+3);
}
but the hardcoded address is in the code not the linker script if that matters, was trying to solve that and couldnt.
08000000 <fun>:
8000000: b510 push {r4, lr}
8000002: 4b03 ldr r3, [pc, #12] ; (8000010 <fun+0x10>)
8000004: f000 f806 bl 8000014 <fun+0x14>
8000008: 3003 adds r0, #3
800000a: bc10 pop {r4}
800000c: bc02 pop {r1}
800000e: 4708 bx r1
8000010: 80002001 andhi r2, r0, r1
8000014: 4718 bx r3
8000016: 46c0 nop ; (mov r8, r8)
I prefer the assembly solution for something like this to force the exact instruction I want. Naturally if you had linked in the external function it would/should have just worked (there are some exceptions but gnu is getting really good at resolving the to and from arm/thumb for you in the linker).
I dont see it as a gnu bug actually, but instead they need a way in the linker script to declare that variable as a thumb function address rather than just some generic linker defined variable (likewise as an arm function address). Just like .thumb_func does (or a longer function/procedure declaration)
.word branchto
.thumb
.globl branchto
branchto:
bx r0
8000018: 0800001c stmdaeq r0, {r2, r3, r4}
0800001c <branchto>:
800001c: 4700 bx r0
.word branchto
.thumb
.thumb_func
.globl branchto
branchto:
bx r0
8000018: 0800001d stmdaeq r0, {r0, r2, r3, r4}
0800001c <branchto>:
800001c: 4700 bx r0
by just reading the gnu linker documentation there may be hope to get what you want
SECTIONS
{
.text0 0x08000000 : { so.o }
.text1 0x08002000 (NOLOAD) : { ex.o }
}
ex.o comming from a dummy function to make everyone happy
int externalFunction ( int x )
{
return(x);
}
08000000 <fun>:
8000000: b510 push {r4, lr}
8000002: f001 fffd bl 8002000 <externalFunction>
8000006: 3003 adds r0, #3
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
and the NOLOAD keeps the dummy function out of the binary.
arm-none-eabi-objcopy so.elf -O srec --srec-forceS3 so.srec
S00A0000736F2E7372656338
S3150800000010B501F0FDFF033010BC02BC0847C0461E
S315080000104743433A2028474E552920362E322E305C
S31508000020004129000000616561626900011F000046
S3150800003000053454000602080109011204140115CA
S31008000040011703180119011A011E021E
S70500000000FA
note it wasnt perfect there was extra garbage that got pulled in, perhaps symbols
08000000 <fun>:
8000000: b510 push {r4, lr}
8000002: f001 fffd bl 8002000 <externalFunction>
8000006: 3003 adds r0, #3
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
800000e: 46c0 nop ; (mov r8, r8)
8000010: 3a434347
8000014: 4e472820
8000018: 36202955
800001c: 302e322e
8000020: 00294100
8000024: 65610000
8000028: 00696261
800002c: 00001f01
8000030: 54340500
8000034: 08020600
8000038: 12010901
800003c: 15011404
8000040: 18031701
8000044: 1a011901
which you can see in the srec, but the 0x08002000 code is not there so your actual external function will get called.
I would go with just making the instruction you want or function pointers with an assignment if you dont want any asm.
The other comments/answers using long branches do work, but it would still be nice to have a direct BL call and avoid the unnecessary load.
I believe I've found a workaround here. Create a dummy file (let's call it ext.c) with:
__attribute__((naked)) int externalFunction(int x){}
Compile this file to ext.o (same way as you compile multiply.c). This generates a dummy object file with a correctly decorated function symbol for externalFunction, whose address gets overridden by the linker script, resulting in the desired BL instruction:
Disassembly of section .text:
08000000 <multiply>:
8000000: b510 push {r4, lr}
8000002: 4348 muls r0, r1
8000004: f001 fffc bl 8002000 <externalFunction>
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
800000e: 46c0 nop ; (mov r8, r8)
I am booting Android on an IMX53 Sabre tablet. I am trying to initialize stacks for the different processor modes. The following is my monitor initialization code:
# Install Secure Monitor
# -----------------------
ldr r1, =ns_image # R1 is used
str r0, [r1]
ldr r0, =tz_monitor # Get address of Monitors vector table
mcr p15, 0, r0, c12, c0, 1 # Write Monitor Vector Base Address Register
# Save Secure state
# ------------------
ldr r0, =S_STACK_LIMIT # Get address of Secure state stack
stmfd r0!, {r4-r12} # Save general purpose registers
# ADD support for SPs
mrs r1, cpsr # Also get a copy of the CPSR
stmfd r0!, {r1, lr} # Save CPSR and LR
ldr r1, =STACK_ADDR
msr cpsr_c, #Mode_FIQ | I_Bit | F_Bit
sub sp, r1, #Offset_FIQ_Stack
msr cpsr_c, #Mode_IRQ | I_Bit | F_Bit
sub sp, r1, #Offset_IRQ_Stack
msr cpsr_c, #Mode_ABT | I_Bit | F_Bit
sub sp, r1, #Offset_ABT_Stack
msr cpsr_c, #Mode_UND | I_Bit | F_Bit
sub sp, r1, #Offset_UND_Stack
msr cpsr_c, #Mode_SYS | I_Bit | F_Bit
sub sp, r1, #Offset_SYS_Stack
msr cpsr_c, #Mode_SVC | I_Bit | F_Bit
sub sp, r1, #Offset_SVC_Stack
msr cpsr_c, #Mode_MON | I_Bit | F_Bit
sub sp, r1, #Offset_MON_Stack
cps #Mode_MON # Move to Monitor mode after saving Secure state
# Save Secure state stack pointer
# --------------------------------
ldr r1, =S_STACK_SP # Get address of global
str r0, [r1] # Save pointer
# Set up initial NS state stack pointer
# --------------------------------------
ldr r0, =NS_STACK_SP # Get address of global
ldr r1, =NS_STACK_LIMIT # Get top of Normal state stack (assuming FD model)
str r1, [r0] # Save pointer
# Set up exception return information
# ------------------------------------
#IMPORT ns_image
ldr lr, ns_image # ns_image
msr spsr_cxsf, #Mode_SVC # Set SPSR to be SVC mode
# Switch to Normal world
# -----------------------
mrc p15, 0, r4, c1, c1, 0 # Read Secure Configuration Register data
bic r4, #0x66
orr r4, #0x19
//orr r4, #NS_BIT # Set NS bit
mcr p15, 0, r4, c1, c1, 0 # Write Secure Configuration Register data
# Clear general purpose registers
# --------------------------------
mov r0, #0
mov r1, #0
mov r2, #0
mov r3, #0
mov r4, #0
mov r5, #0
mov r6, #0
mov r7, #0
mov r8, #0
mov r9, #0
mov r10, #0
mov r11, #0
mov r12, #0
movs pc, lr
Android booting happens fine with this, but I am not sure if the stack pointers are set up correctly, as I am not able to use them as described at IMX53 external abort. Are the stack initializations correct?
Other relevant code snippets:
.equ Mode_USR, 0x10 # User Mode
.equ Mode_FIQ, 0x11 # Fast Interrupt Mode
.equ Mode_IRQ, 0x12 # Interrupt Mode
.equ Mode_SVC, 0x13 # Supervisor Mode
.equ Mode_ABT, 0x17 # Abort Mode
.equ Mode_UND, 0x1B # Undefined Mode
.equ Mode_SYS, 0x1F # System Mode
.equ Mode_MON, 0x16 # Monitor Mode
.equ STACK_ADDR, 0xa0000000
.equ I_Bit, 0x80 # IRQ interrupts disabled
.equ F_Bit, 0x40 # FIQ interrupts disabled
.equ NS_BIT, 0x1
/* memory reserved (in bytes) for stacks of different mode */
.equ Len_FIQ_Stack, 64
.equ Len_IRQ_Stack, 64
.equ Len_ABT_Stack, 64
.equ Len_UND_Stack, 64
.equ Len_SVC_Stack, 64
.equ Len_USR_Stack, 64
.equ Len_MON_Stack, 64
.equ Len_SYS_Stack, 64
.equ Offset_FIQ_Stack, 0
.equ Offset_IRQ_Stack, Offset_FIQ_Stack + Len_FIQ_Stack
.equ Offset_ABT_Stack, Offset_IRQ_Stack + Len_IRQ_Stack
.equ Offset_UND_Stack, Offset_ABT_Stack + Len_ABT_Stack
.equ Offset_SVC_Stack, Offset_UND_Stack + Len_UND_Stack
.equ Offset_USR_Stack, Offset_SVC_Stack + Len_SVC_Stack
.equ Offset_MON_Stack, Offset_USR_Stack + Len_USR_Stack
.equ Offset_SYS_Stack, Offset_MON_Stack + Len_MON_Stack
This code compiles just fine on gcc, but when using llvm (llvm-gcc), it says "constant expression expected" on the line with ldr
The problem is the syntax: How do I specify the place where my array is? I do not want to hard-code the displacement in bytes: ldr r7, [pc, #some_shift] but to use a literal to keep the code clean and safe.
Any idea how to make it working?
.globl func_name
func_name:
push {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
//[Some stripped code]
add r6, r6, sl, lsl #2
sub ip, ip, sl
ldr r7, =maskTable // Here it crashes
add sl, sl, #4 # 0x4
// Some stripped code here
mov r0, #0 # 0x0 // return 0
pop {r4, r5, r6, r7, r8, r9, sl, fp, ip, pc}
.word 0x00000000
.data
.align 5
maskTable:
.word 0x00000000, 0x00000000, 0x00000000, 0x00000000
.word 0x0000FFFF, 0x00000000, 0x00000000, 0x00000000
.word 0xFFFFFFFF, 0x00000000, 0x00000000, 0x00000000
Try changing
ldr r7, =maskTable
to
ldr r7, maskTable
and remove
.data
section. It seems to be a bug/missing capability of gcc < 4.6 to deal with .data section
There are two things you can try:
Change ldr r7, =maskTable into adr r7, maskTable.
Store the address of the table under a separate label and load it manually like follows:
.globl func_name
func_name:
push {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
//[Some stripped code]
add r6, r6, sl, lsl #2
sub ip, ip, sl
ldr r7, maskTable_adr // Here it crashes
add sl, sl, #4 # 0x4
// Some stripped code here
mov r0, #0 # 0x0 // return 0
pop {r4, r5, r6, r7, r8, r9, sl, fp, ip, pc}
.word 0x00000000
.data
.align 5
maskTable_adr:
.word maskTable
maskTable:
.word 0x00000000, 0x00000000, 0x00000000, 0x00000000
.word 0x0000FFFF, 0x00000000, 0x00000000, 0x00000000
.word 0xFFFFFFFF, 0x00000000, 0x00000000, 0x00000000
I don't know the answer myself, but if it was me, I'd look at some compiled C code, and see how the compiler does it. Make sure that the compiler isn't in PIC mode, or something, or it'll do something more complicated and unnecessary.