where are the shared libs in macOS? - macos

I am trying to find the libdyld.dylib file in macOS but I can't find it. according to the lldb debugger, it supposes to be at /usr/lib/system/libdyld.dylib but it is not there... I read this apple support but it got me more confused...
I understand that there is dyld that load the code from somewhere... but from where ? Where is the code of this lib comes from?
using macOS 12.5 Monterey.
mac M1.
update:
I looked into /usr/lib/dyld (there is no /System/Library/dyld file in my system). The code that I see in the lldb when stepping into lib function is different from the code I see when disassembling the same function in the /usr/lib/dyld. E.g let's take dlopen - the debugger (lldb) shows that there are 2 implementations
(lldbinit) image lookup -n dlopen
1 match found in /usr/lib/dyld:
Address: dyld[0x0000000000025954] (dyld.__TEXT.__text + 149844)
Summary: dyld`dyld4::APIs::dlopen(char const*, int)
1 match found in /usr/lib/system/libdyld.dylib:
Address: libdyld.dylib[0x000000018033329c] (libdyld.dylib.__TEXT.__text + 1532)
Summary: libdyld.dylib`dlopen
but when stepping into the dlopen function it choose the one in /usr/lib/system/libdyld.dylib and not in /usr/lib/dyld:
(lldbinit) image lookup -v -a $pc
Address: libdyld.dylib[0x000000018033329c] (libdyld.dylib.__TEXT.__text + 1532)
Summary: libdyld.dylib`dlopen
Module: file = "/usr/lib/system/libdyld.dylib", arch = "arm64e"
Symbol: id = {0x000001a3}, range = [0x0000000182f5b29c-0x0000000182f5b2d0), name="dlopen"
Also the asm is differnet. When stepping into dlopen with lldb I see the next instructions:
dlopen # /usr/lib/system/libdyld.dylib:
-> 0x182f5b29c (0x18033329c): e2 03 01 aa mov x2, x1
0x182f5b2a0 (0x1803332a0): e1 03 00 aa mov x1, x0
0x182f5b2a4 (0x1803332a4): 68 7b 2c b0 adrp x8, 364397
0x182f5b2a8 (0x1803332a8): 00 39 43 f9 ldr x0, [x8, #0x670]
0x182f5b2ac (0x1803332ac): 10 00 40 f9 ldr x16, [x0]
0x182f5b2b0 (0x1803332b0): f1 03 00 aa mov x17, x0
0x182f5b2b4 (0x1803332b4): 51 7f ec f2 movk x17, #0x63fa, lsl #48
0x182f5b2b8 (0x1803332b8): 30 1a c1 da autda x16, x17
0x182f5b2bc (0x1803332bc): 03 0e 47 f8 ldr x3, [x16, #0x70]!
0x182f5b2c0 (0x1803332c0): e4 03 10 aa mov x4, x16
0x182f5b2c4 (0x1803332c4): f0 03 04 aa mov x16, x4
0x182f5b2c8 (0x1803332c8): 30 e6 f7 f2 movk x16, #0xbf31, lsl
and when disassembling dlopen in dyld I see the next instructions:
(jtool2 -d /usr/lib/dyld | less)
__ZN5dyld44APIs6dlopenEPKci:
25954 0xd503237f PACIBSP
25958 0xa9bd57f6 STP X22, X21, [SP, #-48]!
2595c 0xa9014ff4 STP X20, X19, [SP, #16]
25960 0xa9027bfd STP X29, X30, [SP, #32]
25964 0x910083fd ADD X29, SP, #32
25968 0xaa0203f3 _MOV_R X19, X2 R19 = R2 (0x0)
2596c 0xaa0103f5 _MOV_R X21, X1 R21 = R1 (0x0)
25970 0xaa0003f6 _MOV_R X22, X0 R22 = R0 (0x0)
25974 0xaa1e03f4 _MOV_R X20, X30 R20 = R30 (0x0)
25978 0xdac143f4 PACIA X20, X31
2597c 0xf9400408 _LDR X8, [X0, #8] ...R8 = *(R0 + 8) = *0x8 = 0x780000002
25980 0xb9403508 _LDR W8, [X8, #52] ...R8 = *(R8 + 52) = *0x3c = 0x6000000000000
25984 0x7100091f CMP W8, #2
25988 0xfa400824 CCMP
2598c 0x540000e0 B.EQ 0x259a8
25990 0xaa1503e0 _MOV_R X0, X21 R0 = R21 (0x0)
So it is a still a mistory, where does this dlopen code comes from ?

Since 11 version, Apple made some efforts to make harder to reverse optimize their shared libs.
Long story short, they merged most libs and frameworks into a single binary, which is loaded into memory on system start.
You can find it here: /System/Library/dyld/ (folder), there may be several file versions for Intel and arm archs.
All such system libraries referenced from mach-o section of the binary you run are mapped then directly from the loaded dyld cache, so Apple does not need libs to be on filesystem anymore. They made some efforts for compatibility, so for most apps it still looks like they are present on a disk though.
However, as Apple have to publish parts of their sources due to using a lot of opensource stuff, folks found the code responsible for the dyld cache and created several extractors, like this one:
https://github.com/keith/dyld-shared-cache-extractor
(you can even install it with brew)
So if you need to look inside some library - you will need to install extractor, perform extraction, and then you will have what you want.

Related

Regarding Linux Kernel Pagetable

I'm currently working on linux kernel version 5.10.39. I am trying to boot linux kernel on arm a-78 cpu core.
It's successfully execute the instructions which is in head.s file...
But get stuck on one instruction which is cmp x9, x10 in __relocate_kernel fuction.
Where to i stuck in function which is define below.
SYM_FUNC_START_LOCAL(__relocate_kernel)
ldr w9, =__rela_offset // Offset for the reloc table
ldr w10, =__rela_size // Size of the reloc table
mov_q x11, KIMAGE_VADDR // default virtual offset
add x11, x11, x23 // actual virtual offset
add x9, x9, x11 // __va(.rela)
add x10, x9, x10 // __va(.rela) + sizeof(.rela)
cmp x9, x10
b.hs 1f
ldp x12, x13, [x9], #24
ldr x14, [x9, #-8]
cmp w13, #R_AARCH64_RELATIVE
b.ne 0b
add x14, x14, x23 // relocate
str x14, [x12, x23]
b 0b
What's the solution for this ?
Is this Pagetable fault ?

Why do I find some never called instructions nopl, nopw after ret or jmp in GCC compiled code? [duplicate]

I've been working with C for a short while and very recently started to get into ASM. When I compile a program:
int main(void)
{
int a = 0;
a += 1;
return 0;
}
The objdump disassembly has the code, but nops after the ret:
...
08048394 <main>:
8048394: 55 push %ebp
8048395: 89 e5 mov %esp,%ebp
8048397: 83 ec 10 sub $0x10,%esp
804839a: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
80483a1: 83 45 fc 01 addl $0x1,-0x4(%ebp)
80483a5: b8 00 00 00 00 mov $0x0,%eax
80483aa: c9 leave
80483ab: c3 ret
80483ac: 90 nop
80483ad: 90 nop
80483ae: 90 nop
80483af: 90 nop
...
From what I learned nops do nothing, and since after ret wouldn't even be executed.
My question is: why bother? Couldn't ELF(linux-x86) work with a .text section(+main) of any size?
I'd appreciate any help, just trying to learn.
First of all, gcc doesn't always do this. The padding is controlled by -falign-functions, which is automatically turned on by -O2 and -O3:
-falign-functions
-falign-functions=n
Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance,
-falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only
if this can be done by skipping 23 bytes or less.
-fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.
Some assemblers only support this flag when n is a power of two; in
that case, it is rounded up.
If n is not specified or is zero, use a machine-dependent default.
Enabled at levels -O2, -O3.
There could be multiple reasons for doing this, but the main one on x86 is probably this:
Most processors fetch instructions in aligned 16-byte or 32-byte blocks. It can be
advantageous to align critical loop entries and subroutine entries by 16 in order to minimize
the number of 16-byte boundaries in the code. Alternatively, make sure that there is no 16-byte boundary in the first few instructions after a critical loop entry or subroutine entry.
(Quoted from "Optimizing subroutines in assembly
language" by Agner Fog.)
edit: Here is an example that demonstrates the padding:
// align.c
int f(void) { return 0; }
int g(void) { return 0; }
When compiled using gcc 4.4.5 with default settings, I get:
align.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 00 00 00 00 mov $0x0,%eax
9: c9 leaveq
a: c3 retq
000000000000000b <g>:
b: 55 push %rbp
c: 48 89 e5 mov %rsp,%rbp
f: b8 00 00 00 00 mov $0x0,%eax
14: c9 leaveq
15: c3 retq
Specifying -falign-functions gives:
align.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 00 00 00 00 mov $0x0,%eax
9: c9 leaveq
a: c3 retq
b: eb 03 jmp 10 <g>
d: 90 nop
e: 90 nop
f: 90 nop
0000000000000010 <g>:
10: 55 push %rbp
11: 48 89 e5 mov %rsp,%rbp
14: b8 00 00 00 00 mov $0x0,%eax
19: c9 leaveq
1a: c3 retq
This is done to align the next function by 8, 16 or 32-byte boundary.
From “Optimizing subroutines in assembly language” by A.Fog:
11.5 Alignment of code
Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. If an importantsubroutine entry or jump label happens to be near the end of a 16-byte block then themicroprocessor will only get a few useful bytes of code when fetching that block of code. Itmay have to fetch the next 16 bytes too before it can decode the first instructions after thelabel. This can be avoided by aligning important subroutine entries and loop entries by 16.
[...]
Aligning a subroutine entry is as simple as putting as many
NOP
's as needed before thesubroutine entry to make the address divisible by 8, 16, 32 or 64, as desired.
As far as I remember, instructions are pipelined in cpu and different cpu blocks (loader, decoder and such) process subsequent instructions. When RET instructions is being executed, few next instructions are already loaded into cpu pipeline. It's a guess, but you can start digging here and if you find out (maybe the specific number of NOPs that are safe, share your findings please.

What does loadable mean in microcontrollers in the sense of linikers

I'm trying to get familiar with the linking and startup procedures in ARM Cortex-M4 microcontrollers. Looking through the linker scripts almost all the sections are marked as loadable.
At first I thought that meant it would be copied from flash to RAM, but then I learned that doing that is handled in a different way. So what does it mean for a section in flash to be loadable? Isn't it already loaded and run from the location in flash? Also, I'm referring to a section section containing instructions.
Does loadable in this context mean loading by the debugger into the device?
a fully functional cortex-m program
flash.s
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl dummy
dummy:
bx lr
so.c
void dummy ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<10;ra++) dummy(ra);
return(0);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -c so.c -o so.o
arm-none-eabi-ld -o so.elf -T flash.ld flash.o so.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
(can use cortex-m4 either works)
so.list
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000011 andeq r0, r0, r1, lsl r0
8: 00000017 andeq r0, r0, r7, lsl r0
c: 00000017 andeq r0, r0, r7, lsl r0
00000010 <reset>:
10: f000 f804 bl 1c <notmain>
14: e7ff b.n 16 <hang>
00000016 <hang>:
16: e7fe b.n 16 <hang>
00000018 <dummy>:
18: 4770 bx lr
1a: 46c0 nop ; (mov r8, r8)
0000001c <notmain>:
1c: b510 push {r4, lr}
1e: 2400 movs r4, #0
20: 0020 movs r0, r4
22: 3401 adds r4, #1
24: f7ff fff8 bl 18 <dummy>
28: 2c0a cmp r4, #10
2a: d1f9 bne.n 20 <notmain+0x4>
2c: 2000 movs r0, #0
2e: bc10 pop {r4}
30: bc02 pop {r1}
32: 4708 bx r1
being a less complicated linker script and program the elf has less stuff
part of readelf
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 010000 000034 00 AX 0 0 4
[ 2] .ARM.attributes ARM_ATTRIBUTES 00000000 010034 00002d 00 0 0 1
[ 3] .comment PROGBITS 00000000 010061 000011 01 MS 0 0 1
[ 4] .symtab SYMTAB 00000000 010074 0000f0 10 5 12 4
[ 5] .strtab STRTAB 00000000 010164 00003d 00 0 0 1
[ 6] .shstrtab STRTAB 00000000 0101a1 00003a 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
y (purecode), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x00000000 0x00000000 0x00034 0x00034 R E 0x10000
hexdump -C so.bin
00000000 00 10 00 20 11 00 00 00 17 00 00 00 17 00 00 00 |... ............|
00000010 00 f0 04 f8 ff e7 fe e7 70 47 c0 46 10 b5 00 24 |........pG.F...$|
00000020 20 00 01 34 ff f7 f8 ff 0a 2c f9 d1 00 20 10 bc | ..4.....,... ..|
00000030 02 bc 08 47 |...G|
00000034
a "binary" comes in many flavors, it is an unfortunately poorly named term as it is so confusing. Everything you see in the disassembly above is required for the program to run, that is the real binary part of this, it has to be loaded wherever you need it loaded to run. if on an operating system then the operating system reads the elf file extracts the loadable sections with their addresses/offsets and loads them into memory before launching at the entry point. Take advantage of tools already having the elf file format and we can re-use some of that for microcontrollers, we cant/dont normally use the entry point as that makes no sense, we have to make the vector/entry point match what the hardware needs, in this case a vector table.
The hexdump coming from the objcopy also shows the parts we have to have visible to the processor for the program to run, or loaded in address/memory space (flash is in memory or address space in this case).
But the "binary" file, the elf also contains debug symbols in case you want to pull up a debugger, add some more options on the toolchain command line and you get even more information about where these items are in the source code file so you could in some cases see the high level langauge when single stepping through code. Those are not loadable sections, they are descriptive of the program or just there to help out, they are not machine code nor data that the processor needs to execute the program so they do not need to be loaded into the memory space.
yet another "binary" file format
arm-none-eabi-objcopy so.elf -O ihex so.hex
cat so.hex
:100000000010002011000000170000001700000081
:1000100000F004F8FFE7FEE77047C04610B5002483
:1000200020000134FFF7F8FF0A2CF9D1002010BCA2
:0400300002BC0847BF
:00000001FF
it has a little extra information but almost all of it is the part we have to load into the address space for the program to run
another binary file format
S00A0000736F2E7372656338
S1130000001000201100000017000000170000007D
S113001000F004F8FFE7FEE77047C04610B500247F
S113002020000134FFF7F8FF0A2CF9D1002010BC9E
S107003002BC0847BB
S9030000FC
also most of it is program.
elf is just another file format (fairly popular with gnu tools, but still just another file format), it contains the machine code and data required plus a bunch of other stuff, the machine code and data is what we have to load into ram if this is an operating system, or what we ideally load into flash for a microcontroller, but not all microcontrollers are equal some are ram only based and the program is downloaded via usb during enumeration (into ram). and other solutions, or if debugging you could probably have items loaded into ram depending on the mcu and tools, although that is not how the thing boots so that wouldnt really be a good binary.
if you feel the need to zero .bss and to have any .data then you need additional information, the offset and size of .bss and the offset and contents of .data, the bootstrap then zeros and copies those items, and you need that information in the non-volatile flash/rom, it is just more data that is required of the machine code and data needed to run the program. If you are letting others write the code for you then there are possibly tailored linker scripts and bootstrap code that allows you to just have .data items and push a build button on a gui and it all magically is in place when your entry point (main() by convention and or standard) starts execution or the code that represents your high level code at the C entry point.

Why is it necessary to use edi constraint in this inline assembly?

centos 6.5 64bit vps, 500MB ram gcc 4.8.2
I have the following function that works only if I use edi as the constraint to hold the string pointer. If I try to use any other register or constraintg or q etc, it segfaults.
BUT this problem only occurs when both link time optimization and o3 are used together. If o2 it's fine. If I don't use -flto, it's fine. But both together then the only register I can use that doesn't crash is edi
gcc -flto
CFLAGS=-I. -flto -std=gnu11 -msse4.2 -fno-builtin-printf -Wall -Winline -Wstrict-aliasing -g -pg -O3 -lrt -lpthread
It seems like there might be some sort of register clobbering going on or something else. I'm really at a loss to understand why and how to fix this. Another interesting aspect is the generated assembly puts rdi into rdx before using the pointer but if I try to use either register as the input constraint... it segfaults! If it fails under aggressive compiling options it suggests to me either the compiler is stuffing up somehow, or more likely I'm doing something wrong.
char *sse4_strCRLF(char *str)
{
__m128i M = _mm_set1_epi8(13);
char *res;
__asm__ __volatile__(
"xor %0,%0\n\t"
"sub $1, %1\n\t"
"1:" "sub $15,%1\n\t"
".align 16\n\t"
"2:" "add $16, %1\n\t"
"pcmpistri $0x08,(%1),%2\n\t"
"ja 2b\n\t"
"jnc 2f\n\t"
"cmpb $10,1(%1,%%rcx)\n\t"
"jne 1b\n\t"
"add %%rcx,%1\n\t"
"mov %1,%0\n\t"
"2:"
:"=q"(res)
:"edi"(str),"x"(M) //<-- if use anything except edi, it segfaults
:"rcx"
);
return (char*) res;
}
Disassembled output:
00000000000002e0 <sse4_strCRLF>:
2e0: 55 push rbp
2e1: 48 89 e5 mov rbp,rsp
2e4: e8 00 00 00 00 call 2e9 <sse4_strCRLF+0x9>
2e9: 66 0f 6f 05 00 00 00 00 movdqa xmm0,[rip+0x0] # 2f1 <sse4_strCRLF+0x11>
2f1: 48 89 fa mov rdx,rdi //<--- puts rdi into rdx!
2f4: 48 31 c0 xor rax,rax
2f7: 48 83 ea 01 sub rdx,0x1
2fb: 48 83 ea 0f sub rdx,0xf
2ff: 90 nop
300: 48 83 c2 10 add rdx,0x10
304: 66 0f 3a 63 02 08 pcmpistri xmm0,[rdx],0x8
30a: 77 f4 ja 300 <sse4_strCRLF+0x20>
30c: 73 0d jae 31b <sse4_strCRLF+0x3b>
30e: 80 7c 0a 01 0a cmp byte[rdx+rcx*1+0x1],0xa
313: 75 e6 jne 2fb <sse4_strCRLF+0x1b>
315: 48 01 ca add rdx,rcx
318: 48 89 d0 mov rax,rdx
31b: 5d pop rbp
31c: c3 ret
#David Wohlferd gave me the answer. It was 2 dumb mistakes I was making due to ignorance and assumptions. The below code is modified such that the input variable char pointer is not modified by the routine. It's copied into a register and that register is used. Also I was mistakenly thinking I could directly specify a particular register as opposed to a b etc.
gcc still seems to be fussy about what constraints I use. e.g. If I use a and b for res and str respectively, it compiles fine but segfaults on running. But using S and D seems to work fine.
#David Wohlferd, I'd like to credit you as the answerer but I don't think I can do that to a comment.
char *sse4_strCRLF(char *str)
{
__m128i M = _mm_set1_epi8(13);
char *res;
__asm__ __volatile__(
"xor %0,%0\n\t"
"mov %1,%%rdx\n\t"
"sub $1,%%rdx\n\t"
"1:" "sub $15,%%rdx\n\t"
".align 16\n\t"
"2:" "add $16, %%rdx\n\t"
"pcmpistri $0x08,(%%rdx),%2\n\t"
"ja 2b\n\t"
"jnc 2f\n\t"
"cmpb $10,1(%%rdx,%%rcx)\n\t"
"jne 1b\n\t"
"add %%rcx,%%rdx\n\t"
"mov %%rdx,%0\n\t"
"2:"
:"=S"(res)
:"D"(str),"x"(M)
:"rcx","rdx"
);
return (char*) res;
}

Linux kernel ARM exception stack init

I am using Linux kernel 3.0.35 on Freescale i.MX6 (ARM Cortex-A9). After running into a kernel OOPS I tried to understand the exception stack initialization. Here is what I have uncovered so far.
In cpu_init() in arch/arm/kernel/setup.c, I see the exception stack getting initialized:
struct stack {
u32 irq[3];
u32 abt[3];
u32 und[3];
} ____cacheline_aligned;
static struct stack stacks[NR_CPUS];
void cpu_init(void)
{
struct stack *stk = &stacks[cpu];
...<snip>
/*
* setup stacks for re-entrant exception handlers
*/
__asm__ (
"msr cpsr_c, %1\n\t"
"add r14, %0, %2\n\t"
"mov sp, r14\n\t"
"msr cpsr_c, %3\n\t"
"add r14, %0, %4\n\t"
"mov sp, r14\n\t"
"msr cpsr_c, %5\n\t"
"add r14, %0, %6\n\t"
"mov sp, r14\n\t"
"msr cpsr_c, %7"
:
: "r" (stk),
PLC (PSR_F_BIT | PSR_I_BIT | IRQ_MODE),
"I" (offsetof(struct stack, irq[0])),
PLC (PSR_F_BIT | PSR_I_BIT | ABT_MODE),
"I" (offsetof(struct stack, abt[0])),
PLC (PSR_F_BIT | PSR_I_BIT | UND_MODE),
"I" (offsetof(struct stack, und[0])),
PLC (PSR_F_BIT | PSR_I_BIT | SVC_MODE)
: "r14");
I see that each stack has room for only three words. That is how the macro vector_stub in arch/arm/kernel/entry-armv.S uses it. It saves R0, LR (parent PC) and SPSR (parent CPSR) into those three words. Then it jumps to __irq_svc. That starts with a macro svc_entry which creates a stack frame
.macro svc_entry, stack_hole=0
UNWIND(.fnstart )
UNWIND(.save {r0 - pc} )
sub sp, sp, #(S_FRAME_SIZE + \stack_hole - 4)
That is also how I see the disassembled code from KGDB:
Dump of assembler code for function __irq_svc:
0xc01402c0 <+0>: 44 d0 4d e2 sub sp, sp, #68 ; 0x44
0xc01402c4 <+4>: 04 00 1d e3 tst sp, #4
0xc01402c8 <+8>: 04 d0 4d 02 subeq sp, sp, #4
0xc01402cc <+12>: fe 1f 8d e8 stm sp, {r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12}
0xc01402d0 <+16>: 0e 00 90 e8 ldm r0, {r1, r2, r3}
0xc01402d4 <+20>: 30 50 8d e2 add r5, sp, #48 ; 0x30
0xc01402d8 <+24>: 00 40 e0 e3 mvn r4, #0
0xc01402dc <+28>: 44 00 8d e2 add r0, sp, #68 ; 0x44
0xc01402e0 <+32>: 04 00 80 02 addeq r0, r0, #4
0xc01402e4 <+36>: 04 10 2d e5 push {r1} ; (str r1, [sp, #-4]!)
0xc01402e8 <+40>: 0e 10 a0 e1 mov r1, lr
0xc01402ec <+44>: 1f 00 85 e8 stm r5, {r0, r1, r2, r3, r4}
0xc01402f0 <+48>: ad 96 a0 e1 lsr r9, sp, #13
0xc01402f4 <+52>: 89 96 a0 e1 lsl r9, r9, #13
0xc01402f8 <+56>: 04 80 99 e5 ldr r8, [r9, #4]
0xc01402fc <+60>: 01 70 88 e2 add r7, r8, #1
0xc0140300 <+64>: 04 70 89 e5 str r7, [r9, #4]
0xc0140304 <+68>: 54 50 9f e5 ldr r5, [pc, #84] ; 0xc0140360
0xc0140308 <+72>: 00 50 95 e5 ldr r5, [r5]
0xc014030c <+76>: 0c 60 95 e5 ldr r6, [r5, #12]
0xc0140310 <+80>: 4c e0 9f e5 ldr lr, [pc, #76] ; 0xc0140364
0xc0140314 <+84>: 07 0b c6 e3 bic r0, r6, #7168 ; 0x1c00
0xc0140318 <+88>: 1d 00 50 e3 cmp r0, #29
0xc014031c <+92>: 00 00 50 31 cmpcc r0, r0
0xc0140320 <+96>: 0e 00 50 11 cmpne r0, lr
0xc0140324 <+100>: 00 00 50 21 cmpcs r0, r0
0xc0140328 <+104>: 0d 10 a0 11 movne r1, sp
0xc014032c <+108>: 28 e0 4f 12 subne lr, pc, #40 ; 0x28
0xc0140330 <+112>: 32 eb ff 1a bne 0xc013b000 <asm_do_IRQ>
0xc0140334 <+116>: 04 80 89 e5 str r8, [r9, #4]
0xc0140338 <+120>: 00 00 99 e5 ldr r0, [r9]
0xc014033c <+124>: 00 00 38 e3 teq r8, #0
0xc0140340 <+128>: 00 00 a0 13 movne r0, #0
0xc0140344 <+132>: 02 00 10 e3 tst r0, #2
0xc0140348 <+136>: 06 00 00 1b blne 0xc0140368 <svc_preempt>
0xc014034c <+140>: 40 40 9d e5 ldr r4, [sp, #64] ; 0x40
0xc0140350 <+144>: 04 f0 6f e1 msr SPSR_fsxc, r4
0xc0140354 <+148>: 1f f0 7f f5 clrex
0xc0140358 <+152>: ff ff dd e8 ldm sp, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, sp, lr, pc}^
End of assembler dump.
During an exception, SP is the banked R13. If I am following correctly, there is no room for this frame on that stack. That means I must have missed something. Is there some other place where the exception stacks are initialized?
tl;dr - We switch modes to supervisor and use that stack.
You are missing the key point of where control is handed to the CPU via the vector table and the mode is switched. See: entry-armv.S and __vectors_start. The vector stubs is the code where control is initially sent after the branch in the main vector table. The vector_stub macro saves three items; a corrected lr, r0 and the spsr of the excepted mode (as you noted).
The point you miss is, after this all exceptions switch to SVC_MODE and as such use the current tasks stack, which also has the thread_info structure. mode switching is a tough concept to get in ARM system level assembler. Registers that were previously set are now completely different. Pay attention to msr and cps type instructions. Things can change completely after them; I have been confused by this dozens of times.
The spsr is used as an index into a vector_stub table, which will normally jump to either __irq_svc or __irq_usr. Just scroll down to look at the bottom of the entry-arm.S which you already found.
Related: Physical address of ARM-Linux vector table

Resources