Mach-O - LINKEDIT section endianness - macos

I'm a little confused on the __LINKEDIT section.
Let me set the background:
What I understand about __LINKEDIT
In theory (http://www.newosxbook.com/articles/DYLD.html) the first "section" will be LC_DYLD_INFO.
If I check mach-o/loader.h I get:
#define LC_DYLD_INFO 0x22
...
struct dyld_info_command {
uint32_t cmd; /* LC_DYLD_INFO or LC_DYLD_INFO_ONLY */
uint32_t cmdsize; /* sizeof(struct dyld_info_command) */
...
If I check a mach-o file with otool I get:
$ otool -l MyBinary | grep -B3 -A8 LINKEDIT
Load command 3
cmd LC_SEGMENT_64
cmdsize 72
segname __LINKEDIT
vmaddr 0x0000000100038000
vmsize 0x0000000000040000
fileoff 229376
filesize 254720
maxprot 0x00000001
initprot 0x00000001
nsects 0
flags 0x0
If I check the hex using xxd
$ xxd -s 229376 -l 4 MyBinary
00038000: 1122 9002 ."..
I know that the endiannes of my binary is little:
$ rabin2 -I MyBinary (03/14 10:21:51)
arch arm
baddr 0x100000000
binsz 484096
bintype mach0
bits 64
canary false
class MACH064
crypto false
endian little
havecode true
intrp /usr/lib/dyld
laddr 0x0
lang swift
linenum false
lsyms false
machine all
maxopsz 16
minopsz 1
nx false
os darwin
pcalign 0
pic true
relocs false
sanitiz false
static false
stripped true
subsys darwin
va true
I can corroborate that the first section in __LINKEDIT is LC_DYLD_INFO by getting it's offset form otool:
$ otool -l MyBinary | grep -B1 -A11 LC_DYLD_INFO (03/14 10:25:35)
Load command 4
cmd LC_DYLD_INFO_ONLY
cmdsize 48
rebase_off 229376
rebase_size 976
bind_off 230352
bind_size 3616
weak_bind_off 0
weak_bind_size 0
lazy_bind_off 233968
lazy_bind_size 6568
export_off 240536
export_size 9744
If we check the offset of __LINKEDIT and from LC_DYLD_INFO we get the same: 229376
Everything fine at the moment, kinda make sense.
My confusion
Now when I'm in lldb and want to make sense of the memory.
I can read the memory at the offset:
(lldb) image dump sections MyBinary
...
0x00000400 container [0x0000000100ca0000-0x0000000100ce0000) r-- 0x00038000 0x0003e300 0x00000000 MyBinary.__LINKEDIT
Ok, let's read that memory:
(lldb) x/x 0x00000100ca0000
0x100ca0000: 0x02902211
So this is my problem:
0x02902211
Let's assume I don't know if it's Little or Big Endian. I should find 0x22 at the begining or at the end of the bytes. but it's in the middle? (This confuses me)
the 0x11 I guess is the size 17(in decimal) which might corresponds to what I can see from the structure in loader.h (12bytes + 5bytes of padding?) :
struct dyld_info_command {
uint32_t cmd; /* LC_DYLD_INFO or LC_DYLD_INFO_ONLY */
uint32_t cmdsize; /* sizeof(struct dyld_info_command) */
uint32_t rebase_off; /* file offset to rebase info */
uint32_t rebase_size; /* size of rebase info */
uint32_t bind_off; /* file offset to binding info */
uint32_t bind_size; /* size of binding info */
uint32_t weak_bind_off; /* file offset to weak binding info */
uint32_t weak_bind_size; /* size of weak binding info */
uint32_t lazy_bind_off; /* file offset to lazy binding info */
uint32_t lazy_bind_size; /* size of lazy binding infs */
uint32_t export_off; /* file offset to lazy binding info */
uint32_t export_size; /* size of lazy binding infs */
};
My questions
1.) Why is the 0x22 not in the end(or beginnig)? or am I reading the offset incorrectly?
2.) otool says that the command size is 48 (that's 0x30 in hex) but I can't get it from the bytes next to 0x22. Where do I get the size from?
Thanks for taking the time to read all the way here, and thanks for any help.

Related

Global variable symbols is incorrect when I debug a unix-like kernel wrote by myself

code is here at commit #489ee1c
I am writing a unix-like kernel following this tutorial for personal learning. Global variable symbols is incorrect when I debug a unix-like kernel wrote by myself.
I start the kernel using
qemu-system-i386 -d cpu_reset -s -S -D ./run.log -drive format=raw,file=os_image -m 8G
there is also a problem that physical memory is only 3GB in code while I set -m 4G.
and start a gdb stoping at init_global_mm_vars() functions
.gdbinit
set arch i386
symbol-file /root/os/2-kernel/kernel.elf
b init_global_mm_vars
target remote localhost:1234
You can see that the address of symbol Kernel_Vmm_End is 0x58d4 ,but used in asm is 0x68d4. all above global variable symbols is incorrect.
Why all the global variable symbols go wrong ?
I found that if I don't use link.ld script and just use -Ttext=0,when link and all problems seem gone.
ENTRY(kernel_main) /* Kernel entry label */
OUTPUT_FORMAT("elf32-i386")
OUTPUT_ARCH(i386)
SECTIONS {
. = 0x0; /* Kernel code is located at 0x0 */
Kernel_Text_Vmm_Start_p = .; /* Export labels */
.text : /* Align at 4KB and load at 4KB */
{
*(.text) /* All text sections from all files */
}
. = ALIGN(0x1000);
Kernel_Rodata_Vmm_Start_p =.;
.rodata ALIGN (0x1000) : AT(ADDR(.rodata)) /* Align at 4KB and load at 4KB */
{
*(.rodata) /* All read-only data sections from all files */
}
. = ALIGN(0x1000);
Kernel_Data_Vmm_Start_p =.;
.data ALIGN (0x1000) : AT(ADDR(.data)) /* Align at 4KB and load at 4KB */
{
*(.data) /* All data sections from all files */
}
. = ALIGN(0x1000);
Kernel_Bss_Vmm_Start_p =.;
.bss ALIGN (0x1000) : AT(ADDR(.bss)) /* Align at 4KB and load at 4KB */
{
*(COMMON) /* All COMMON sections from all files */
*(.bss) /* All bss sections from all files */
}
. = ALIGN(0x1000);
Kernel_Vmm_End_p = .;
}
Still have no idea why this ld script goes wrong?

How to specify linker option to not use a particular address space while generating the vmlinux kernel Image?

I have an address space in RAM that is used for something else -
0x00100000 to - 4096 bytes. I want to specify a linker option in Yocto , linux kernel to not use that address space starting from 0x00100000 to - 4096 bytes.
I am unsure about the syntax here like -
ROM 0 (NOLOAD)
SECTIONS
{
ROM 0 (NOLOAD) : { 0x00100000 - 4096 }
. = KERNELBASE;
_text = .;
_stext = .;
/*

What is *fill* section shows in the link map file?

Yesterday i created my own u-boot module and want to set text base address at 0xd0020010.
But after compiling, in the .map file generated by linker shows like this
inker script and memory map
0x00000000 . = 0x0
0x00000000 . = ALIGN (0x4)
.text 0xd0020010 0x1f0
0xd0020010 __image_copy_start = .
*(.vectors)
*fill* 0xd0020010 0x10 00
.vectors 0xd0020020 0x60 arch/arm/lib/built-in.o
0xd0020020 _start
0xd0020044 _undefined_instruction
0xd0020048 _software_interrupt
0xd002004c _prefetch_abort
0xd0020050 _data_abort
0xd0020054 _not_used
0xd0020058 _irq
0xd002005c _fiq
You can see above the .vectors section, there are 16 bytes of 0x00 which name is "*fill*".
And my link script is like this
SECTIONS
{
. = 0x00000000;
. = ALIGN(4);
.text :
{
__image_copy_start = .;
*(.vectors)
CPUDIR/start.o (.text*)
*(.text*)
}
.........
I tried to remove ALIGH(4), but it stand still. And 0xd0020010 is a aligned address right? So it should have no matter to do with "ALIGH"
Although the 16 bytes of memory are filled by 0x00, which are nop instructions, but i still wonder why there is a "*fill*" section.

gcc / ld: overlapping sections (.tbss, .init_array) in statically-linked ELF binary

I'm compiling a very simple hello-world one-liner statically on Debian 7 system on x86_64 machine with gcc version 4.8.2 (Debian 4.8.2-21):
gcc test.c -static -o test
and I get an executable ELF file that includes the following sections:
[17] .tdata PROGBITS 00000000006b4000 000b4000
0000000000000020 0000000000000000 WAT 0 0 8
[18] .tbss NOBITS 00000000006b4020 000b4020
0000000000000030 0000000000000000 WAT 0 0 8
[19] .init_array INIT_ARRAY 00000000006b4020 000b4020
0000000000000010 0000000000000000 WA 0 0 8
[20] .fini_array FINI_ARRAY 00000000006b4030 000b4030
0000000000000010 0000000000000000 WA 0 0 8
[21] .jcr PROGBITS 00000000006b4040 000b4040
0000000000000008 0000000000000000 WA 0 0 8
[22] .data.rel.ro PROGBITS 00000000006b4060 000b4060
00000000000000e4 0000000000000000 WA 0 0 32
Note that .tbss section is allocated at addresses 0x6b4020..0x6b4050 (0x30 bytes) and it intersects with allocation of .init_array section at 0x6b4020..0x6b4030 (0x10 bytes), .fini_array section at 0x6b4030..0x6b4040 (0x10 bytes) and with .jcr section at 0x6b4040..0x6b4048 (8 bytes).
Note it does not intersect with the following sections, for example, .data.rel.ro, but that's probably because .data.rel.ro alignment is 32 and thus it can't be placed any earlier than 0x6b4060.
The resulting file runs ok, but I still don't exactly get how it works. From what I read in glibc documentation, .tbss is a just .bss section for thread local storage (i.e. allocated memory scratch space, not really mapped in physical file). Is it that .tbss section is so special that it can overlap other sections? Are .init_array, .fini_array and .jcr are so useless (for example, they are not needed anymore then TLS-related code runs), so they can be overwritten by bss? Or is it some sort of a bug?
Basically, what do I get to read and write if I'll try to read address 0x6b4020 in my application? .tbss contents or .init_array pointers? Why?
The virtual address of .tbss is meaningless as that section only serves as a template for the TLS storage as allocated by the threading implementation in GLIBC.
The way this virtual address comes into place is that .tbss follows .tbdata in the default linker script:
...
.gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
/* Thread Local Storage sections */
.tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) }
.tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) }
.preinit_array :
{
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array))
PROVIDE_HIDDEN (__preinit_array_end = .);
}
.init_array :
{
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array))
PROVIDE_HIDDEN (__init_array_end = .);
}
...
therefore its virtual address is simply the virtual address of the preceding section (.tbdata) plus the size of the preceding section (eventually with some padding in order to reach the desired alignment). .init_array (or .preinit_array if present) comes next and its location should be determined the same way, but .tbss is known to be so very special, that it is given a deeply hard-coded treatment inside GNU LD:
/* .tbss sections effectively have zero size. */
if ((os->bfd_section->flags & SEC_HAS_CONTENTS) != 0
|| (os->bfd_section->flags & SEC_THREAD_LOCAL) == 0
|| link_info.relocatable)
dotdelta = TO_ADDR (os->bfd_section->size);
else
dotdelta = 0; // <----------------
dot += dotdelta;
.tbss is not relocatable, it has the SEC_THREAD_LOCAL flag set, and it does not have contents (NOBITS), therefore the else branch is taken. In other words, no matter how large the .tbss is, the linker does not advance the location of the section that follows it (also know as "the dot").
Note also that .tbss sits in a non-loadable ELF segment:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000b1f24 0x00000000000b1f24 R E 200000
LOAD 0x00000000000b2000 0x00000000006b2000 0x00000000006b2000
0x0000000000002288 0x00000000000174d8 RW 200000
NOTE 0x0000000000000158 0x0000000000400158 0x0000000000400158
0x0000000000000044 0x0000000000000044 R 4
TLS 0x00000000000b2000 0x00000000006b2000 0x00000000006b2000 <---+
0x0000000000000020 0x0000000000000060 R 8 |
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 |
0x0000000000000000 0x0000000000000000 RW 8 |
|
Section to Segment mapping: |
Segment Sections... |
00 .note.ABI-tag ... |
01 .tdata .ctors ... |
02 .note.ABI-tag ... |
03 .tdata .tbss <---------------------------------------------------+
04
This is rather simple if you have an understanding about two things:
1) What is SHT_NOBITS
2) What is tbss section
SHT_NOBITS means that this section occupies no space inside file.
Normally, NOBITS sections, like bss are placed after all PROGBITS sections at the end of the loaded segments.
tbss is special section to hold uninitialized thread-local data that contribute to the program's memory image. Take an attention here: this section must hold unique data for each program thread.
Now lets talk about overlapping. We have two possible overlappings -- inside binary file and inside memory.
1) Binary files offset:
There is no data to write under this section in binary. Inside file it holds no space, so linker start next section init_array immediately after tbss declared. You may think about its size not as about size, but as about special service information for code like:
if (isTLSSegment) tlsStartAddr += section->memSize();
So it doesn't overlap anything inside file.
2) Memory offset
The tdata and tbss sections may be possibly modified at startup time by the dynamic linker
performing relocations, but after that the section data is kept around as the initialization image and not modified anymore. For each thread, including the initial one, new memory is allocated into which then the content of the initialization image is copied. This ensures that all threads get the same starting conditions.
This what makes tbss (and tdata) so special.
Do not think about their memory offsets as about statically known -- they are more like "generation patterns" for per-thread work. So they also can not overlap with "normal" memory offsets -- they are being processed in other way.
You may consult with this paper to know more.

Determine program segments (HEADER, TEXT, CONST, etc...) at run time

So i realize I can open a binary up in IDA Pro and determine where the segments start/stop. Is it possible to determine this at run-time in Cocoa?
I'm assuming there are some c-level library functions that enable this, I poked around in the mach headers but couldn't find much :/
Thanks in advance!
Cocoa doesn’t include classes for handling Mach-O files. You need to use the Mach-O functions provided by the system. You were right in read the Mach-O headers.
I’ve coded a small program that accepts as input a Mach-O file name and dumps information about its segments. Note that this program deals with thin files (i.e., not fat/universal) for the x86_64 architecture only.
Note that I’m also not checking every operation and whether the file is a correctly formed Mach-O file. Doing the appropriate checks are left as an exercise to the reader.
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <mach-o/loader.h>
#include <sys/mman.h>
#include <sys/stat.h>
int main(int argc, char *argv[]) {
int fd;
struct stat stat_buf;
size_t size;
char *addr = NULL;
struct mach_header_64 *mh;
struct load_command *lc;
struct segment_command_64 *sc;
// Open the file and get its size
fd = open(argv[1], O_RDONLY);
fstat(fd, &stat_buf);
size = stat_buf.st_size;
// Map the file to memory
addr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_FILE | MAP_PRIVATE, fd, 0);
// The first bytes of a Mach-O file comprise its header
mh = (struct mach_header_64 *)addr;
// Load commands follow the header
addr += sizeof(struct mach_header_64);
printf("There are %d load commands\n", mh->ncmds);
for (int i = 0; i < mh->ncmds; i++) {
lc = (struct load_command *)addr;
if (lc->cmdsize == 0) continue;
// If the load command is a (64-bit) segment,
// print information about the segment
if (lc->cmd == LC_SEGMENT_64) {
sc = (struct segment_command_64 *)addr;
printf("Segment %s\n\t"
"vmaddr 0x%llx\n\t"
"vmsize 0x%llx\n\t"
"fileoff %llu\n\t"
"filesize %llu\n",
sc->segname,
sc->vmaddr,
sc->vmsize,
sc->fileoff,
sc->filesize);
}
// Advance to the next load command
addr += lc->cmdsize;
}
printf("\nDone.\n");
munmap(addr, size);
close(fd);
return 0;
}
You need to compile this program for x86_64 bit only and run it against a x86_64 Mach-O binary. For instance, assuming you’ve saved this program as test.c:
$ clang test.c -arch x86_64 -o test
$ ./test ./test
There are 11 load commands
Segment __PAGEZERO
vmaddr 0x0
vmsize 0x100000000
fileoff 0
filesize 0
Segment __TEXT
vmaddr 0x100000000
vmsize 0x1000
fileoff 0
filesize 4096
Segment __DATA
vmaddr 0x100001000
vmsize 0x1000
fileoff 4096
filesize 4096
Segment __LINKEDIT
vmaddr 0x100002000
vmsize 0x1000
fileoff 8192
filesize 624
Done.
If you want more examples on how to read Mach-O files, cctools on Apple’s Open Source Web site is probably your best bet. You’ll also want to read the Mac OS X ABI Mach-O File Format Reference as well.

Resources