Check if mac executable has debug info - macos

I want to make sure my executable has debug info, trying the linux equivalent doesn't help:
$ file ./my_lovely_program
./my_lovely_program: Mach-O 64-bit executable arm64 # with debug info? without?
EDIT (from the answer of #haggbart)
It seems that my executable has no debug info (?)
$ dwarfdump --debug-info ./compi
./compi: file format Mach-O arm64
.debug_info contents: # <--- empty, right?
And with the other option, I'm not sure:
$ otool -hv ./compi
./compi:
Mach header
magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
MH_MAGIC_64 ARM64 ALL 0x00 EXECUTE 19 1816 NOUNDEFS DYLDLINK TWOLEVEL WEAK_DEFINES BINDS_TO_WEAK PIE
This is very weird because I can perfectly debug it with lldb
(lldb) b main
Breakpoint 1: where = compi`main + 24 at main.cpp:50:9, address = 0x0000000100018650
(lldb) run
Process 6067 launched: '/Users/oren/Downloads/compi' (arm64)
Process 6067 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100018650 compi`main(argc=3, argv=0x000000016fdff7b8) at main.cpp:50:9
47 /*****************/
48 int main(int argc, char **argv)
49 {
-> 50 if (argc == 3)
51 {
52 const char *input = argv[1];
53 const char *output = argv[2];
Target 0: (compi) stopped.

Mach-O isn't like ELF: Its debug info is "sold separately" in a .dSYM file.
When you compile with -g you'll see a file gets generated along side your output, such that:
(~) gcc a.c -o /tmp/a -g2
(~) %ls -lFd /tmp/a /tmp/a.dSYM
-rwxr-xr-x 1 morpheus wheel 34078 Dec 6 12:56 /tmp/a*
drwxr-xr-x 3 morpheus wheel 96 Dec 6 12:56 /tmp/a.dSYM/
The .dSYM is a bundle (i.e. a directory structure) whose Contents/Resources/DWARF has the "companion file":
(~) %file /tmp/a.dSYM/Contents/Resources/DWARF/a
/tmp/a.dSYM/Contents/Resources/DWARF/a: Mach-O 64-bit dSYM companion file arm64
(~) %jtool2 -l /tmp/a.dSYM/Contents/Resources/DWARF/a | grep UUID
LC 00: LC_UUID UUID: BDD5C13E-F7B8-3B4D-BAF9-14DF3CD03724
(~) %jtool2 -l /tmp/a | grep UUID
LC 09: LC_UUID UUID: BDD5C13E-F7B8-3B4D-BAF9-14DF3CD03724
tools like lldb can figure out the debug data by trying for the companion file directory (usually in same location as the binary, or specified in a path), and then check the LC_UUID matches. This enables you to ship the binary without its dSym, and use the dSym when symbolicating a crash report (this is what Apple does). The debug info includes all local variable names, as well as debug_aranges (addr2line), etc:
(~) %jtool2 -l /tmp/a.dSYM/Contents/Resources/DWARF/a | grep DWARF
LC 07: LC_SEGMENT_64 Mem: 0x100009000-0x10000a000 __DWARF
Mem: 0x100009000-0x10000921f __DWARF.__debug_line
Mem: 0x10000921f-0x10000924f __DWARF.__debug_aranges
Mem: 0x10000924f-0x1000093dc __DWARF.__debug_info
Mem: 0x1000093dc-0x100009478 __DWARF.__debug_abbrev
Mem: 0x100009478-0x100009590 __DWARF.__debug_str
Mem: 0x100009590-0x1000095e8 __DWARF.__apple_names
Mem: 0x1000095e8-0x10000960c __DWARF.__apple_namespac
Mem: 0x10000960c-0x100009773 __DWARF.__apple_types
Mem: 0x100009773-0x100009797 __DWARF.__apple_objc
If you really want to get of any debug info - including, say, local function symbols (which are included by default in the binary), strip -d -x is your friend. This operates on the binary.
Note that running "dsymutil" (As suggested in other answers) can be a bit misleading, since in order to display information it will track down the accompanying dSym - which will be present on your machine, but not if you move the binary elsewhere.

If you run :
dsymutil -s ./my_lovely_propgram | grep N_OSO
and it shows output, it means there is debug info.

Related

Why does gdb does not show debug symbols of kernel with debug info?

I am trying to learn more about kernel and driver development, so for that purpose I thought to use KVM and gdb to establish debug session with custom installed kernel (v5.1.0).
The kernel has debug info included, and here is a chunk of .config I used:
$ rg -i "(debug|kalls|GDB_SCRIPTS).*=y" .config
205:CONFIG_KALLSYMS=y
206:CONFIG_KALLSYMS_ALL=y
...
225:CONFIG_SLUB_DEBUG=y
...
9620:CONFIG_DEBUG_INFO=y
9623:CONFIG_DEBUG_INFO_DWARF4=y
9624:CONFIG_GDB_SCRIPTS=y
9640:CONFIG_DEBUG_KERNEL=y
...
By using "-s" option I can connect to Ubuntu 18.04 kernel in my VM, but gdb does not show any symbols:
Reading symbols from vmlinux...
(gdb) target remote :1234
Remote debugging using :1234
0xffffffff8ea4af66 in ?? ()
(gdb) bt
#0 0xffffffff8ea4af66 in ?? ()
#1 0xffffffff8f603e38 in ?? ()
#2 0xffffffff8ea4abb2 in ?? ()
#3 0x0000000000000000 in ?? ()
(gdb) i t
Ambiguous info command "t": target, tasks, terminal, threads, tp, tracepoints, tvariables, type-printers, types.
(gdb) i threads
Id Target Id Frame
* 1 Thread 1 (CPU#0 [halted ]) 0xffffffff8ea4af66 in ?? ()
2 Thread 2 (CPU#1 [halted ]) 0xffffffff8ea4af66 in ?? ()
(gdb) b printk
Breakpoint 1 at 0xffffffff81101fa3: file /home/ilukic/projects/kernel/linux-stable/kernel/printk/printk.c, line 2030.
(gdb) c
Continuing.
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0xffffffff81101fa3
Command aborted.
(gdb) disassemble 0xffffffff81101f83,100
Dump of assembler code from 0xffffffff81101f83 to 0x64:
End of assembler dump.
(gdb) disassemble 0xffffffff81101f83,+100
Dump of assembler code from 0xffffffff81101f83 to 0xffffffff81101fe7:
0xffffffff81101f83 <kmsg_dump_rewind_nolock+19>: Cannot access memory at address 0xffffffff81101f83
(gdb) disassemble 0xffffffff81101fa3,+10
Dump of assembler code from 0xffffffff81101fa3 to 0xffffffff81101fad:
0xffffffff81101fa3 <printk+0>: Cannot access memory at address 0xffffffff81101fa3
At the end, when inspecting /proc/kallsyms on VM (e.g. searching for printk symbol from previous gdb session), no symbol is found:
~$ cat /proc/kallsyms | grep "t printk"
0000000000000000 t printk_safe_log_store
0000000000000000 t printk_late_init
~$ uname -a
Linux ubuntu18 5.1.0 #2 SMP Tue Nov 12 19:01:21 CET 2019 x86_64 x86_64 x86_64 GNU/Linux
On the other hand when using objdump, "printk" can be found in vmlinux and as seen, gdb does not complain about missing symbol when setting a breakpoint.
I am assuming that installation of kernel went well as no errors were reported, still I can't explain why I can't find corresponding symbols in kallsyms.
Other thing that I find strange is when going through /proc/kallsyms why do all the lines start with 0s.
Any ideas why is gdb not showing any symbols?
As #IanAbbott suggested, CONFIG_RANDOMIZE_BASE=y (or "nokaslr" kernel command line argument)
was missing to prevent KASLR.

MachO file format - Value of `fileoff` field in LC_SEGMENT_64 load command

I compiled a simple program such as
int main()
{
return 0;
}
using Clang into an executable and asked otool to report the load commands generated by the compiler. The one I'm interested in is LC_SEGMENT_64, in particular the one that describes the __TEXT segment within the file. The description I get is this:
$ otool -lV foo
foo:
Load command 0
cmd LC_SEGMENT_64
cmdsize 72
segname __PAGEZERO
vmaddr 0x0000000000000000
vmsize 0x0000000100000000
fileoff 0
filesize 0
maxprot ---
initprot ---
nsects 0
flags (none)
Load command 1
cmd LC_SEGMENT_64
cmdsize 312
segname __TEXT
vmaddr 0x0000000100000000
vmsize 0x0000000000001000
fileoff 0
filesize 4096
maxprot rwx
initprot r-x
nsects 3
flags (none)
Section
sectname __text
segname __TEXT
addr 0x0000000100000f90
size 0x000000000000000f
offset 3984
align 2^4 (16)
reloff 0
nreloc 0
type S_REGULAR
attributes PURE_INSTRUCTIONS SOME_INSTRUCTIONS
reserved1 0
reserved2 0
My question is: why is the fileoff field in the second load command set to zero?
Apple's documentation for this field states that
The file is mapped starting at fileoff to the beginning of the segment in memory, vmaddr.
This, initially, led me to believe that this field, in conjunction with filesize, indicated the loader something like this: "Take the contents of the file from fileoff to fileoff + filesize and this is the sequence of instructions you're gonna ask the processor to run". But my assumption doesn't hold if this value is zero, of course.
I thought that, since the segment has at least one section, the loader will use the value of the respective offset in the section's description to locate the code to run, and hence such value isn't exactly needed --- we can see that, in fact, the first section within this segment has a value for the offset field (in this case 3984, which I validated with otool -s __TEXT __text -j foo and indeed refers to the offset at which this section is located within the file).
But, if I do the same thing to the object file generated from the same source file (i.e. a file with type MH_OBJECT instead of MH_EXECUTE), the result I get is this:
$ otool -lV foo.o
foo.o:
Load command 0
cmd LC_SEGMENT_64
cmdsize 312
segname
vmaddr 0x0000000000000000
vmsize 0x0000000000000070
fileoff 464
filesize 112
maxprot rwx
initprot rwx
nsects 3
flags (none)
Section
sectname __text
segname __TEXT
addr 0x0000000000000000
size 0x000000000000000f
offset 464
align 2^4 (16)
reloff 0
nreloc 0
type S_REGULAR
attributes PURE_INSTRUCTIONS SOME_INSTRUCTIONS
reserved1 0
reserved2 0
In this case, the load command does have a value for its fileoff field, which is the same as the one for its first section, __text.
otool makes it hard to realize, but the answer is simple - Observe here:
$ jtool -v -l /tmp/a | grep SEG
LC 00: LC_SEGMENT_64 Mem: 0x000000000-0x100000000 File: Not Mapped ---/--__PAGEZERO
LC 01: LC_SEGMENT_64 Mem: 0x100000000-0x100001000 File: 0x0-0x1000 r-x/rw__TEXT
LC 02: LC_SEGMENT_64 Mem: 0x100001000-0x100002000 File: 0x1000-0x1098 r--/rw__LINKEDIT
The __TEXT segment is mapped from the beginning of the file (or slice, if fat ("universal")). That is, with the Mach-O header. This is actually a feature, because the Mach-O then gets parsed by dyld (your friendly loader) for other load commands (notably libraries). The other issue is that __TEXT.__text is often in the very same page , so you'd have to map the whole page anyway.

Why do common C compilers include the source filename in the output?

I have learnt from this recent answer that gcc and clang include the source filename somewhere in the binary as metadata, even when debugging is not enabled.
I can't really understand why this should be a good idea. Besides the tiny privacy risks, this happens also when one optimizes for the size of the resulting binary (-Os), which looks inefficient.
Why do the compilers include this information?
The reason why GCC includes the filename is mainly for debugging purposes, because it allows a programmer to identify from which source file a given symbol comes from as (tersely) outlined in the ELF spec p1-17 and further expanded upon in some Oracle docs on linking.
An example of using the STT_FILE section is given by this SO question.
I'm still confused why both GCC and Clang still include it even if you specify -g0, but you can stop it from including STT_FILE with -s. I couldn't find any explanation for this, nor could I find an "official reason" why STT_FILE is included in the ELF specification (which is very terse).
I have learnt from this recent answer that gcc includes the source filename somewhere in the binary as metadata, even when debugging is not enabled.
Not quite. In modern ELF object files the file name indeed is a symbol of type FILE:
$ readelf bignum.o # Source bignum.c
[...]
Symbol table (.symtab) contains 36 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS bignum.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 8
9: 00000000000003f0 172 FUNC GLOBAL DEFAULT 1 add
10: 00000000000004a0 104 FUNC GLOBAL DEFAULT 1 copy
However, once stripped, the symbol is gone:
$ strip bignum.o
$ readelf -all bignum.o | grep bignum.c
$
So to keep your privacy, strip the executable, or compile/link with -s.

Documented way to disable ASLR on OS X?

On OS X 10.9 (Mavericks), it's possible to disable address space layout randomization for a single process if you launch the process by calling posix_spawn() and passing the undocumented attribute 0x100. Like this:
extern char **environ;
pid_t pid;
posix_spawnattr_t attr;
posix_spawnattr_init(&attr);
posix_spawnattr_setflags(&attr, 0x100);
posix_spawn(&pid, argv[0], NULL, &attr, argv, environ);
(This is reverse-engineered from Apple's GDB sources.)
The trouble with undocumented features like this is that they tend to disappear without notice. According to this Stack Overflow answer the dynamic linker dyld used to consult the environment variable DYLD_NO_PIE, but this does not work in 10.9; similarly the static linker apparently used to take a --no-pie option, but this is no longer the case.
So is there a documented way to disable ASLR?
(The reason why I need to disable ASLR is to ensure repeatability, when testing and debugging, of code whose behaviour depends on the addresses of objects, for example address-based hash tables and BIBOP-based memory managers.)
Actually, there still is a -no_pie linker flag, but you might have thought it is actually called --no-pie.
Lets have a small test program:
#include <stdio.h>
const char *test = "test";
int main() {
printf("%p\n", (void*)test);
return 0;
}
And compile as usual first:
cc -o test-pie test-pie.c
And check the flags
$ otool -hv test-pie
test-pie:
Mach header
magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
MH_MAGIC_64 X86_64 ALL LIB64 EXECUTE 16 1376 NOUNDEFS DYLDLINK TWOLEVEL PIE
Alright there is a PIE flag in there, let's verify
$ for x in $(seq 1 5); do echo -n "$x "; ./test-pie; done
1 0x10a447f96
2 0x10e3cbf96
3 0x1005daf96
4 0x10df50f96
5 0x104e63f96
That looks random enough.
Now, lets tell the linker we don't want PIE using -Wl,-no_pie:
cc -o test-nopie test-pie.c -Wl,-no_pie
Sure enough the PIE flag is gone:
$ otool -hv test-nopie
test-pie:
Mach header
magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
MH_MAGIC_64 X86_64 ALL LIB64 EXECUTE 16 1376 NOUNDEFS DYLDLINK TWOLEVEL
And test:
$ for x in $(seq 1 5); do echo -n "$x "; ./test-nopie; done
1 0x100000f96
2 0x100000f96
3 0x100000f96
4 0x100000f96
5 0x100000f96
So we make the linker not add the PIE flag, and my Mavericks system seems to still abide by it.
FWIW, the PIE flag is defined and documented in /usr/include/mach-o/loader.h as MH_PIE.
There are tools all over the Internet to clear the PIE flag from existing binaries, e.g. http://src.chromium.org/svn/trunk/src/build/mac/change_mach_o_flags.py
While I cannot offer you a documented way to start a PIE-flagged binary without ASLR, since you want to test code, presumably your own, just linking your test programs with -no_pie or removing the PIE flag afterwards from your test binaries should suffice?

Validating application of -bind_at_load on OS X

I have a binary that I have linked with the -bind_at_load argument to ld. On an ELF system, I'd use -Wl,-z,now and then readelf to verify that the DT_BIND_NOW flag was enabled on the binary. On OS X, how can I verify that the the appropriate flag in the mach header has been ste to honor -bind_at_load? What is the name of the flag, and what value should it be set to?
You can use otool -l /path/to/binary and inspect the LC_DYLD_INFO_ONLY load command. If the binary was linked with -bind_at_load, then the lazy bind offset/size are equal to 0: dyld won’t lazily bind symbols and all symbols are bound when the binary is loaded.
Sample output:
With -bind_at_load
Load command 4
cmd LC_DYLD_INFO_ONLY
cmdsize 48
rebase_off 8192
rebase_size 8
bind_off 8200
bind_size 224
weak_bind_off 0
weak_bind_size 0
lazy_bind_off 0
lazy_bind_size 0
export_off 8424
export_size 48
Without -bind_at_load
Load command 4
cmd LC_DYLD_INFO_ONLY
cmdsize 48
rebase_off 8192
rebase_size 8
bind_off 8200
bind_size 128
weak_bind_off 0
weak_bind_size 0
lazy_bind_off 8328
lazy_bind_size 104
export_off 8432
export_size 48

Resources