Does the loader rewrite absolute addresses in machine code?

Does the loader rewrite absolute addresses in machine code? - windows

Let's say I have the following:
a pref. Image base of 0x40000 and the fact that ASLR is being used.
am I correct if I say that translation would be done like this:
(OG)
0004232f 8b 45 00 MOV EAX,dword ptr [EBP]
00042332 a3 64 00 6C 00 MOV [006C0064],EAX
00042337 8b 45 04 MOV EAX,dword ptr [EBP + 0x4]
(After ASLR)
0066232f 8b 45 00 MOV EAX,dword ptr [EBP]
00662332 a3 64 20 72 00 MOV [00722064],EAX
00662337 8b 45 04 MOV EAX,dword ptr [EBP + 0x4]
is there anything else I overlook that could change the byte order on load?

In 32-bit Windows all DLL files and many (but not all) EXE files have a so-called "Base Relocation Table".
This table contains a list of all absolute addresses contained in the file.
In your example, there is an absolute address (0x006C0064) stored at address 0x00042333.
The table will contain an entry saying that there is an absolute address stored at 0x00042333.
When the executable file or dynamic library is loaded to another address, the loader will indeed "rewrite" all these addresses.
However, "rewriting" is a quite simple operation: A fixed value must simply be added to all these addresses.
Please note that this concept is completely different from the concept used by Linux and other OSs.

Related

How do I implement an FFI from Rust in assembler?

My Rust code needs to make winapi FFIs and I see winapi-rs is very popular. What I need now, is to see the actual instructions of these FFIs. The binary object files are available on github (for example GLU32).
Just as an example, it contains a 663 bytes object file dsjfbs00001.o, which I'd like to disassemble and see the instructions. I've tried without giving an offset (which means it starts at 0):
objdump -b binary -Mintel,x86-64 -m i386 -D dsjfbs00001.o
This line comes from the similar question Disassembling A Flat Binary File Using objdump and I get this output (I show just the first 16 lines, it goes on for 247 lines):
dsjfbs00001.o: file format binary
Disassembly of section .data:
00000000 <.data>:
0: 64 86 07 xchg BYTE PTR fs:[rdi],al
3: 00 00 add BYTE PTR [rax],al
5: 00 00 add BYTE PTR [rax],al
7: 00 84 01 00 00 0a 00 add BYTE PTR [rcx+rax*1+0xa0000],al
e: 00 00 add BYTE PTR [rax],al
10: 00 00 add BYTE PTR [rax],al
12: 04 00 add al,0x0
14: 2e 74 65 cs je 0x7c
17: 78 74 js 0x8d
...
I have some knowledge about assembler, but here I'm at a loss. The executable code obviously doesn't start at 0 so I wonder how can I discover the correct offset?
The output shows that this is the .data section. But how does it tell this? It this a guess? A hexdump returns exactly the same bytes with no header bytes (i.e. such as an elf file would have):
0000000 8664 0007 0000 0000 0184 0000 000a 0000
0000010 0000 0004 742e 7865 0074 0000 0000 0000
Endianness aside, it starts with 0x64, 0x86, 0x07, as seen above for the xchg opcode. So how can it tell it's a .data section? And then... where's the .text section I'm interested in? It never says there's one.
From all of this I deduce that without an actual offset it's impossible to tell where the entry point is. Actually, the initial ~600 bytes contain many zeroes, while the last ~60 bytes have the typical entropy you'd expect from executable code. But I don't know how to determine this offset exactly by searching in the winapi-rs repo (the *.def files look useless to me, they just list the available routine names).
And as an additional question, would it be feasible to create those file on my own? Can't I just take/write some assembly code, produce an object file with NASM or similar, and use that for FFIs from my Rust code? Is this even possible?
Where would I start doing something like this, if I don't even have C/C++ WinAPI header files or Visual Studio?
BTW: I really need just some ~10 functions of GLU32, not the whole winapi.

Cannot get correct immediate mode from GNU assembler for symbols with intel_syntax [duplicate]

I have an instruction written in Intel syntax (using gas as my assembler) that looks like this:
mov rdx, msg_size
...
msg: .ascii "Hello, world!\n"
.set msg_size, . - msg
but that mov instruction is being assembled to mov 0xe,%rdx, rather than mov $0xe,%rdx, as I would expect. How should I write the first instruction (or the definition of msg_size) to get the expected behavior?

Use mov edx, OFFSET symbol to get the symbol "address" as an immediate, rather than loading from it as an address. This works for actual label addresses as well as symbols you set to an integer with .set.
For the msg address (not msg_size assemble-time constant) in 64-bit code, you may want
lea rdx, [RIP+msg] for a PIE executable where static addresses don't fit in 32 bits. How to load address of function or label into register
In GAS .intel_syntax noprefix mode:
OFFSET symbol works like AT&T $symbol. This is somewhat like MASM.
symbol works like AT&T symbol (i.e. a dereference) for unknown symbols.
[symbol] is always an effective-address, never an immediate, in GAS and NASM/YASM. LEA doesn't load from the address but it still uses the memory-operand machine encoding. (That's why lea uses the same syntax).
Interpretation of bare symbol depends on order of declaration
GAS is a one-pass assembler (which goes back and fills in
symbol values once they're known).
It decides on the opcode and encoding for mov rdx, symbol when it first encounters that line. An earlier msize= . - msg or .equ / .set will make it choose mov reg, imm32, but a later directive won't be visible yet.
The default assumption for not-yet-defined symbols is that symbol is an address in some section (like you get from defining it with a label like symbol:, or from .set symbol, .). And because GAS .intel_syntax is like MASM not NASM, a bare symbol is treated like [symbol] - a memory operand.
If you put a .set or msg_length=msg_end - msg directive at the top of your file, before the instructions that reference it, they would assemble to mov reg, imm32 mov-immediate. (Unlike in AT&T syntax where you always need a $ for an immediate even for numeric literals like 1234.)
For example: source and disassembly interleaved with objdump -dS:
Assembled with gcc -g -c foo.s and disassembled with objdump -drwC -S -Mintel foo.o (with as --version = GNU assembler (GNU Binutils) 2.34). We get this:
0000000000000000 <l1>:
.intel_syntax noprefix
l1:
mov eax, OFFSET equsym
0: b8 01 00 00 00 mov eax,0x1
mov eax, equsym #### treated as a load
5: 8b 04 25 01 00 00 00 mov eax,DWORD PTR ds:0x1
mov rax, big #### 32-bit sign-extended absolute load address, even though the constant was unsigned positive
c: 48 8b 04 25 aa aa aa aa mov rax,QWORD PTR ds:0xffffffffaaaaaaaa
mov rdi, OFFSET label
14: 48 c7 c7 00 00 00 00 mov rdi,0x0 17: R_X86_64_32S .text+0x1b
000000000000001b <label>:
label:
nop
1b: 90 nop
.equ equsym, . - label # equsym = 1
big = 0xaaaaaaaa
mov eax, OFFSET equsym
1c: b8 01 00 00 00 mov eax,0x1
mov eax, equsym #### treated as an immediate
21: b8 01 00 00 00 mov eax,0x1
mov rax, big #### constant doesn't fit in 32-bit sign extended, assembler can see it when picking encoding so it picks movabs imm64
26: 48 b8 aa aa aa aa 00 00 00 00 movabs rax,0xaaaaaaaa
It's always safe to use mov edx, OFFSET msg_size to treat any symbol (or even a numeric literal) as an immediate regardless of how it was defined. So it's exactly like AT&T $ except that it's optional when GAS already knows the symbol value is just a number, not an address in some section. For consistency it's probably a good idea to always use OFFSET msg_size so your code doesn't change meaning if some future programmer moves code around so the data section and related directives are no longer first. (Including future you who's forgotten these strange details that are unlike most assemblers.)
BTW, .set is a synonym for .equ, and there's also symbol=value syntax for setting a value which is also synonymous to .set.
Operand-size: generally use 32-bit unless a value needs 64
mov rdx, OFFSET symbol will assemble to mov r/m64, sign_extended_imm32. You don't want that for a small length (vastly less than 4GiB) unless it's a negative constant, not an address. You also don't want movabs r64, imm64 for addresses; that's inefficient.
It's safe under GNU/Linux to write mov edx, OFFSET symbol in a position-dependent executable, and in fact you should always do that or use lea rdx, [rip + symbol], never sign-extended 32-bit immediate unless you're writing code that will be loaded into the high 2GB of virtual address space (e.g. a kernel). How to load address of function or label into register
See also 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIE executables being the default in modern distros.
Tip: if you know the AT&T or NASM syntax, or the NASM syntax, for something, use that to produce the encoding you want and then disassemble with objdump -Mintel to find out the right syntax for .intel_syntax noprefx.
But that doesn't help here because disassembly will just show the numeric literal like mov edx, 123, not mov edx, OFFSET name_not_in_object_file. Looking at gcc -masm=intel compiler output can also help, but again compilers do their own constant-propagation instead of using symbols for assemble-time constants.
BTW, no open-source projects that I'm aware of contain GAS intel_syntax source code. If they use gas, they use AT&T syntax. Otherwise they use NASM/YASM. (You sometimes also see MSVC inline asm in open source projects).
Same effect in AT&T syntax, or for [RIP + symbol]
This is a lot more artificial since you wouldn't normally do this with an integer constant that wasn't an address. I include it here just to show another facet of GAS's behaviour depending on a symbol being defined or not at a point during its 1 pass.
How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? - [RIP + symbol] is interpreted as using relative addressing to reach symbol, not actually adding two addresses. But [RIP + 4] is taken literally, as an offset relative to the end of this instruction.
So again, it matters what GAS knows about a symbol when it reaches an instruction that references it, because it's 1-pass. If undefined, it assumes it's a normal symbol. If defined as a numeric value with no section associated, it works like a literal number.
_start:
foo=4
jmpq *foo(%rip)
jmpq *bar(%rip)
bar=4
That assembles to the first jump being the same as jmp *4(%rip) loading a pointer from 4 bytes past the end of the current instruction. But the 2nd jump using a symbol relocation for bar, using a RIP-relative addressing mode to reach the absolute address of the symbol bar, whatever that may turn out to be.
0000000000000000 <.text>:
0: ff 25 04 00 00 00 jmp QWORD PTR [rip+0x4] # a <.text+0xa>
6: ff 25 00 00 00 00 jmp QWORD PTR [rip+0x0] # c <bar+0x8> 8: R_X86_64_PC32 *ABS*
After linking with ld foo.o, the executable has:
401000: ff 25 04 00 00 00 jmp *0x4(%rip) # 40100a <bar+0x401006>
401006: ff 25 f8 ef bf ff jmp *-0x401008(%rip) # 4 <bar>

Calls to Addresses in the Middle of Routines

I am tracing wireshark-2.6.10 using Pin. At several points during the initialization, I can see some calls, such as this:
00000000004e9400 <__libc_csu_init##Base>:
...
4e9449: 41 ff 14 dc callq *(%r12,%rbx,8)
...
The target of this call is 0x197db0, shown here:
0000000000197cb0 <_start##Base>:
...
197db0: 55 push %rbp
197db1: 48 89 e5 mov %rsp,%rbp
197db4: 5d pop %rbp
197db5: e9 66 ff ff ff jmpq 197d20 <_start##Base+0x70>
197dba: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
...
Pin says that this is in the middle of the containing routine, i.e., _start##Base. But, when I reach this target using gdb, I see the following output:
>│0x5555556ebdb0 <frame_dummy> push %rbp
│0x5555556ebdb1 <frame_dummy+1> mov %rsp,%rbp
│0x5555556ebdb4 <frame_dummy+4> pop %rbp
│0x5555556ebdb5 <frame_dummy+5> jmpq 0x5555556ebd20 <register_tm_clones>
│0x5555556ebdba <frame_dummy+10> nopw 0x0(%rax,%rax,1)
│0x5555556ebdc0 <main_window_update()> xor %edi,%edi
Note that if I subtract the bias value, the runtime target address will be consistent with the compile time value (i.e., 0x5555556ebdb0 - 0x555555554000 = 0x197db0). It seems that there exists a pseudo-routine called frame_dummy inside _start##Base. How is that possible? How can I extract the addresses for these pseudo-routines, beforehand (i.e., before execution)?
UPDATE:
These types of calls to the middle of functions were not present in GIMP and Anjuta (which are written almost purely in C and built from source). But are present in Inkscape and Wireshark (written in C++, although I do not think that the language is the cause. These two were installed from packages.).
At first, it seemed that this situation occurs only during the initialization and before calling the main() function. But, at least in wireshark-2.6.10 this occurs at least in one place after main() starts. Here, we have wireshark-qt.cpp: Lines 522-524 (which is part of main()).
/* Get the compile-time version information string */
comp_info_str = get_compiled_version_info(get_wireshark_qt_compiled_info,
get_gui_compiled_info);
This is a call to get_compiled_version_info(). In assembly, the function is called at address 0x5555556e74c2 (0x1934c2 without bias), as shown below:
>│0x5555556e74c2 <main(int, char**)+178> callq 0x5555556f5870 <get_compiled_version_info>
│0x5555556e74c7 <main(int, char**)+183> lea 0x4972(%rip),%rdi # 0x5555556ebe40 <get_wireshark_runtime_info(_GString*)>
│0x5555556e74ce <main(int, char**)+190> mov %rax,%r13
Again, the target is in the middle of another function, _ZN7QStringD1Ev##Base:
00000000001980f0 <_ZN7QStringD1Ev##Base>:
...
1a1870: 41 54 push %r12
...
This is the output of gdb (0x5555556f5870 - 0x555555554000 = 0x1a1870):
>│0x5555556f5870 <get_compiled_version_info> push %r12
│0x5555556f5872 <get_compiled_version_info+2> mov %rdi,%r12
│0x5555556f5875 <get_compiled_version_info+5> push %rbp
│0x5555556f5876 <get_compiled_version_info+6> lea 0x349445(%rip),%rdi # 0x555555a3ecc2
As can be seen, the debugger recognizes that this address is the start address of get_compiled_version_info(). This is because it has access to debug_info. In all cases that I found, the symbol for these pseudo-routines were removed from the original binary (because .symtab was removed from the binary). But the strange thing is that it is located inside _ZN7QStringD1Ev##Base. Therefore, Pin considers get_compiled_version_info() to be inside _ZN7QStringD1Ev##Base.

How is that possible?
The frame_dummy is a bona-fide C function. If Pin thinks it's in the middle of _start, it's probably because:
_start is an assembly function, and
its .st_size is set incorrectly in the symbol table.
You can confirm this by looking at readelf -Ws a.out | egrep ' (_start|frame_dummy)'.
You are probably using the binary linked with fairly old GLIBC.
GLIBC used to generate C runtime startup files (whence _start comes from) by using gcc -S to create assembly from C source, then splitting and editing the assembly with sed. Getting .size directive wrong was one problem with that approach, and it is no longer used on x86_64 as of 2012 (commit).
How can I extract the addresses for these pseudo-routines, beforehand (i.e., before execution)?
Pin doesn't magically create these pseudo-routines, they must be visible in the readelf -Ws output of the original binary.

Linking two .o files together

I have two .asm files, one that calls a function inside the other. My files look like:
mainProg.asm:
global main
extern factorial
section .text
main:
;---snip---
push rcx
call factorial
pop rcx
;---snip---
ret
factorial.asm:
section .text
factorial:
cmp rdi, 0
je l2
mov rax, 1
l1:
mul rdi
dec rdi
jnz l1
ret
l2:
mov rax, 1
ret
(Yes, there's some things I could improve with the implementation.)
I tried to compile them according to the steps at How to link two nasm source files:
$ nasm -felf64 -o factorial.o factorial.asm
$ nasm -felf64 -o mainProg.o mainProg.asm
$ gcc -o mainProg mainProg.o factorial.o
The first two commands work without issue, but the last fails with
mainProg.o: In function `main':
mainProg.asm:(.text+0x22): undefined reference to `factorial'
collect2: error: ld returned 1 exit status
Changing the order of the object files doesn't change the error.
I tried searching for solutions to link two .o files, and I found the question C Makefile given two .o files. As mentioned there, I ran objdump -S factorial.o and got
factorial.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <factorial>:
0: 48 83 ff 00 cmp $0x0,%rdi
4: 74 0e je 14 <l2>
6: b8 01 00 00 00 mov $0x1,%eax
000000000000000b <l1>:
b: 48 f7 e7 mul %rdi
e: 48 ff cf dec %rdi
11: 75 f8 jne b <l1>
13: c3 retq
0000000000000014 <l2>:
14: b8 01 00 00 00 mov $0x1,%eax
19: c3 retq
which is pretty much identical to the source file. It clearly contains the factorial function, so why doesn't ld detect it? Is there a different method to link two .o files?

You need a global factorial assembler directive in factorial.asm. Without that, it's still in the symbol table, but the linker won't consider it for linking between objects.
A label like factorial: is half way between a global/external symbol and a local label like .loop1: would make (not present in the object file at all). Local labels are a good way to get less messy disassembly, with one block per function instead of a separate block starting after every branch target.
Non-global symbols are only useful for disassembly and stuff like that, AFAIK. I think they would get stripped, along with debug information, by strip.
Also, note that imul rax, rdi runs faster, because it doesn't have to store the high half of the result in %rdx, or even calculate it.
Also note that you can objdump -Mintel -d to get intel-syntax disassembly. Agner Fog's objconv is also very nice, but it's more typing because the output doesn't go to stdout by default. (Although a shell wrapper function or script can solve that.)
Anyway, this would be better:
global factorial
factorial:
mov eax, 1 ; depending on the assembler, might save a REX prefix
; early-out branch after setting rax, instead of duplicating the constant
test rdi, rdi ; test is shorter than compare-against-zero
jz .early_out
.loop: ; local label won't appear in the object file
imul rax, rdi
dec rdi
jnz .loop
.early_out:
ret
Why does main push/pop rcx? If you're writing functions that follow the standard ABI (definitely a good idea unless there's a large performance gain), and you want something to survive a call, keep it in a call-preserved register like rbx.

How do I get full assembler output in gcc?

I know I can get the assembler source code generated by the compiler by using:
gcc -S ...
even though that annoyingly doesn't give me an object file as part of the process.
But how can I get everything about the compiled code? I mean addresses, the bytes generated and so forth.
The instructions output by gcc -S do not tell me anything about instruction lengths or encodings, which is what I want to see.

I like objdump for this, but the most useful options are non-obvious - especially if you're using it on an object file which contains relocations, rather than a final binary.
objdump -d some_binary does a reasonable job.
objdump -d some_object.o is less useful because calls to external functions don't get disassembled helpfully:
...
00000005 <foo>:
5: 55 push %ebp
6: 89 e5 mov %esp,%ebp
8: 53 push %ebx
...
29: c7 04 24 00 00 00 00 movl $0x0,(%esp)
30: e8 fc ff ff ff call 31 <foo+0x2c>
35: 89 d8 mov %ebx,%eax
...
The call is actually to printf()... adding the -r flag helps with that; it marks relocations. objdump -dr some_object.o gives:
...
29: c7 04 24 00 00 00 00 movl $0x0,(%esp)
2c: R_386_32 .rodata.str1.1
30: e8 fc ff ff ff call 31 <foo+0x2c>
31: R_386_PC32 printf
...
Then, I find it useful to see each line annotated as <symbol+offset>. objdump has a handy option for that, but it has the annoying side effect of turning off the dump of the actual bytes - objdump --prefix-addresses -dr some_object.o gives:
...
00000005 <foo> push %ebp
00000006 <foo+0x1> mov %esp,%ebp
00000008 <foo+0x3> push %ebx
...
But it turns out that you can undo that by providing another obscure option, finally arriving at my favourite objdump incantation:
objdump --prefix-addresses --show-raw-insn -dr file.o
which gives output like this:
...
00000005 <foo> 55 push %ebp
00000006 <foo+0x1> 89 e5 mov %esp,%ebp
00000008 <foo+0x3> 53 push %ebx
...
00000029 <foo+0x24> c7 04 24 00 00 00 00 movl $0x0,(%esp)
2c: R_386_32 .rodata.str1.1
00000030 <foo+0x2b> e8 fc ff ff ff call 00000031 <foo+0x2c>
31: R_386_PC32 printf
00000035 <foo+0x30> 89 d8 mov %ebx,%eax
...
And if you've built with debugging symbols (i.e. compiled with -g), and you replace the -dr with -Srl, it will attempt to annotate the output with the corresponding source lines.

The easiest way to get a quick listing is to use the -a option to the assembler, which you can do by putting -Wa,-a on the gcc command line. You can use various modifiers to the a option to affect exactly what comes out -- see the as(1) man page.

It sounds to me like you want a disassembler. objdump is pretty much the standard (otool on Mac OS X); in concert with whatever map file information your linker gives you, the disassembly of your object file should give you everything you want.

gcc will produce an assembly language source file. You can then use as -a yourfile.S to produce a listing that includes offsets and encoded bytes for each instruction. -a also has some sub-options to control what shows up in the listing file (as --help will give a list of them along with the other available options).

nasm -f elf xx.asm -l x.lst
gcc xx.c xx.o -o xx
generates a 'list' file x.lst which is only for xx.asm
for xx.c along with xx.asm you can compile them both and then use 'gdb' - gnu debugger

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio