I have two .asm files, one that calls a function inside the other. My files look like:
mainProg.asm:
global main
extern factorial
section .text
main:
;---snip---
push rcx
call factorial
pop rcx
;---snip---
ret
factorial.asm:
section .text
factorial:
cmp rdi, 0
je l2
mov rax, 1
l1:
mul rdi
dec rdi
jnz l1
ret
l2:
mov rax, 1
ret
(Yes, there's some things I could improve with the implementation.)
I tried to compile them according to the steps at How to link two nasm source files:
$ nasm -felf64 -o factorial.o factorial.asm
$ nasm -felf64 -o mainProg.o mainProg.asm
$ gcc -o mainProg mainProg.o factorial.o
The first two commands work without issue, but the last fails with
mainProg.o: In function `main':
mainProg.asm:(.text+0x22): undefined reference to `factorial'
collect2: error: ld returned 1 exit status
Changing the order of the object files doesn't change the error.
I tried searching for solutions to link two .o files, and I found the question C Makefile given two .o files. As mentioned there, I ran objdump -S factorial.o and got
factorial.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <factorial>:
0: 48 83 ff 00 cmp $0x0,%rdi
4: 74 0e je 14 <l2>
6: b8 01 00 00 00 mov $0x1,%eax
000000000000000b <l1>:
b: 48 f7 e7 mul %rdi
e: 48 ff cf dec %rdi
11: 75 f8 jne b <l1>
13: c3 retq
0000000000000014 <l2>:
14: b8 01 00 00 00 mov $0x1,%eax
19: c3 retq
which is pretty much identical to the source file. It clearly contains the factorial function, so why doesn't ld detect it? Is there a different method to link two .o files?
You need a global factorial assembler directive in factorial.asm. Without that, it's still in the symbol table, but the linker won't consider it for linking between objects.
A label like factorial: is half way between a global/external symbol and a local label like .loop1: would make (not present in the object file at all). Local labels are a good way to get less messy disassembly, with one block per function instead of a separate block starting after every branch target.
Non-global symbols are only useful for disassembly and stuff like that, AFAIK. I think they would get stripped, along with debug information, by strip.
Also, note that imul rax, rdi runs faster, because it doesn't have to store the high half of the result in %rdx, or even calculate it.
Also note that you can objdump -Mintel -d to get intel-syntax disassembly. Agner Fog's objconv is also very nice, but it's more typing because the output doesn't go to stdout by default. (Although a shell wrapper function or script can solve that.)
Anyway, this would be better:
global factorial
factorial:
mov eax, 1 ; depending on the assembler, might save a REX prefix
; early-out branch after setting rax, instead of duplicating the constant
test rdi, rdi ; test is shorter than compare-against-zero
jz .early_out
.loop: ; local label won't appear in the object file
imul rax, rdi
dec rdi
jnz .loop
.early_out:
ret
Why does main push/pop rcx? If you're writing functions that follow the standard ABI (definitely a good idea unless there's a large performance gain), and you want something to survive a call, keep it in a call-preserved register like rbx.
Related
During the compilation process, the linker maps our code text content into the .text in the code memory section. I would like to know what is the meaning of the text content, does it mean the actual code in text or in assembly?
Thanks a lot!
does it mean the actual code in text or in assembly?
Neither: it's actual code in machine instructions.
For example:
$ cat > t.c
int foo() { return 42; }
$ gcc -c t.c
$ objdump -d t.o
t.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 2a 00 00 00 mov $0x2a,%eax
9: 5d pop %rbp
a: c3 retq
The contents of the .text section is the following 11 bytes:
554889e5b82a0000005dc3
Update:
is it correct to say that it contains the assembly code which converted into machine readable binary format?
You could say that, but it not very clear.
Perhaps "contains machine readable binary instructions, produced by compiling and assembling the program source, and applying relocations". (That last part is what the linker does, not demonstrated in the example above.)
I have an instruction written in Intel syntax (using gas as my assembler) that looks like this:
mov rdx, msg_size
...
msg: .ascii "Hello, world!\n"
.set msg_size, . - msg
but that mov instruction is being assembled to mov 0xe,%rdx, rather than mov $0xe,%rdx, as I would expect. How should I write the first instruction (or the definition of msg_size) to get the expected behavior?
Use mov edx, OFFSET symbol to get the symbol "address" as an immediate, rather than loading from it as an address. This works for actual label addresses as well as symbols you set to an integer with .set.
For the msg address (not msg_size assemble-time constant) in 64-bit code, you may want
lea rdx, [RIP+msg] for a PIE executable where static addresses don't fit in 32 bits. How to load address of function or label into register
In GAS .intel_syntax noprefix mode:
OFFSET symbol works like AT&T $symbol. This is somewhat like MASM.
symbol works like AT&T symbol (i.e. a dereference) for unknown symbols.
[symbol] is always an effective-address, never an immediate, in GAS and NASM/YASM. LEA doesn't load from the address but it still uses the memory-operand machine encoding. (That's why lea uses the same syntax).
Interpretation of bare symbol depends on order of declaration
GAS is a one-pass assembler (which goes back and fills in
symbol values once they're known).
It decides on the opcode and encoding for mov rdx, symbol when it first encounters that line. An earlier msize= . - msg or .equ / .set will make it choose mov reg, imm32, but a later directive won't be visible yet.
The default assumption for not-yet-defined symbols is that symbol is an address in some section (like you get from defining it with a label like symbol:, or from .set symbol, .). And because GAS .intel_syntax is like MASM not NASM, a bare symbol is treated like [symbol] - a memory operand.
If you put a .set or msg_length=msg_end - msg directive at the top of your file, before the instructions that reference it, they would assemble to mov reg, imm32 mov-immediate. (Unlike in AT&T syntax where you always need a $ for an immediate even for numeric literals like 1234.)
For example: source and disassembly interleaved with objdump -dS:
Assembled with gcc -g -c foo.s and disassembled with objdump -drwC -S -Mintel foo.o (with as --version = GNU assembler (GNU Binutils) 2.34). We get this:
0000000000000000 <l1>:
.intel_syntax noprefix
l1:
mov eax, OFFSET equsym
0: b8 01 00 00 00 mov eax,0x1
mov eax, equsym #### treated as a load
5: 8b 04 25 01 00 00 00 mov eax,DWORD PTR ds:0x1
mov rax, big #### 32-bit sign-extended absolute load address, even though the constant was unsigned positive
c: 48 8b 04 25 aa aa aa aa mov rax,QWORD PTR ds:0xffffffffaaaaaaaa
mov rdi, OFFSET label
14: 48 c7 c7 00 00 00 00 mov rdi,0x0 17: R_X86_64_32S .text+0x1b
000000000000001b <label>:
label:
nop
1b: 90 nop
.equ equsym, . - label # equsym = 1
big = 0xaaaaaaaa
mov eax, OFFSET equsym
1c: b8 01 00 00 00 mov eax,0x1
mov eax, equsym #### treated as an immediate
21: b8 01 00 00 00 mov eax,0x1
mov rax, big #### constant doesn't fit in 32-bit sign extended, assembler can see it when picking encoding so it picks movabs imm64
26: 48 b8 aa aa aa aa 00 00 00 00 movabs rax,0xaaaaaaaa
It's always safe to use mov edx, OFFSET msg_size to treat any symbol (or even a numeric literal) as an immediate regardless of how it was defined. So it's exactly like AT&T $ except that it's optional when GAS already knows the symbol value is just a number, not an address in some section. For consistency it's probably a good idea to always use OFFSET msg_size so your code doesn't change meaning if some future programmer moves code around so the data section and related directives are no longer first. (Including future you who's forgotten these strange details that are unlike most assemblers.)
BTW, .set is a synonym for .equ, and there's also symbol=value syntax for setting a value which is also synonymous to .set.
Operand-size: generally use 32-bit unless a value needs 64
mov rdx, OFFSET symbol will assemble to mov r/m64, sign_extended_imm32. You don't want that for a small length (vastly less than 4GiB) unless it's a negative constant, not an address. You also don't want movabs r64, imm64 for addresses; that's inefficient.
It's safe under GNU/Linux to write mov edx, OFFSET symbol in a position-dependent executable, and in fact you should always do that or use lea rdx, [rip + symbol], never sign-extended 32-bit immediate unless you're writing code that will be loaded into the high 2GB of virtual address space (e.g. a kernel). How to load address of function or label into register
See also 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIE executables being the default in modern distros.
Tip: if you know the AT&T or NASM syntax, or the NASM syntax, for something, use that to produce the encoding you want and then disassemble with objdump -Mintel to find out the right syntax for .intel_syntax noprefx.
But that doesn't help here because disassembly will just show the numeric literal like mov edx, 123, not mov edx, OFFSET name_not_in_object_file. Looking at gcc -masm=intel compiler output can also help, but again compilers do their own constant-propagation instead of using symbols for assemble-time constants.
BTW, no open-source projects that I'm aware of contain GAS intel_syntax source code. If they use gas, they use AT&T syntax. Otherwise they use NASM/YASM. (You sometimes also see MSVC inline asm in open source projects).
Same effect in AT&T syntax, or for [RIP + symbol]
This is a lot more artificial since you wouldn't normally do this with an integer constant that wasn't an address. I include it here just to show another facet of GAS's behaviour depending on a symbol being defined or not at a point during its 1 pass.
How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? - [RIP + symbol] is interpreted as using relative addressing to reach symbol, not actually adding two addresses. But [RIP + 4] is taken literally, as an offset relative to the end of this instruction.
So again, it matters what GAS knows about a symbol when it reaches an instruction that references it, because it's 1-pass. If undefined, it assumes it's a normal symbol. If defined as a numeric value with no section associated, it works like a literal number.
_start:
foo=4
jmpq *foo(%rip)
jmpq *bar(%rip)
bar=4
That assembles to the first jump being the same as jmp *4(%rip) loading a pointer from 4 bytes past the end of the current instruction. But the 2nd jump using a symbol relocation for bar, using a RIP-relative addressing mode to reach the absolute address of the symbol bar, whatever that may turn out to be.
0000000000000000 <.text>:
0: ff 25 04 00 00 00 jmp QWORD PTR [rip+0x4] # a <.text+0xa>
6: ff 25 00 00 00 00 jmp QWORD PTR [rip+0x0] # c <bar+0x8> 8: R_X86_64_PC32 *ABS*
After linking with ld foo.o, the executable has:
401000: ff 25 04 00 00 00 jmp *0x4(%rip) # 40100a <bar+0x401006>
401006: ff 25 f8 ef bf ff jmp *-0x401008(%rip) # 4 <bar>
I am tracing wireshark-2.6.10 using Pin. At several points during the initialization, I can see some calls, such as this:
00000000004e9400 <__libc_csu_init##Base>:
...
4e9449: 41 ff 14 dc callq *(%r12,%rbx,8)
...
The target of this call is 0x197db0, shown here:
0000000000197cb0 <_start##Base>:
...
197db0: 55 push %rbp
197db1: 48 89 e5 mov %rsp,%rbp
197db4: 5d pop %rbp
197db5: e9 66 ff ff ff jmpq 197d20 <_start##Base+0x70>
197dba: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
...
Pin says that this is in the middle of the containing routine, i.e., _start##Base. But, when I reach this target using gdb, I see the following output:
>│0x5555556ebdb0 <frame_dummy> push %rbp
│0x5555556ebdb1 <frame_dummy+1> mov %rsp,%rbp
│0x5555556ebdb4 <frame_dummy+4> pop %rbp
│0x5555556ebdb5 <frame_dummy+5> jmpq 0x5555556ebd20 <register_tm_clones>
│0x5555556ebdba <frame_dummy+10> nopw 0x0(%rax,%rax,1)
│0x5555556ebdc0 <main_window_update()> xor %edi,%edi
Note that if I subtract the bias value, the runtime target address will be consistent with the compile time value (i.e., 0x5555556ebdb0 - 0x555555554000 = 0x197db0). It seems that there exists a pseudo-routine called frame_dummy inside _start##Base. How is that possible? How can I extract the addresses for these pseudo-routines, beforehand (i.e., before execution)?
UPDATE:
These types of calls to the middle of functions were not present in GIMP and Anjuta (which are written almost purely in C and built from source). But are present in Inkscape and Wireshark (written in C++, although I do not think that the language is the cause. These two were installed from packages.).
At first, it seemed that this situation occurs only during the initialization and before calling the main() function. But, at least in wireshark-2.6.10 this occurs at least in one place after main() starts. Here, we have wireshark-qt.cpp: Lines 522-524 (which is part of main()).
/* Get the compile-time version information string */
comp_info_str = get_compiled_version_info(get_wireshark_qt_compiled_info,
get_gui_compiled_info);
This is a call to get_compiled_version_info(). In assembly, the function is called at address 0x5555556e74c2 (0x1934c2 without bias), as shown below:
>│0x5555556e74c2 <main(int, char**)+178> callq 0x5555556f5870 <get_compiled_version_info>
│0x5555556e74c7 <main(int, char**)+183> lea 0x4972(%rip),%rdi # 0x5555556ebe40 <get_wireshark_runtime_info(_GString*)>
│0x5555556e74ce <main(int, char**)+190> mov %rax,%r13
Again, the target is in the middle of another function, _ZN7QStringD1Ev##Base:
00000000001980f0 <_ZN7QStringD1Ev##Base>:
...
1a1870: 41 54 push %r12
...
This is the output of gdb (0x5555556f5870 - 0x555555554000 = 0x1a1870):
>│0x5555556f5870 <get_compiled_version_info> push %r12
│0x5555556f5872 <get_compiled_version_info+2> mov %rdi,%r12
│0x5555556f5875 <get_compiled_version_info+5> push %rbp
│0x5555556f5876 <get_compiled_version_info+6> lea 0x349445(%rip),%rdi # 0x555555a3ecc2
As can be seen, the debugger recognizes that this address is the start address of get_compiled_version_info(). This is because it has access to debug_info. In all cases that I found, the symbol for these pseudo-routines were removed from the original binary (because .symtab was removed from the binary). But the strange thing is that it is located inside _ZN7QStringD1Ev##Base. Therefore, Pin considers get_compiled_version_info() to be inside _ZN7QStringD1Ev##Base.
How is that possible?
The frame_dummy is a bona-fide C function. If Pin thinks it's in the middle of _start, it's probably because:
_start is an assembly function, and
its .st_size is set incorrectly in the symbol table.
You can confirm this by looking at readelf -Ws a.out | egrep ' (_start|frame_dummy)'.
You are probably using the binary linked with fairly old GLIBC.
GLIBC used to generate C runtime startup files (whence _start comes from) by using gcc -S to create assembly from C source, then splitting and editing the assembly with sed. Getting .size directive wrong was one problem with that approach, and it is no longer used on x86_64 as of 2012 (commit).
How can I extract the addresses for these pseudo-routines, beforehand (i.e., before execution)?
Pin doesn't magically create these pseudo-routines, they must be visible in the readelf -Ws output of the original binary.
To the best of my knowledge, x86-64 requires the stack to be 16-byte aligned before a call, while gcc with -m32 doesn't require this for main.
I have the following testing code:
.data
intfmt: .string "int: %d\n"
testint: .int 20
.text
.globl main
main:
mov %esp, %ebp
push testint
push $intfmt
call printf
mov %ebp, %esp
ret
Build with as --32 test.S -o test.o && gcc -m32 test.o -o test. I am aware that syscall write exists, but to my knowledge it cannot print ints and floats the way printf can.
After entering main, a 4 byte return address is on the stack. Then interpreting this code naively, the two push calls each put 4 bytes on the stack, so call needs another 4 byte value pushed to be aligned.
Here is the objdump of the binary generated by gas and gcc:
0000053d <main>:
53d: 89 e5 mov %esp,%ebp
53f: ff 35 1d 20 00 00 pushl 0x201d
545: 68 14 20 00 00 push $0x2014
54a: e8 fc ff ff ff call 54b <main+0xe>
54f: 89 ec mov %ebp,%esp
551: c3 ret
552: 66 90 xchg %ax,%ax
554: 66 90 xchg %ax,%ax
556: 66 90 xchg %ax,%ax
558: 66 90 xchg %ax,%ax
55a: 66 90 xchg %ax,%ax
55c: 66 90 xchg %ax,%ax
55e: 66 90 xchg %ax,%ax
I am very confused about the push instructions generated.
If two 4 byte values are pushed, how is alignment achieved?
Why is 0x2014 pushed instead of 0x14? What is 0x201d?
What does call 54b even achieve? Output of hd matches objdump. Why is this different in gdb? Is this the dynamic linker?
B+>│0x5655553d <main> mov %esp,%ebp │
│0x5655553f <main+2> pushl 0x5655701d │
│0x56555545 <main+8> push $0x56557014 │
│0x5655554a <main+13> call 0xf7e222d0 <printf> │
│0x5655554f <main+18> mov %ebp,%esp │
│0x56555551 <main+20> ret
Resources on what goes on when a binary is actually executed are appreciated, since I don't know what's actually going on and the tutorials I've read don't cover it. I'm in the process of reading through How programs get run: ELF binaries.
The i386 System V ABI does guarantee / require 16 byte stack alignment before a call, like I said at the top of my answer that you linked. (Unless you're calling a private helper function, in which case you can make up your own rules for alignment, arg-passing, and which registers are clobbered for that function.)
Functions are allowed to crash or misbehave if you violate this ABI requirement, but are not required to. e.g. scanf in x86-64 Ubuntu glibc (as compiled by recent gcc) only recently started doing that: scanf Segmentation faults when called from a function that doesn't change RSP
Functions can depend on stack alignment for performance (to align a double or array of doubles to avoid cache-line splits when accessing them).
Usually the only case where a function depends on stack alignment for correctness is when compiled to use SSE/SSE2, so it can use 16-byte alignment-required loads/stores to copy a struct or array (movaps or movdqa), or to actually auto-vectorize a loop over a local array.
I think Ubuntu doesn't compile their 32-bit libraries with SSE (except functions like memcpy that use runtime dispatching), so they can still work on ancient CPUs like Pentium II. Multiarch libraries on an x86-64 system should assume SSE2, but with 4-byte pointers it's less likely that 32-bit functions would have 16 byte structs to copy.
Anyway, whatever the reason, obviously printf in your 32-bit build of glibc doesn't actually depend on 16-byte stack alignment for correctness, so it doesn't fault even when you misalign the stack.
Why is 0x2014 pushed instead of 0x14? What is 0x201d?
0x14 (decimal 20) is the value in memory at that location. It will be loaded at runtime, because you used push r/m32, not push $20 (or an assemble time constant like .equ testint, 20 or testint = 20).
You used gcc -m32 to make a PIE (Position Independent Executable), which is relocated at runtime, because that's the default on Ubuntu's gcc.
0x2014 is the offset relative to the start of the file. If you disassemble at runtime after running the program, you'll see a real address.
Same for call 54b. It's presuambly a call to the PLT (which is near the start of the file / text segment, hence the low address).
If you disassembled with objdump -drwC, you'd see symbol relocation info. (I like -Mintel as well, but beware it's MASM-like, not NASM).
You can link with gcc -m32 -no-pie to make classic position-dependent executables. I'd definitely recommend that especially for 32-bit code, and especially if you're compiling C, use gcc -m32 -no-pie -fno-pie to get non-PIE code-gen as well as linking into a non-PIE executable. (see 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIEs.)
Problem
When I compile my assembly code with as (binutils) and link using link.exe (Visual Studio 2015) the program crashes because of an unrelocated address.
When linking with gcc (gcc hello-64-gas.obj -o hello-64-gas.exe) the program runs correctly without crash though.
Am I correctly assuming that the object file generated by as should be compiler independent, since abi compatibility problems are in the hands of the assembly code writer?
Since I am a beginner, any explanation of my mistakes/incorrect assumptions is appreciated.
Platform
Windows 10, 64 bit
Linker: Visual Studio 2015 using the native command tools command prompt (x64)
Compiler: as from MinGW-w64
Example
The following code does not link correctly:
# hello-64-gas.asm print a string using printf
# Assemble: as hello-64-gas.asm -o hello-64-gas.obj --64
# Link: link -subsystem:CONSOLE hello-64-gas.obj -out:hello-64-gas.exe libcmt.lib libvcruntime.lib libucrt.lib legacy_stdio_definitions.lib
.intel_syntax noprefix
.global main
# Declare needed C functions
.extern printf
.section .data
msg: .asciz "Hello world"
fmt: .asciz "%s(%d; %f)\n"
myDouble: .double 2.33, -1.0
.text
main:
sub rsp, 8*5
mov rcx, offset flat: fmt
mov rdx, offset flat: msg
mov r8, 0xFF
mov r9, offset flat: myDouble
mov r9, [r9]
movq xmm4, r9
call printf
add rsp, 8*5
mov rax, 0
ret
When debugging it seems mov r9, offset flat: myDouble is not relocated: mov r9,18h, where 18h would be correct if the .data section where at position zero. Looking at the relocation table with objdump -dr hello-64-gas.obj yields:
...
19: 49 c7 c1 18 00 00 00 mov $0x18,%r9
1c: R_X86_64_32S .data
...
Variation (workaround?)
Replacing mov with movabs seems to work:
# hello-64-gas.asm print a string using printf
# Assemble: as hello-64-gas.asm -o hello-64-gas.obj --64
# Link: link -subsystem:CONSOLE hello-64-gas.obj -out:hello-64-gas.exe libcmt.lib libvcruntime.lib libucrt.lib legacy_stdio_definitions.lib
.intel_syntax noprefix
.global main
# Declare needed C functions
.extern printf
.section .data
msg: .asciz "Hello world"
fmt: .asciz "%s(%d; %f)\n"
myDouble: .double 2.33, -1.0
.text
main:
sub rsp, 8*5
movabs rcx, offset flat: fmt
movabs rdx, offset flat: msg
mov r8, 0xFF
movabs r9, offset flat: myDouble
mov r9, [r9]
movq xmm4, r9
call printf
add rsp, 8*5
mov rax, 0
ret
This does somehow run correctly when linked using link.exe.
The relocation that the GNU assembler is using for your references to myDouble, along with fmt and msg, isn't supported by Microsoft's linker. This relocation, called R_X86_64_32S by the GNU utilities and having a value of 0x11, isn't documented in Microsoft's PECOFF specification. As can be evidenced by using Microsoft's DUMPBIN on your object file, Microsoft's linker seems to use relocations with this value for some other undocumented purpose:
RELOCATIONS #1
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000007 EHANDLER 7 .data
0000000E EHANDLER 7 .data
0000001C EHANDLER 7 .data
00000029 REL32 00000000 C printf
As work around you can use either use:
a LEA instruction with RIP relative addressing, which generates a R_X86_64_PC32/REL32 relocation
as you found out yourself, a MOVABS instruction, which generates a R_X86_64_64/ADDR64 relocation
a 32-bit MOV instruction which generates a R_X86_64_32/ADDR32 relocation
In order these would be written as:
lea r9, [rip + myDouble]
movabs r9, offset myDouble
mov r9d, offset myDouble
These, along with mov r9, offset myDouble, are four different instructions with different encodings and subtly different semantics each requiring a different type of relocation.
The LEA instruction encodes myDouble as a 32-bit signed offset relative to RIP. This is the preferable instruction to use here, as it takes only 4 bytes to encode the address and it allows the executable to be loaded anywhere in the 64-bit address space. The only limitation is that executable needs to be less than 2G in size, but this is a fundamental limitation x64 PECOFF executables anyways.
The MOVABS encodes myDouble as a 64-bit absolute address. While in theory this allows myDouble to be located anywhere in the 64-bit address space, even more than 2G away from the instruction, it takes 8 bytes of encoding space and doesn't actually get you anything under Windows.
The 32-bit MOV instruction encodes myDouble as an unsigned 32-bit absolute address. It has the disadvantage of requiring the the executable to be loaded somewhere in the first 4G of address space. Because of this you need to use the /LARGEADDRESSAWARE:NO flag with the Microsoft linker otherwise you'll get an error.
The 64-bit MOV instruction you're using encodes myDouble as a 32-bit signed absolute address. This also limits where the executable can be loaded, and requires a type of relocation that Microsoft's PECOFF format isn't documented as having and isn't supported by Microsoft's linker.