x86 asm printf causes segfault when using intel syntax (gcc)

x86 asm printf causes segfault when using intel syntax (gcc) - gcc

I'm just starting to learn x86 assembly, and I am a bit confused as to why this little example doesn't work. All I want to do is to print the content of the eax register as a decimal value. This is my code in AT&T Syntax:
.data
intout:
.string "%d\n"
.text
.globl main
main:
movl $666, %eax
pushl %eax
pushl $intout
call printf
movl $1, %eax
int $0x80
Which I compile and run as follows:
gcc -m32 -o hello helloworld.S
./hello
This works as excepted (Printing 666 to the console). On a little side note, I would like to point out that I don't understand what exactly "movl $1, %eax" and "int $0x80" are supposed to accomplish here. I'm also a not sure what "pushl $intout" does. Why is my output composed out of two separate stack entries? And what exactly does the .string macro do?
These are only side questions however, since my real problem is that I can't find a way to make this run using the much easier to read/write/comprehend Intel syntax.
Here is the code:
.intel_syntax noprefix
.data
intout:
.string "%d\n"
.text
.globl main
main:
mov eax, 666
push eax
push intout
call printf
mov eax, 1
int 0x80
Running this same as above, it just prints "Segmentation fault".
What am I doing wrong?

You need to use push OFFSET intout otherwise the 32-bit value stored at intout will be pushed on the stack, rather than its address.
intout is just a label, which is basically a name assigned to an address in your program. The .string "%d\n" directive that follows it defines a sequence of bytes in your program, both allocating memory and initializing that memory. Specifically it allocates 4 bytes in the .data section and initializes them with the characters '%', 'd', '\n', and '\0'. Since the label intout is defined just before the .string line it has the address of the first byte in the string.
The line push intout results in a instruction that reads the 4 bytes starting at the address of referred to by intout and pushes them on to the stack (specifically it subtracts 4 from ESP and then copies them to the 4 bytes now pointed to by ESP.) The line push $intout (or push OFFSET intout) pushes the 4 bytes that make up the 32-bit address of intout on the stack.
This means that the line push intout pushes a meaningless value on to the stack. The function printf ends up interpreting it as a pointer, an address where the format string is supposed to be stored, but since it doesn't point to valid location in memory your program crashes.

Related

Porting JonesForth to macOS v10.15 (Catalina)

I'm trying to make JonesForth run on a recent MacBook out of the box, just using Mac tools.
I started to convert everything 64 bits and attend to the Mac assembler syntax.
I got things to assemble, but I immediately run into a curious segmentation fault:
/* NEXT macro. */
.macro NEXT
lodsq
jmpq *(%rax)
.endm
...
/* Assembler entry point. */
.text
.globl start
.balign 16
start:
cld
mov %rsp,var_SZ(%rip) // Save the initial data stack pointer in FORTH variable S0.
mov return_stack_top(%rip),%rbp // Initialise the return stack.
//call set_up_data_segment
mov cold_start(%rip),%rsi // Initialise interpreter.
NEXT // Run interpreter!
.const
cold_start: // High-level code without a codeword.
.quad QUIT
QUIT is defined like this via macro defword:
.macro defword
.const_data
.balign 8
.globl name_$3
name_$3 :
.quad $4 // Link
.byte $2+$1 // Flags + length byte
.ascii $0 // The name
.balign 8 // Padding to next four-byte boundary
.globl $3
$3 :
.quad DOCOL // Codeword - the interpreter
// list of word pointers follow
.endm
// QUIT must not return (ie. must not call EXIT).
defword "QUIT",4,,QUIT,name_TELL
.quad RZ,RSPSTORE // R0 RSP!, clear the return stack
.quad INTERPRET // Interpret the next word
.quad BRANCH,-16 // And loop (indefinitely)
...more code
When I run this, I get a segmentation fault the first time in the NEXT macro:
(lldb) run
There is a running process, kill it and restart?: [Y/n] y
Process 83000 exited with status = 9 (0x00000009)
Process 83042 launched: '/Users/klapauciusisgreat/jonesforth64/jonesforth' (x86_64)
Process 83042 stopped
* thread #1, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000100000698 jonesforth`start + 24
jonesforth`start:
-> 0x100000698 <+24>: jmpq *(%rax)
0x10000069a <+26>: nopw (%rax,%rax)
jonesforth`code_DROP:
0x1000006a0 <+0>: popq %rax
0x1000006a1 <+1>: lodsq (%rsi), %rax
Target 0: (jonesforth) stopped.
rax does point to what I think is the dereferenced address, DOCOL:
(lldb) register read
General Purpose Registers:
rax = 0x0000000100000660 jonesforth`DOCOL
So one mystery is:
Why does RAX point to DOCOL instead of QUIT? My guess is that the instruction was halfway executed and the result of the indirection was stored in rax. What are some good pointers to documentation?
Why the segmentation fault?
I commented out the original segment setup code in the original that called brk to set up a data segment. Another [implementation] also did not call it at all, so I thought I could as well ignore this. Is there any magic on how to set up segment permissions with syscalls in a 64-bit binary on Catalina? The make command is pretty much the standard JonesForth one:
jonesforth: jonesforth.S
gcc -nostdlib -g -static $(BUILD_ID_NONE) -o $# $<
P.S.: Yes, I can get JonesForth to work perfectly in Docker images, but that's besides the point. I really want it to work in 64 bit on Catalina, out of the box.

The original code had something like
mov $cold_start,%rsi
And the Apple assembler complains about not being able to use 32 immediate addressing in 64-bit binaries.
So I tried
mov $cold_start(%rip),%rsi
but that also doesn't work.
So I tried
mov cold_start(%rip),%rsi
which assembles, but of course it dereferences cold start, which is not something I need.
The correct way of doing this is apparently
lea cold_start(%rip),%rsi
This seems to work as intended.

Writing and debugging a min program in asm

I am trying to write a program to find the minimum value of a list of integers in asm. Here is what I have so far:
.section .data
data_items:
.long 2,3,4,5,1,9,10 # set 10 as the sentinal value
.section text
.globl _start
_start:
# %ebx holds min
# %edi holds index (destination index)
# %eax current data item
movl $255, %ebx # set the current min to 255
movl $0, %edi # the index is also zero
start_loop:
movl data_items(,%edi,4), %eax # set %eax equal to the current data item
cmpl $10, %eax # compare %eax with zero to see if we should exit
je exit_loop # if it's the sentinel value, exit
incl %edi # increment the index
cmpl %eax, %edi # compare the current value to the current min
jge start_loop # if it's not less than the current value, go to start
movl %eax, %ebx # move the current value if less that the current min
jmp start_loop # always go back to the start if we've gotten this far
exit_loop:
movl $1, %eax # push the linux system call to %eax (1=exit)
int $0x80 # give linux control (so it will exit)
When I run this, I get the following:
$ as min.s -o min.o && ld min.o -o min && ./min
Segmentation fault (core dumped)
How is one supposed to debug asm? For example, at least in C the compiler tells you what the error might be and the line number, whereas here I know just about nothing. (Note: the error is having .section text instead of .section .text but how would one figure that out?)

It's very possible in C to write a program that compiles with no warnings but crashes (e.g. NULL pointer deref), and you'll see exactly the same thing. It's much more likely in asm, though.
You debug asm with a debugger, GDB for example. See tips at the bottom of https://stackoverflow.com/tags/x86/info. And if you make any system calls, use strace to see what your program is actually doing.
To debug this, you'd run it under GDB and notice that it segfaulted on the first instruction, movl $255, %ebx. It doesn't access memory so code-fetch must have faulted. So there must be something wrong with your sections that resulted in your code in section linked into a non-executable segment of your executable.
objdump -d would also have given you a hint: it disassembles the .text section by default, and this program doesn't have one.
The reason text instead of .text causes this problem is that the defaults for sections with random names that aren't one of the few specially-recognized ones are read+write without exec.
In GAS, use .text or .data, special shortcut directives for .section .text or .data which avoid this problem for those sections. https://sourceware.org/binutils/docs/as/Text.html
But not all "standard" sections have special directives, you do still need .section .rodata to switch to the read-only data section, where you should have put your array. (read, no write. On newer toolchains, also no exec). Instead of switching to the .bss section, though, you can use .comm or .lcomm (https://sourceware.org/binutils/docs/as/bss.html)
Another possible problem is that you're building this 32-bit code as a 64-bit executable (unless you're using a 32-bit-only install where as --32 is the default). Using 32-bit addressing modes works in 64-bit modes, truncating the address to 32 bits. That works when accessing static data in a position-dependent executable on Linux, because all code+data is linked into the low 2GiB of virtual address space.
But any access to (%esp) or -4(%ebp) or whatever would fault because the stack in a 64-bit process is mapped to a high address with non-zero bits outside the low 32.
You'd notice that problem in GDB because layout reg would show all 16 64-bit integer registers, RAX..R15.

Mac OS x86 Assembly: Why does the initialized memory amount change?

I just started learning assembly a week or so ago, and when debugging a program, I came across some strange memory usage. The following code (see end of post) is broken into two files for a reason.
If I compile and run with
gcc main.s
./a.out
with only code block 1 running (code block 2 commented out), then the program prints "8", meaning that right when my program starts, the Mac OS automatically puts 8 bytes worth of stuff on the stack, then leaves my program to do its thing.
However, if I compile and run with
gcc main.s print.s
./a.out
With only code block 2 running (code block 1 commented out), then the program prints "16", meaning that Mac OS is initially putting 16 bytes on the stack instead of 8. When this happens, the offsets applied to rsp to achieve 16-byte alignment remain the same, meaning that the start of the stack is being offset by 8 bytes whenever an outside function is called.
I also tried putting the _printNum function in the same file as main.s, but the discrepancy persisted. Another thing I tried was to add another format string and use it later on in the program to see if something to do with the format string was using memory, but it made no difference.
What I think is going on is that Mac OS is pushing the instruction pointer for the next instruction to execute when my program terminates onto the stack, then pushing the old base stack pointer onto the stack, both 32-bit, for a total of 8 bytes. When I include a function call (either local or external to the main file), it seems like the assembler decides to use 64-bit addresses instead of 32-bit addresses, doubling the memory used, and hence the 16 bytes used.
Why is this happening, and if I am wrong, what is Mac OS doing to the stack? Is any of the extra stack used of value to me? Is the computer doing something else instead of switching from 32-bit to 64-bit addressing? Thanks.
main program (main.s):
.cstring
_format: .asciz "%d\n"
.text
.globl _main
_main:
movq %rbp, %rax # Put stack base pointer in rax
subq %rsp, %rax # Subtract stack pointer to get total memory used
subq $8, %rsp # Get 16-byte alignment
#---------------------------------------------------------
# code block 1 - prints rax manually
#---------------------------------------------------------
movq %rax, %rsi # Value to print needs to be in rsi
lea _format(%rip), %rdi # Address of format string goes in rdi
# Don't know what the "_format(%rip)" does,
# but it works (any info would be handy)
call _printf
#---------------------------------------------------------
# code block 2 - prints rax via function call
#---------------------------------------------------------
call _printNum # Prints the value of rax
#---------------------------------------------------------
# stack cleanup and return
#---------------------------------------------------------
addq $8, %rsp # Account for the previous -8 to rsp
ret # end program
printing function (print.s):
.cstring
_format: .asciz "%d\n"
.text
.globl _printNum
# assumes 16-byte aligned when called
# prints the value of the rax register
_printNum:
push %rbp # save %rbp - previous stack base
movq %rsp, %rbp # update stack base
push %rsi # save %rsi - register
push %rdi # save %rdi - register
# print - already 16 byte aligned (rip and three values for 32 bytes)
movq %rax, %rsi # load the value to print
lea _format(%rip), %rdi # load the format string
call _printf
# restore registers
popq %rdi
popq %rsi
popq %rbp
# return
ret

x64 nasm: pushing memory addresses onto the stack & call function

I'm pretty new to x64-assembly on the Mac, so I'm getting confused porting some 32-bit code in 64-bit.
The program should simply print out a message via the printf function from the C standart library.
I've started with this code:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
push msg
call _printf
mov rsp, rbp
pop rbp
ret
Compiling it with nasm this way:
$ nasm -f macho64 main.s
Returned following error:
main.s:12: error: Mach-O 64-bit format does not support 32-bit absolute addresses
I've tried to fix that problem byte changing the code to this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
mov rax, msg ; shouldn't rax now contain the address of msg?
push rax ; push the address
call _printf
mov rsp, rbp
pop rbp
ret
It compiled fine with the nasm command above but now there is a warning while compiling the object file with gcc to actual program:
$ gcc main.o
ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not
allowed in code signed PIE, but used in _main from main.o. To fix this warning,
don't compile with -mdynamic-no-pic or link with -Wl,-no_pie
Since it's a warning not an error I've executed the a.out file:
$ ./a.out
Segmentation fault: 11
Hope anyone knows what I'm doing wrong.

The 64-bit OS X ABI complies at large to the System V ABI - AMD64 Architecture Processor Supplement. Its code model is very similar to the Small position independent code model (PIC) with the differences explained here. In that code model all local and small data is accessed directly using RIP-relative addressing. As noted in the comments by Z boson, the image base for 64-bit Mach-O executables is beyond the first 4 GiB of the virtual address space, therefore push msg is not only an invalid way to put the address of msg on the stack, but it is also an impossible one since PUSH does not support 64-bit immediate values. The code should rather look similar to:
; this is what you *would* do for later args on the stack
lea rax, [rel msg] ; RIP-relative addressing
push rax
But in that particular case one needs not push the value on the stack at all. The 64-bit calling convention mandates that the fist 6 integer/pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9, exactly in that order. The first 8 floating-point or vector arguments go into XMM0, XMM1, ..., XMM7. Only after all the available registers are used or there are arguments that cannot fit in any of those registers (e.g. a 80-bit long double value) the stack is used. 64-bit immediate pushes are performed using MOV (the QWORD variant) and not PUSH. Simple return values are passed back in the RAX register. The caller must also provide stack space for the callee to save some of the registers.
printf is a special function because it takes variable number of arguments. When calling such functions AL (the low byte of RAX) should be set to the number of floating-point arguments, passed in the vector registers. Also note that RIP-relative addressing is preferred for data that lies within 2 GiB of the code.
Here is how gcc translates printf("This is a test\n"); into assembly on OS X:
xorl %eax, %eax # (1)
leaq L_.str(%rip), %rdi # (2)
callq _printf # (3)
L_.str:
.asciz "This is a test\n"
(this is AT&T style assembly, source is left, destination is right, register names are prefixed with %, data width is encoded as a suffix to the instruction name)
At (1) zero is put into AL (by zeroing the whole RAX which avoids partial-register delays) since no floating-point arguments are being passed. At (2) the address of the string is loaded in RDI. Note how the value is actually an offset from the current value of RIP. Since the assembler doesn't know what this value would be, it puts a relocation request in the object file. The linker then sees the relocation and puts the correct value at link time.
I am not a NASM guru, but I think the following code should do it:
default rel ; make [rel msg] the default for [msg]
section .data
msg: db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp ; re-aligns the stack by 16 before call
mov rbp, rsp
xor eax, eax ; al = 0 FP args in XMM regs
lea rdi, [rel msg]
call _printf
mov rsp, rbp
pop rbp
ret

No answer yet has explained why NASM reports
Mach-O 64-bit format does not support 32-bit absolute addresses
The reason NASM won't do this is explained in Agner Fog's Optimizing Assembly manual in section 3.3 Addressing modes under the subsection titled 32-bit absolute addressing in 64 bit mode he writes
32-bit absolute addresses cannot be used in Mac OS X, where addresses are above 2^32 by
default.
This is not a problem on Linux or Windows. In fact I already showed this works at static-linkage-with-glibc-without-calling-main. That hello world code uses 32-bit absolute addressing with elf64 and runs fine.
#HristoIliev suggested using rip relative addressing but did not explain that 32-bit absolute addressing in Linux would work as well. In fact if you change lea rdi, [rel msg] to lea rdi, [msg] it assembles and runs fine with nasm -efl64 but fails with nasm -macho64
Like this:
section .data
msg db 'This is a test', 10, 0 ; something stupid here
section .text
global _main
extern _printf
_main:
push rbp
mov rbp, rsp
xor al, al
lea rdi, [msg]
call _printf
mov rsp, rbp
pop rbp
ret
You can check that this is an absolute 32-bit address and not rip relative with objdump. However, it's important to point out that the preferred method is still rip relative addressing. Agner in the same manual writes:
There is absolutely no reason to use absolute addresses for simple memory operands. Rip-
relative addresses make instructions shorter, they eliminate the need for relocation at load
time, and they are safe to use in all systems.
So when would use use 32-bit absolute addresses in 64-bit mode? Static arrays is a good candidate. See the following subsection Addressing static arrays in 64 bit mode. The simple case would be e.g:
mov eax, [A+rcx*4]
where A is the absolute 32-bit address of the static array. This works fine with Linux but once again you can't do this with Mac OS X because the image base is larger than 2^32 by default. To to this on Mac OS X see example 3.11c and 3.11d in Agner's manual. In example 3.11c you could do
mov eax, [(imagerel A) + rbx + rcx*4]
Where you use the extern reference from Mach O __mh_execute_header to get the image base. In example 3.11c you use rip relative addressing and load the address like this
lea rbx, [rel A]; rel tells nasm to do [rip + A]
mov eax, [rbx + 4*rcx] ; A[i]

According to the documentation for the x86 64bit instruction set http://download.intel.com/products/processor/manual/325383.pdf
PUSH only accepts 8, 16 and 32bit immediate values (64bit registers and register addressed memory blocks are allowed though).
PUSH msg
Where msg is a 64bit immediate address will not compile as you found out.
What calling convention is _printf defined as in your 64bit library?
Is it expecting the parameter on the stack or using a fast-call convention where the parameters on in registers? Because x86-64 makes more general purpose registers available the fast-call convention is used more often.

gnu assembler: get address of label/variable [INTEL SYNTAX]

I have a code like this:
.bss
woof: .long 0
.text
bleh:
...some op codes here.
now I would like to move the address of woof into eax. What's the intel syntax code here for doing that? The same goes with moving bleh's address into, say, ebx.
Your help is much appreciated!

The bss section can't have any actual objects in it. Some assemblers may still allow you to switch to the .bss section, but all you can do there is say something like: x: . = . + 4.
In most assemblers these days and specifically in gnu for intel, there is no longer a .bss directive, so you temporarily switch to bss and create the bss symbol in one shot with something like: .comm sym,size,alignment. This is why you are presumably getting an error ".bss directive not recognized" or something like that.
And then you can get the address with either:
lea woof, %eax
or
movl $woof, %eax
Update: aha, intel syntax, not intel architecture. OK:
.intel_syntax noprefix
lea esi,fun
lea esi,[fun]
mov eax,OFFSET FLAT:fun
.att_syntax
lea fun, %eax
mov $fun, %eax
.data
fun: .long 0x123
All the lea forms should generate the same code.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio