I am using mingw compiler on my windows 10 laptop. The following c code should display a segmentation fault error. But instead, it just does not show anything (empty output).
#include <stdio.h>
int main(){
main();
return 0;
}
I compile the code using:
gcc filename.c
But when I use some online c compilers, it shows segmentation fault. Please tell me how to get segfault error. It is hard to debug the large code if I don't get any error.
Related: How to get back plain old "Segmentation fault" message in windows?
Let's see the assembly:
-O0: (No optimizations,gcc, just a bit stripped):
main:
pushq %rbp #Push address to stack
movq %rsp, %rbp
movl $0, %eax
call main
movl $0, %eax
popq %rbp
ret
You just call a function recursively, always pushing the return address to the stack. But after some implementation/platform defined number of calls there is another memory. You just overwrite it or you access invalid memory. (Segmentation fault)
You can see it like this:
High=======Lower addresses
[stack].....[someOtherMemory]
...recursive call...
[stack ]...[someOtherMemory]
...recursive call...
[stack ].[someOtherMemory]
...recursive call...
[stack ]someOtherMemory]
Kaboom: You accessed the other memory, that was maybe not writable/readable => Crash
Now, you can compile with higher optimizations: -O3, example output, stripped again:
main:
jmp main
It is just an infinite loop. Theoretically both would have the same result. The compiler is nearly allowed to do anything, as long as the observable output stays the same (See: https://stackoverflow.com/a/46455917/13912132).
But in the first, it is translated "literally", leading to hitting other parts. But if no other parts would be hit, it would run all the time.
In the second, the compiler may see, that this would run infinite theoretically, so it got optimized.
Related
I'm trying to make JonesForth run on a recent MacBook out of the box, just using Mac tools.
I started to convert everything 64 bits and attend to the Mac assembler syntax.
I got things to assemble, but I immediately run into a curious segmentation fault:
/* NEXT macro. */
.macro NEXT
lodsq
jmpq *(%rax)
.endm
...
/* Assembler entry point. */
.text
.globl start
.balign 16
start:
cld
mov %rsp,var_SZ(%rip) // Save the initial data stack pointer in FORTH variable S0.
mov return_stack_top(%rip),%rbp // Initialise the return stack.
//call set_up_data_segment
mov cold_start(%rip),%rsi // Initialise interpreter.
NEXT // Run interpreter!
.const
cold_start: // High-level code without a codeword.
.quad QUIT
QUIT is defined like this via macro defword:
.macro defword
.const_data
.balign 8
.globl name_$3
name_$3 :
.quad $4 // Link
.byte $2+$1 // Flags + length byte
.ascii $0 // The name
.balign 8 // Padding to next four-byte boundary
.globl $3
$3 :
.quad DOCOL // Codeword - the interpreter
// list of word pointers follow
.endm
// QUIT must not return (ie. must not call EXIT).
defword "QUIT",4,,QUIT,name_TELL
.quad RZ,RSPSTORE // R0 RSP!, clear the return stack
.quad INTERPRET // Interpret the next word
.quad BRANCH,-16 // And loop (indefinitely)
...more code
When I run this, I get a segmentation fault the first time in the NEXT macro:
(lldb) run
There is a running process, kill it and restart?: [Y/n] y
Process 83000 exited with status = 9 (0x00000009)
Process 83042 launched: '/Users/klapauciusisgreat/jonesforth64/jonesforth' (x86_64)
Process 83042 stopped
* thread #1, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000100000698 jonesforth`start + 24
jonesforth`start:
-> 0x100000698 <+24>: jmpq *(%rax)
0x10000069a <+26>: nopw (%rax,%rax)
jonesforth`code_DROP:
0x1000006a0 <+0>: popq %rax
0x1000006a1 <+1>: lodsq (%rsi), %rax
Target 0: (jonesforth) stopped.
rax does point to what I think is the dereferenced address, DOCOL:
(lldb) register read
General Purpose Registers:
rax = 0x0000000100000660 jonesforth`DOCOL
So one mystery is:
Why does RAX point to DOCOL instead of QUIT? My guess is that the instruction was halfway executed and the result of the indirection was stored in rax. What are some good pointers to documentation?
Why the segmentation fault?
I commented out the original segment setup code in the original that called brk to set up a data segment. Another [implementation] also did not call it at all, so I thought I could as well ignore this. Is there any magic on how to set up segment permissions with syscalls in a 64-bit binary on Catalina? The make command is pretty much the standard JonesForth one:
jonesforth: jonesforth.S
gcc -nostdlib -g -static $(BUILD_ID_NONE) -o $# $<
P.S.: Yes, I can get JonesForth to work perfectly in Docker images, but that's besides the point. I really want it to work in 64 bit on Catalina, out of the box.
The original code had something like
mov $cold_start,%rsi
And the Apple assembler complains about not being able to use 32 immediate addressing in 64-bit binaries.
So I tried
mov $cold_start(%rip),%rsi
but that also doesn't work.
So I tried
mov cold_start(%rip),%rsi
which assembles, but of course it dereferences cold start, which is not something I need.
The correct way of doing this is apparently
lea cold_start(%rip),%rsi
This seems to work as intended.
I am trying to learn x86_64 assembly, and am using GCC as my assembler. The exact command I'm using is:
gcc -nostdlib tapydn.S -D__ASSEMBLY__
I'm mainly using gcc for its preprocessor. Here is tapydn.S:
.global _start
#include <asm-generic/unistd.h>
syscall=0x80
.text
_start:
movl $__NR_exit, %eax
movl $0x00, %ebx
int $syscall
This results in a segmentation fault. I believe the problem is with the following line:
movl $__NR_exit, %eax
I used __NR_exit because it was more descriptive than some magic number. However, it appears that my usage of it is incorrect. I believe this to be the case because when I change the line in question to the following, it runs fine:
movl $0x01, %eax
Further backing up this trail of thought is the contents of usr/include/asm-generic/unistd.h:
#define __NR_exit 93
__SYSCALL(__NR_exit, sys_exit)
I expected the value of __NR_exit to be 1, not 93! Clearly I am misunderstanding its purpose and consequently its usage. For all I know, I'm getting lucky with the $0x01 case working (much like undefined behaviour in C++), so I kept digging...
Next, I looked for the definition of sys_exit. I couldn't find it. I tried using it anyway as follows (with and without the preceeding $):
movl $sys_exit, %eax
This wouldn't link:
/tmp/cc7tEUtC.o: In function `_start':
(.text+0x1): undefined reference to `sys_exit'
collect2: error: ld returned 1 exit status
My guess is that it's a symbol in one of the system libraries and I'm not linking it due to my passing -nostdlib to GCC. I'd like to avoid linking such a large library for just one symbol if possible.
In response to Jester's comment about mixing 32 and 64 bit constants, I tried using the value 0x3C as suggested:
movq $0x3C, %eax
movq $0x00, %ebx
This also resulting a segmentation fault. I also tried swapping out eax and ebx for rax and rbx:
movq $0x3C, %rax
movq $0x00, %rbx
The segmentation fault remained.
Jester then commented stating that I should be using syscall rather than int $0x80:
.global _start
#include <asm-generic/unistd.h>
.text
_start:
movq $0x3C, %rax
movq $0x00, %rbx
syscall
This works, but I was later informed that I should be using rdi instead of rbx as per the System V AMD64 ABI:
movq $0x00, %rdi
This also works fine, but still ends up using the magic number 0x3C for the system call number.
Wrapping up, my questions are as follows:
What is the correct usage of __NR_exit?
What should I be using instead of a magic number for the exit system call?
The correct header file to get the system call numbers is sys/syscall.h. The constants are called SYS_### where ### is the name of the system call you are interested in. The __NR_### macros are implementation details and should not be used. As a rule of thumb, if an identifier begins with an underscore it should not be used, if it begins with two it should definitely not be used. The arguments go into rdi, rsi, rdx, r10, r8, and r9. Here is a sample program for Linux:
#include <sys/syscall.h>
.globl _start
_start:
mov $SYS_exit,%eax
xor %edi,%edi
syscall
These conventions are mostly portable to other UNIX-like operating systems.
While trying to make a very tiny program with NASM and GCC on my Ubuntu machine, I noticed something weird.
The following code compiles fine under 64-bit NASM and GCC:
global main
extern puts
section .text
main:
push rax
mov rdi, message
call puts
jmp exit
exit:
;return stack memory
pop rax
ret
message:
db "Hello from NASM!", 0
But when trying to compile the same code (only with registers changed) under 32-bit NASM and GCC, it will either result segmentation fault and/or random characters. Why is this happening? Does the x64 architecture have different way in storing memory to the stack than i386? If so, how could this behaviour be prevented?
When in 32-bit mode, most calling conventions (cdecl, stdcall, etc...) expect arguments to be pushed on the stack, not in registers, unlike in 64-bit mode, and also, you would need to adjust the stack pointer after calling puts, so you would need to do something like:
lea edx, #message
push edx
call puts
add esp, 4
For the program to produce the same output in 32-bit mode. I may not have the NASM syntax right as I usually write assembly code in MASM and GAS.
The book Assembly Language Step by Step provides the following code as a sandbox:
section .data
section .text
global _start
_start:
nop
//insert sandbox code here
nop
Any example that I include in the space for sandbox is creating a segmentation fault. For example, adding this code:
mov ax, 067FEh
mov bx, ax
mov cl, bh
mov ch, bl
Then compiling with:
nasm -f macho sandbox.asm
ld -o sandbox -e _start sandbox.o
creates a seg fault when I run it on my OS/X. Is there a way to get more information about what's causing the segmentation fault?
The problem you have is that you have created a program that runs past the end of the code that you have written.
When your program executes, the loader will end up issuing a jmp to your _start. Your code then runs, but you do not have anything to return to the OS at the end, so it will simply continue running, executing whatever bytes happen to be in RAM after your code.
The simplest fix would be to properly exit the code. For example:
mov eax, 0x1 ; system call number for exit
sub esp, 4 ; OS X system calls needs "extra space" on stack
int 0x80
Since you are not generating any actual output, you would need to step through with a debugger to see what's going on. After compiling you could use lldb to step through.
lldb ./sandbox
image dump sections
Make note of the address listed that is of type code for your executable (not dyld). It will likely be 0x0000000000001fe6. Continuing within lldb:
b s -a 0x0000000000001fe6
run
register read
step
register read
step
register read
At this point you should be past the NOPs and see things changing in registers. Have fun!
I run a simple piece x86-64 assembly code of hello world.
.global main
.text
main:
mov $message, %rdi
sub $8, %rsp
call puts
add $8, %rsp
ret
message:
.asciz "Hello, World"
I use gcc_4.8.2 under cygwin to compile this program under my 64-bit windows os.
gcc -o helloworld helloworld.s
but the compiler always give me the error:
/tmp/ccylxw5q.o:fake:(.text+0x3): relocation truncated to fit: R_X86_64_32S against `.text'
how to solve this problem?
Under gas (which is invoked by gcc) you need to use the movabsq mnemonic if you want to load a 64 bit immediate. Otherwise the assembler will use the 32 bit sign-extending mov, which is indicated by the relocation type of 32S as well. The final address of the message might not be represented as a 32 bit sign-extended value under certain memory layouts which is probably causing the truncation warning.