Debugging Backtrace gives weird line numbers - debugging

NOTE: While I mainly focus on using GDB as a debugger in this question, I've noticed the same behavior in other debugging tools (Forge DDT does the same thing as well)
I'm trying to troubleshoot some really old Fortran code with GDB. It isn't actually crashing, just printing a non-descriptive "Error Encountered" message on stdout.
I've isolated the subroutine that prints the non-helpful error message, and am trying to use GDB backtrace to figure out what's actually happening when it stops. Except, the backtrace keeps point back to a nonsensical section of code when it fires. It may look something like this in my progmain.F file:
...
246 inum=number
247 number=number-1
248 call func
249 call error(999,0,0,0,zero,'func')
...
then I have another source file helper.F with:
722 subroutine error(num,i1,i2,i3,r,c)
This will correspond to a gdb output of:
(gdb) break error
(gdb) run
Breakpoint 1, error (num=999, i1=0, i2=0, i3=0, r=0, c=..., .tmp.C.len_V$757=6)
at helper.F:722
722 subroutine error (num,i1,i2,i3,r,c)
Missing separate debuginfos, use: zypper install libgcc_s1-debuginfo-10.3.0+git1587-1.6.4.x86_64
(gdb) bt
#0 error (num=999, i1=0, i2=0, i3=0, r=0, c=..., .tmp.C.len_V$757=6)
at helper.F:722
#1 0x00000000004036a2 in progmain () at progmain.F:246
...
Note that line 246 in progmain.F is not a call to error, it's an assignment operator 3 lines above what I presume is the call to error that triggered the breakpoint.
The issue is, sometimes it isn't obvious which call to error is triggering the breakpoint. In particular, one breakpoint trigger has a backtrace that leads to the middle of a bunch of commented lines in progmain.F which are in between calls to error--and the code itself is RIFE with go to jumps so I can't actually tell what call to error is actually triggering the breakpoint.
Why are the line numbers inaccurate? How can I make bt actually point back to the line where error was called? I'm compiling with -O0 and -g so it shouldn't be an optimizer issue...
UPDATE: Following a comment in response to this thread, I tried doing a frame 1 command followed by disas /m in gdb. I then paged through the assembly code until I reached the section of code in question The disas output looks like:
245 x=0
0x00000000004036d7 <+359>: mov $0x0,%eax
0x00000000004036dc <+364>: call 0x5d0dbf <func>
246 inum=number
0x00000000004036e1 <+369>: add $0xfffffffffffffff0,%rsp
0x00000000004036e5 <+373>: mov $0x75a988,%eax
0x00000000004036ea <+378>: mov $0x75a984,%edx
0x00000000004036ef <+383>: mov $0x75a984,%ecx
0x00000000004036f4 <+388>: mov $0x75a984,%ebx
0x00000000004036f9 <+393>: mov $0x11096c0,%esi
0x00000000004036fe <+398>: mov $0x75a3e0,%edi
0x0000000000403703 <+403>: movq $0x6,(%rsp)
0x000000000040370b <+411>: mov %rdi,-0x5a8(%rbp)
0x0000000000403712 <+418>: mov %rax,%rdi
0x0000000000403715 <+421>: mov %rsi,-0x5a0(%rbp)
0x000000000040371c <+428>: mov %rdx,%rsi
0x000000000040371f <+431>: mov %rcx,%rdx
0x0000000000403722 <+434>: mov %rbx,%rcx
0x0000000000403725 <+437>: mov -0x5a0(%rbp),%rax
0x000000000040372c <+444>: mov %rax,%r8
0x000000000040372f <+447>: mov -0x5a8(%rbp),%rax
0x0000000000403736 <+454>: mov %rax,%r9
0x0000000000403739 <+457>: mov $0x0,%eax
0x000000000040373e <+462>: call 0x644c45 <error>
=> 0x0000000000403743 <+467>: add $0x10,%rsp
247 number=number-1
248 call func
249 call error(999,0,0,0,zero,'func')
So if I'm reading this right, the assembly isn't happening in the same order as the original code. The call to func is happening first, then number is being loaded and decremented, then error is being called, and I guess this re-arrangement is breaking the debugging symbols? Why is this happening when I'm compiling without optimizations?

Related

Trouble debugging assembly code for greater of two numbers

I wrote the following code to check if the 1st number- 'x' is greater than the 2nd number- 'y'. For x>y output should be 1 and for x<=y output should be 0.
section .txt
global _start
global checkGreater
_start:
mov rdi,x
mov rsi,y
call checkGreater
mov rax,60
mov rdi,0
syscall
checkGreater:
mov r8,rdi
mov r9,rsi
cmp r8,r9
jg skip
mov [c],byte '0'
skip:
mov rax,1
mov rdi,1
mov rsi,c
mov rdx,1
syscall
ret
section .data
x db 7
y db 5
c db '1',0
But due to some reasons(of course from my end), the code always gives 0 as the output when executed.
I am using the following commands to run the code on Ubuntu 20.04.1 LTS with nasm 2.14.02-1
nasm -f elf64 fileName.asm
ld -s -o fileName fileName.o
./fileName
Where did I make a mistake?
And how should one debug assembly codes, I looked for printing received arguments in checkGreater, but it turns out that's a disturbing headache itself.
Note: If someone wondering why I didn't directly use x and y in checkGreater, I want to extend the comparison to user inputs, and so wrote code in that way only.
The instructions
mov rdi,x
mov rsi,y
write the address of x into rdi, and of y into rsi. The further code then goes on to compare the addresses, which are always x<y, since x is defined above y.
What you should have written instead is
mov rdi,[x]
mov rsi,[y]
But then you have another problem: x and y variables are 1 byte long, while the destination registers are 8 bytes long. So simply doing the above fix will read extraneous bytes, leading to useless results. The final correction is to either fix the size of the variables (writing dq instead of db), or read them as bytes:
movzx rdi,byte [x]
movzx rsi,byte [y]
As for
And how should one debug assembly codes
The main tool for you is an assembly-level debugger, like EDB on Linux or x64dbg on Windows. But in fact, most debuggers, even the ones intended for languages like C++, are capable of displaying disassembly for the program being debugged. So you can use e.g. GDB, or even a GUI wrapper for it like Qt Creator or Eclipse. Just be sure to switch to machine code mode, or use the appropriate commands like GDB's disassemble, stepi, info registers etc..
Note that you don't have to build EDB or GDB from source (as the links above might suggest): they are likely already packaged in the Linux distribution you use. E.g. on Ubuntu the packages are called edb-debugger and gdb.

Porting JonesForth to macOS v10.15 (Catalina)

I'm trying to make JonesForth run on a recent MacBook out of the box, just using Mac tools.
I started to convert everything 64 bits and attend to the Mac assembler syntax.
I got things to assemble, but I immediately run into a curious segmentation fault:
/* NEXT macro. */
.macro NEXT
lodsq
jmpq *(%rax)
.endm
...
/* Assembler entry point. */
.text
.globl start
.balign 16
start:
cld
mov %rsp,var_SZ(%rip) // Save the initial data stack pointer in FORTH variable S0.
mov return_stack_top(%rip),%rbp // Initialise the return stack.
//call set_up_data_segment
mov cold_start(%rip),%rsi // Initialise interpreter.
NEXT // Run interpreter!
.const
cold_start: // High-level code without a codeword.
.quad QUIT
QUIT is defined like this via macro defword:
.macro defword
.const_data
.balign 8
.globl name_$3
name_$3 :
.quad $4 // Link
.byte $2+$1 // Flags + length byte
.ascii $0 // The name
.balign 8 // Padding to next four-byte boundary
.globl $3
$3 :
.quad DOCOL // Codeword - the interpreter
// list of word pointers follow
.endm
// QUIT must not return (ie. must not call EXIT).
defword "QUIT",4,,QUIT,name_TELL
.quad RZ,RSPSTORE // R0 RSP!, clear the return stack
.quad INTERPRET // Interpret the next word
.quad BRANCH,-16 // And loop (indefinitely)
...more code
When I run this, I get a segmentation fault the first time in the NEXT macro:
(lldb) run
There is a running process, kill it and restart?: [Y/n] y
Process 83000 exited with status = 9 (0x00000009)
Process 83042 launched: '/Users/klapauciusisgreat/jonesforth64/jonesforth' (x86_64)
Process 83042 stopped
* thread #1, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000100000698 jonesforth`start + 24
jonesforth`start:
-> 0x100000698 <+24>: jmpq *(%rax)
0x10000069a <+26>: nopw (%rax,%rax)
jonesforth`code_DROP:
0x1000006a0 <+0>: popq %rax
0x1000006a1 <+1>: lodsq (%rsi), %rax
Target 0: (jonesforth) stopped.
rax does point to what I think is the dereferenced address, DOCOL:
(lldb) register read
General Purpose Registers:
rax = 0x0000000100000660 jonesforth`DOCOL
So one mystery is:
Why does RAX point to DOCOL instead of QUIT? My guess is that the instruction was halfway executed and the result of the indirection was stored in rax. What are some good pointers to documentation?
Why the segmentation fault?
I commented out the original segment setup code in the original that called brk to set up a data segment. Another [implementation] also did not call it at all, so I thought I could as well ignore this. Is there any magic on how to set up segment permissions with syscalls in a 64-bit binary on Catalina? The make command is pretty much the standard JonesForth one:
jonesforth: jonesforth.S
gcc -nostdlib -g -static $(BUILD_ID_NONE) -o $# $<
P.S.: Yes, I can get JonesForth to work perfectly in Docker images, but that's besides the point. I really want it to work in 64 bit on Catalina, out of the box.
The original code had something like
mov $cold_start,%rsi
And the Apple assembler complains about not being able to use 32 immediate addressing in 64-bit binaries.
So I tried
mov $cold_start(%rip),%rsi
but that also doesn't work.
So I tried
mov cold_start(%rip),%rsi
which assembles, but of course it dereferences cold start, which is not something I need.
The correct way of doing this is apparently
lea cold_start(%rip),%rsi
This seems to work as intended.

GDB Debugger: An internal issue to GDB has been detected

I'm new to GNU Debugger. I've been playing around with it, debugging Assembly Files (x86_64 Linux) for a day or so and just a few hours ago I ''discovered'' the TUI interface.
My first attempt using the TUI interface was to see the register changes as I execute each line at a time of a simple Hello World program (in asm). Here is the code of the program
section .data
text db "Hello, World!", 10
len equ $-text
section .text
global _start
_start:
nop
call _printText
mov rax, 60
mov rdi, 0
syscall
_printText:
nop
mov rax, 1
mov rdi, 1
mov rsi, text
mov rdx, len
syscall
ret
After creating the executable file in the terminal of linux I write
$ gdb -q ./hello -tui
Then I created three breakpoints: one right of the _start, another right after _printText and the last just above the mov rax, 60 for the SYS_EXIT.
After this:
1) I run the program.
2) On gdb mode I write layout asm to see the written code.
3) I write layout regs.
4) Finally I use stepi to see how the register change according the the written hello world program.
The thing is that when the RIP register points to the address of ret, corresponding to SYS_EXIT and I hit Enter I get the following message in console
[Inferior 1 (process 2059) exited normally]
/build/gdb-cXfXJ3/gdb-7.11.1/gdb/thread.c:1100: internal-error: finish_thread_st
ate: Assertion `tp' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
If I type n It appears this (as it says, it quits if I type y):
This is a bug, please report it. For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.
/build/gdb-cXfXJ3/gdb-7.11.1/gdb/thread.c:1100: internal-error: finish_thread_st
ate: Assertion `tp' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n)
As I don't know what a core file of GDB (and what is useful for), so I type n and the debugging session closes.
Does anyone know why this is happening and how can be fixed?
By the way, I'm new in Assembly also, so if this occurs because of something wrong in the program I'd also appreciate if anyone can point that out.
I use the same GDB version as you and I always use the TUI features; but I've never had this problem. However, when I use your code the internal GDB error occurs. But if I make one change in your write syscall function, the error does not manifest.
Although you are not calling another function from within a function, I generally create a stack frame by including at least the "push rbp", "mov rbp, rsp", and "leave" instructions in my x86-64 function calls. This may be a band-aide or a work around with respect to the "bug".
_printText:
push rbp
mov rbp, rsp
mov rax, 1
mov rdi, 1
mov rsi, text
mov rdx, len
syscall
leave
ret
Does anyone know why this is happening
It's happening because there is a bug in GDB (more precisely, an assertion that GDB internal variable tp is not NULL has been violated).
and how can be fixed?
You should try to reproduce this with current version of GDB (the bug may have already been fixed), and file a bug report (like the message tells you).
I don't know what a core file of GDB (and what is useful for),
It's only useful to GDB developers.

Why does this assembly code throw a seg fault?

The book Assembly Language Step by Step provides the following code as a sandbox:
section .data
section .text
global _start
_start:
nop
//insert sandbox code here
nop
Any example that I include in the space for sandbox is creating a segmentation fault. For example, adding this code:
mov ax, 067FEh
mov bx, ax
mov cl, bh
mov ch, bl
Then compiling with:
nasm -f macho sandbox.asm
ld -o sandbox -e _start sandbox.o
creates a seg fault when I run it on my OS/X. Is there a way to get more information about what's causing the segmentation fault?
The problem you have is that you have created a program that runs past the end of the code that you have written.
When your program executes, the loader will end up issuing a jmp to your _start. Your code then runs, but you do not have anything to return to the OS at the end, so it will simply continue running, executing whatever bytes happen to be in RAM after your code.
The simplest fix would be to properly exit the code. For example:
mov eax, 0x1 ; system call number for exit
sub esp, 4 ; OS X system calls needs "extra space" on stack
int 0x80
Since you are not generating any actual output, you would need to step through with a debugger to see what's going on. After compiling you could use lldb to step through.
lldb ./sandbox
image dump sections
Make note of the address listed that is of type code for your executable (not dyld). It will likely be 0x0000000000001fe6. Continuing within lldb:
b s -a 0x0000000000001fe6
run
register read
step
register read
step
register read
At this point you should be past the NOPs and see things changing in registers. Have fun!

Segmentation fault in assembly program

I am trying to spawn a shell using the following code:
Section .Text
global _start
_start:
jmp short TrickCall
_ReturnHere:
pop esi
xor eax,eax
mov byte [esi+7],al
lea ebx,[esi]
mov long [esi+8],ebx
mov long [esi+12],eax
mov byte al,0x0b
mov ebx,esi
lea ecx,[esi+8]
lea edx,[esi+12]
int 0x80
TrickCall:
call _ReturnHere
db "/bin/shJAAAANNNN"
I am using gcc version 4.4.3 as my compiler. When I run it using gdb it gives the following output:
(gdb) run
Starting program: /root/spawn_shell
Program received signal SIGSEGV, Segmentation fault.
0x08048059 in _ReturnHere ()
It cannot access the memory address of _ReturnHere. Any way to get around this?
Your problem is DEP, when you pop the return address off the stack and try to write to it, its not marked as writable, only readable & executable. You either need to disable DEP (bad, its meant to protect against exploits that do something like this) or put the text just after call _ReturnHere into a RW(X) memory.

Resources