Debugging a simple ARM 64-bit executable causes internal error in GDB - gcc

I wrote a simple C program to print hello world. Then I ran it through
aarch64-linux-gnu-gcc -ohello hello.c -static -g3
gdb-multiarch hello
After this, I run and gdb encounters an internal error:
Reading symbols from hello...done.
(gdb) r
Starting program: /home/gt/hello
/build/gdb-GT4MLW/gdb-8.1/gdb/i387-tdep.c:592: internal-error: void i387_supply_fxsave(regcache*, int, const void*): Assertion `tdep->st0_regnum >= I386_ST0_REGNUM' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
Here's the output of file hello:
hello: ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.7.0, BuildID[sha1]=a...b, with debug_info, not stripped
This is my hello.c:
#include<stdio.h>
int main(){
printf("hello world");
return 0;
}
What am I doing wrong? What else do I need to do? I am running Ubuntu 18.04 on an x86_64 machine.
When I use gdb hello, I am unable to use breakpoints, I get this error:
Reading symbols from hello...done.
(gdb) break 4
Breakpoint 1 at 0x400404: file hello.c, line 4.
(gdb) r
Starting program: /home/gt/hello
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x400404
(gdb)
I am following the guide given on this page under the first section.

In order to run and debug your AArch64 executable, you (in general) need to run it on an AArch64 machine, or in an AArch64 emulator.
You might have some setup where qemu more or less transparently emulates aarch64 binaries for you - but that doesn't work quite as transparently for the debugger. In general you can run the debugger on one machine, connected over a network to a debugging server on a different machine, allowing you to debug a process running on the machine with the debugging server.
The guide you linked shows how to set up qemu to allow it to transparently emulate binaries as you execute them. That guide only shows executing, not debugging, but it has got a link "Debugging using GDB" that points to https://ubuntuforums.org/showthread.php?t=2010979&s=096fb05dbd59acbfc8542b71f4b590db&p=12061325#post12061325, where it is explained how to debug a process which executes within qemu emulation. This essentially amounts to the same remote debugging with a debugging server as I mentioned above.
The essential bits of this post is this:
# In a terminal
$ qemu-arm-static -g 10101 ./hello
# In a new terminal
$ gdb-multiarch
(gdb) target remote :10101
Remote debugging using :10101
[New Remote target]
[Switching to Remote target]

Related

gdb-multiarch (MINGW64) cannot determine architecture from executable?

I have seen in Specifying an architecture in gdb-multiarch :
If I compile a C program with any arm compiler (e.g. arm-none-eabi-gcc) and afterwards call gdb-multiarch with the binary as second param(e)ter, it will correctly determine the machine type and I can debug my remote application.
I am using mingw-w64-x86_64-gdb-multiarch 12.1-1 on MINGW64 (MSYS2) on Windows 10; and unfortunately, it seems it cannot determine the architecture. I have built an executable for Pico/RP2040 with gcc on this same system, and the system sees it as:
$ file myexecutable.elf
myexecutable.elf: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, with debug_info, not stripped
However, if I try to run this in gdb-multiarch, I get:
$ gdb-multiarch myexecutable.elf
GNU gdb (GDB) 12.1
...
warning: A handler for the OS ABI "Windows" is not built into this configuration
of GDB. Attempting to continue with the default armv6s-m settings.
Reading symbols from myexecutable.elf...
(gdb)
Well, as the warning says, it seems that gdb-multiarch sees this .elf as 'OS ABI "Windows"' - similar to what was noted in Specifying an architecture in gdb-multiarch :
If however I call gdb-multiarch on its own, it will assume my machine type (x86_64) and tries to debug the remote target with the wrong architecture..
... except, here I'm calling gdb-multiarch with the binary as second parameter - and I still have that problem!
The linked question already explains that set architecture arch from within GDB should work; however, I'm not exactly sure what to enter: file says this executable is "ARM, EABI5", and I've tried to derive an architecture label from that, which doesn't quite work:
(gdb) set architecture armeabi5
Undefined item: "armeabi5".
... and I'm not that knowledgeable with ARM, so that I'd know exactly what to enter here manually.
Therefore, I would prefer if gdb-multiarch found the architecture on its own automatically.
How can I persuade gdb-multiarch to detect the (ARM) architecture of the compiled file correctly?
You may be trying to use a multiarch GDB suitable for use on Intel architectures only.
You can check by starting GDB and entering the following command:
set architecture. If this is the case, the list of supported architectures will look like this:
(gdb) set architecture
Requires an argument. Valid arguments are i386, i386:x86-64, i386:x64-32, i8086, i386:intel, i386:x86-64:intel, i386:x64-32:intel, auto.
(gdb)
For debugging a cortex-m0 target, I would suggest using the GDB provided with the Arm arm-none-eabi toolchain, arm-none-eabi-gdb:
(gdb) set architecture
Requires an argument. Valid arguments are arm, armv2, armv2a, armv3, armv3m, armv4, armv4t, armv5, armv5t, armv5te, xscale, ep9312, iwmmxt, iwmmxt2, armv5tej, armv6, armv6kz, armv6t2, armv6k, armv7, armv6-m, armv6s-m, armv7e-m, armv8-a, armv8-r, armv8-m.base, armv8-m.main, armv8.1-m.main, arm_any, auto.
(gdb)
(gdb) set architecture armv6-m
The target architecture is set to "armv6-m".
(gdb)
Architecture for Cortex-M0/Cortex-M0+ is armv6-m, but I was always able to debug Cortex-M0 programs with GDB without having to use set architecture.

Why does gfortran give segfault error when trying to replace file?

program test
implicit none
open(100, file="a.dat", status="replace")
end program test
When I try to run this program, it runs fine the first time when the file has not been created. But an attempt to run the program again (when the file a.dat exists and should be replaced) gives segfault error. Here is what I found using gdb.
Program received signal SIGSEGV, Segmentation fault.
0x00007ff95deb1150 in strcmp () from C:\Windows\system32\msvcrt.dll
I am using GNU Fortran (x86_64-win32-seh-rev0, Built by MinGW-W64 project) 5.1.0 with target as Target: x86_64-w64-mingw32. I didn't use any extra compiler flags except -g.
Also since gdb mentions Windows system files, here is which OS I use - Windows 8.1

seg fault when running arm-elf-gcc compiled code

Using MacPorts i have just installed arm-elf-gcc on to my MacBook Pro. This worked flawlessly and all seems to run fine.
However, after compiling a simple hello world test program in C and C++ and trying to run either on the target board (an ARM9 based board running Debian Linux) they immediately seg fault.
I'm a bit stuck as how to go about debugging this, as the target board has limited tools available and no gdb. I have successfully built and run other code using a Linux hosted cross compiler so it should work.
Any ideas?
Following the suggestion I have built and run gdbserver, I get the following in gdb on the host:
Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
I thought it may be a problem with the standard c libs so I removed any calls and have just an empty main that return 0, it is compiled with -Wall -g hello-arm.cpp -static. As a test I compiled the same source with a Linux hosted cross compiler and it runs and exits fine. The only difference I can see is the that Linux compiled version is over twice the size and the difference in output from the file command:
arm-elf-gcc: ELF 32-bit LSB executable, ARM, version 1, statically linked, not stripped
arm-*-linux: ELF 32-bit LSB executable, ARM, version 1, statically linked, for GNU/Linux 2.4.18, not stripped
The usual method of debugging in this situation is to run gdbserver on the target board, and connect to it (via ethernet) with gdb running on a host computer.
Alternately, you could try comparing the assembly in a Mac-compiled "Hello World" program and a (working) Linux-compiled one to see what's different.
After digging around for a couple of days I am starting to understand a bit more about embedded compilers. I wasn't really sure of the difference between arm-elf-gcc installed via MacPorts and the arm-unknown-linux toolchain I had installed on my Linux box. I just came across a pdf titled "An introduction to the GNU compiler" which contains the following paragraph:
Important: Using the GNU Compiler to
create your executable is not quite
the same as using the GNU Linker,
arm-elf-ld, yourself. The reason is
that the GNU Compiler automatically
links a number of standard system
libraries into your executable. These
libraries allow your program to
interact with an operating system, to
use the standard C library functions,
to use certain language features and
operations (such as division), and so
on. If you wish to see exactly which
libraries are being linked into the
executable, you should pass the
verbose flag
-v to the compiler.
This has important implications for
embedded systems! Such systems do not
usually have an operating system.
This means that linking in the system
libraries is almost always
meaningless: if there is no operating
system, for example, then calling the
standard printf function does not make
much sense.
So when I get back to my dev machine later I will determine the libraries linked in with the Linux build and add them to the arm-elf-gcc build.
I'll update this when I have more information but I just want to document my findings in case any one else has these problems.

CUDA: Debug with -deviceemu and gdb

I wrote a CUDA application that has some hardcoded parameters in it (via #defines). Everything seemed to work right, so I tried some other parameters. Now, the program doesn't work correctly anymore.
So, I want to debug it. I compile the application with -deviceemu -g -O0 options, because I read that I can then use gdb to debug it. In gdb, I set a breakpoint at the kernel start using break kernelstart.
However, gdb, jumps at the start of my CUDA kernel, but I can not step through it, because it doesn't let me inspect things within the kernel. I think it's best if I give the output of gdb:
Breakpoint 1, kernelstart (__cuda_0=0x100000, __cuda_1=0x101000, __cuda_2=0x102000, __cuda_3=0x102100) at cudatest.cu:287
(gdb) s
__device_stub__Z12kernelstartPjS_S_S_ (__par0=0x100000, __par1=0x101000, __par2=0x102000, __par3=0x102100) at /tmp/tmpxft_000003c4_00000000-1_cudatest.cudafe1.stub.c:7
7 /tmp/tmpxft_000003c4_00000000-1_cudatest.cudafe1.stub.c: No such file or directory.
in /tmp/tmpxft_000003c4_00000000-1_cudatest.cudafe1.stub.c
(gdb) s
cudaLaunch<char> (entry=0x804a98d "U\211\345\203\354\030\213E\024\211D$\f\213E\020\211D$\b\213E\f\211D$\004\213E\b\211\004$\350\r\377\377\377\311\303U\211\345\203\354\070\307\004$\340 \005\b\350\345\341\377\377\243P!\005\b\307\004$x\234\004\b\350\b\001") at /usr/local/cuda/bin/../include/cuda_runtime.h:773
(gdb) s
(gdb) s
cudatest (__cuda_0=0x100000, __cuda_1=0x101000, __cuda_2=0x102000, __cuda_3=0x102100) at cudatest.cu:354
(gdb) s
After, this, it jumps back to my main procedure.
I know that my specifications are more than vague, but can anybody guess where the problem is? Is it possible to inspect kernels using gdb?
Use cuda-gdb
Compile: nvcc -g -G filename.cu
Invoke cuda-gdb on your a.out
You can set breakpoint inside your kernel function as usual.
Run the program, and it should stop inside your kernel function.
You can even get details of the current thread which is being executed using commands like cuda thread. Other commands like cuda block exist.
To switch between threads say cuda thread (x,y,z)
For more details refer to the latest version of cuda-gdb's documentation. If you are using the latest version of cuda toolkit (ie, 3.2 as of today), make sure you are looking at the latest version of the documentation (as the options have changed a lot).
And also make sure you are running cuda-gdb from a console (outside X11), since you are stopping your GPU for debugging.
Hope this helps.
Compiling with :
nvcc -g -G --keep
fixed this problem for me. This ensures all the intermediate files generated during compilation are not erased so that the debugger can find them.

After setting a breakpoint in Qt, gdb says: "Error accessing memory address"

I wrote a very simple Qt program here:
int main(int argc, char* argv[])
{
QApplication app(argc, argv);
QTableView table(&frame);
table.resize(100, 100);
table.show();
return app.exec();
}
And when I try to set a breakpoint where the table gets clicked, I get this error from gdb:
(gdb) symbol-file /usr/lib/libQtGui.so.4.4.3.debug
Load new symbol table from "/usr/lib/libQtGui.so.4.4.3.debug"? (y or n) y
Reading symbols from /usr/lib/libQtGui.so.4.4.3.debug...done.
(gdb) br 'QAbstractItemView::clicked(QModelIndex const&)'
Breakpoint 1 at 0x5fc660: file .moc/release-shared/moc_qabstractitemview.cpp, line 313.
(gdb) run
Starting program: ./qt-test
Warning:
Cannot insert breakpoint 1.
Error accessing memory address 0x5fc660: Input/output error.
Does anyone know why the breakpoint can't be inserted?
Don't use the gdb command symbol-file to load external symbols. The breakpoint addresses will be wrong since they're not relocated.
Instead, put a breakpoint in main, run the program, and then set your breakpoint:
gdb ./program
GNU gdb 6.8-debian blah blah blah
(gdb) br main
Breakpoint 1 at 0x80489c1
(gdb) run
Starting program: ./program
Breakpoint 1, 0x080489c1 in main ()
(gdb) br 'QAbstractItemView::clicked(QModelIndex const&)'
Breakpoint 2 at 0xb7d24664
(gdb) continue
Continuing.
Then make your breakpoint happen.
Make sure to specify the parameter list in the function you want to set a breakpoint in, without the names of those parameters, just their types.
The actual error:
Error accessing memory address 0x5fc660: Input/output error.
Can be caused by 32/64 bit mixups. Check, for example, that you didn't attach to a 32-bit binary with a 64-bit process' ID, or vice versa.
If you want to automatically break in main without setting a breakpoint you can also use the start command.
If you need to provide any arguments to the program you can use:
start argument1 argument2
OK for me I got this when building with mingw-w64 (native or cross compiler).
I'm not sure what the exact problem was, but if I build it using gcc mingw-w64 i686-5.1.0-posix-sjlj-rt_v4-rev0 then it creates (finally) debuggable builds. Otherwise
(gdb) break main
...
(gdb) r
...
Cannot insert breakpoint 1.
Cannot access memory at address 0x42445c
<process basically hangs>
message 19 times out of 20, though sometimes it did actually work (very rarely).
gdb 7.8.1 and 7.9.1 seemed to be able to debug the created exe. So it's probably not the version of gdb that makes a difference.
My current theory/suspect is either it was the version of gcc or possibly the sljl vs. dwarf2 "aspect" to the compiler [?] (i686-492-posix-dwarf-rt_v3-rev1 didn't work, and cross compiling with some form of gcc 4.9.2 didn't either). Didn't try other versions of gcc.
update: newer gcc (5.1.0) but cross compiling I still got this failure. The cause in this case turned out to be a dependency library that my build (FFmpeg) was using by linking against (libgme in this case) which is exporting a few errant "shared" symbols (when I am building a static executable). Because of this, "shared" builds brake (https://trac.ffmpeg.org/ticket/282) and somehow it screws up gdb as well. For instance possibly linking against SDL can do this to you as well. My thought is possibly an ld bug [?]

Resources