Challenge: Access violation reading/executing location after successful compile

Challenge: Access violation reading/executing location after successful compile - debugging

A challenge emerged from the Danish Center for Cyber Security a few weeks ago.
See https://puzzling.stackexchange.com/questions/49702/programming-news-paper-puzzle/49757
A part of the challenge is to fix an Assembly code, load an .img file to process and then compile it. The file is called u5emu.asm.
A user called David J posted a cleaned-up version of the .asm code here: http://pastebin.com/TChuYF29
There's a minor bug where he wrote . instead of , on line 126, otherwise it looks good. What I did additionally was to change the getchar and putchar to _getchar and _putchar in the .asm code so the C lib would work. Also, I edited the U5_LE to _asm_main: since driver.c's main calls _asm_main.
I've gotten as far as to create an .exe by doing:
nasm -f win32 u5emu.asm
gcc -o u5emu u5emu.obj driver.c asm_io.obj
Which creates an executable file. I'm pretty sure that the program will ask me for an input (since there's a getchar) and it will then process the included file (a B64 encoded string which I've cleaned up and removed odd symbols like [, ; etc) and put out a clue for the next part of the challenge.
When I run the exe it crashes and I get two types of errors when I debug:
Unhandled exception at 0x546CD4A1 in u5emu.exe: 0xC0000005: Access violation reading location 0x00000000.
And
Exception thrown at 0x00000000 in u5emu.exe: 0xC0000005: Access violation executing location 0x00000000
I've hit a dead end here, so hoping someone can assist me in how to crack this.

Not an answer to you question, but I can tell you what I did: I rewrote the small program into C (using a switch for the 32 opcodes). This makes it MUCH easier to add debug printout, etc. Hint #2: Remember to swap bytes, the emulated machine is big endian.

Related

what are the two numbers for an instruction location in objdump of a kernel module? [duplicate]

Consider the following Linux kernel dump stack trace; e.g., you can trigger a panic from the kernel source code by calling panic("debugging a Linux kernel panic");:
[<001360ac>] (unwind_backtrace+0x0/0xf8) from [<00147b7c>] (warn_slowpath_common+0x50/0x60)
[<00147b7c>] (warn_slowpath_common+0x50/0x60) from [<00147c40>] (warn_slowpath_null+0x1c/0x24)
[<00147c40>] (warn_slowpath_null+0x1c/0x24) from [<0014de44>] (local_bh_enable_ip+0xa0/0xac)
[<0014de44>] (local_bh_enable_ip+0xa0/0xac) from [<0019594c>] (bdi_register+0xec/0x150)
In unwind_backtrace+0x0/0xf8 what does +0x0/0xf8 stand for?
How can I see the C code of unwind_backtrace+0x0/0xf8?
How to interpret the panic's content?

It's just an ordinary backtrace, those functions are called in reverse order (first one called was called by the previous one and so on):
unwind_backtrace+0x0/0xf8
warn_slowpath_common+0x50/0x60
warn_slowpath_null+0x1c/0x24
ocal_bh_enable_ip+0xa0/0xac
bdi_register+0xec/0x150
The bdi_register+0xec/0x150 is the symbol + the offset/length there's more information about that in Understanding a Kernel Oops and how you can debug a kernel oops. Also there's this excellent tutorial on Debugging the Kernel
Note: as suggested below by Eugene, you may want to try addr2line first, it still needs an image with debugging symbols though, for example
addr2line -e vmlinux_with_debug_info 0019594c(+offset)

Here are two alternatives for addr2line. Assuming you have the proper target's toolchain, you can do one of the following:
Use objdump:
locate your vmlinux or the .ko file under the kernel root directory, then disassemble the object file :
objdump -dS vmlinux > /tmp/kernel.s
Open the generated assembly file, /tmp/kernel.s. with a text editor such as vim. Go to
unwind_backtrace+0x0/0xf8, i.e. search for the address of unwind_backtrace + the offset. Finally, you have located the problematic part in your source code.
Use gdb:
IMO, an even more elegant option is to use the one and only gdb. Assuming you have the suitable toolchain on your host machine:
Run gdb <path-to-vmlinux>.
Execute in gdb's prompt: list *(unwind_backtrace+0x10).
For additional information, you may checkout the following resources:
Kernel Debugging Tricks.
Debugging The Linux Kernel Using Gdb

In unwind_backtrace+0x0/0xf8 what the +0x0/0xf8 stands for?
The first number (+0x0) is the offset from the beginning of the function (unwind_backtrace in this case). The second number (0xf8) is the total length of the function. Given these two pieces of information, if you already have a hunch about where the fault occurred this might be enough to confirm your suspicion (you can tell (roughly) how far along in the function you were).
To get the exact source line of the corresponding instruction (generally better than hunches), use addr2line or the other methods in other answers.

When we get runtime error in swift project, Why does Xcode send us to Thread output in assembly language? What's the point ?

As you know when there is somethings wrong when we are running a Swift project in Xcode we will direct to tread debug navigator's thread section and we will be face with some assembly code like this :
I am wondering is there any reference, tutorial or tools for understanding these codes , there should be reasone that we direct to these code
let me clear; I know how to fix the errors but this suffering me when I do not understand some thing like this. I want to know what are these codes and how we can use them or at least understand them.
Thanks :)

Original question: what language is that? That's AT&T syntax assembly language for x86-64. https://stackoverflow.com/tags/x86/info for manuals from Intel and other resources, and https://stackoverflow.com/tags/att/info for how AT&T syntax differs from Intel syntax used in most manuals. (I think the x86 tag wiki has a few AT&T syntax tutorials.) Most AT&T-syntax disassemblers have an intel-syntax mode, too, so you can use that if you want asm that matches Intel's manuals.
What's the point?
The point is so you can debug your program if you know asm. Or you can show the asm to someone who does understand it, or include it in a bug report.
Did you compile without debug symbols? Or did it crash in library code without symbols? It's normal for debuggers to show you asm if it can't show you source, or if you ask for asm.
If you have debug symbols for your own code, you can at least backtrace into parent functions for which you do have source. (Unless the stack is corrupted.)
Did your program fault on that instruction highlighted in pink? That's a bit odd, since it's loading from static data (a RIP-relative load means the address is a link-time constant).
Did you maybe munmap or mprotect that page of your program's data or text segment so a load would fault? Normally you only get faults when an addressing mode involves a pointer.
(The call *0x1234(%rip) right before it is calling through a function pointer, though. The function-pointer is stored in memory, but code-fetch after the call executes would fault if it was pointing to an unmapped or non-executable page). But your first image shows you got a SIGABRT, not SIGSEGV, so that's more like the program on purpose aborted after failing an assertion.
I believe majority of swift coders don't know asm
There's nothing more useful a debugger can do without debug symbols and source files.
Also keep in mind that the majority of debugger authors do know asm, so for them it is an obviously-useful feature / behaviour. They know that many people won't be able to benefit from it, but that some will.
Asm is what's really running on the machine. Without asm, you couldn't find wrong-code compiler bugs, etc. etc. As far as software bugs, there is no lower level than asm, so it's not some arbitrary choice of some lower-level layer to stop at.
(Unless there's also a bug in your disassembler or debugger, in which case you need to check the hex machine code.)

Compiling and linking NASM and 64-bit C code together into a bootloader [duplicate]

This question already has an answer here:
Relocation error when compiling NASM code in 64-bit mode
(1 answer)
Closed 4 years ago.
I made a very simple 1 stage bootloader that does two main things: it switches from 16 bit real mode to 64 bit long mode, and it read the next few sectors from the hard disk that are for initiating the basic kernel.
For the basic kernel, I am trying to write code in C instead of assembly, and I have some questions regarding that:
How should I compile and link the nasm file and the C file?
When compiling the files, should I compile to 16 bit or 64 bit? since I am switching from 16 to 64 bits.
How would I add more files from either C or assembly to the project?
I rewrote the question to make my goal more clear, so if source code is needed tell me to add it.
Code: https://github.com/LatKid/BasicBootloaderNASMC

since I am also linking a nasm file with the C file, it spits an error from the nasm object file, which is relocation R_X86_64_16 against .text' can not be used when making a shared object; recompile with -fPIC
One of your issues is probably inside that nasm assembler file (which you don't show in the initial version of your question). It should contain only position-independent code (PIC) so cannot produce an object file with relocation R_X86_64_16 (In your edited question, mov sp, main is obviously not PIC, you should use instruction pointer relative data access of x86-64, and you cannot define main both in your nasm file and in a C file, and you cannot mix 16 bits mode with 64 bits mode when linking).
Study ELF, then the x86-64 ABI to understand what kind of relocations are permitted in a PIC file (and what constraints an assembler file should follow to produce a PIC object file).
Use objdump(1) & readelf(1) to inspect object files (and shared objects and executables).
Once your nasm code produces a PIC object file, link with gcc and use gcc -v to understand what happens under the hoods (you'll see that extra libraries and object files, including crt0 ones, -lgcc and -lc, are used).
Perhaps you need to understand better compilation and linking. Read Levine's book Linkers and Loaders, Drepper's paper How To Write Shared Libraries, and -about compilation- the Dragon book.
You might want to link with gcc but use your own linker script. See also this answer to a very related question (probably with motivations similar to yours); the references there are highly relevant for you.
PS. Your question lacks motivation and context (it has no MCVE but needs one) and might be some XY problem. I guess you are on Linux. I strongly recommend publishing your actual full code -even buggy- (perhaps on github or gitlab or elsewhere) as free software to get potential help. I strongly recommend using an existing bootloader (probably GRUB) and focus your efforts on your OS code (which should be published as free software, to get some feedback).

How do I interpret this error from GDB?

I feel pretty dumb right now, but how do I interpret this message in GDB?
Program received signal SIGSEGV, Segmentation fault.
0x00007fe2eb46073a in clearerr (fp=0x4359790) at clearerr.c:27
27 clearerr.c: No such file or directory.
in clearerr.c
What file is missing that's causing the segfault? Is it clearerr.c or the file that clearerr is trying to access?

What file is missing that's causing the segfault?
We don't know what is causing SIGSEGV, but it's unlikely that any missing file has anything to do with it.
First, this:
clearerr.c: No such file or directory.
simply means that GDB can not show you the source where SIGSEGV occurred. That is because clearerr() is part of your libc, and you either didn't install sources for your libc (they may not even be available for your environment), or you didn't tell GDB how to find these sources.
Second, the actual cause of SIGSEGV is most likely because the fp that you invoked it with has been corrupted or is invalid in some other way.
Here are a few ways this could happen:
char c;
FILE *fp = (FILE*) &c; // fp is bogus: doesn't point to a FILE at all
clearerr(fp); // likely will crash
FILE *fp2; // fp2 contains uninitialized garbage
clearerr(fp2); // likely will crash
FILE *fp3 = fopen("/tmp/foo", "w");
fclose(fp3); // destroys fp3
clearerr(fp3); // accesses dangling memory, likely will crash
There are of course many other ways as well. You'll need to look at the caller of clearerr to see if it's doing something stupid. To find the caller, use GDB where command.

The seg fault is being caused by a file that clearerr.c is trying to access (at line 27).

using gdb effectively

I'm used to using gdb quite effectively when I am dealing with ELF binaries which have been compiled using the -ggdb flag. However there are a few difficulties I am facing when I am facing normal non-stripped binaries.
I can set the breakpoint at main, but what if I needed to set the breakpoint at a fixed offset(say 10 lines) from the start of main?
Usually I get the address of a character array(say buf) as print &buf. However, in the current case I get a message saying that buf cannot be found in the current context.
How do I deal with the above mentioned issues? It would be great if you could provide some reading material too.

To get things like source line number and variable information, your code needs to be compiled with debug symbols (-ggdb or similar). Compiling without debug symbols but unstripped keeps in function and global variable names, but nothing else. Stripping the executable even removes some of those. So, in answer to your question, you can't do the things you want without compiling with -g.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio