Accessing memory with gdb, for assembly code? - debugging

I'm currently debugging a simple c program, and was wondering about this assembly comparison:
cmpl $0x1d,-0xc(%ebp)
From what I gather, this is checking 29 against a location in memory.
How do I access this in gdb with the print or x commands? Is it as simple as looking at the location provided by ebp then moving 12 bits/bytes along or am I completely on the wrong track?

It is indeed comparing 29 with the location in memory that is offset 12 before ebp. Assuming the program you are disassembling uses frame pointers, it's reading a local variable off the stack, probably the first one. (Although the compiler is free to place them in any order.)
If it's not using frame pointers, disassemble the surrounding code and figure out what assigns ebp.

Yes, that's cmp with an immediate and a memory operand. And yes, the effective address used to load the memory operand is ebp - 12 bytes.
In gdb, $ebp gives you the contents of the ebp register as a value you can use in an expression. So you can do stuff like:
p $ebp-0xc # print the address
p *(int*)($ebp-0xc) # dereference it as an int*
x /4db $ebp-0xc # dump 4 8bit bytes (b) with %d formatting
Printing a char* prints the null-terminated string as well as the address, so you can do something like:
(gdb) p (char*)0x0804980B
$20 = 0x804980b "giants"
Of course, the address can be an expression involving a register value.

Related

Printing a string in x86 Assembly on Mac OS X (NASM)

I'm doing x86 on Mac OS X with NASM. Copying an example and experimenting I noticed that my print command needed a four bytes pushed onto the stack after the other parameters but can't figure out why line five is necessary:
1 push dword len ;Length of message
2 push dword msg ;Message to write
3 push dword 1 ;STDOUT
4 mov eax,4 ;Command code for 'writing'
5 sub esp,4 ;<<< Effectively 'push' Without this the print breaks
6 int 0x80 ;SYSCALL
7 add esp,16 ;Functionally 'pop' everything off the stack
I am having trouble finding any documentation on this 'push the parameters to the stack' syntax that NASM/OS X seems to require. If anyone can point me to a resource for that in general that would most likely answer this question as well.
(Most of the credit goes to #Michael Petch's comment; I'm repeating it here so that it is an answer, and also in order to further clarify the reason for the additional four bytes on the stack.)
macOS is based on BSD, and, as per FreeBSD's documentation re system calls, by default the kernel uses the C calling conventions (which means arguments are pushed to the stack, from last to first), but assuming four extra bytes pushed to the stack, as "it is assumed the program will call a function that issues int 80h, rather than issuing int 80h directly".
That is, the kernel is not built for direct int 80h calls, but rather for code that looks like this:
kernel: ; subroutine to make system calls
int 80h
ret
.
.
.
; code that makes a system call
call kernel ; instead of invoking int 80h directly
Notice that call kernel would push the return address (used by the kernel subroutine's ret to return to calling code after the system call) onto the stack, accounting for four additional bytes – that's why it's necessary to manually push four bytes to the stack (any four bytes – their actual value doesn't matter, as it is ignored by the kernel – so one way to achieve this is sub esp, 4) when invoking int 80h directly.
The reason the kernel expects this behaviour – of calling a method which invokes the interrupt instead of invoking it directly – is that when writing code that can be run on multiple platforms it's then only needed to provide a different version of the kernel subroutine, rather than of every place where a system call is invoked (more details and examples in the link above).
Note: all the above is for 32-bit; for 64-bit the calling conventions are different – registers are used to pass the arguments rather than the stack (there's also a call convention for 32-bit which uses registers, but even then it's not the same registers), the syscall instruction is used instead of int 80h, and no extra four bytes (which, on 64-bit systems, would actually be eight bytes) need to be pushed.

Activation records - C

Please consider the below program:
#include <stdio.h>
void my_f(int);
int main()
{
int i = 15;
my_f(i);
}
void my_f(int i)
{
int j[2] = {99, 100};
printf("%d\n", j[-2]);
}
My understanding is that the activation record (aka stack frame) for my_f() should look like this:
------------
| i | 15
------------
| Saved PC | Address of next instruction in caller function
------------
| j[0] | 99
------------
| j[1] | 100
------------
I expected j[-2] to print 15, but it prints 0. Could someone please explain what I am missing here? I am using GCC 4.0.1 on OS X 10.5.8 (Yes, I live under a rock, but that's besides the point here).
If you ever actually want the address of your stack frame in GNU C, use
__builtin_frame_address(0) (non-zero args attempt to backtrace up the stack to parent stack frames). This is the address of the first thing pushed by the function, i.e. a saved ebp or rbp if you compiled with -fno-omit-frame-pointer. If you want to modify the return address on the stack, you might be able to do that with an offset from __builtin_frame_address(0), but to just read it reliably use __builtin_return_address(0).
GCC keeps the stack 16byte-aligned in the usual x86 ABIs. There could easily be a gap between the return address and j[1]. In theory, it could put j[] as far down as it wanted, or optimize it away (or to a read-only static constant, since nothing writes it).
If you compiled with optimization, i probably isn't stored anywhere, and
my_f(int i) is inlined into main.
Also, like #EOF said, j[-2] is two spots below the bottom of your diagram. (Low addresses are at the bottom, because the stack grows down). Also note that the diagram on wikipedia (from the link I edited into the question) is drawn with low addresses at the top. The ASCII diagram in my answer has low addresses at the bottom.
If you compiled with -O0, then there's some hope. In 64bit code (the default target for 64bit builds of gcc and clang), the calling convention passes the first 6 args in registers, so the only i in memory will be in main's stack frame.
Also, in AMD64 code, j[3] might be the upper half of the return address (or the saved %rbp), if j[] is placed below one of those with no gap. (pointers are 64bit, int is still 32 bits.) j[2], the first out-of-bounds element, would alias onto the low 32bits (aka low dword in Intel terminology, where a "word" is 16 bits.)
The best hope for this to work is in un-optimized 32bit code,
using a calling convention with no register-args. (e.g. the x86 32bit SysV ABI. See also the x86 tag wiki).
In that case, your stack will look like:
# 32bit stack-args calling convention, unoptimized code
higher addresses
^^^^^^^^^^^^
| argv |
------------
| argc |
-------------------
| main's ret addr |
-------------------
| ... |
| main()'s local variables and stuff, layout decided by the compiler
| ... |
------------
| i | # function arg
------------ <-- 16B-aligned boundary for the first arg, as required in the ABI
| ret addr |
------------ <--- esp pointer on entry to the function
|saved ebp | # because gcc -m32 -O0 uses -fno-omit-frame-pointer
------------ <--- ebp after mov ebp, esp (part of no-omit-frame-pointer)
unpredictable amount of padding, up to the compiler. (likely 0 bytes in this case)
but actually not: clang 3.5 for example makes a copy of it's arg (`i`) here, and puts j[] right below that, so j[2] or j[5] will work
------------
| j[1] |
------------
| j[0] |
------------
| |
vvvvvvvvvvvv Lower addresses. (The wikipedia diagram is upside-down, IMO: it has low addresses at the top).
It's somewhat likely that the 8 byte j array will be placed right below the value written by push ebp, with no gap. That would make j[0] 16B-aligned, although there's no requirement or guarantee that local arrays have any particular alignment. (Except that C99 variable-length arrays are 16B-aligned, in the AMD64 SysV ABI. I don't remember there being a guarantee for non-variable length arrays, but I didn't check.)
If the function saved any other call-preserved registers (like ebx) so it could use them, those saved registers would be before or after the saved ebp, above space used for locals.
j[4] might work in 32bit code, like #EOF suggested. I assume he arrived at 4 by the same reasoning I did, but forgot to mention that it only applies to 32bit code.
Looking at the asm:
Of course, at what really happens is much better than all this guessing and hand-waving.
I put your function on the Godbolt compiler explorer, with the oldest gcc version it has (4.4.7), using -xc -O0 -Wall -fverbose-asm -m32. -xc is to compile as C, not C++.
my_f:
push ebp #
mov ebp, esp #,
sub esp, 40 #, # no idea why it reserves 40 bytes. clang 3.5 only reserves 24
mov DWORD PTR [ebp-16], 99 # j[0]
mov DWORD PTR [ebp-12], 100 # j[1]
mov edx, DWORD PTR [ebp+0] ###### This is the j[4] load
mov eax, OFFSET FLAT:.LC0 # put the format string address into eax
mov DWORD PTR [esp+4], edx # store j[4] on the stack, to become an arg for printf
mov DWORD PTR [esp], eax # store the format string
call printf #
leave
ret
So gcc puts j at ebp-16, not the ebp-8 that I guessed. j[4] gets the saved ebp. i is at j[6], 8 more bytes up the stack.
Remember, all we've learned here is what gcc 4.4 happens to do at -O0. There's no rule that says j[6] will refer to a location that holds a copy of i on any other setup, or with different surrounding code.
If you want to learn asm from compiler output, look at the asm from -Og or -O1 at least. -O0 stores everything to memory after every statement, so it's very noisy / bloated, which makes it harder to follow. Depending on what you want to learn, -O3 is good. Obviously you have to write functions that do something with input parameters instead of compile-time constants, so they don't optimize away. See How to remove "noise" from GCC/clang assembly output? (especially the link to Matt Godbolt's CppCon2017 talk), and other links in the x86 tag wiki.
clang 3.5.
As noted in the diagram, copies i from the arg slot to a local. Although when it calls printf, it copies from the arg slot again, not the copy inside its own stack frame.
In theory you are right but practically it depends on a lot of issues. These are e.g. the calling conventions, operating system type and version, and also on the compiler type and version.
You can only tell that specifically by looking at the final disassembly of your code.

process descriptor pointer doesn't match current macro in Linux Kernel

I am using the esp value of kernel stack to calculate the process descriptor pointer value.
According to ULK book, I just need to mask 13 least significant bits of esp to obtain the base address of the thread_info structure.
My test is:
write a kernel module because I need to get value of kernel stack
In the kernel init function, get the value of kernel stack
use following formula to get the process descriptor pointer of the process running on the CPU: *((unsigned int*) esp & 0xffffe000)
use the current macro, print out its value.
I think the value of step3 should be same as the value of step 4.
But my experiment results shows: sometimes they are same, and sometimes they are different. Could any explain why? Or am I missing anything?
This is because at the base of the kernel stack you will find a struct thread_info instance (platform dependent) and not a struct task_struct. The current() macro provides a pointer to the current task_struct.
Try the following:
struct thread_info *info = (struct thread_info*)(esp & 0xfffe000);
struct task_struct *my_current = info->task;
Now you can compare my_current with current().
Finally, I solved this problem. Everything is correct expect for the size of kernel stack. My kernel use 4KB stack instead of 8KB stack. So I just need to mask low 12 bits of the ESP.
Thanks for all the suggestions and answer!

Simple "Hello-World", null-free shellcode for Windows needed

I would like to test a buffer-overflow by writing "Hello World" to console (using Windows XP 32-Bit). The shellcode needs to be null-free in order to be passed by "scanf" into the program I want to overflow. I've found plenty of assembly-tutorials for Linux, however none for Windows. Could someone please step me through this using NASM? Thxxx!
Assembly opcodes are the same, so the regular tricks to produce null-free shellcodes still apply, but the way to make system calls is different.
In Linux you make system calls with the "int 0x80" instruction, while on Windows you must use DLL libraries and do normal usermode calls to their exported functions.
For that reason, on Windows your shellcode must either:
Hardcode the Win32 API function addresses (most likely will only work on your machine)
Use a Win32 API resolver shellcode (works on every Windows version)
If you're just learning, for now it's probably easier to just hardcode the addresses you see in the debugger. To make the calls position independent you can load the addresses in registers. For example, a call to a function with 4 arguments:
PUSH 4 ; argument #4 to the function
PUSH 3 ; argument #3 to the function
PUSH 2 ; argument #2 to the function
PUSH 1 ; argument #1 to the function
MOV EAX, 0xDEADBEEF ; put the address of the function to call
CALL EAX
Note that the argument are pushed in reverse order. After the CALL instruction EAX contains the return value, and the stack will be just like it was before (i.e. the function pops its own arguments). The ECX and EDX registers may contain garbage, so don't rely on them keeping their values after the call.
A direct CALL instruction won't work, because those are position dependent.
To avoid zeros in the address itself try any of the null-free tricks for x86 shellcode, there are many out there but my favorite (albeit lengthy) is encoding the values using XOR instructions:
MOV EAX, 0xDEADBEEF ^ 0xFFFFFFFF ; your value xor'ed against an arbitrary mask
XOR EAX, 0xFFFFFFFF ; the arbitrary mask
You can also try NEG EAX or NOT EAX (sign inversion and bit flipping) to see if they work, it's much cheaper (two bytes each).
You can get help on the different API functions you can call here: http://msdn.microsoft.com
The most important ones you'll need are probably the following:
WinExec(): http://msdn.microsoft.com/en-us/library/ms687393(VS.85).aspx
LoadLibrary(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress(): http://msdn.microsoft.com/en-us/library/ms683212%28v=VS.85%29.aspx
The first launches a command, the next two are for loading DLL files and getting the addresses of its functions.
Here's a complete tutorial on writing Windows shellcodes: http://www.codeproject.com/Articles/325776/The-Art-of-Win32-Shellcoding
Assembly language is defined by your processor, and assembly syntax is defined by the assembler (hence, at&t, and intel syntax) The main difference (at least i think it used to be...) is that windows is real-mode (call the actual interrupts to do stuff, and you can use all the memory accessible to your computer, instead of just your program) and linux is protected mode (You only have access to memory in your program's little cubby of memory, and you have to call int 0x80 and make calls to the kernel, instead of making calls to the hardware and bios) Anyway, hello world type stuff would more-or-less be the same between linux and windows, as long as they are compatible processors.
To get the shellcode from your program you've made, just load it into your target system's
debugger (gdb for linux, and debug for windows) and in debug, type d (or was it u? Anyway, it should say if you type h (help)) and between instructions and memory will be the opcodes.
Just copy them all over to your text editor into one string, and maybe make a program that translates them all into their ascii values. Not sure how to do this in gdb tho...
Anyway, to make it into a bof exploit, enter aaaaa... and keep adding a's until it crashes
from a buffer overflow error. But find exactly how many a's it takes to crash it. Then, it should tell you what memory adress that was. Usually it should tell you in the error message. If it says '9797[rest of original return adress]' then you got it. Now u gotta use ur debugger to find out where this was. disassemble the program with your debugger and look for where scanf was called. Set a breakpoint there, run and examine the stack. Look for all those 97's (which i forgot to mention is the ascii number for 'a'.) and see where they end. Then remove breakpoint and type the amount of a's you found out it took (exactly the amount. If the error message was "buffer overflow at '97[rest of original return adress]" then remove that last a, put the adress you found examining the stack, and insert your shellcode. If all goes well, you should see your shellcode execute.
Happy hacking...

grdb not working variables

i know this is kinda retarded but I just can't figure it out. I'm debugging this:
xor eax,eax
mov ah,[var1]
mov al,[var2]
call addition
stop: jmp stop
var1: db 5
var2: db 6
addition:
add ah,al
ret
the numbers that I find on addresses var1 and var2 are 0x0E and 0x07. I know it's not segmented, but that ain't reason for it to do such escapades, because the addition call works just fine. Could you please explain to me where is my mistake?
I see the problem, dunno how to fix it yet though. The thing is, for some reason the instruction pointer starts at 0x100 and all the segment registers at 0x1628. To address the instruction the used combination is i guess [cs:ip] (one of the segment registers and the instruction pointer for sure). The offset to var1 is 0x10 (probably because from the begining of the code it's the 0x10th byte in order), i tried to examine the memory and what i got was:
1628:100 8 bytes
1628:108 8 bytes
1628:110 <- wtf? (assume another 8 bytes)
1628:118 ...
whatever tricks are there in the memory [cs:var1] points somewhere else than in my code, which is probably where the label .data would usually address ds.... probably.. i don't know what is supposed to be at 1628:10
ok, i found out what caused the assness and wasted me whole fuckin day. the behaviour described above is just correct, the code is fully functional. what i didn't know is that grdb debugger for some reason sets the begining address to 0x100... the sollution is to insert the directive ORG 0x100 on the first line and that's the whole thing. the code was working because instruction pointer has the right address to first instruction and goes one by one, but your assembler doesn't know what effective address will be your program stored at so it pretty much remains relative to first line of the code which means all the variables (if not using label for data section) will remain pointing as if it started at 0x0. which of course wouldn't work with DOS. and grdb apparently emulates some DOS features... sry for the language, thx everyone for effort, hope this will spare someone's time if having the same problem...
heheh.. at least now i know the reason why to use .data section :))))
Assuming that is x86 assembly, var1 and var2 must reside in the .data section.
Explanation: I'm not going to explain exactly how the executable file is structured (not to mention this is platform-specific), but here's a general idea as to why what you're doing is not working.
Assembly code must be divided into data sections due to the fact that each data section corresponds directly (or almost directly) to a specific part of the binary/executable file. All global variables must be defined in the .data sections since they have a corresponding location in the binary file which is where all global data resides.
Defining a global variable (or a globally accessed part of the memory) inside the code section will lead to undefined behavior. Some x86 assemblers might even throw an error on this.

Resources