What are possible reasons for PC stops at the address of RET and does not move in LC3? - lc3

I used JSR and RET to jump to a subroutine and jump back to the main function. However, every time when the PC is on the address of RET, it stops there and never moves. Is there any possible reason for this problem? I did not use any subroutine inside my first subroutine, but I do use Branches. SO, I think my R7 does not change in the subroutine.

Are you using any TRAPs? TRAP modifies R7 also.

Related

Assembly - Why this CALL function doesn't work?

I don't understand why CALL function in this code doesn't work:
#include<stdio.h>
void main() {
__asm {
jmp L1
L2:
mov eax, 8
ret
L1:
call L2
}
}
If i debug the code step by step, the line 'call L1' is not processed, and program directly skips to the end. What is wrong? I'm working on VisualStudio2015 with Intel 32-bit registers.
The problem
You've stumbled on the difference between step over F10 and step into F11.
When you use (the default) step over, call appears to be ignored.
You need to step into the code and then the debugger will behave as you'd expect.
Step over
The way this works with step over is that the debugger sets a breakpoint on the next instruction, halts there and moves the breakpoint to the next instruction again.
Step over knows about (conditional) jumps and accounts for that, but disregards (steps over) call statements; it interprets a call as a jump to another subroutine and 'assumes' you want to stay within the current context.
These automatic breakpoints are ephemeral, unlike manual breakpoints which persist until you cancel them.
Step into
Step into does the same, but also sets a breakpoint at every call destination; in effect leading you deep into the woods traversing every subroutine.
Step out
If you've stepped too deep 'into' a subroutine Visual Studio allows you to step out using ShiftF11; this will take you back to the next instruction after the originating call.
Some other debuggers name this feature "run until return".
Debugging high level code
When the debugger is handling higher language source code (e.g. C) it keeps a list of target addresses for every line of source code. It will plan its breakpoints per line of source code.
Other than the fact that every line of high level code translates to zero or more lines of assembly it works the same as stepping through raw assembly code.

Using a custom LC-3 trap routine

I wrote a subroutine to be used as a Trap call via Trap x26. My code for my subroutine is at address x3300. I cannot figure out how to jump from x26 to my actual instructions for the subroutine at x3300, since the gap is greater than JSR's PC offset parameter allows. I know I could add some code in near x26 to make it possible to jump all the way to x3300, but I don't think that's how I am supposed to do it. I think I'm missing something with understanding traps in general.
Here's my understanding/confusion of traps: So from x0000 - X00FF is the trap vector table. For example, if you call TRAP x20, then the PC goes to x20 and continues execution with the instruction at x20. (Please let me know if this is incorrect!) At this point I am confused because at the address x20 in the LC-3 is a BRZ x0021 command, which takes the PC to x21. At x21, there is a BRZ x52command. When this branch gets executed to go to x52 plus the PC, the command there is TRAP x00. Most of the Trap 20's commands seem to go to these (what look like) nonsense trap commands. After the trap x00 is executed, the program goes to xFD79. This is really confusing me since at x00 in memory, there is just another TRAP x00. To me, it seems like the program should go to x00 instead of xFD79.
Can someone help explain this to me please? What exactly is going on when a trap is called? I thought it just went to another address in memory where the actual code for the instruction was and executed that, but what I have seen doesn't reflect that. Any help is greatly appreciated as this is preventing me from completing a school project right now.
Thanks!
"So from x0000 - X00FF is the trap vector table. For example, if you call TRAP x20, then the PC goes to x20 and continues execution with the instruction at x20. (Please let me know if this is incorrect!)"
This is correct, however the next sentence...
"At this point I am confused because at the address x20 in the LC-3 is a BRZ x0021 command, which takes the PC to x21"
The command you see which looks like a BRz is not, in fact, an instruction. It is an address! x0400 would be a fairly useless command - it the PC offset is zero, so it just goes to the next line. If you interpret it as an address instead, and go to that address as part of the trap call, you will find the rest of the trap instructions.

About assembler far calls and the heaven's gate, do segment calls that trigger an exception push cs and eip BEFORE the exception is thrown?

Currently i'm playing with the windows/WOW64 trick known as "the heaven's gate", which, as some of you will probably know, allows us to enter x64 mode even though in a x86 program (i was so amazed when i tested it and it worked!) But i know it is not supported on all Windows versions, so my code (because there is a code) uses seh, it looks like this:
start:
use32
;; setup seh...
call $33:.64bits_code ; specify 0x33 segment, it's that easy
;; success in x64 mode, quit seh...
jmp .exit
.64bits_code:
use64
;; ...
use32
retf
.seh_handler:
use32
;; ...
xor eax,eax ; EXCEPTION_CONTINUE_EXECUTION
ret
.32bits_code:
; we have been called by a far call (well, indirectly, routed by a seh handler)
; HERE IS THE PROBLEM => Should i use a retf since cs and eip are on the stack,
; or the exception has been triggered before pushing them???
; "retf" or "jmp .exit"?
.exit:
xor eax,eax
push eax
call [ExitProcess]
I know a simple "jmp .exit" would do the trick, but i'm terribly curious about it
When the OS gets an interrupt or a fault happens, it expects that no matter what the user code was up to, the CPU has saved the required state on the kernel stack so that an IRET will invisibly resume whatever it was doing.
Note that there is no "magic" involved in that state on the kernel stack. "Continuing execution" only means restoring saved values of rflags, cs:rip and ss:rsp and running the code that cs:rip ends up pointing to.
That means that without involving SEH specifically, just thinking about any kind of exception happening during a far call, there are really only two cases to consider:
The exception happens "before" the jump: nothing has been pushed, the state on the kernel stack says we should resume by restarting the call instruction.
The exception happens "after" the jump: cs:eip have been pushed, rip points somewhere at or after the .64bits_code label, and that saved state says that to resume we should jump to the 64-bit code.
If the CPU allowed a far call to be interrupted "in the middle", there would be no possible value of cs:rip that produces a consistent result when the OS continues execution. For example, if the far call's return address was pushed before the exception happened but the saved cs:rip points to the far call instruction, you'll end up with two copies of the return address on the stack and all hell breaks loose.
Now, to actually answer your question: it depends on the value of rIP that the OS tells you the exception happened on. If it points to the 64-bit code you must have cs:eip on the stack, if it points to the 32-bit code you can safely assume it has not been pushed yet.

call immediate versus call dword near [dword addr]

So recently I've been wanting to call some win32 calls from assembly, and I've been using NASM as my external assembler. I was calling SendMessage in my code in the following way:
call __imp__SendMessageW#16
This was assembled into a relative jump (0xE8 opcode) and the result was an access violation. In the debugger, the computed jump offset seemed to be the correct one (in that __imp__SendMessageW#16 really did seem to reside there) but nonetheless it did not work. Examining the assembly produced by Visual Studio when I called the function from C++, I noticed that it wasn't a relative immediate jump it was using, but instead (in the language of MASM) a call dword ptr [__imp__SendMessageW#16], corresponding to an 0xFF15 opcode. After some futzing around I figured out that NASM syntax encodes this as call dword near [dword __imp__SendMessageW#16], and making the change my code suddenly worked.
My question is, why does one work and not the other? Is there some relocation of code going on that causes the relative immediate call to jump somewhere unfriendly? I've never been much of an assembly programmer but my impression was always that the two calls should do the same thing and the main difference is that one is position independent and the other is not (assuming that they move the IP to the same place). The relocation of code theory makes sense given that, but then how do you explain the debugger showing the right address?
Also: what's the logic behind the [] syntax in this call? The offset is still an immediate (just little endian encoded immediately after 0xFF15), there's no memory access going on here beyond the instruction fetch (I tend to think of [] as a dereference outside the context of lea).
call dword[__imp__SendMessageW#16]
_imp_SendMessageW#16 is an address to your imports section that contains the address of the API function. You use the square brackets to deference (call the address STORED by this address)

Simple "Hello-World", null-free shellcode for Windows needed

I would like to test a buffer-overflow by writing "Hello World" to console (using Windows XP 32-Bit). The shellcode needs to be null-free in order to be passed by "scanf" into the program I want to overflow. I've found plenty of assembly-tutorials for Linux, however none for Windows. Could someone please step me through this using NASM? Thxxx!
Assembly opcodes are the same, so the regular tricks to produce null-free shellcodes still apply, but the way to make system calls is different.
In Linux you make system calls with the "int 0x80" instruction, while on Windows you must use DLL libraries and do normal usermode calls to their exported functions.
For that reason, on Windows your shellcode must either:
Hardcode the Win32 API function addresses (most likely will only work on your machine)
Use a Win32 API resolver shellcode (works on every Windows version)
If you're just learning, for now it's probably easier to just hardcode the addresses you see in the debugger. To make the calls position independent you can load the addresses in registers. For example, a call to a function with 4 arguments:
PUSH 4 ; argument #4 to the function
PUSH 3 ; argument #3 to the function
PUSH 2 ; argument #2 to the function
PUSH 1 ; argument #1 to the function
MOV EAX, 0xDEADBEEF ; put the address of the function to call
CALL EAX
Note that the argument are pushed in reverse order. After the CALL instruction EAX contains the return value, and the stack will be just like it was before (i.e. the function pops its own arguments). The ECX and EDX registers may contain garbage, so don't rely on them keeping their values after the call.
A direct CALL instruction won't work, because those are position dependent.
To avoid zeros in the address itself try any of the null-free tricks for x86 shellcode, there are many out there but my favorite (albeit lengthy) is encoding the values using XOR instructions:
MOV EAX, 0xDEADBEEF ^ 0xFFFFFFFF ; your value xor'ed against an arbitrary mask
XOR EAX, 0xFFFFFFFF ; the arbitrary mask
You can also try NEG EAX or NOT EAX (sign inversion and bit flipping) to see if they work, it's much cheaper (two bytes each).
You can get help on the different API functions you can call here: http://msdn.microsoft.com
The most important ones you'll need are probably the following:
WinExec(): http://msdn.microsoft.com/en-us/library/ms687393(VS.85).aspx
LoadLibrary(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress(): http://msdn.microsoft.com/en-us/library/ms683212%28v=VS.85%29.aspx
The first launches a command, the next two are for loading DLL files and getting the addresses of its functions.
Here's a complete tutorial on writing Windows shellcodes: http://www.codeproject.com/Articles/325776/The-Art-of-Win32-Shellcoding
Assembly language is defined by your processor, and assembly syntax is defined by the assembler (hence, at&t, and intel syntax) The main difference (at least i think it used to be...) is that windows is real-mode (call the actual interrupts to do stuff, and you can use all the memory accessible to your computer, instead of just your program) and linux is protected mode (You only have access to memory in your program's little cubby of memory, and you have to call int 0x80 and make calls to the kernel, instead of making calls to the hardware and bios) Anyway, hello world type stuff would more-or-less be the same between linux and windows, as long as they are compatible processors.
To get the shellcode from your program you've made, just load it into your target system's
debugger (gdb for linux, and debug for windows) and in debug, type d (or was it u? Anyway, it should say if you type h (help)) and between instructions and memory will be the opcodes.
Just copy them all over to your text editor into one string, and maybe make a program that translates them all into their ascii values. Not sure how to do this in gdb tho...
Anyway, to make it into a bof exploit, enter aaaaa... and keep adding a's until it crashes
from a buffer overflow error. But find exactly how many a's it takes to crash it. Then, it should tell you what memory adress that was. Usually it should tell you in the error message. If it says '9797[rest of original return adress]' then you got it. Now u gotta use ur debugger to find out where this was. disassemble the program with your debugger and look for where scanf was called. Set a breakpoint there, run and examine the stack. Look for all those 97's (which i forgot to mention is the ascii number for 'a'.) and see where they end. Then remove breakpoint and type the amount of a's you found out it took (exactly the amount. If the error message was "buffer overflow at '97[rest of original return adress]" then remove that last a, put the adress you found examining the stack, and insert your shellcode. If all goes well, you should see your shellcode execute.
Happy hacking...

Resources