Assembly syntax to distinguish two forms of near jump - windows

I'm assembling the same source with two different assemblers. I expect to get two identical results (modulo memory offsets, exact value of NOPs and such). Yet I've suddenly encountered the weirdest issue: there are two possible encodings of JZ:
74 cb
and
0F 84 cw/cd
The displacement, in my case, fits into one byte, and one assembler (a flavor of GAS, I think) emits the former while another (MASM) emits the latter. Since I perform some validation by matching outputs, this throws the validation off.
I have rather little control over the options of GAS, but I have complete control over MASM. Question - is there an option, a directive, or a specific command syntax to force one encoding over the other?

If all of the code, except for this one instruction is the exact same when assembled, this looks like a bug in MASM. These instructions resolve to:
74: jz rel8
0F 84: jz rel16/32
So, MASM is improperly using more space for that opcode than it should. You may be able to remedy this by using a more explicit form of the instruction in MASM, like
jz byte my_label
However, if your machine code is different at all, this may be the proper behavior of MASM. Ensure that the signed word/dword argument to jz rel16 would fit into a signed byte

Related

Accessing global variables in ARM64 position independent assembly code

I'm writing some ARM64 assembly code for macOS, and it needs to access a global variable.
I tried to use the solution in this SO answer, and it works fine if I just call the function as is. However, my application needs to patch some instructions of this function, and the way I'm doing it, the function gets moved somewhere else in memory in the process. Note the adrp/ldr pair is untouched during patching.
However, if I try to run the function after moving it elsewhere in memory, it no longer returns correct results. This happens even if I just memcpy() the code as is, without patching. After tracing with a debugger, I isolated the issue to the address of the global valuable being incorrectly loaded by the adrp/ldr pair (and weirdly, the ldr is assembled as an add, as seen with objdump straight after compiling the binary -- not sure if it's somehow related to the issue here.)
What would be the correct way to load a global variable, so that it survives the function being copied somewhere else and run from there?
Note the adrp/ldr pair is untouched during patching.
There's the issue. If you rip code out of the binary it's in, then you effectively need to re-link it.
There's two ways of dealing with this:
If you have complete control over the segment layout, then you could have one executable segment with all of your assembly in it, and right next to it one segment with all addresses that code needs, and make sure the assembly ONLY has references to things on that page. Then wherever you copy your assembly, you'd also copy the data page next to it. This would enable you to make use of static addresses that get rebased by the dynamic linker at the time your binary is loaded. This might look something like:
.section __ASM,__asm,regular
.globl _asm_stub
.p2align 2
_asm_stub:
adrp x0, _some_ref#PAGE
ldr x0, [x0, _some_ref#PAGEOFF]
ret
.section __REF,__ref
.globl _some_ref
.p2align 3
_some_ref:
.8byte _main
Compile that with -Wl,-segprot,__ASM,rx,rx and you'll get an executable __ASM and a writeable __REF segment. Those two would have to maintain their relative position to each other when they get copied around.
(Note that on arm64 macOS you cannot put symbol references into executable segments for the dynamic linker to rebase, because it will fault and crash while trying to do so, and even if it were able to do that, it would invalidate the code signature.)
You act as a linker, scanning for PC-relative instructions and re-linking them as you go. The list of PC-relative instructions in arm64 is quite short, so it should be a feasible amount of work:
adr and adrp
b and bl
b.cond (and bc.cond with FEAT_HBC)
cbz and cbnz
tbz and tbnz
ldr and ldrsw (literal)
ldr (SIMD & FP literal)
prfm (literal)
(You can look for the string PC[] in the ARMv8 Reference Manual to find all uses.)
For each of those you'd have to check whether their target address lies within the range that's being copied or not. If it does, then you'd leave the instruction alone (unless you copy the code to a different offset within the 4K page than it was before, in which case you have to fix up adrp instructions). If it isn't then you'll have to recalculate the offset and emit a new instruction. Some of the instructions have a really low maximum offset (tbz/tbnz ±32KiB). But usually the only instructions that reference addresses across function boundaries are adr, adrp, b, bl and ldr. If all code on the page is written by you then you can do adrp+add instead of adr and adrp+ldr instead of just ldr, and if you have compiler-generated code on there, then all adr's and ldr's will have a nop before or after, which you can use to turn them into an adrp combo. That should get your maximum reference range up to ±128MiB.

MOVXZ into register - "invalid operand for instruction"

I am trying to compile an assembler-based implementation of AES, viewable here. My assembler is giving me the following error, repeated several different times over what appear to be instances of the same error. The exact source location is here, but due to the large amount of preprocessor indirection used in this file, I have copied the exact error from my build output, which gives the exact code as seen by the compiler:
/Volumes/Sources/Andromeda/Kernel/libkern/crypto/aes/EncryptDecrypt.s:297:19: error: invalid operand for instruction
movzx 240(%r10), %rax
^~~~
I do not quite understand what may be causing this problem. If I understand it properly, this instruction moves a byte (or more, this is unclear, and may in fact be the source of the problem) into the RAX register, zero-extending it if the source is less than 64-bits in size. Do I need to explicitly specify a size by adding a tag to the movxz instruction (e.g. movzxb)? What else might be the cause of this problem? Thanks!
At&t syntax does not normally use movzx, but maybe some assembler versions accept it. My copy of GNU assembler 2.22 does, but maybe OSX version doesn't. In any case, the assembler generates code for a byte source. If you do in fact have that, the proper at&t syntax would be movzbq 240(%r10), %rax, or, taking advantage of automatic zero extension, movzbl 240(%r10), %eax.
If you have a 4 byte source, then you can't use movzx at all, since it does not exist for that operand type. All you need in this case is the automatic zero extension, so you can simply do movl 240(%r10), %eax.

What does the 66 in "66:PUSH 08" stand for?

Test platform is windows 32bit.
I use IDA pro to disassemble a PE file, do some very tedious transform work, and re-assembly it into a new PE file.
But there is some difference in the re-assembled PE file and the original one if I use OllyDbg
to debug the new PE file (although there is no difference of this part in the assembly file I transformed)
Here is part of the original one:
See the
PUSH 8
PUSH 0
is correct.
Here is part of my new PE file:
See now the
PUSH 8
PUSH 0
is changed to
66:6A 08
66:6A 00
and it lead to the failure of the new PE's execution.
Basically, from what I have seen, it lead to the un-align of stack.
So does anyone know what is wrong with this part? I don't see any difference in the assembly code I transform....
Could anyone give me some help? Thank you!
66h is the operand-size override prefix. In 32-bit code, it switches the operand size to 16-bit from the default 32-bit. So what happens here is that the PUSH instruction pushes a 16-bit value on the stack instead of the 32-bit one, and the ESP is decremented by 2 instead of 4. That's why you get unbalanced stack after the call.
You should check your assembler's documentation to see how you can force 32-bit operand size for the PUSH imm instructions. Different assemblers use different conventions for that. For example, in NASM you'd probably use something like push dword 8.
It is a "prefix" opcode byte: See http://wiki.osdev.org/X86-64_Instruction_Encoding#Legacy_Prefixes
0x66 means "operand size override". Your code is apparantly operating in 32-bit mode; PUSH without the prefix will push a 32 bit value. I think what this does is cause the PUSH to fetch a 16 bit value, and push that as a 32 bit value on the stack. (I write a lot of assembly code, and have never had need to do that).

More Null Free Shellcode

I need to find null-free replacements for the following instructions so I can put the following code in shellcode.
The first instruction I need to convert to null-free is:
mov ebx, str ; the string containing /dev/zero
The string str is defined in my .data section.
The second is:
mov eax,0x5a
Thanks!
Assuming what you want to learn is how assembly code is made up, what type of instruction choices ends up in assembly code with specific properties, then (on x86/x64) do the following:
Pick up Intel's instruction set reference manuals (four volumes as of this writing, I think). They contain opcode tables (instruction binary formats), and detailed lists of all allowed opcodes for a specific assembly mnemonic (instruction name).
Familiarize yourself with those and mentally divide them into two groups - those that match your expected properties (like, not containing the 'x' character ... or any other specific one), and those that don't. The 2nd category you need to eliminate from your code if they're present.
Compile your code telling the compiler not to discard compile intermediates:gcc -save-temps -c csource.c
Disassemble the object file:objdump -d csource.o
The disassembly output from objdump will contain the binary instructions (opcodes) as well as the instruction names (mnemonics), i.e. you'll see exactly which opcode format was chosen. You can now check whether any opcodes in there are from the 2nd set as per 1. above.
The creative bit of the work comes in now. When you've found an instruction in the disassembly output that doesn't match the expectations/requirements you have, look up / create a substitute (or, more often, a substitute sequence of several instructions) that gives the same end result but is only made up from instructions that do match what you need.
Go back to the compile intermediates from above, find the csource.s assembly, make changes, reassemble/relink, test.
If you want to make your assembly code standalone (i.e. not using system runtime libraries / making system calls directly), consult documentation on your operating system internals (how to make syscalls), and/or disassemble the runtime libraries that ordinarily do so on your behalf, to learn how it's done.
Since 5. is definitely homework, of the same sort like create a C for() loop equivalent to a given while() loop, don't expect too much help there. The instruction set reference manuals and experiments with the (dis)assembler are what you need here.
Additionally, if you're studying, attend lessons on how compilers work / how to write compilers - they do cover how assembly instruction selection is done by compilers, and I can well imagine it to be an interesting / challenging term project to e.g. write a compiler whose output is guaranteed to contain the character '?' (0x3f) but never '!' (0x21). You get the idea.
You mention the constant load via xor to clear plus inc and shl to get any set of bits you want.
The least fragile way I can think of to load an unknown constant (your unknown str) is to load the constant xor with some value like 0xAAAAAAAA and then xor that back out in a subsequent instruction. For example to load 0x1234:
0: 89 1d 9e b8 aa aa mov %ebx,0xaaaab89e
6: 31 1d aa aa aa aa xor %ebx,0xaaaaaaaa
You could even choose the 0xAAAAAAAA to be some interesting ascii!

Simple "Hello-World", null-free shellcode for Windows needed

I would like to test a buffer-overflow by writing "Hello World" to console (using Windows XP 32-Bit). The shellcode needs to be null-free in order to be passed by "scanf" into the program I want to overflow. I've found plenty of assembly-tutorials for Linux, however none for Windows. Could someone please step me through this using NASM? Thxxx!
Assembly opcodes are the same, so the regular tricks to produce null-free shellcodes still apply, but the way to make system calls is different.
In Linux you make system calls with the "int 0x80" instruction, while on Windows you must use DLL libraries and do normal usermode calls to their exported functions.
For that reason, on Windows your shellcode must either:
Hardcode the Win32 API function addresses (most likely will only work on your machine)
Use a Win32 API resolver shellcode (works on every Windows version)
If you're just learning, for now it's probably easier to just hardcode the addresses you see in the debugger. To make the calls position independent you can load the addresses in registers. For example, a call to a function with 4 arguments:
PUSH 4 ; argument #4 to the function
PUSH 3 ; argument #3 to the function
PUSH 2 ; argument #2 to the function
PUSH 1 ; argument #1 to the function
MOV EAX, 0xDEADBEEF ; put the address of the function to call
CALL EAX
Note that the argument are pushed in reverse order. After the CALL instruction EAX contains the return value, and the stack will be just like it was before (i.e. the function pops its own arguments). The ECX and EDX registers may contain garbage, so don't rely on them keeping their values after the call.
A direct CALL instruction won't work, because those are position dependent.
To avoid zeros in the address itself try any of the null-free tricks for x86 shellcode, there are many out there but my favorite (albeit lengthy) is encoding the values using XOR instructions:
MOV EAX, 0xDEADBEEF ^ 0xFFFFFFFF ; your value xor'ed against an arbitrary mask
XOR EAX, 0xFFFFFFFF ; the arbitrary mask
You can also try NEG EAX or NOT EAX (sign inversion and bit flipping) to see if they work, it's much cheaper (two bytes each).
You can get help on the different API functions you can call here: http://msdn.microsoft.com
The most important ones you'll need are probably the following:
WinExec(): http://msdn.microsoft.com/en-us/library/ms687393(VS.85).aspx
LoadLibrary(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress(): http://msdn.microsoft.com/en-us/library/ms683212%28v=VS.85%29.aspx
The first launches a command, the next two are for loading DLL files and getting the addresses of its functions.
Here's a complete tutorial on writing Windows shellcodes: http://www.codeproject.com/Articles/325776/The-Art-of-Win32-Shellcoding
Assembly language is defined by your processor, and assembly syntax is defined by the assembler (hence, at&t, and intel syntax) The main difference (at least i think it used to be...) is that windows is real-mode (call the actual interrupts to do stuff, and you can use all the memory accessible to your computer, instead of just your program) and linux is protected mode (You only have access to memory in your program's little cubby of memory, and you have to call int 0x80 and make calls to the kernel, instead of making calls to the hardware and bios) Anyway, hello world type stuff would more-or-less be the same between linux and windows, as long as they are compatible processors.
To get the shellcode from your program you've made, just load it into your target system's
debugger (gdb for linux, and debug for windows) and in debug, type d (or was it u? Anyway, it should say if you type h (help)) and between instructions and memory will be the opcodes.
Just copy them all over to your text editor into one string, and maybe make a program that translates them all into their ascii values. Not sure how to do this in gdb tho...
Anyway, to make it into a bof exploit, enter aaaaa... and keep adding a's until it crashes
from a buffer overflow error. But find exactly how many a's it takes to crash it. Then, it should tell you what memory adress that was. Usually it should tell you in the error message. If it says '9797[rest of original return adress]' then you got it. Now u gotta use ur debugger to find out where this was. disassemble the program with your debugger and look for where scanf was called. Set a breakpoint there, run and examine the stack. Look for all those 97's (which i forgot to mention is the ascii number for 'a'.) and see where they end. Then remove breakpoint and type the amount of a's you found out it took (exactly the amount. If the error message was "buffer overflow at '97[rest of original return adress]" then remove that last a, put the adress you found examining the stack, and insert your shellcode. If all goes well, you should see your shellcode execute.
Happy hacking...

Resources