I can't assemble movd (MMX) instruction in my visual c express ediion 2008 - windows

when I try to compile movd instruction it is showing error as
error A2085:instruction or register not accepted in current CPU mode
My code is as follows:
.386
.model flat, c
.code
add_func_asm PROC
movd eax, ebx
ret
add_func_asm endp
END
this is .asm file and I called this function from a C file
I fixed it by using below code
.586
.mmx
.model flat, c
.code
add_func_asm PROC
movd mm1, ebx
ret
add_func_asm endp
END

.386
That cannot work, the 386 processor didn't have this instruction. You have to target .586 (Pentium and up) and explicitly state that you want to use the MMX instruction set. Fix:
.586
.mmx
That will get the assembler to accept the MOVD instruction. Next thing you'll have to do is fix the operands. Moving from ebx to eax is not valid, and pointless, you'll have to specify an MMx register.

MOVD and MOVQ are MMX instructions, so you need to use a .MMX (or .XMM) directive to enable the instruction set.

Try mov eax, ebx instead for moving 32 bits.

Related

Why is my branch instruction not jumping to the given address?

I'm currently learning reverse engineering and therefore I need to learn assembly. The code is running well, but I got an error on JE and JNE instructions. To be more specific: I'm doing a detour hook as a practice and the program crashes, because it jumps to the false address.
I wanted to write this, but the compiler gave me an error (LNK2016):
JE 0x123 (0x123 example address)
I fixed that problem by writing:
JE short 0x123
Full function code:
my_hook:
push rbp
mov rbp, rsp
mov dword [rcx + 0x18], 99
test rcx, rcx
je 0x7FF7847902EE
lock dec dword [rcx + 0x08]
jne 0x7FF7847902EE
mov rcx, [rsp + 0x30]
leave
ret
But the problem now is, that it is jumping to the beginning of the function instead of 0x123 in our case.
The conditional jump instructions on x86 all take a relative displacement, not an absolute address. They work to jump to another label in your code, since the assembler can compute the displacement between the jump instruction and the destination, but you can't use them jump to an absolute address unless you know, at assembly time, the address where your code will be loaded.
Near absolute jumps on x86 are indirect: you need to have the address in a register or memory. And such jumps are only available in an unconditional form, so you'll have to use the conditional jump to get to them.
You could do
my_hook:
push rbp
mov rbp, rsp
mov dword [rcx + 0x18], 99
test rcx, rcx
je jump_elsewhere
lock dec dword [rcx + 0x08]
jne jump_elsewhere
mov rcx, [rsp + 0x30]
leave
ret
jump_elsewhere:
mov rax, 0x7FF7847902EE
jmp rax
If you can't spare a register, you could instead do
jump_elsewhere:
jmp [rel elsewhere_addr]
elsewhere_addr:
dq 0x7FF7847902EE
but this is more bytes of code.
If you do know the address where your code will be loaded, and it's within 2 GB of the destination address, you can use the ORG directive to tell the assembler about it. So if my_hook will be at address 0x7ff7DEADBEEF, you can do
org 0x7ff7DEADBEEF
my_hook:
; ...
je 0x7FF7847902EE
and the assembler will compute the correct 32-bit displacement.
By the way, the test rcx, rcx doesn't really make sense: if rcx were zero then the previous instruction mov dword [rcx + 0x18], 99 would have faulted, since on any decent operating system, the zero page will be unmapped. Maybe you wanted this test and conditional jump to go before the mov. Unless this is some sort of strange bare-metal code that you're patching?

Translating Go assembler to NASM

I came across the following Go code:
type Element [12]uint64
//go:noescape
func CSwap(x, y *Element, choice uint8)
//go:noescape
func Add(z, x, y *Element)
where the CSwap and Add functions are basically coming from an assembly, and look like the following:
TEXT ·CSwap(SB), NOSPLIT, $0-17
MOVQ x+0(FP), REG_P1
MOVQ y+8(FP), REG_P2
MOVB choice+16(FP), AL // AL = 0 or 1
MOVBLZX AL, AX // AX = 0 or 1
NEGQ AX // RAX = 0x00..00 or 0xff..ff
MOVQ (0*8)(REG_P1), BX
MOVQ (0*8)(REG_P2), CX
// Rest removed for brevity
TEXT ·Add(SB), NOSPLIT, $0-24
MOVQ z+0(FP), REG_P3
MOVQ x+8(FP), REG_P1
MOVQ y+16(FP), REG_P2
MOVQ (REG_P1), R8
MOVQ (8)(REG_P1), R9
MOVQ (16)(REG_P1), R10
MOVQ (24)(REG_P1), R11
// Rest removed for brevity
What I try to do is that translate the assembly to a syntax that is more familiar to me (I think mine is more like NASM), while the above syntax is Go assembler. Regarding the Add method I didn't have much problem, and translated it correctly (according to test results). It looks like this in my case:
.text
.global add_asm
add_asm:
push r12
push r13
push r14
push r15
mov r8, [reg_p1]
mov r9, [reg_p1+8]
mov r10, [reg_p1+16]
mov r11, [reg_p1+24]
// Rest removed for brevity
But, I have a problem when translating the CSwap function, I have something like this:
.text
.global cswap_asm
cswap_asm:
push r12
push r13
push r14
mov al, 16
mov rax, al
neg rax
mov rbx, [reg_p1+(0*8)]
mov rcx, [reg_p2+(0*8)]
But this doesn't seem to be quite correct, as I get error when compiling it. Any ideas how to translate the above CSwap assembly part to something like NASM?
EDIT (SOLUTION):
Okay, after the two answers below, and some testing and digging, I found out that the code uses the following three registers for parameter passing:
#define reg_p1 rdi
#define reg_p2 rsi
#define reg_p3 rdx
Accordingly, rdx has the value of the choice parameter. So, all that I had to do was use this:
movzx rax, dl // Get the lower 8 bits of rdx (reg_p3)
neg rax
Using byte [rdx] or byte [reg_3] was giving an error, but using dl seems to work fine for me.
Basic docs about Go's asm: https://golang.org/doc/asm. It's not totally equivalent to NASM or AT&T syntax: FP is a pseudo-register name for whichever register it decides to use as the frame pointer. (Typically RSP or RBP). Go asm also seems to omit function prologue (and probably epilogue) instructions. As #RossRidge comments, it's a bit more like a internal representation like LLVM IR than truly asm.
Go also has its own object-file format, so I'm not sure you can make Go-compatible object files with NASM.
If you want to call this function from something other than Go, you'll also need to port the code to a different calling convention. Go appears to be using a stack-args calling convention even for x86-64, unlike the normal x86-64 System V ABI or the x86-64 Windows calling convention. (Or maybe those mov function args into REG_P1 and so on instructions disappear when Go builds this source for a register-arg calling convention?)
(This is why you could you had to use movzx eax, dl instead of loading from the stack at all.)
BTW, rewriting this code in C instead of NASM would probably make even more sense if you want to use it with C. Small functions are best inlined and optimized away by the compiler.
It would be a good idea to check your translation, or get a starting point, by assembling with the Go assembler and using a disassembler.
objdump -drwC -Mintel or Agner Fog's objconv disassembler would be good, but they don't understand Go's object-file format. If Go has a tool to extract the actual machine code or get it in an ELF object file, do that.
If not, you could use ndisasm -b 64 (which treats input files as flat binaries, disassembling all the bytes as if they were instructions). You can specify an offset/length if you can find out where the function starts. x86 instructions are variable length, and disassembly will likely be "out of sync" at the start of the function. You might want to add a bunch of single-byte NOP instructions (kind of a NOP sled) for the disassembler, so if it decodes some 0x90 bytes as part of an immediate or disp32 for a long instruction that was really not part of the function, it will be in sync. (But the function prologue will still be messed up).
You might add some "signpost" instructions to your Go asm functions to make it easy to find the right place in the mess of crazy asm from disassembling metadata as instructions. e.g. put a pmuludq xmm0, xmm0 in there somewhere, or some other instruction with a unique mnemonic that you can search for which the Go code doesn't include. Or an instruction with an immediate that will stand out, like addq $0x1234567, SP. (An instruction that will crash so you don't forget to take it out again is good here.)
Or you could use gdb's built-in disassembler: add an instruction that will segfault (like a load from a bogus absolute address (movl 0, AX null-pointer deref), or a register holding a non-pointer value e.g. movl (AX), AX). Then you'll have an instruction-pointer value for the instructions in memory, and can disassemble from some point behind that. (Probably the function start will be 16-byte aligned.)
Specific instructions.
MOVBLZX AL, AX reads AL, so that's definitely an 8-bit operand. The size for AX is given by the L part of the mnemonic, meaning long for 32 bit, like in GAS AT&T syntax. (The gas mnemonic for that form of movzx is movzbl %al, %eax). See What does cltq do in assembly? for a table of cdq / cdqe and the AT&T equivalent, and the AT&T / Intel mnemonic for the equivalent MOVSX instruction.
The NASM instruction you want is movzx eax, al. Using rax as the destination would be a waste of a REX prefix. Using ax as the destination would be a mistake: it wouldn't zero-extend into the full register, and would leave whatever high garbage. Go asm syntax for x86 is very confusing when you're not used to it, because AX can mean AX, EAX, or RAX depending on the operand size.
Obviously mov rax, al isn't a possibility: Like most instructions, mov requires both its operands to be the same size. movzx is one of the rare exceptions.
MOVB choice+16(FP), AL is a byte load into AL, not an immediate move. choice+16 is a an offset from FP. This syntax is basically the same as AT&T addressing modes, with FP as a register and choice as an assemble-time constant.
FP is a pseudo-register name. It's pretty clear that it should simply be loading the low byte of the 3rd arg-passing slot, because choice is the name of a function arg. (In Go asm, choice is just syntactic sugar, or a constant defined as zero.)
Before a call instruction, rsp points at the first stack arg, so that + 16 is the 3rd arg. It appears that FP is that base address (and might actually be rsp+8 or something). After a call (which pushes an 8 byte return address), the 3rd stack arg is at rsp + 24. After more pushes, the offset will be even larger, so adjust as necessary to reach the right location.
If you're porting this function to be called with a standard calling convention, the 3 integer args will be passed in registers, with no stack args. Which 3 registers depends on whether you're building for Windows vs. non-Windows. (See Agner Fog's calling conventions doc: http://agner.org/optimize/)
BTW, a byte load into AL and then movzx eax, al is just dumb. Much more efficient on all modern CPUs to do it in one step with
movzx eax, byte [rsp + 24] ; or rbp+32 if you made a stack frame.
I hope the source in the question is from un-optimized Go compiler output? Or the assembler itself makes such optimizations?
I think you can translate these as just
mov rbx, [reg_p1]
mov rcx, [reg_p2]
Unless I'm missing some subtlety, the offsets which are zero can just be ignored. The *8 isn't a size hint since that's already in the instruction.
The rest of your code looks wrong though. The MOVB choice+16(FP), AL in the original is supposed to be fetching the choice argument into AL, but you're setting AL to a constant 16, and the code for loading the other arguments seems to be completely missing, as is the code for all of the arguments in the other function.

Assembly - visualise registers, stack etc

Hej, I was writing my programs on emu8086, and I used it for debugging. However now I need to use floating points, FPU and emu8086 doesn't support them. I need an easy way to see what is in certain place of memory. For example visualise: "dzielna", "dzielnik", the content of register such as ax, bx,.. ; and what is in st(0), st(1) etc. Shall you recommend me a good program to visualise it?
dane1 segment
dzielna dd 1.3
dzielnik dd 6.7
dane1 ends
assume cs:code1, ss:stos1, ds:dane1
stos1 segment stack
dw 400 dup(?)
top1 dw ?
stos1 ends
code1 segment
.386
.387
start1: mov ax,seg top1
mov ss,ax
mov sp,offset top1
mov ax,dane1
mov ds,ax
finit
fldpi
fld dword ptr [dzielna]
fld dword ptr [dzielnik]
fsub st(0),st(1)
fstp dword ptr [dzielna]
finish:
mov ah,4ch
int 21h
code1 ends
end start1
The program to visualise it is called a debugger. Since you are running in DosBox you need one that can be run there.
If you can get your hands on Turbo Assembler - it has a debugger TD.exe
OpenWatcom also has a debugger that can be run in DosBox
and both allows you to show the FPU registers.

Register ESI causes RunTime-Check Failure #0 error

I've spend lot of time trying to solve this problem and I don't understand, why it doesn't work. Problem's description is in comments below:
.386
.MODEL FLAT, STDCALL
OPTION CASEMAP:NONE
.NOLIST
.NOCREF
INCLUDE \masm32\include\windows.inc
.LIST
.CODE
DllEntry PROC hInstDLL:HINSTANCE, reason:DWORD, reserved1:DWORD
mov eax, TRUE
ret
DllEntry ENDP
caesarAsm proc string: DWORD, key: DWORD, stringLength : DWORD
mov esi, 1 ; I cannot use this register, mov esi, (anything) causes Crash:
; Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention
mov eax, string
ret
caesarAsm endp
END DllEntry
I searched "whole" Internet, I found that the problem is connected with stack, but no operation on stacks helped me to solve it.
I'm using Microsoft Visual Studio 2012
I assume the error does not occur in this function, rather, it is triggered elsewhere. The esi register is a callee-saved register. You must make sure its value is the same at function exit as it was at entry. You can use it in your function, but you must save and restore its value. Such as:
push esi
mov esi, 1
mov eax, string
pop esi
ret
This is all well documented. You may only use eax, ecx and edx without saving.
Side note: you are using high-level features of your assembler, you might want to check the actual generated code or refrain from using them until you are confident in what the result is going to be. Incidentally masm has a USES keyword which would do the save/restore for you.

How to make gcc compiler reserve registers when building intel-style inline assembly code?

I am building some intel-style inline assembly code using gcc compiler on Xcode 4.
Below lists part of the inline assembly code:
_asm
{
mov eax, esp
sub esp, 116
and esp, ~15
mov [esp+112], eax
}
Under ship mode, GCC compiles the above 4 lines asm code to:
mov %esp,%eax
sub $0x74,%esp
and $0xfffffff0,%esp
mov %eax,0x70(%esp)
which are exactly what I want.
However, under debug mode GCC will compiler that code to
mov %esp,%eax
mov %eax,%esp
mov %esp,%eax
mov %eax,-0x28(%ebp)
mov %esp,%eax
mov %eax,%esp
sub $0x74,%esp
mov %esp,%eax
mov %eax,-0x24(%ebp)
mov %esp,%eax
mov %eax,%esp
**and $0xfffffff0,%esp**
**mov %esp,%eax** **//changing the value of “eax”**
mov %eax,-0x24(%ebp)
mov %esp,%ecx
mov %ecx,%esp
**mov %eax,0x70(%esp)** **//store a “dirty” value to address 0x70(%esp), which is not we want**
One way to solve the above problem is to rewrite the inline asm code using AT&T style instructions and add the register to the clobbered list. But this way would be a very time-consuming work since the code to rewrite is so…o long.
Are there any other efficient ways to solve the problem? To make the gcc compiler know that register “eax” should be reserved?
There are 2 ways:
The best way to solve it is using gcc assembly template
capabilities. Then you can tell the compiler WHAT you're doing an
the register allocator will not use your registers for anything
else.
A quickhack would be to just use "asm volatile" instead of "asm" that way gcc will not reschedule
any instructions inside that block. You'll still have to tell GCC
that you're using the register so it's not going to store anything
in there. You should also list "memory" in the clobber list, so gcc
knows that it can't trust values it might have loaded before your
code-block.
asm volatile(
"Code goes here"
: : : "eax", "esp", "memory"
);
Btw: Your code is doing some "bad things" like moving esp around, which might cause trouble down the line, unless you know exactly what you're doing.
An empty asm block after the intel-style block solves the problem, like this:
__asm volatile {
mov eax, esp
sub esp, 116
and esp, ~15
mov [esp+112], eax
};
__asm__ __volatile__ ("":::"eax", "memory");
However, if you don't restore %esp, it's going to wreak havoc.

Resources