Understanding assembly syntax issues with masm32 and Visual Studio 2013 - visual-studio

After much trial and error, I still have some trouble understanding why the assembly syntax used in my textbook caused so many issues when using Windows 8.
.MODEL SMALL
.586
.STACK 100h
.DATA
Message DB 'Hello, my name blank', 13, 10, '$'
.CODE
Hello PROC
mov ax, #data
mov ds, ax
mov dx, OFFSET Message
mov ah, 9h
int 21h
mov al, 0
mov ah, 4ch
int 21h
Hello ENDP
END Hello
At first I tried running the code with masm32, using the command prompt and correct linker. Then I tried using Visual Studio 2013 ultimate; even using masm32 within Visual Studio, I got the similar issues each time. The assembler had issues with the #data line, and no leading underscore for Hello. Fixing the latter only resulted in a issue with unmatched blocks.
I did find a workaround by using a MS-DOS virtual environment, and the code worked fine after removing the .586 instruction.
I suspect the main issues were trying to run this code in a x64 OS environment, but I'm still learning the language so I'd like to hear other opinions on why I couldn't get it to run initially.
The book we're using is Jones, Assembly Language for the IBM PC Family 3rd edition.

You are using a 32 bit linker. You need to use the 16 bit linker called link16 in masm32/bin to link the code.
e.g.
ml /c /Fl filename.asm
-then-
link16 filename.obj

The difference between the 16 bit and the 32 bit addressmode is the default size of the operands/registers and addresses inside of our codesegment and how the assembler use the operandsize and the addresssize prefixes.
Within the 16 bit addressmode the default size is 16 bit and if we want to use 32 bit register/operands and/or 32 bit addresses within the 16 bit addressmode, then our assembler have to place an operandsize and/or an adddresssize prefix to all of those 32 bit instructions. But if we use only 16 bit instructions within the 16 bit adressmode, then we do not need those operandsize and/or adddresssize prefixes.
Whithin the 32 bit addressmode the default size is 32 bit and if we want to use 32 bit register/operands and/or 32 bit addresses within the 32 bit addressmode, then our assembler do not have to place an operandsize and/or an adddresssize prefix to all of our 32 bit instructions. (This is good for to minimize the number of bytes of our code, if we use mostly 32 bit instructions.) But if we use 16 bit instructions within the 32 bit adressmode, then our assembler have to place the operandsize and/or adddresssize prefixes.
Additional there are two assembler directives(use16 and use32) for to determine for wich adressmode the code is written, if we want to have different parts of code for both addressmodes.
..
Beside both addressmodes there are also a large difference between the realmode and the protected mode.
For the realmode in combination with the 16 bit Addressmode(default on startup) we become a default segmentsize of 64 KB segments and all addresses will be calculate together with the segment part of a segmentregister and the offset part for to build an address. For the protected mode we have to use global and/or local descriptor tables for to specify the size of a segment that we want to use.
...
At last the architecture of the underlying operating system give us the demands for the target for that we have to assemble our code and which software interrupts are aviable for to use.
Dirk

Related

Why does basic assembly code fail to be build?

I have a big problem, I just started with assembly, and I think I at least have understood the basics of MOV and the system calls, but I cannot really understand why do not my codes want to be build and just run, they are the basic 'Hello World' commands
My code looks like this
global_start
_start:
mov eax,4
mov ebx,1
mov ecx,msg
mov edx,len
int 0x80
mov eax,1
int 0x80
segment .data
msg db 'Ide Gas na max', 0xa
len equ $ - msg
I tried to set up all different environments, ,MESM32 SDK, Visual Studio, Visual Studio Code with MASM/TASM extension which opens DOSBOX, sadly the it crashes instantly and debug option gives error for every line, (I did learn that TASM is more for 16bit applications, so I changed to MSAM only in preferences)
main.ASM(1): error A2008: ression : segment
main.ASM(2): error A2008: ression : global_start
main.ASM(4): error A2034: values for structure
main.ASM(16): error A2088: ring
I did not include full error, because only those 4 variations repeat for each line, I can post if you want to see, but I do not want to make this too long. So then I thought maybe it is just wrong set of instruction I used, so I just found random hello world codes online and just copy and pasted and absolutely none worked, always there was an error 🤔 And then I changed the IDE, to MASM32 editor, which only always gives "Assembly Error" and Visual Studio just says every time diff message which I have no idea now what is, I deleted it, I just do not like it really, and yes I did set up MASM for the VS project also, I set it up following more tutorials, and also I have some book I followed.
So please, can someone explain me what do to, what to try, I am clueless, in the code I also tried Section instead of Segment or just changing orders or dots, still nothing

What does write_cr0(read_cr0() | 0x10000) do?

I searched the web a lot but didn't find a short explanation about what write_cr0(read_cr0() | 0x10000) really do. It is related to the Linux kernel and I curios about developing LKM's. I want to know what this really do and what are the security issues with this.
It used to remove the write protection on the syscall table.
But how it is really works? and what does each thing in this line?
CR0 is one of the control registers available on x86 CPUs, which contains flags controlling CPU features related to memory protection, multitasking, paging, etc. You can find a full description in Volume 3, Section 2.5 of Intel's Software Developer's Manual.
These registers are accessed by special instructions that the compiler doesn't normally generate, so read_cr0() is a function which executes the instruction to read this register (via inline assembly) and returns the result in a general-purpose register. Likewise, write_cr0() writes to this register.
The function calls are likely to be inlined, so that the generated code would be something like
mov eax, cr0
or eax, 0x10000
mov cr0, eax
The OR with 0x10000 sets bit 16, the Write Protect bit. On early 32-bit x86 CPUs, code running at supervisor level (like the kernel) was always allowed to write all of virtual memory, regardless of whether the page was marked read-only. This bit makes that optional, so that when it is set, such accesses will cause page faults. This line of code probably follows an earlier line which temporarily cleared the bit.

What does the 66 in "66:PUSH 08" stand for?

Test platform is windows 32bit.
I use IDA pro to disassemble a PE file, do some very tedious transform work, and re-assembly it into a new PE file.
But there is some difference in the re-assembled PE file and the original one if I use OllyDbg
to debug the new PE file (although there is no difference of this part in the assembly file I transformed)
Here is part of the original one:
See the
PUSH 8
PUSH 0
is correct.
Here is part of my new PE file:
See now the
PUSH 8
PUSH 0
is changed to
66:6A 08
66:6A 00
and it lead to the failure of the new PE's execution.
Basically, from what I have seen, it lead to the un-align of stack.
So does anyone know what is wrong with this part? I don't see any difference in the assembly code I transform....
Could anyone give me some help? Thank you!
66h is the operand-size override prefix. In 32-bit code, it switches the operand size to 16-bit from the default 32-bit. So what happens here is that the PUSH instruction pushes a 16-bit value on the stack instead of the 32-bit one, and the ESP is decremented by 2 instead of 4. That's why you get unbalanced stack after the call.
You should check your assembler's documentation to see how you can force 32-bit operand size for the PUSH imm instructions. Different assemblers use different conventions for that. For example, in NASM you'd probably use something like push dword 8.
It is a "prefix" opcode byte: See http://wiki.osdev.org/X86-64_Instruction_Encoding#Legacy_Prefixes
0x66 means "operand size override". Your code is apparantly operating in 32-bit mode; PUSH without the prefix will push a 32 bit value. I think what this does is cause the PUSH to fetch a 16 bit value, and push that as a 32 bit value on the stack. (I write a lot of assembly code, and have never had need to do that).

Using Assembly On Mac

I'm using a MacBook Pro with an Intel Core 2 Duo processor at 2.53 GHz, but I was told Mac users must follow AT&T syntax (which adds to my confusion since I am running Intel) and x86 (not sure what this means exactly).
So I need to get into assembly but am finding it very hard to even begin. Searches online show assembly code that varies greatly in syntax and I can't find any resources that explain basic assembly how-tos. I keep reading about registers and a stack but don't understand how to look at this. Can anyone explain/point me in the right direction? Take, for example, this code which is the only code I found to work:
.data
_mystring: .ascii "Hello World\n\0" #C expects strings to terminate with a 0.
.text
.globl _foo
_foo:
push %ebp
mov %esp,%ebp
pushl $_mystring
call _myprint
add $4,%esp
pop %ebp
ret
Very simple but what is it saying? I am having a confusing time understanding how this code does what it does. I know Java, PHP, and C, among other languages, but this, the steps and syntax of it, isn't clear to me. Here's the main file to go with it:
#include <stdio.h>
void foo();
void myprint(char *s)
{printf("%s", s);}
main()
{foo();}
Also, there's this which just multiplies numbers:
.data
.globl _cntr
_cntr: .long 0
.globl _prod
_prod: .long 0
.globl _x
_x: .long 0
.globl _y
_y: .long 0
.globl _mask
_mask: .long 1
.globl _multiply
multiply:
push %ebp
mov %ebp,%esp
mov $0,%eax
mov _x,%ebx
mov _y,%edx
LOOP:
cmp $0,%ebx
je DONE
mov %ebx,%ecx
and $1,%ecx
cmp $1,%ecx
jne LOOPC
add %edx,%eax
LOOPC:
shr $1,%ebx
shl $1,%edx
jmp LOOP
DONE:
pop %ebp
ret
and the main.c to go with it:
#include <stdio.h>
extern int multiply();
extern int x, y;
int main()
{
x = 34;
y = 47;
printf("%d * %d = %d\n", x, y, multiply());
}
And finally three small questions:
What is the difference between .s and .h file names (I have both a main.c and main.h, which one is for what)?
And why does assembly need a main.c to go with it/how does it call it?
Can anyone recommend a good assembly IDE like Eclipse is for Java or PHP
Thanks to whomever answers (this is actually my first post on this site), I've been trying to figure this out for a few days and every resource I have read just doesn't explain the assembly logic to me. It says what .data or .text does but only someone who knows how to "think assembly" would understand what they mean?
Also, if anyone is around New York City and feels very comfortable with Assembly and C I would love some private lessons. I feel there is a lot of potential with this language and would love to learn it.
Assembly language is a category of programming languages which are closely tied to CPU architectures. Traditionally, there is a one-to-one correspondence between each assembly instruction and the resulting CPU instruction.
There are also assembly pseudo-instructions which do not correspond to CPU instruction, but instead affect the assembler or the generated code. .data and .text are pseudo-instructions.
Historically, each CPU manufacturer implemented an assembly language as defined by their assembler, a source code translation utility. There have been thousands of specific assembly languages defined.
In modern times, it has been recognized that each assembly language shares a lot of common features, particularly with respect to pseudo-instructions. The GNU compiler collection (GCC) supports essentially every CPU architecture, so it has evolved generic assembly features.
x86 refers to the Intel 8086 family (8088, 8086, 8087, 80186, 80286, 80386, 80486, 80586 aka Pentium, 80686 aka Pentium II, etc.)
AT&T syntax is a notation style used by many assembly language architectures. A major feature is that instruction operands are written in the order from, to as was common historically. Intel syntax uses to, from operands. There are other differences as well.
As for your many questions, here are some resources which will 1) overwhelm you, and 2) eventually provide all your answers:
assembly language overview
tutorials and resources
x86 instruction summary
comprehensive x86 architecture reference
Ordinarily, an introductory assembly language programming class is a full semester with plenty of hands-on work. It assumes you are familiar with the basics of computer architecture. It is reasonable to expect that understanding the above material will take 300-500 hours. Good luck!

Using assembly JMP function on x86_64

I'm really new to programming (in general - it's pathetic) and some Python-related assembly has cropped up in this app that I'm hacking to run on 64-bit.
Essentially, the code goes like this:
#define FUNCTION(name) \
.globl _##name; \
_##name: \
jmp *(_p_##name)
.text
FUNCTION(name)
The FUNCTION(name) syntax is used about 50 times to define headers for an external Python library as far as I can tell (I'm not going to pretend that I fully understand it, I'm just bugfixing).
Since I'm compiling for x86_64, the following error is spit out by GCC for each FUNCTION(name) instance:
32-bit absolute addressing is not supported for x86-64
cannot do signed 4 byte relocation
How would I go about "fixing" this to run on x86_64?
Grab a copy of the Intel Architecture Software Developer's Manuals. As you're seeing, some forms of the jmp instruction are invalid in 64-bit mode. In particular, the two "Jump far, absolute, address given in operand" forms won't work. You will need to change to a relative addressing or absolute indirect addressing form of the instruction. Volume 2A, page 3-549 in my copy, of the manual has a huge pile of information about jmp.

Resources