MOV is probably the first instruction everyone learns while learning ASM.
Just now I encountered a book Assembly Language Programming in GNU/Linux for IA32 Architectures By Rajat Moona which says: (broken link removed)
But I learnt that it is MOV dest, src. Its like "Load dest with src". Even Wiki says the same.
I'm not saying that the author is wrong. I know that he is right. But what am I missing here?
btw.. he is using GCC's as to assemble these instructions. But that shouldn't change the instruction syntax right?
mov dest, src is called Intel syntax. (e.g. mov eax, 123)
mov src, dest is called AT&T syntax. (e.g. mov $123, %eax)
UNIX assemblers including the GNU assembler uses AT&T syntax, all other x86 assemblers I know of uses Intel syntax. You can read up on the differences on wikipedia.
Yes, as/gas use AT&T syntax that uses the order src,dest. MASM, TASM, NASM, etc. all use the order 'dest, src". As it happens, AT&T syntax doesn't fit very well with Intel processors, and (at least IMO) is a nearly unreadable mess. E.g. movzx comes out particularly bad.
There are two distinct types of assembly language syntax - Intel and AT&T syntax.
You can find a comparison of both on Wikipedia's assembly language page.
Chances are your book uses the AT&T syntax, where the source operand comes before the destination.
As already mentioned in the answer by Jerry Coffin, the Intel syntax fits better with the encoding of instructions for the x86 architecture. As a comment in my debugger's disassembler states, "the operands appear in the instruction in the same order as they appear in the disassembly output". For example, consider this instruction:
-a
1772:0100 test word [AA55], 1234
1772:0106
-u 100 l 1
1772:0100 F70655AA3412 test word [AA55], 1234
-
As you can read in the opcode hexdump, the instruction opcode 0F7h is first, then the ModR/M byte 06h, then the little-endian offset word 0AA55h, and then finally the immediate word 1234h. The Intel syntax matches that order in the assembly source. In the AT&T syntax this would look like testw $0x1234, (0xAA55) which swaps the order compared to the encoding.
Another example that obeys the Intel syntax order is comparison conditions. For example, consider this sequence:
cmp ax, 26
jae .label
This will jump to .label if ax is above-or-equal-to 26 (in unsigned comparison). This mnemonic is only true of the cmp dest, src operand order, which sets flags as for dest -= src.
Related
Is it possible to write inline assembly (Intel syntax) with GCC or Clang, without needing to understand the clobber list "stuff"?
I'm going to guess "no" because the clobber list "stuff" ensures you don't over-write the register the compiler wrote to (immediately before your inline assembly begins)?
GNU C Basic inline asm statements (no operand/clobber lists) are not recommended for basically anything except maybe the body of an __attribute__((naked)) function. Why can't local variable be used in GNU C basic inline asm statements? (globals can't safely be used either.)
https://gcc.gnu.org/wiki/DontUseInlineAsm says to see ConvertBasicAsmToExtended for reasons not to use Basic asm statements. You can't really do anything safely in Basic asm; even asm("cli"); can get reordered with any memory accesses that aren't volatile.
If you're going to use inline asm at all (instead of writing a stand-alone function in asm, or C with intrinsics), you need to describe your string of asm instruction in exact detail to the compiler, in terms of a black box with input and/or output operands, and/or clobbers. See https://stackoverflow.com/tags/inline-assembly/info for links to guides, including some SO answers about using input / output constraints.
Think hard before deciding it's really worth using GNU C inline asm for anything. If you can get the compiler to emit the same instructions another way, that's almost always better. Intrinsics or pure C allow constant-propagation optimization; inline asm doesn't (unless you do stuff like if(_builtin_constant_p(x)) { pure C version } else { inline asm version }).
Intel syntax: in GCC, compile with -masm=intel so your asm template will be part of an Intel-syntax .s, and the compiler will substitute in operands in Intel syntax. (Like dword ptr [rsp] instead of (%rsp) for "m"(my_int)).
In clang I'm not sure there's any convenient way to use Intel-syntax in normal asm statements.
There is one other option though, if you don't care about efficient code (but then why are you using asm?): clang supports -fasm-blocks to allow syntax like MSVC's inefficient style of inline asm. And yes, this uses Intel syntax.
Is there any way to complie a microsoft style inline-assembly code on a linux platform? shows how inefficient the resulting code is: full of compiler-generated instructions to store input variables to memory for the asm{} block to read them. Because MSVC-style asm blocks can't do inputs or outputs in registers. (Clang doesn't support the leave-a-value-in-EAX method for getting a single value out so the output has to be stored/reloaded as well.)
You don't get to specify clobbers for this, so I assume an asm block implies a "memory" clobber, along with clobbers on all registers you write. (Or maybe even just mention.)
I would not recommend this; it's basically not possible to wrap a single instruction or handful of instructions efficiently this way. Only if you're writing a whole loop can you amortize the overhead of getting inputs into an asm{} block.
Looked a lot online for an answer to this question and found nothing.
My Question Is “How is it possible to access specific address in Assembler “
I’m asking for the actual syntax in Turbo Assembler seen things like
Mov ax , [value]
And
Move es:[bx] , value
And I’m very confused .
What the [value] And :[value] Syntax even mean in Turbo Assembler and how would I access a specific address like B4h for example ?
The “ [ Var ] “ Syntax in Turbo Assembler is simply a way to refer to the value the Var is pointing at in the case Var Holds 3Ah for example this Syntax will refer to the value of the memory address 3Ah .
On Matt Godbolt's Compiler Explorer website, you can compile code using various pre-installed compilers. When using PowerPC gcc 4.8 the registers cannot be distinguished from immediates (for example addi 11,31,16).
However, when the -mregnames option is used, all registers are marked with %r followed by the register index. How do I omit just the % sign to get r1 instead of %r1?
For example, void nop () {} with gcc4.8 PowerPC -O0 -mregnames:
nop():
stwu %r1,-16(%r1)
stw %r31,12(%r1)
mr %r31,%r1
addi %r11,%r31,16
lwz %r31,-4(%r11)
mr %r1,%r11
blr
When targeting PowerPC, you basically have two options for the syntax of assembly listings:
You can either use the IBM syntax (common on IBM assemblers), where the registers do not use any type of special prefix: they are just referred to with numbers. Yes, this makes it difficult to distinguish them from immediates.
Or, you can use Gnu/AT&T syntax, which always prefixes registers with % symbols (and an r, in this case). This not only makes it easier to distinguish between registers and immediates, but it also makes it possible to distinguish between integer registers (%r?) and floating-point registers (%f?).
There is no intermediate option, where you get the r (or f) prefix, but no leading %. If you need this, you can do like Jester suggested and post-process the output, using the regular expression %r[0-9]+ for matching.
An update:
powerpc-linux-gnu-gcc version 5.4.0 (the default package with Ubuntu 16.04)
When using -mregnames, you can use "%r0" or "r0" or "0" format for a register name in assembly source code files.
For disassembling, powerpc-linux-gnu-objdump defaults to the "r0" format (which I agree is easier to read).
In the example from that webpage, it looks like it is showing the listing output from the compiler, instead of using objdump. I do not know of a way to control the listing output format.
In some kernel-mode assembly source I have a line that looks like this:
; excerpt #1
.set __framesize, ROUND_TO_STACK(localvarsize)
(localvarsize is a parameter to a C-preprocessor macro, if you’re wondering.) I assume that __framesize is a compile-time variable that is usable in .if statements, and is then discarded. However, I find references to a symbol named __framesize in the symbol table and disassembly of my kernel. The symbol is defined (as output by nm -m) as such:
; excerpt #2
0000000000000000 (absolute) non-external __framesize
The usage of __framesize in compiler-generated assembly is as such:
; excerpt #3
movq %gs:__framesize, %rax
movq 0x140(%rax), %r15
Given what I understand of my compiler and my kernel, excerpt #3 should be emitted as movq %gs:0x140, %r15, and that code should work. (The code that is actually being emitted from the C as excerpt #3 is causing a triple fault on the second line.)
I have two questions:
Should this __framesize symbol be emitted into my binary by the assembler when used in this fashion? If possible, how can I suppress it?
Would this usage of __framesize cause a problem like what is discussed above?
I am using GAS assembler syntax and the Xcode 7.1.1 assembler, and a Mach-O output format, if it is useful.
The GNU as manual says that .set modifies the value(i.e. address) and/or type of an existing symbol. It's synonymous with .equ, so it can be used to set/modify assembler macro variable, or to mess around with symbols which are also labels.
If __framesize is showing up in the object file, then it's probably declared somewhere else.
Try looking at the disassembly output, to see what really happened.
.text
.globl main
main:
xorl %eax,%eax ;return 0
ret
Like such a tiny programe:
1.is it true that indentation is just personal preference?
2.the complete of assembly is of various .tags and func:, is there any other part missing that can no be included in these two categories?
Yes, I do think indentation is just to make it easy for you and other people to read your program.
"func : " are labels which act as a way of helping you reference difference parts of your program. It helps a lot when doing loops and such.
".tags" or ".globl" are directives, these are used by the assembler when assembling your code to machine instructions.