Offset before square bracket in x86 intel asm on GCC - gcc

From all the docs I've found, there is no mention of syntax like offset[var+offset2] in Intel x86 syntax but GCC with the following flags
gcc -S hello.c -o - -masm=intel
for this program
#include<stdio.h>
int main(){
char c = 'h';
putchar(c);
return 0;
}
produces
.file "hello.c"
.intel_syntax noprefix
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
sub rsp, 16
mov BYTE PTR -1[rbp], 104
movsx eax, BYTE PTR -1[rbp]
mov edi, eax
call putchar#PLT
mov eax, 0
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Arch Linux 9.3.0-1) 9.3.0"
.section .note.GNU-stack,"",#progbits
I'd like to highlight the line mov BYTE PTR -1[rbp], 104 where offset -1 appears outside the square brackets. TBH, I'm just guessing that it is an offset, can anyone direct me to a proper documentation highlighting this ?
Here is a similar question: Squared Brackets in x86 asm from IDA where a comment does mention that it is an offset but I'd really like a proper documentation reference.

Yes, it's just another way of writing [rbp - 1], and the -1 is a displacement in technical x86 addressing mode terminology1.
The GAS manual's section on x86 addressing modes only mentions the [ebp - 4] possibility, not -4[ebp], but GAS does assemble it.
And disassembly in AT&T or Intel syntax confirms what it meant. x86 addressing modes are constrained by what the machine can encode (Referencing the contents of a memory location. (x86 addressing modes)), so there isn't a lot of wiggle room on what some syntax might mean. (This syntax was emitted by GCC so we can safely assume that it's valid. And that it means the same thing as the -1(%rbp) it emits in AT&T syntax mode.)
Footnote 1: The whole rbp-1 effective address is the offset part of a seg:off address. The segment base is fixed at 0 in 64-bit mode, except for FS and GS, and even in 32-bit mode mainstream OSes use a flat memory model, so you can ignore the segment base. I point this out only because "offset" in x86 terminology does have a specific technical meaning separate from "displacement", in case you care about using terminology that matches Intel's manuals.
For some reason GCC's choice of syntax depends on -fno-pie or not. https://godbolt.org/z/iK9jh6 (On modern GNU/Linux distros like your Arch system, -fpie is enabled by default. On Godbolt it isn't).
This choice continues with optimization enabled, if you use volatile to force the stack variable to be written, or do other stuff with pointers: e.g. https://godbolt.org/z/4P92Fk. It applies to arbitrary dereferences like ptr[1 + x] from function args.
GCC -fno-pie chooses [rbp - 1] and [rdi+4+rsi*4]
GCC -fpie chooses -1[rbp] and 4[rdi+rsi*4]
IDK why GCC's internals choose differently based on PIE mode. No obvious reason; perhaps for some reason they just use different code paths in GCC's internals, or different format strings and they just happen to make different choices.
Both with and without PIE, a global (static storage) is referenced as glob[rip], not [RIP + glob] which is also supported. In both cases that means glob with respect to RIP, not actually RIP + absolute address of the symbol. But that's an exception to the rule that applies for any other register, or for no register.
GAS .intel_syntax is MASM-like, and MASM certainly does support symbol[register] and I think even 1234[register]. It's more normal for the displacement.

Related

(ASM) Error when moving a byte to a pointer [duplicate]

I'm learning about x86 inline assembly programming.
I wanted to write mov ecx, FFFFFFBB, however the compiler isn’t recognizing it. How should hex numbers like that be written in inline assembler code?
It depends on the flavour of your assembler.
AT&T: movl $0xFFFFFFBB, %ecx
Intel: mov ecx, 0FFFFFFBBh
FYI, AT&T syntax is used by assemblers such as the GNU Assembler, whereas NASM and most of others use Intel's one.
See the x86 tag wiki for links to assembler manuals, and lots of other stuff.
Different x86 assemblers support one or both of these syntaxes for hex constants:
0xDEADBEEF: NASM (and compat), GNU as, FASM, MSVC inline asm (but not MASM), emu8086.
0DEADBEEFh: NASM (and compat), FASM, MASM, TASM, emu8086.
DOS/Windows-only assemblers often only support the ...h syntax.
Portable assemblers typically support the 0x... syntax, or both.
Note the leading 0:
Numeric constants always have to start with a decimal digit to distinguish them from symbol names. (How do I write letter-initiated hexadecimal numbers in masm code? is specifically about that, for trailing-h style.)
Also note that assemblers, like C compilers, can evaluate expressions at assemble time, so you can write foo & 0xF (if foo is an assembler constant, defined with foo equ 0xABC or something). You can even add/subtract from labels (which are link-time constants, not assemble-time), so stuff like mov eax, OFFSET label - 20 still assembles to a mov r32, imm32 mov-immediate instruction, just with a different 32-bit immediate.
From the NASM manual's section on constants:
Some examples (all producing exactly the same code):
mov ax,200 ; decimal
mov ax,0200 ; still decimal
mov ax,0200d ; explicitly decimal
mov ax,0d200 ; also decimal
mov ax,0c8h ; hex
mov ax,$0c8 ; hex again: the 0 is required
mov ax,0xc8 ; hex yet again
mov ax,0hc8 ; still hex
mov ax,310q ; octal
mov ax,310o ; octal again
mov ax,0o310 ; octal yet again
mov ax,0q310 ; octal yet again
mov ax,11001000b ; binary
mov ax,1100_1000b ; same binary constant
mov ax,1100_1000y ; same binary constant once more
mov ax,0b1100_1000 ; same binary constant yet again
mov ax,0y1100_1000 ; same binary constant yet again
Most assemblers also allow character literals, like '0' for ASCII zero. Or even '0123' for four ASCII digits packed into a 32bit integer. Some support escape sequences (\n'), some (like YASM) don't. NASM only supports escape-sequences inside backquotes, not double quotes.
Other platforms:
ARM assembler: 0xDEADBEEF works.
I think 0x... is typical. the 0...h is mostly a DOS thing.
It depends on your assembler, but a common notation for hex literals is 0FFFFFFBBh.
Hex numbers are generally always represented with a leading 0x, so you'd use 0xFFFFFFBB.

What does gcc -fno-trapping-math do?

I cannot find any example where the -fno-trapping-math option has an effect.
I would expect -ftrapping-math to disable optimizations that may affect whether traps are generated or not. For example the calculation of an intermediate value with extended precision using x87 instructions or FMA instructions may prevent an overflow exception from occurring. The -ftrapping-math option does not prevent this.
Common subexpression elimination may result in one exception occurring rather than two, for example the optimization 1./x + 1./x = 2./x will generate one trap rather than two when x=0. The -ftrapping-math option does not prevent this.
Please give some examples of optimizations that are prevented by -fno-trapping-math.
Can you recommend any documents that explain the different floating point optimization options better than the gcc manual, perhaps with specific examples of code that is optimized by each option? Possibly for other compilers.
A simple example is as follows:
float foo()
{
float a = 0;
float nan = a/a;
return nan;
}
Compiled with GCC 7.3 for x64, at -O3:
foo():
pxor xmm0, xmm0
divss xmm0, xmm0
ret
...which is pretty self-explanatory. Note that it's actually doing the div (despite knowing that 0/0 is nan), which is not especially cheap! It has to do that, because your code might be trying to deliberately raise a floating point trap.
With -O3 -fno-signaling-nans -fno-trapping-math:
foo():
movss xmm0, DWORD PTR .LC0[rip]
ret
.LC0:
.long 2143289344
That is, "just load in a NaN and return it". Which is identical behavior, as long as you're not relying on there being a trap.

Understanding 8086 assembler debugger

I'm learning assembler and I need some help with understanding codes in the debugger, especially the marked part.
mov ax, a
mov bx, 4
I know how above instructions works, but in the debugger I have "2EA10301" and "BB0400".
What do they mean?
The first instruction moves variable a from data segment to the ax register, but in debugger I have cs:[0103].
What do mean these brackets and these numbers?
Thanks for any help.
The 2EA10301 and BB0400 numbers are the opcodes for the two instructions highlighted.
2E is Code Segment (CS) prefix and instructs the CPU to access memory with the CS segment instead of the default DS one.
A1 is the opcode for MOV AX, moffs16 and 0301 is the immediate 0103h in little endian, the address to read from.
So 2EA10301 is mov ax, cs:[103h].
The square brackets are the preferred way to denote a memory access through one the addressing mode but some assemblers support the confusing syntax without the brackets.
As this syntax is ambiguous and less standardised across different assemblers than the other, it is discouraged.
During the assembling the assembler keeps a location counter incremented for each byte emitted (each "section"/segment has its own counter, i.e. the counter is reset at the beginning of each "section").
This gives each variable an offset that is used to access it and to craft the instruction, variables names are for the human, CPUs can only read from addresses, numbers.
This offset will later be and address in memory once the file is loaded.
The assembler, the linker and the loader cooperate, there are various tricks at play, to make sure the final instruction is properly formed in memory and that the offset is transformed into the right address.
In your example their efforts culminate in the value 103h, that is the address of a in memory.
Again, in your example, the offset, if the file is a COM (by the way, don't put variables in the execution flow), was still 103h due to the peculiar structure of the COM files.
But in general, it could have been another number.
BB is MOV r16, imm16 with the register BX. The base form is B8 with the lower 3 bits indicating the register to use, BX is denoted by a value of 3 (011b in binary) and indeed 0B8h + 3 = 0BBh.
After the opcode, again, the WORD immediate 0400 that encodes 4 in little endian.
You now are in the position to realise that the assembly source is not always fully informative, as the assemblers implement some form of syntactic sugar.
The instruction mov ax, a, identical to mov bx, 4 in its syntax and that technically is move the immediate value, constant and known at assembly time, given by the address of a into ax, is instead interpreted as move the content of a, a value present in memory and readable only with a memory access, into ax because a is known to be a variable.
This phenomenon is limited in the x86, being CISC, and more widespread in the RISC world, where the lack of commonly needed instructions is compensated with pseudo-instructions.
Well, first, assembler is x86 Assembly. The assembler is what turns the instructions into machine code.
When you disassemble programs, it probably will use the hex values (like 90 is NOP instruction or B8 to move something to AX).
Square brackets copies the memory address to which the register points to.
The hex on the side is called the address.
Everything is very simple. The command mov ax, cx: [0103] means that the value of 000Ah is loaded into the register ax. This value is taken from the code segment at 0103h. Slightly higher in the pictures you can see this value. cx: 0101 0B900A00. Accordingly, at the address 0101h to be the value 0Bh, 0102h to be the value 90h, 0103h to be the value 0Ah, 0104h to be the value 00h. It turns out that the AL register loads the value from the address 0103h equal to 0Ah. It turns out that the AH register loads the value from the address 0104h equal to 00h and it turns out ax = 000Ah. If instead of the ax command, cx: [0103] there was the ax command, cx: [0101], then ax = 900Bh or the ax command, cx: [0102], then ax = 0A90h.

Assembly registers [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
My question WAS about getting as much info as I could about registers...No luck :/
Everyone got everything so wrong [Probably because English is not my native language].
So, the question will be more general... ;(
I need a tutorial with the BASICS!
Ah...Could I be more not-specific?
Also, thanks for the help in advance!
In general you can use any of eax, ebx, ecx, edx, esi and edi pretty much as you want. They can each hold any 32-bit value.
Keep in mind that if you call any Win32 API functions that they are free to modify eax, ecx and edx. So if you need to preserve the values of those registers across a function call you'll have to save them somewhere temporarily (e.g. on the stack).
Similarly, if you write a function that is to be called by another function (e.g. a Windows callback) you should preserve ebx, esi,edi and ebp within that function.
Some instructions are hardcoded to use certain registers. For example, the loop instruction uses (e)cx, the string instructions use esi/edi, the div instruction uses eax/edx, etc. You can find all such cases by going through the descriptions for all the instructions in Intel's manual.
The "fixed uses" of the registers derive from the ancient roots back in the 8086 days (and in some ways, even from before that).
The 8086 was an accumulator machine, you were supposed to do math mostly with ax (there was no eax yet), and a bit with dx. You can see this back in many instructions, for example most ALU ops have a smaller form for op ax, imm (also op al, imm) than for op other, imm, and the ancient decimal math instructions (daa and friends) operate only on al. There are instructions that always reference (e)ax and maybe (e)dx as "high half", see the "old multiplication" (with the single explicit operand), imul with an immediate was added in the 80186, imul reg, r/m was added in the 80386 which added a whole lot of stuff including 32bit mode. With 32bit mode also came the modern ModRM/SIB structure, here are the old 16bit version and the modern 32/64bit version. In the old version, there are only 4 registers that could ever be used in a memory operand, so there's a bit of the "fixed roles for registers" again. 32bit mode mostly removed that, except that esp can never be the index register (that wouldn't normally make sense anyway).
More recently, Haswell introduced shlx which removes the restriction that shifting by a variable amount could only be done using cl as the count, and mulx partially removed the fixed roles of registers for "wide multiplication" (80186 and 80386 only added the "general" forms for multiplication without the high half), mulx still gives edx a fixed role though.
More strangely, the relatively recently added pblendvb assigned a fixed role to xmm0, previous to that the vector registers weren't encumbered by such old-fashioned restrictions. That fixed role disappeared with AVX though, which allowed the extra operand to be encoded. pcmpistri and friends still assign a fixed role to ecx though.
With x64 came a change to 8 bit register operands, if a REX prefix is present it is now possible to use spl, bpl, sil and dil, previously unencodable, but at the cost of being able to address ah, ch, dh or bh. That's probably a symptom of moving away from special roles too, since previously it wouldn't have made much sense to be able to use bpl, but now that it's "more general purpose" it might have some uses (it's still often used as a base pointer though).
The general pattern is towards fewer restrictions/fixed roles. But much of the history of x86 is still visible today.
As a general comment, before you go much further, I recommend adopting a programming style, or you'll find it very hard to follow your own code. Below is a formatted example of your code, maybe not everything is correctly formatted but it gives you an idea. Once in the habit, it's easier than making higgledy-piggledy code. One of its main advantages, is with practice you can cast your eye down the code and follow it far quicker than if you have to read every line.
.386
.model flat, stdcall
option casemap :none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\masm32.lib
.data
ProgramText db "Hello World!", 0
BadText db "Error: Sum is incorrect value", 0
GoodText db "Excellent! Sum is 6", 0
Sum sdword 0
.code
start:
; eax
mov ecx, 6 ; set the counter to 6 ?
xor eax, eax ; set eax to 0
_label:
add eax, ecx ; add the numbers ?
dec ecx ; from 0 to 6 ?
jnz _label ; 21
mov edx, 7 ; 21
mul edx ; multiply by 7 147
push eax ; pushes eax into the stack
pop Sum ; pops eax and places it in Sum
cmp Sum, 147 ; compares Sum to 147
jz _good ; if they are equal, go to _good
_bad:
invoke StdOut, addr BadText
jmp _quit
_good:
invoke StdOut, addr GoodText
_quit:
invoke ExitProcess, 0
end start
I'll single out one line:
push eax ; pushes eax into the stack
Don't use comments to explain what an instruction does: use them to say what you are trying to acheive, or what the register represents, to give added value to the code.
Good luck to you: plenty of practice and midnight oil!

Subtract and detect underflow, most efficient way? (x86/64 with GCC)

I'm using GCC 4.8.1 to compile C code and I need to detect if underflow occurs in a subtraction on x86/64 architecture. Both are UNSIGNED. I know in assembly is very easy, but I'm wondering if I can do it in C code and have GCC optimize it in a way, cause I can't find it. This is a very used function (or lowlevel, is that the term?) so I need it to be efficient, but GCC seems to be too dumb to recognize this simple operation? I tried so many ways to give it hints in C, but it always uses two registers instead of just a sub and a conditional jump. And to be honest I get annoyed seeing such stupid code written so MANY times (function is called a lot).
My best approach in C seemed to be the following:
if((a-=b)+b < b) {
// underflow here
}
Basically, subtract b from a, and if result underflows detect it and do some conditional processing (which is unrelated to a's value, for example, it brings an error, etc).
GCC seems too dumb to reduce the above to just a sub and a conditional jump, and believe me I tried so many ways to do it in C code, and tried alot of command line options (-O3 and -Os included of course). What GCC does is something like this (Intel syntax assembly):
mov rax, rcx ; 'a' is in rcx
sub rcx, rdx ; 'b' is in rdx
cmp rax, rdx ; useless comparison since sub already sets flags
jc underflow
Needless to say the above is stupid, when all it needs is this:
sub rcx, rdx
jc underflow
This is so annoying because GCC does understand that sub modifies flags that way, since if I typecast it into a "int" it will generate the exact above except it uses "js" which is jump with sign, instead of carry, which will not work if the unsigned values difference is high enough to have the high bit set. Nevertheless it shows it is aware of the sub instruction affecting those flags.
Now, maybe I should give up on trying to make GCC optimize this properly and do it with inline assembly which I have no problems with. Unfortunately, this requires "asm goto" because I need a conditional JUMP, and asm goto is not very efficient with an output because it's volatile.
I tried something but I have no idea if it is "safe" to use or not. asm goto can't have outputs for some reason. I do not want to make it flush all registers to memory, that would kill the entire point I'm doing this which is efficiency. But if I use empty asm statements with outputs set to the 'a' variable before and after it, will that work and is it safe? Here's my macro:
#define subchk(a,b,g) { typeof(a) _a=a; \
asm("":"+rm"(_a)::"cc"); \
asm goto("sub %1,%0;jc %l2"::"r,m,r"(_a),"r,r,m"(b):"cc":g); \
asm("":"+rm"(_a)::"cc"); }
and using it like this:
subchk(a,b,underflow)
// normal code with no underflow
// ...
underflow:
// underflow occured here
It's a bit ugly but it works just fine. On my test scenario, it compiles just FINE without volatile overhead (flushing registers to memory) without generating anything bad, and it seems it works ok, however this is just a limited test, I can't possibly test this everywhere I use this function/macro as I said it is used A LOT, so I'd like to know if someone is knowledgeable, is there something unsafe about the above construct?
Particularly, the value of 'a' is NOT NEEDED if underflow occurs, so with that in mind are there any side effects or unsafe stuff that can happen with my inline asm macro? If not I'll use it without problems till they optimize the compiler so I can replace it back after I guess.
Please don't turn this into a debate about premature optimizations or what not, stay on topic of the question, I'm fully aware of that, so thank you.
I probably miss something obvious, but why isn't this good?
extern void underflow(void) __attribute__((noreturn));
unsigned foo(unsigned a, unsigned b)
{
unsigned r = a - b;
if (r > a)
{
underflow();
}
return r;
}
I have checked, gcc optimizes it to what you want:
foo:
movl %edi, %eax
subl %esi, %eax
jb .L6
rep
ret
.L6:
pushq %rax
call underflow
Of course you can handle underflow however you want, I have just done this to keep the asm simple.
How about the following assembly code (you can wrap it into GCC format):
sub rcx, rdx ; assuming operands are in rcx, rdx
setc al ; capture carry bit int AL (see Intel "setxx" instructions)
; return AL as boolean to compiler
Then you invoke/inline the assembly code, and branch on the resulting boolean.
Have you tested whether this is actually faster? Modern x86-microarchitectures use microcode, turning single assembly instructions into sequences of simpler micro-operations. Some of them also do micro-op fusion, in which a sequence of assembly-instructions is turned into a single micro-op. In particular, sequences like test %reg, %reg; jcc target are fused, probably because global processor flags are a bane of performance.
If cmp %reg, %reg; jcc target is mOp-fused, gcc might use that to get faster code. In my experience, gcc is very good at scheduling and similar low-level optimizations.

Resources