What does .comm mean? - gcc

I just translated this program,
#include <stdio.h>
int dam[1000][1000];
int main (int argc, const char * argv[]) {
// insert code here...
printf("Hello, World!\n");
return 0;
}
to assembly using gcc producing,
.cstring
LC0:
.ascii "Hello, World!\0"
.text
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $20, %esp
call L3
"L00000000001$pb":
L3:
popl %ebx
leal LC0-"L00000000001$pb"(%ebx), %eax
movl %eax, (%esp)
call L_puts$stub
movl $0, %eax
addl $20, %esp
popl %ebx
leave
ret
.comm _dam,1000000,5
.section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5
L_puts$stub:
.indirect_symbol _puts
hlt ; hlt ; hlt ; hlt ; hlt
.subsections_via_symbols
What does .comm mean? Does dam use heap space, stack space or data space?

From the as manual:
..comm declares a common symbol named
symbol. When linking, a common symbol
in one object file may be merged with
a defined or common symbol of the same
name in another object file. If ld
does not see a definition for the
symbol--just one or more common
symbols--then it will allocate length
bytes of uninitialized memory. length
must be an absolute expression. If ld
sees multiple common symbols with the
same name, and they do not all have
the same size, it will allocate space
using the largest size.
When using ELF, the .comm directive
takes an optional third argument. This
is the desired alignment of the
symbol, specified as a byte boundary
(for example, an alignment of 16 means
that the least significant 4 bits of
the address should be zero). The
alignment must be an absolute
expression, and it must be a power of
two. If ld allocates uninitialized
memory for the common symbol, it will
use the alignment when placing the
symbol. If no alignment is specified,
as will set the alignment to the
largest power of two less than or
equal to the size of the symbol, up to
a maximum of 16.

.comm name, size, alignment
The .comm directive allocates storage in the data section. The storage is referenced by the identifier name. Size is measured in bytes and must be a positive integer. Name cannot be predefined. Alignment is optional. If alignment is specified, the address of name is aligned to a multiple of alignment.
Source: https://docs.oracle.com/cd/E26502_01/html/E28388/eoiyg.html

Related

Is movzbl followed by testl faster than testb?

Consider this C code:
int f(void) {
int ret;
char carry;
__asm__(
"nop # do something that sets eax and CF"
: "=a"(ret), "=#ccc"(carry)
);
return carry ? -ret : ret;
}
When I compile it with gcc -O3, I get this:
f:
nop # do something that sets eax and CF
setc %cl
movl %eax, %edx
negl %edx
testb %cl, %cl
cmovne %edx, %eax
ret
If I change char carry to int carry, I instead get this:
f:
nop # do something that sets eax and CF
setc %cl
movl %eax, %edx
movzbl %cl, %ecx
negl %edx
testl %ecx, %ecx
cmovne %edx, %eax
ret
That change replaced testb %cl, %cl with movzbl %cl, %ecx and testl %ecx, %ecx. The program is actually equivalent, though, and GCC knows it. As evidence of this, if I compile with -Os instead of -O3, then both char carry and int carry result in the exact same assembly:
f:
nop # do something that sets eax and CF
jnc .L1
negl %eax
.L1:
ret
It seems like one of two things must be true, but I'm not sure which:
A testb is faster than a movzbl followed by a testl, so GCC's use of the latter with int is a missed optimization.
A testb is slower than a movzbl followed by a testl, so GCC's use of the former with char is a missed optimization.
My gut tells me that an extra instruction will be slower, but I also have a nagging doubt that it's preventing a partial register stall that I just don't see.
By the way, the usual recommended approach of xoring the register to zero before the setc doesn't work in my real example. You can't do it after the inline assembly runs, since xor will overwrite the carry flag, and you can't do it before the inline assembly runs, since in the real context of this code, every general-purpose call-clobbered register is already in use somehow.
There's no downside I'm aware of to reading a byte register with test vs. movzb.
If you are going to zero-extend, it's also a missed optimization not to xor-zero a reg ahead of the asm statement, and setc into that so the cost of zero-extension is off the critical path. (On CPUs other than Intel IvyBridge+ where movzx r32, r8 is not zero latency). Assuming there's a free register, of course. Recent GCC does sometimes find this zero/set-flags/setcc optimization for generating a 32-bit boolean from a flag-setting instruction, but often misses it when things get complex.
Fortunately for you, your real use-case couldn't do that optimization anyway (except with mov $0, %eax zeroing, which would be off the critical path for latency but cause a partial-register stall on Intel P6 family, and cost more code size.) But it's still a missed optimization for your test case.

Segmentation fault: 11 With Array Assignment in Loop Using x86 GNU GAS Assembly

This question is similar to another question I posted here. I am attempting to write the Assembly version of the following in c/c++:
int x[10];
for (int i = 0; i < 10; i++){
x[i] = i;
}
Essentially, creating an array storing the values 1 through 9.
My current logic is to create a label that loops up to 10 (calling itself until reaching the end value). In the label, I have placed the instructions to update the array at the current index of iteration. However, after compiling with gcc filename.s and running with ./a.out, the error Segmentation fault: 11 is printed to the console. My code is below:
.data
x:.fill 10, 4
index:.int 0
end:.int 10
.text
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
jmp outer_loop
leave
ret
outer_loop:
movl index(%rip), %eax;
cmpl end(%rip), %eax
jge end_loop
lea x(%rip), %rdi;
mov index(%rip), %rsi;
movl index(%rip), %eax;
movl %eax, (%rdi, %rsi, 4)
incl index(%rip)
jmp outer_loop
leave
ret
end_loop:
leave
ret
Oddly the code below
lea x(%rip), %rdi;
mov index(%rip), %rsi;
movl index(%rip), %eax;
movl %eax, (%rdi, %rsi, 4)
works only if it is not in a label that is called repetitively. Does anyone know how I can implement the code above in a loop, without Segmentation fault: 11 being raised? I am using x86 Assembly on MacOS with GNU GAS syntax compiled with gcc.
Please note that this question is not a duplicate of this question as different Assembly syntax is being used and the scope of the problem is different.
You're using a 64-bit instruction to access a 32-bit area of memory :
mov index(%rip), %rsi;
This results in %rsi being assigned the contents of memory starting from index and ending at end (I'm assuming no alignment, though I don't remember GAS's rules regarding it). Thus, %rsi effectively is assigned the value 0xa00000000 (assuming first iteration of the loop), and executing the following movl %eax, (%rdi, %rsi, 4) results in the CPU trying to access the address that's not mapped by your process.
The solution is to remove the assignment, and replace the line after it with movl index(%rip), %esi. 32-bit operations are guaranteed to always clear out the upper bits of 64-bit registers, so you can then safely use %rsi in the address calculation, as it's going to contain the current index and nothing more.
Your debugger would've told you this, so please do use it next time.

Meaning of dollar sign in gnu assembly labels

What is the meaning of a dollar sign in front of a gnu assembly label?
For example, what is the difference between mov msg, %si and mov $msg, %si
(For more context, I'm playing around with the x86 Bare Metal Examples: https://github.com/cirosantilli/x86-bare-metal-examples/blob/master/bios_hello_world.S)
#include "common.h"
BEGIN
mov $msg, %si
mov $0x0e, %ah
loop:
lodsb
or %al, %al
jz halt
int $0x10
jmp loop
halt:
hlt
msg:
.asciz "hello world"
(What do the dollar ($) and percentage (%) signs represent in assembly intel x86? discusses the general use of % before registers and $ before constants; but, I don't think it lays out the use of $ with labels nearly as clearly as the answer below )
You use $(dollar) sign when addressing a constant, e.g.:
movl $1, %eax (put 1 to %eax register)
or when handling an address of some variable, e.g.: movl $var, %eax (this means take an address of var label and put it into %eax register).
If you don't use dollar sign that would mean "take the value from var label and put it to register".

Assembly multiplication loop returning wrong high number

I am trying to write a for loop that does multiplication by adding a number (var a) by another number (var b) times.
.globl times
times:
movl $0, %ecx # i = 0
cmpl %ecx, %esi #if b-i
jge end # if >= 0, jump to end
loop:
addl (%edi, %eax), %eax #sum += a
incl %ecx # i++
cmpl %esi, %ecx # compare (i-b)
jl loop # < 0? loop b times total
end:
ret
Where am I going wrong? I've run through the logic and I can't figure out what the problem is.
TL:DR: you didn't zero EAX, and your ADD instruction is using a memory operand.
You should have used a debugger. You'd easily have seen that EAX wasn't zero to start with. See the bottom of the x86 tag wiki for tips on using gdb to debug asm.
I guess you're using the x86-64 System V ABI, so your args (a and b) are in %edi and %esi.
At the start of a function, registers other than the ones holding your args should be assumed to contain garbage. Even the high parts of registers that are holding your args can contain garbage. (exception to this rule: unofficially, narrow args are sign or zero extended to 32-bit by the caller)
Neither arg is a pointer, so you shouldn't dereference them. add (%edi, %eax), %eax calculates a 32-bit address as EDI+EAX, and then loads 32 bits from there. It adds that dword to EAX (the destination operand).
I'm shocked that your program didn't segfault, since you're using your integer arg as a pointer.
For many x86 instructions (like ADD), the destination operand is not write-only. add %edi, %eax does EAX += EDI. I think you're getting mixed up with 3-operand RISC syntax, where you might have an instruction like add %src1, %src2, %dst.
x86 has some instructions like that, added as recent extensions, like BMI2 bzhi, but the usual instructions are all 2-operand with destructive destinations. (except for LEA, where instead of loading from the address, it stores the address in the destination. So lea (%edi, %eax), %eax would work. You could even put the result in a different register. LEA is great for saving MOV instructions by doing shift+add and a mov all in one instruction, using the addressing mode syntax and machine-code encoding.
You have a comment that says ie eax = sum + (a x 4bits). No clue what you're talking about there. a is 4 bytes (not bits), and you're not multiplying a (%edi) by anything.
Just for fun, here's how I'd write your function (if I had to avoid imul %edi, %esi / mov %esi, %eax). I'll assume both args are non-negative, to keep it simple. If your args are signed integers, and you have to loop -b times if b is negative, then you need some extra code.
# args: int a(%edi), int b(%esi) # comments are important for documenting inputs/outputs to blocks of code
# return value: product in %eax
# assumptions: b is non-negative.
times:
xor %eax, %eax # zero eax
test %esi, %esi # set flags from b
jz loop_end # early-out if it's zero
loop: # do{
add %edi, %eax # sum += a,
dec %esi # b-- (setting flags based on the result, except for CF so don't use ja or jb after it)
jge loop # }while(b>=0)
loop_end:
ret
Note the indenting style, so it's easy to find the branch targets. Some people like to indent extra for instructions inside loops.
Your way works fine (if you do it right), but my way illustrates that counting down is easier in asm (no need for an extra register or immediate to hold the upper bound). Also, avoiding redundant compares. But don't worry about optimizing until after you're comfortable writing code that at least works.
This is a pseudo code, keep that in mind.
mov X,ebx <- put into EBX your counter, your B
mov Y,edx <- put into EDX your value, your A
mov 0,eax <- Result
loop:
add eax,edx
dec ebx
jnz loop <- While EBX is not zero
The above implementation should result in your value into EAX. Your code looks like it's missing the eax initialisation.

xorl %eax, %eax in x86_64 assembly code produced by gcc

I'm a total noob at assembly, just poking around a bit to see what's going on. Anyway, I wrote a very simple function:
void multA(double *x,long size)
{
long i;
for(i=0; i<size; ++i){
x[i] = 2.4*x[i];
}
}
I compiled it with:
gcc -S -m64 -O2 fun.c
And I get this:
.file "fun.c"
.text
.p2align 4,,15
.globl multA
.type multA, #function
multA:
.LFB34:
.cfi_startproc
testq %rsi, %rsi
jle .L1
movsd .LC0(%rip), %xmm1
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L3:
movsd (%rdi,%rax,8), %xmm0
mulsd %xmm1, %xmm0
movsd %xmm0, (%rdi,%rax,8)
addq $1, %rax
cmpq %rsi, %rax
jne .L3
.L1:
rep
ret
.cfi_endproc
.LFE34:
.size multA, .-multA
.section .rodata.cst8,"aM",#progbits,8
.align 8
.LC0:
.long 858993459
.long 1073951539
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
The assembly output makes sense to me (mostly) except for the line xorl %eax, %eax. From googling, I gather that the purpose of this is simply to set %eax to zero, which in this case corresponds to my iterator long i;.
However, unless I am mistaken, %eax is a 32-bit register. So it seems to me that this should actually be xorq %rax, %rax, particularly since this is holding a 64-bit long int. Moreover, further down in the code, it actually uses the 64-bit register %rax to do the iterating, which never gets initialized outside of xorl %eax %eax, which would seem to only zero out the lower 32 bits of the register.
Am I missing something?
Also, out of curiosity, why are there two .long constants there at the bottom? The first one, 858993459 is equal to the double floating-point representation of 2.4 but I can't figure out what the second number is or why it is there.
I gather that the purpose of this is simply to set %eax to zero
Yes.
which in this case corresponds to my iterator long i;.
No. Your i is uninitialized in the declaration. Strictly speaking, that operation corresponds to the i = 0 expression in the for loop.
However, unless I am mistaken, %eax is a 32-bit register. So it seems to me that this should actually be xorq %rax, %rax, particularly since this is holding a 64-bit long int.
But clearing the lower double word of the register clears the entire register. This is not intuitive, but it's implicit.
Just to answer the second part: .long means 32 bit, and the two integral constants side-by-side form the IEEE-754 representation of the double 2.4:
Dec: 1073951539 858993459
Hex: 0x40033333 0x33333333
400 3333333333333
S+E Mantissa
The exponent is offset by 1023, so the actual exponent is 0x400 − 1023 = 1. The leading "one" in the mantissa is implied, so it's 21 × 0b1.001100110011... (You recognize this periodic expansion as 3/15, i.e. 0.2. Sure enough, 2 × 1.2 = 2.4.)

Resources