I am new to 64bit Assembly coding. So I tried some simple Programms:
c-programm:
#include <stdio.h>
extern double bla();
double x=0;
int main() {
x=bla();
printf(" %f",x);
return 0;
}
Assembly:
section .data
section .text
global bla
bla:
mov rax,10
movq xmm0,rax
ret
The result was alwals 0.0 instead of 10.0
But when i make it without a immediate it works fine
#include <stdio.h>
extern double bla(double y);
double x=0;
double a=10;
int main() {
x=bla(a);
printf("add returned %f",x);
return 0;
}
section .data
section .text
global bla
bla:
movq rax,xmm0
movq xmm0,rbx ;xmm0=0 now
movq xmm0,rax ;xmm0=10 now
ret
Do I need a different Instruction to load a Immediate in a 64bit Register?
The problem here was that the OP was trying to move 10 into a floating-point register with the following code:
mov rax,10
movq xmm0,rax
That cannot work, since movq into xmm0 assumes that the bit-pattern of the source is already in floating-point format - and of course it isn't: it's an integer.
#Michael Petch's suggestion was to use the (NASM) assembler's floating-point converter as follows:
mov rax,__float64__(10.0)
movq xmm0,rax
That then produces the expected output.
Related
case 1->
int a;
std :: cout << a << endl; // prints 0
case 2->
int a;
std :: cout << &a << " " << a << endl; // 0x7ffc057370f4 32764
whenever I print address of variable, they aren't initialized to default value why is it so.
I thought value of a in case 2 is junk but every time I run the code it shows 32764,5,6,7 are these still junk values?
Variables in C++ are not initialized to a default value, hence there's no way to determine the value. You can read more about it here.
I'm afraid the accepted answer does not touch the main point of the question:
why
int a;
std :: cout << a << endl; // prints 0
always prints 0, as if a was initialized to its default value, whereas in
int a;
std :: cout << &a << " " << a << endl; // 0x7ffc057370f4 32764
the compiler produces some junk value for a.
Yes, in both cases we have an example of undefined behavior and ANY value for a is possible, so why in Case 1 there's always 0?
First of all remember that a C/C++ compiler is free to modify the source code in an arbitrary way as long as the meaning of the program remains the same. So, if you write
int a;
std :: cout << a << endl; // prints 0
the compiler is free to assume that a needs not be associated with any real RAM cells. You don't read it, nor do you write to a. So the compiler is free to allocate the memory for a in one of its registers. In such a case a has no address and is functionally equivalent to something as weird as a "named, addressless temporary". However, in Case 2 you ask the compiler to print the address of a. In such a case the compiler cannot ignore the request and generates the code for the memory where a would be allocated even though the value of a can be a junk.
The next factor is optimization. You can either switch it off completely in Debug compilation mode or turn on aggressive optimization in Release mode. So, you can expect that your simple code will behave differently whether you compile it as Debug or Release. Moreover, since it is undefined behavior, your code may run differently if compiled with different compilers or even different versions of the same compiler.
I prepared a version of your program that is a bit easier to analyze:
#include <iostream>
int f()
{
int a;
return a; // prints 0
}
int g()
{
int a;
return reinterpret_cast<long long int>(&a) + a; // prints 0
}
int main() { std::cout << f() << " " << g() << "\n"; }
Function g differs form f in that it uses the address of uninitialized variable a. I tested it in Godbolt Compiler Explorer: https://godbolt.org/z/os8b583ss You can switch there between various compilers and various optimization options. Please do experiment yourself. For Debug and gcc or clang, use -O0 or -g, for Release use -O3.
For the newest (trunk) gcc, we have the following assembly equivalent:
f():
xorl %eax, %eax
ret
g():
leaq -4(%rsp), %rax
addl -4(%rsp), %eax
ret
main:
subq $24, %rsp
xorl %esi, %esi
movl $_ZSt4cout, %edi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
leaq 12(%rsp), %rsi
movl $_ZSt4cout, %edi
addl 12(%rsp), %esi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
xorl %eax, %eax
addq $24, %rsp
ret
Please notice that f() was reduced to a trivial setting of the eax register to zero ( for any value of integer a, a xor a equals 0). eax is the register where this function is to return its value. Hence 0 in Release. Well, actually, no, the compiler is even smarter: it never calls f()! Instead, it zeroes the esi register that is used in a call to operator<<. Similarly, g is replaced by reading 12(%rsp), once as a value, once as the address of. This generates a random value for a and rather similar values for &a. AFIK, they're a bit randomized to make the life of hackers attacking our code harder.
Now the same code in Debug:
f():
pushq %rbp
movq %rsp, %rbp
movl -4(%rbp), %eax
popq %rbp
ret
g():
pushq %rbp
movq %rsp, %rbp
leaq -4(%rbp), %rax
movl %eax, %edx
movl -4(%rbp), %eax
addl %edx, %eax
popq %rbp
ret
main:
pushq %rbp
movq %rsp, %rbp
call f()
movl %eax, %esi
movl $_ZSt4cout, %edi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
call g()
movl %eax, %esi
movl $_ZSt4cout, %edi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
movl $0, %eax
popq %rbp
ret
You can now clearly see, even without knowing the 386 assembly (I don't know it either) that in Debug mode (-g) the compiler performs no optimization at all. In f() it reads a (4 bytes below the frame pointer register value, -4(%rbp)) and moves it to the "result register" eax. In g(), the same is done, but a is read once as a value and once as an address. Moreover, both f() and g() are called in main(). In this compiler mode, the program produces "random" results for a (try it yourself!).
To make things even more interesting, here's f() as compiled by clang (trunk) in Release:
f(): # #f()
retq
g(): # #g()
retq
Can you see? These function are so trivial to clang that it generated no code for them. Moreover, it did not zeroed the registers corresponding to a, so, unlike g++, clang produces a random value for a (in both Release and Debug).
You can go with your experiments even further and find that what clang produces for f depends on whether f or g is called first in main.
Now you should have a better understanding of what Undefined Behavior is.
I wrote a simple program:
constexpr int strlen_c(char const* s)
{
return *s ? 1 + strlen_c(s + 1) : 0;
}
int main()
{
return strlen_c("hello world");
}
I expected that the compiler optimizes the function and evaluates its result in compile time. But actually the generated machine code evaluates the result in a loop:
mov edx, offset aHelloWorld ; "hello world"
loc_408D00:
add edx, 1
mov eax, edx
sub eax, offset aHelloWorld ; "hello world"
cmp byte ptr [edx], 0
jnz short loc_408D00
leave
retn
The program is being compiled with g++ version 5.3 with flags -std=c++11 -Ofast -O2. The same result I obtain in Visual Studio 2013, and g++ 4.9.
Quaestion what is the reason the compiler couldn't optimize the given code?
A constexpr function is not necessarily always evaluated at compile time. However, it must be evaluated at compile time if used in a constexpr context, So, following will work regardless of the compiler optimizations:
int main()
{
constexpr auto len = strlen_c("hello world");
return len;
}
Following is the assembly generated for the above code:
main:
mov eax, 11
ret
Demo
I successfully called the exit syscall from assembly but I'm strugling to call the _getpid syscall and use it's return value. Here is the code I'm using
.text
.globl _getpiddirect
_getpiddirect:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl $39, %eax
int $0x80
addl $8, %esp
popl %ebp
ret
and
#include <stdio.h>
#include <unistd.h>
extern unsigned long getpiddirect();
int main(int argc, const char *argv[])
{
printf("%lu\n", getpiddirect());
printf("%lu\n", (unsigned long) getpid());
return 0;
}
getpiddirect keeps returning 4056.
Thats because 39 is a code for getppid - get parent process id and that is what you're getting as 4056. The getpid code is 20, but please look at /usr/include/sys/syscall.h for the value of SYS_getpid as exact constant used on your system.
Also i'm not sure why you want 8 bytes on the stack prior to calling getpid through interrupt. It doesn't affect anything and is just useless, no?
If I have the following C++ code to compare two 128-bit unsigned integers, with inline amd-64 asm:
struct uint128_t {
uint64_t lo, hi;
};
inline bool operator< (const uint128_t &a, const uint128_t &b)
{
uint64_t temp;
bool result;
__asm__(
"cmpq %3, %2;"
"sbbq %4, %1;"
"setc %0;"
: // outputs:
/*0*/"=r,1,2"(result),
/*1*/"=r,r,r"(temp)
: // inputs:
/*2*/"r,r,r"(a.lo),
/*3*/"emr,emr,emr"(b.lo),
/*4*/"emr,emr,emr"(b.hi),
"1"(a.hi));
return result;
}
Then it will be inlined quite efficiently, but with one flaw. The return value is done through the "interface" of a general register with a value of 0 or 1. This adds two or three unnecessary extra instructions and detracts from a compare operation that would otherwise be fully optimized. The generated code will look something like this:
mov r10, [r14]
mov r11, [r14+8]
cmp r10, [r15]
sbb r11, [r15+8]
setc al
movzx eax, al
test eax, eax
jnz is_lessthan
If I use "sbb %0,%0" with an "int" return value instead of "setc %0" with a "bool" return value, there's still two extra instructions:
mov r10, [r14]
mov r11, [r14+8]
cmp r10, [r15]
sbb r11, [r15+8]
sbb eax, eax
test eax, eax
jnz is_lessthan
What I want is this:
mov r10, [r14]
mov r11, [r14+8]
cmp r10, [r15]
sbb r11, [r15+8]
jc is_lessthan
GCC extended inline asm is wonderful, otherwise. But I want it to be just as good as an intrinsic function would be, in every way. I want to be able to directly return a boolean value in the form of the state of a CPU flag or flags, without having to "render" it into a general register.
Is this possible, or would GCC (and the Intel C++ compiler, which also allows this form of inline asm to be used) have to be modified or even refactored to make it possible?
Also, while I'm at it — is there any other way my formulation of the compare operator could be improved?
Here we are almost 7 years later, and YES, gcc finally added support for "outputting flags" (added in 6.1.0, released ~April 2016). The detailed docs are here, but in short, it looks like this:
/* Test if bit 0 is set in 'value' */
char a;
asm("bt $0, %1"
: "=#ccc" (a)
: "r" (value) );
if (a)
blah;
To understand =#ccc: The output constraint (which requires =) is of type #cc followed by the condition code to use (in this case c to reference the carry flag).
Ok, this may not be an issue for your specific case anymore (since gcc now supports comparing 128bit data types directly), but (currently) 1,326 people have viewed this question. Apparently there's some interest in this feature.
Now I personally favor the school of thought that says don't use inline asm at all. But if you must, yes you can (now) 'output' flags.
FWIW.
I don't know a way to do this. You may or may not consider this an improvement:
inline bool operator< (const uint128_t &a, const uint128_t &b)
{
register uint64_t temp = a.hi;
__asm__(
"cmpq %2, %1;"
"sbbq $0, %0;"
: // outputs:
/*0*/"=r"(temp)
: // inputs:
/*1*/"r"(a.lo),
/*2*/"mr"(b.lo),
"0"(temp));
return temp < b.hi;
}
It produces something like:
mov rdx, [r14]
mov rax, [r14+8]
cmp rdx, [r15]
sbb rax, 0
cmp rax, [r15+8]
jc is_lessthan
Can a pointer be used as offset and base of a memory reference with inline assembly?
For example:
int main(){
char a[16],b[16];
asm volatile("\
movq $123,16(%%rsp,%%rbx,1)"
:"=m"(*a)::"rbx");
}
Could be something like:
int main(){
char a[16],b[16];
asm volatile("\
movq $123,(%0,%%rbx,1)"
:"=m"(*a)::"rbx");
}
One choice is to use one additional register:
int main(){
char a[16],b[16];
asm volatile("\
lea %0,%%rcx\n\
movq $123,(%%rcx,%%rbx,1)"
:"=m"(*a)::"rbx","rcx");
}