I need to combine two 32-bit values to create a 64-bit value. I'm looking for something analogous to MAKEWORD and MAKELONG. I can easily define my own macro or function, but if the API already provides one, I'd prefer to use that.
I cannot find any in the Windows API. However, I do know that you work mostly (or, at least, a lot) with Delphi, so here is a quick Delphi function:
function MAKELONGLONG(A, B: cardinal): UInt64; inline;
begin
PCardinal(#result)^ := A;
PCardinal(cardinal(#result) + sizeof(cardinal))^ := B;
end;
Even faster:
function MAKELONGLONG(A, B: cardinal): UInt64;
asm
end;
Explanation: In the normal register calling convention, the first two arguments (if cardinal-sized) are stored in EAX and EDX, respetively. A (cardinal-sized) result is stored in EAX. Now, a 64-bit result is stored in EAX (less significant bits, low address) and EDX (more significant bits, high address); hence we need to move A to EAX and B to EDX, but they are already there!
Personally I prefer C-macros
#define MAKE_i64(hi, lo) ( (LONGLONG(DWORD(hi) & 0xffffffff) << 32 ) | LONGLONG(DWORD(lo) & 0xffffffff) )
Related
I want to manually set address of Pointer to value stored in string variable. I have:
addr : String;
ptr : Pointer;
then:
addr:='005F5770';
How to assign it to the ptr?
Like this:
ptr := Pointer($005F5770);
You don't need a string variable since the address is a literal that is known at compile time.
In fact you can make this a constant since the value is known at compile time:
const
ptr = Pointer($005F5770);
Of course, if the value isn't a literal and really does start life as a string with hexadecimal representation then you first need to convert to an integer:
ptr := Pointer(StrToUInt64('$' + S));
Convert it to a UInt64 so that your code is immune to 32 bit pointer truncation when compiled for 64 bit.
Prepend the string hexadecimal number with $ or 0xand use the standard StrToInt():
ptr := Pointer(StrToInt('$'+addr));
If your pointer values are large and targeting a 64 bit compiler, consider using StrToInt64()
Note that a typecast from integer to a pointer is needed.
I'm trying to output the same string twice in extended inline ASM in GCC, on 64-bit Linux.
int main()
{
const char* test = "test\n";
asm(
"movq %[test], %%rdi\n" // Debugger shows rdi = *address of string*
"movq $0, %%rax\n"
"push %%rbp\n"
"push %%rbx\n"
"call printf\n"
"pop %%rbx\n"
"pop %%rbp\n"
"movq %[test], %%rdi\n" // Debugger shows rdi = 0
"movq $0, %%rax\n"
"push %%rbp\n"
"push %%rbx\n"
"call printf\n"
"pop %%rbx\n"
"pop %%rbp\n"
:
: [test] "g" (test)
: "rax", "rbx","rcx", "rdx", "rdi", "rsi", "rsp"
);
return 0;
}
Now, the string is outputted only once. I have tried many things, but I guess I am missing some caveats about the calling convention. I'm not even sure if the clobber list is correct or if I need to save and restore RBP and RBX at all.
Why is the string not outputted twice?
Looking with a debugger shows me that somehow when the string is loaded into rdi for the second time it has the value 0 instead of the actual address of the string.
I cannot explain why, it seems like after the first call the stack is corrupted? Do I have to restore it in some way?
Specific problem to your code: RDI is not maintained across a function call (see below). It is correct before the first call to printf but is clobbered by printf. You'll need to temporarily store it elsewhere first. A register that isn't clobbered will be convenient. You can then save a copy before printf, and copy it back to RDI after.
I do not recommend doing what you are suggesting (making function calls in inline assembler). It will be very difficult for the compiler to optimize things. It is very easy to get things wrong. David Wohlferd wrote a very good article on reasons not to use inline assembly unless absolutely necessary.
Among other things the 64-bit System V ABI mandates a 128-byte red zone. That means you can't push anything onto the stack without potential corruption. Remember: doing a CALL pushes a return address on the stack. Quick and dirty way to resolve this problem is to subtract 128 from RSP when your inline assembler starts and then add 128 back when finished.
The 128-byte area beyond the location pointed to by %rsp is considered to
be reserved and shall not be modified by signal or interrupt handlers.8 Therefore,
functions may use this area for temporary data that is not needed across function
calls. In particular, leaf functions may use this area for their entire stack frame,
rather than adjusting the stack pointer in the prologue and epilogue. This area is
known as the red zone.
Another issue to be concerned about is the requirement for the stack to be 16-byte aligned (or possibly 32-byte aligned depending on the parameters) prior to any function call. This is required by the 64-bit ABI as well:
The end of the input argument area shall be aligned on a 16 (32, if __m256 is
passed on stack) byte boundary. In other words, the value (%rsp + 8) is always
a multiple of 16 (32) when control is transferred to the function entry point.
Note: This requirement for 16-byte alignment upon a CALL to a function is also required on 32-bit Linux for GCC >= 4.5:
In context of the C programming language, function arguments are pushed on the stack in the reverse order. In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary when calling a function (previous versions only required a 4-byte alignment.)
Since we call printf in inline assembler we should ensure that we align the stack to a 16-byte boundary before making the call.
You also have to be aware that when calling a function some registers are preserved across a function call and some are not. Specifically those that may be clobbered by a function call are listed in Figure 3.4 of the 64-bit ABI (see previous link). Those registers are RAX, RCX, RDX, RD8-RD11, XMM0-XMM15, MMX0-MMX7, ST0-ST7 . These are all potentially destroyed so should be put in the clobber list if they don't appear in the input and output constraints.
The following code should satisfy most of the conditions to ensure that inline assembler that calls another function will not inadvertently clobber registers, preserves the redzone, and maintains 16-byte alignment before a call:
int main()
{
const char* test = "test\n";
long dummyreg; /* dummyreg used to allow GCC to pick available register */
__asm__ __volatile__ (
"add $-128, %%rsp\n\t" /* Skip the current redzone */
"mov %%rsp, %[temp]\n\t" /* Copy RSP to available register */
"and $-16, %%rsp\n\t" /* Align stack to 16-byte boundary */
"mov %[test], %%rdi\n\t" /* RDI is address of string */
"xor %%eax, %%eax\n\t" /* Variadic function set AL. This case 0 */
"call printf\n\t"
"mov %[test], %%rdi\n\t" /* RDI is address of string again */
"xor %%eax, %%eax\n\t" /* Variadic function set AL. This case 0 */
"call printf\n\t"
"mov %[temp], %%rsp\n\t" /* Restore RSP */
"sub $-128, %%rsp\n\t" /* Add 128 to RSP to restore to orig */
: [temp]"=&r"(dummyreg) /* Allow GCC to pick available output register. Modified
before all inputs consumed so use & for early clobber*/
: [test]"r"(test), /* Choose available register as input operand */
"m"(test) /* Dummy constraint to make sure test array
is fully realized in memory before inline
assembly is executed */
: "rax", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10", "r11",
"xmm0","xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
"xmm8","xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
"mm0","mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm6",
"st", "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)"
);
return 0;
}
I used an input constraint to allow the template to choose an available register to be used to pass the str address through. This ensures that we have a register to store the str address between the calls to printf. I also get the assembler template to choose an available location for storing RSP temporarily by using a dummy register. The registers chosen will not include any one already chosen/listed as an input/output/clobber operand.
This looks very messy, but failure to do it correctly could lead to problems later as you program becomes more complex. This is why calling functions that conform to the System V 64-bit ABI within inline assembler is generally not the best way to do things.
I have a C DLL and want to call it from Delphi XE3 Update 2.
Curiously it seems that in my project calling it dynamically IS different to calling it statically. Here is the 'minimal' code to reproduce (I have changed the Lib/functionnames):
program testProject;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils, System.classes, Windows;
function keylist_open (keylist: PPointer): Integer; external 'libLib';
var
Handle: HINST;
DLLName: PChar = 'libLib.dll';
type
Tkeylist_open = function(keylist: PPointer): Integer; stdcall;
const
keylist_openDynamic: Tkeylist_open = nil;
var
keylist: Pointer;
begin
Handle := LoadLibrary(DLLName);
if Handle = 0 then
Exit;
#keylist_openDynamic := GetProcAddress(Handle, 'keylist_open');
keylist_open(#keylist);
if (keylist = nil) then
Writeln('static: keylist is nil');
keylist_openDynamic(#keylist);
if (keylist = nil) then
Writeln('dynamic: keylist is nil');
end.
The output is
static: keylist is nil
Which means that calling the function dynamically is different from statically.
the keylist indeed gets initialized correctly by calling it dynamically.
looking into the generated assembler code i realize that the variable 'keylist'
is put into the eax register:
testProject.dpr.34: keylist_open(#keylist);
004D16A2 B804B04D00 mov eax,$004db004
004D16A7 E8ECC6FFFF call keylist_open
then
testProject.dpr.12: function keylist_open (keylist: PPointer): Integer; external 'libLib';
004CDD98 FF255CC54D00 jmp dword ptr [$004dc55c]
and another jump
libLib.keylist_open:
5B364508 E903A23D00 jmp $5b73e710
but then the dll (i do not know which function this is, some entry point or the keylist routine) there is
5B73E710 55 push ebp
5B73E711 8BEC mov ebp,esp
5B73E713 81ECDC000000 sub esp,$000000dc
5B73E719 53 push ebx
5B73E71A 56 push esi
5B73E71B 57 push edi
5B73E71C 8DBD24FFFFFF lea edi,[ebp-$000000dc]
5B73E722 B937000000 mov ecx,$00000037
5B73E727 B8CCCCCCCC mov eax,$cccccccc
...
it seems that the eax parameter is being overwritten in eax.
two lines later the code for the dynamic call is:
testProject.dpr.37: keylist_openDynamic(#keylist);
004D16CE 6804B04D00 push $004db004
004D16D3 FF15F0564D00 call dword ptr [$004d56f0]
jumping to
libLib.keylist_open:
5B364508 E903A23D00 jmp $5b73e710
and thus to the very same code. But as the parameter is now not stored in eax, overwriting eax does not matter.
call anyone shed a light here, what is going wrong, i.e. what is wrong with my static code and why?
The two versions differ in the calling convention. The run time linking variant uses stdcall and the load time linking variant uses register.
Make the calling conventions match and all will be well.
I am rewriting some C functions in ASM for practicing.
My memset function is setting RAX to the same address passed in the RDI register.
But gcc is extending the AL's value with the CDQE instruction.
char super[] = "suuuuuuuuuuper";
res = memset(super, 't', 4);
printf("memset = {%s} (%p) res = %p\n", super, super, res);
Output :
memset = {ttttuuuuuuuper} (0x7fffffd30250) res = 0xffffffffffd30250
Then a segmentation fault would occur if I try to access the address stored in res.
I can just edit the binary file and replace the CQDE call with two NOP instructions, it will run perfectly.
But I was wondering if there's something else, such as a GCC flag to avoid that call ?
Make sure that the code calling memset() has seen a proper prototype for memset() so it knows that the function returns a void* instead of an int.
Of course you'll also need to pass the -fno-builtin-memset option to the compiler (or something equivalent) to make sure the compiler calls your function at all.
I was playing around a bit to get a better grip on calling conventions and how the stack is handled, but I can't figure out why main allocates three extra double words when setting up the stack (at <main+0>). It's neither aligned to 8 bytes nor 16 bytes, so that's not why as far as I know. As I see it, main requires 12 bytes for the two parameters to func and the return value.
What am I missing?
The program is C code compiled with "gcc -ggdb" on a x86 architecture.
Edit: I removed the -O0 flag from gcc, and it made no difference to the output.
(gdb) disas main
Dump of assembler code for function main:
0x080483d1 <+0>: sub esp,0x18
0x080483d4 <+3>: mov DWORD PTR [esp+0x4],0x7
0x080483dc <+11>: mov DWORD PTR [esp],0x3
0x080483e3 <+18>: call 0x80483b4 <func>
0x080483e8 <+23>: mov DWORD PTR [esp+0x14],eax
0x080483ec <+27>: add esp,0x18
0x080483ef <+30>: ret
End of assembler dump.
Edit: Of course I should have posted the C code:
int func(int a, int b) {
int c = 9;
return a + b + c;
}
void main() {
int x;
x = func(3, 7);
}
The platform is Arch Linux i686.
The parameters to a function (including, but not limited to main) are already on the stack when you enter the function. The space you allocate inside the function is for local variables. For functions with simple return types such as int, the return value will normally be in a register (eax, with a typical 32-bit compiler on x86).
If, for example, main was something like this:
int main(int argc, char **argv) {
char a[35];
return 0;
}
...we'd expect to see at least 35 bytes allocated on the stack as we entered main to make room for a. Assuming a 32-bit implementation, that would normally be rounded up to the next multiple of 4 (36, in this case) to maintain 32-bit alignment of the stack. We would not expect to see any space allocated for the return value. argc and argv would be on the stack, but they'd already be on the stack before main was entered, so main would not have to do anything to allocate space for them.
In the case above, after allocating space for a, a would typicaly start at [esp-36], argv would be at [esp-44] and argc would be at [esp-48] (or those two might be reversed -- depending on whether arguments were pushed left to right or right to left). In case you're wondering why I skipped [esp-40], that would be the return address.
Edit: Here's a diagram of the stack on entry to the function, and after setting up the stack frame:
Edit 2: Based on your updated question, what you have is slightly roundabout, but not particularly hard to understand. Upon entry to main, it's allocating space not only for the variables local to main, but also for the parameters you're passing to the function you call from main.
That accounts for at least some of the extra space being allocated (though not necessarily all of it).
It's alignment. I assumed for some reason that esp would be aligned from the start, which it clearly isn't.
gcc aligns stack frames to 16 bytes per default, which is what happened.